"; */ ?>

web 2.0


30
Aug 09

Key Value Store List

B Tree

While playing with CouchDB, I decided to expand on the subject and research to see what else is out there. Apparently there are lots of cool implementations of Key Value Stores. And since to try them all will take a long time, I decided make some notes of what I found, to simplify my journey.

Here I created a simple reference list of different, non-commercial implementations of Key Value Stores. Let me know if there are more interesting projects that are not in this list, so we can keep this list updated.

Tokyo Cabinet

A C library that implements a very fast and space efficient key-value store:

The database is a simple data file containing records, each is a pair of a key and a value. Every key and value is serial bytes with variable length. Both binary data and character string can be used as a key and a value. There is neither concept of data tables nor data types. Records are organized in hash table, B+ tree, or fixed-length array.

Besides bindings for Ruby, there are also APIs for Perl, Java, and Lua available.

To share Tokyo Cabinet across machines, Tokyo Tyrant provides a server for concurrent and remote connections.

Speed and efficiency are two consistent themes for Tokyo Cabinet. Benchmarks show that it only takes 0.7 seconds to store 1 million records in the regular hash table and 1.6 seconds for the B-Tree engine. To achieve this, the overhead per record is kept at as low as possible, ranging between 5 and 20 bytes: 5 bytes for B-Tree, 16-20 bytes for the Hash-table engine. And if small overhead is not enough, Tokyo Cabinet also has native support for Lempel-Ziv or BWT compression algorithms, which can reduce your database to ~25% of it’s size (typical text compression rate). Oh, and did I mention that it is thread safe (uses pthreads) and offers row-level locking?

good intro: Tokyo Cabinet Beyond Key Value Store

Project Voldemort

Voldemort is a very cool project that comes out of LinkedIn. They seem to even be providing a full time guy doing development and support via a mailing list. Kudos to them, because Voldemort, as far as I can tell, is great. Best of all, it scales. You can add servers to the cluster, you don’t do any client side hashing, throughput is increased as the size of the cluster increases. As far as I can tell, you can handle any increase in requests by adding servers as well as those servers being fault tolerant, so a dead server doesn’t bring down the cluster.

Voldemort does have a downside for me, because I primarily use ruby and the provided client is written in java, so you either have to use JRuby (which is awesome but not always realistic) or Facebook Thrift to interact with Voldemort. This means thrift has to be compiled on all of your machines, and since Thrift uses Boost C++ library, and Boost C++ library is both slow and painful to compile, deployment of Voldemort apps is increased significantly.

Voldemort is also intersting because it has pluggable data storage backend and the bulk of it is mostly for the sharding and fault tolerance and less about data storage. Voldemort might actually be a good layer on top of Redis or Tokyo Cabinet some day.

Voldemort, it should be noted, is also only going to be worth using if you actually need to spread your data out over a cluster of servers. If your data fits on a single server in Tokyo Tyrant, you are not going to gain anything by using Voldemort. Voldemort however, might be seen as a good migration path from Tokyo * when you do hit that wall were performance isn’t enough.(from: NoSQL If Only It Was That Easy)

CouchDB

Apache CouchDB is a document-oriented database that can be queried and indexed in a MapReduce fashion using JavaScript. CouchDB also offers incremental replication with bi-directional conflict detection and resolution.

CouchDB provides a RESTful JSON API than can be accessed from any environment that allows HTTP requests. There are myriad third-party client libraries that make this even easier from your programming language of choice. CouchDB’s built in Web administration console speaks directly to the database using HTTP requests issued from your browser.

It’s a “distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API”. Data is stored in ‘documents’, which are essentially key-value maps themselves, using the data types you see in JSON. CouchDB can do full text indexing of your documents, and lets you express views over your data in Javascript. I could imagine using CouchDB to store lots of data on users: name, age, sex, address, IM name and lots of other fields, many of which could be null, and each site update adds or changes the available fields. In situations like that it quickly gets unwieldy adding and changing columns in a database, and updating versions of your application code to match. (from: Anti RDBMS a List of Distributed Key Value Stores)

good intro: InfoQ: CouchDB From 10K Feet

MongoDB

MongoDB is not a key/value store, it’s quite a bit more. It’s definitely not a RDBMS either. I haven’t used MongoDB in production, but I have used it a little building a test app and it is a very cool piece of kit. It seems to be very performant and either has, or will have soon, fault tolerance and auto-sharding (aka it will scale). I think Mongo might be the closest thing to a RDBMS replacement that I’ve seen so far. It won’t work for all data sets and access patterns, but it’s built for your typical CRUD stuff. Storing what is essentially a huge hash, and being able to select on any of those keys, is what most people use a relational database for. If your DB is 3NF and you don’t do any joins (you’re just selecting a bunch of tables and putting all the objects together, AKA what most people do in a web app), MongoDB would probably kick ass for you.

Oh, and did I mention that, of all the NoSQL options out there, MongoDB is the one of the only ones being developed as a business with commercial support available? If you’re dealing with lots of other people’s data, and have a business built on the data in your DB, this isn’t trivial.

On a side note, if you use Ruby, check out MongoMapper for very easy and nice to use ruby access.(from: NoSQL If Only It Was That Easy)

MongoDB: Dwight Merriman, 10gen (slides, video)

Cassandra

Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store. Cassandra brings together the distributed systems technologies from Dynamo and the data model from Google’s BigTable. Like Dynamo, Cassandra is eventually consistent. Like BigTable, Cassandra provides a ColumnFamily-based data model richer than typical key/value systems.

Cassandra was open sourced by Facebook in 2008, where it was designed by one of the authors of Amazon’s Dynamo. In a lot of ways you can think of Cassandra as Dynamo 2.0. Cassandra is in production use at Facebook but is still under heavy development.

Sounded very promising when the source was released by Facebook last year. They use it for inbox search. It’s Bigtable-esque, but uses a DHT so doesn’t need a central server (one of the Cassandra developers previously worked at Amazon on Dynamo). Unfortunately it’s languished in relative obscurity since release, because Facebook never really seemed interested in it as an open-source project. From what I can tell there isn’t much in the way of documentation or a community around the project at present. (from: Anti RDBMS a List of Distributed Key Value Stores)

good intro: Up and Running With Cassandra

MySQL Cluster / NDB

Although it is not your native Key Value Store, I found it interesting to put on the list. While it is commonly used through an SQL interface, the architecture and performance is exactly what you want: Cloud-like sharding, very good performance on key-value lookups, etc… And if you don’t want the SQL, you can use the NDB API directly, or REST through mod_ndb Apache module (http://code.google.com/p/mod-ndb/).

This would score high on your list if you evaluated it:

– Transparent sharding: Data is distributed through an md5sum hash on your primary key (or user defined key), yet you connect to whichever MySQL server you want, the partitions/shards are transparent behind that.

– Transparent re-sharding: In version 7.0, you can add more data nodes in an online manner, and re-partition tables without blocking traffic.

– Replication: Yes. (MySQL replication).

– Durable: Yes, ACID. (When using a redundant setup, which you always will.)

– Commit’s to disk: Not on commit, but with a 200ms-2000ms delay. Durability comes from committing to more than 1 node, *on commit*.

– Less than 10ms response times: You bet! 1-2 ms for quite complex queries even.

Other KVS I have in my backlog to expand on here later:

LightCloud

Lux IO

Flare

Tx

Oracle Berkeley DB

Ringo

Redis

Scalaris

Kai

HBase

Hypertable

Dynomite

MemcacheDB

ThruDB

And as a bonus – there is another quite interesting initiative started by Yehuda Katz:

Moneta

Moneta: A unified interface for key/value stores
================================================
 
Moneta provides a standard interface for interacting with various kinds of key/value stores.
 
Out of the box, it supports:
 
* File store for xattr
* Basic File Store
* Memcache store
* In-memory store
* The xattrs in a file system
* DataMapper
* S3
* Berkeley DB
* Redis
* SDBM
* Tokyo
* CouchDB
 
All stores support key expiration, but only memcache supports it natively. All other stores
emulate expiration.
 
The Moneta API is purposely extremely similar to the Hash API. In order so support an
identical API across stores, it does not support iteration or partial matches, but that
might come in a future release.

Any additional info / projects are welcome!


24
Jun 09

What is The Best Java Web Framework?

What is the best Java web framework

No, really, what is it? Are you one of those people who ever wondered why there is actually more than one? Or you work with Struts for 5 years in a row, and think it is the best, just because you know all the nasty hooks you have to implement to make it do what you actually need?

Are you one of those die hard Rails fans, who (as most of Rails developers) does not really know Java, but already has a lot to say how bad it is?

Do you work in C++, and feel absolutely confident, that the framework, your company had developed for the last 25 years, that build web pages via socket programming is the best, and the most efficient one out there?

Is your company big enough, so it has a mature team of software (what they call themselves) architects, that take open source frameworks, and write their own single wrapper framework around it that, as they think (since they wrote it), is the 8th wonder of the (ancient) world?

You, see, I am a consultant – I know many of you, since I met “you” before :)

This one will be a very short article about my experiences with several Java Web Frameworks out there. Here we go:

Spring MVC Would be a good choice for the most of your needs

(I’ll give a short controller example below)

Wicket Interesting to look at – no XML, no JSP (JSTL), just Java and HTML. Can mimic a flow in a WebPage object. Better separation of concerns than, for example, in GWT (e.g. no Javish CSS, etc.). Good Community

The only thing that is off is your dynamic HTML elements are done in Java

Spring Webflow Yes – it is a separate beast. It mostly is good, and makes sense, however, in practice, once you need to do something a bit more complex that a shopping cart or a hotel booking app (hint, hint), you can run into problems. “Back button” and “Double click” are not very well handled by the framework, may get an exception while bookmarking (there is a magic recipe, but far from being simple, and intuitive), sharing data across the flow, last resort error handling are not simple, etc.
Stripes Good / simple (no XML – conventions), but not very actively maintained – hence not as mature. (good community though) Worth to look at for simple projects.
Struts Just architected wrong from the very beginning: Validation (XML – why? What about minimum search criteria, what about several, what about nested OO validators!?) / 0 for NULLs / Multi Action Forms / Testing (without StrutsTestcase) / etc. ) Improved a bit since WebWork merging, but still lots of “code smells”.
JSF Quite hard to keep up with all these JSF based JSP tags + integration with security is not simple + full JSF solutions are usually Frankensteins with many pieces from different vendors.
Tapestry Not bad, actually make sense, when you get it. But have you ever looked and tried to follow the Tapestry code? – Very complex implementation, if ever need to look inside the code + Tapestry does take time to learn, so forget about a new off-shore team, or fresh out of college not so geeky grads, taking it on.

And here is why I like Spring MVC:

  • Binding / Validation is done just right – clean, testable, reusable
  • Multiple View options ( PDF, XML, Excel, Atom, etc… ) done easy [AbstractExcelView, AbstractFeedView, AbstractJExcelView, AbstractPdfView, AbstractUrlBasedView, AbstractXsltView ]
  • Annotation based – no XML madness, and very clear when looking at the code – check the Pet Clinic in Spring 3.0 M3
  • Integrates with Spring JS very nicely ( in case needed )
  • Handing requests and parameters with ( Spring ) expression language – quite flexible
  • In Spring 3, MVC is actually REST aware (GET, POST, PUT, DELETE)

You can download Spring STS, import sample projects, and see many examples on how to use it, but here is a very simple controller example from Spring’s Pet Clinic:

@Controller                                                 // it’s a controller
public class ClinicController {
 
      @RequestMapping("/welcome.do")           // that all there is to mapping
      public void welcomeHandler() {
      }
 
      /** @return a ModelMap with the model attributes for the view
                                 uses org.springframework.core.Conventions */
      @RequestMapping("/vets.do")
      public ModelMap vetsHandler() {
            return new ModelMap(this.clinic.getVets());
      }
 
      @RequestMapping("/owner.do")
      public ModelMap ownerHandler(@RequestParam("ownerId") int ownerId) {  // parameters are passed in easily
            return new ModelMap(this.clinic.loadOwner(ownerId));
      }
}

It all depends on project’s requirements / timeline / resources / requirements / technologies already in place / etc… But having a choice, I do choose Spring MVC – it just makes sense: easy implementaion with Spring Roo / integration with Spring’s back end / support / community / releases / etc… I also like where Wicket is going, but it feels like “it is still going…”

In any case – good luck, and remember – if it does not have to be Java, but you still like to “stay close”, I would definitely give Grails a shot.


29
Mar 09

Adobe Flex in Ubuntu: Develop, Compile and Run

Flex on UbuntuRecently, browsing InfoQ I stumbled upon a very visual and interesting presentation by Christophe Coenraets “Rich Internet Applications with Flex and AIR“.

This presentation took place during QCon London 2008, where Christophe Coenraets, a Senior Technical Evangelist at Adobe, presented Flex and AIR, two technologies from Adobe used to create, deploy and run Rich Internet Applications.

I have not had any experience with Flex in the past, and, naturally, right after the presentation, I decided to give it a try – to develop, compile, and run an ultra simple Flex application. After some research, I found that there are two choices that are out there for Flex developers:

Adobe® Flex® Builder™ – software is a highly productive Eclipse™ based development tool enabling intelligent coding, interactive step-through debugging, and visual design of the user interface layout, appearance, and behavior of rich Internet applications (RIAs).

OR

Adobe® Flex™ 3 Software Development Kit (SDK) – includes the Flex framework (component class library) and Flex compiler, enabling you to freely develop and deploy Flex applications using an IDE of your choice.

While Adobe® Flex® Builder™ is an appealing option, it is not free. It starts from $300, and goes up to $700 for a professional edition. Whereas Flex SDK is open source and free – which is “a bit” cheaper than $300. The biggest difference between the two is that with just SDK, I will have to use my own IDE / text editor to write Flex applications, which is totally fine by me.

Step 1. Download Flex SDK.

Go to download Flex SDK, and check the box with “I have read the Adobe Flex SDK License, and by downloading the software listed below I agree to the terms of the agreement.”, you should see the “Download the Flex SDK for all Platforms” link to a Flex SDK zip file. Download it.

Unzip it to any directory that you like (in my case it is /opt/flex-sdk)

unzip flex_sdk_3.3.0.4589.zip

Step 2. Create an alias to compile MXML, ActionScript, etc. Flex applications.

Make sure java 6 is installed:

sudo apt-get install sun-java6-jdk

I need to have JAVA_HOME pointed to java 5 (JAVA_HOME=/usr/lib/jvm/java-1.5.0-sun), so I’ll hardcode the path to java 6 into flex compiler alias:

in ~/.bashrc:

# flex SDK home
export FLEX_SDK_HOME=/opt/flex-sdk
alias mxmlc='/usr/lib/jvm/java-6-sun/bin/java -jar "$FLEX_SDK_HOME/lib/mxmlc.jar" +flexlib="$FLEX_SDK_HOME/frameworks" "$@"'

re-login to the terminal (open a new terminal session). Now you can execute:

mxmlc youFlexApp.mxml

to compile an MXML file into an executable “youFlexApp.swf”

Step 3. Write a simple MXML application and compile it with Flex SDK.

Create a simple MXML file “flexTest.mxml”, that would create a button:

<?xml version="1.0" encoding="utf-8"?>
<mx:Application xmlns:mx="http://www.adobe.com/2006/mxml" layout="absolute">
        <mx:Button label="I am a simple flexy button" x="10" y="10" />
</mx:Application>

Compile it:

$ mxmlc flexTest.mxml
Loading configuration file /opt/flex-sdk/frameworks/flex-config.xml
/path/to/flexTest.swf (172884 bytes)

Now you should see a new SWF once the compiler is done:

$ ls -l
total 180
-rw-r--r-- 1 user group    217 2009-03-29 02:38 flexTest.mxml
-rw-r--r-- 1 user group 172884 2009-03-29 02:40 flexTest.swf

Step 4. Run compiled Flex application.

Open it with Firefox (make sure you have Adobe Flash Player plugin installed. If not, install it):

$ firefox flexTest.swf &

Now you should see that flexy button:

a simple button written in MXML and compiled by flex SDK

It is quite simple, really. Good Luck Flexing!


7
Jan 09

Install Adobe Flash Player Firefox Plugin

Flash Player on Ubuntu

Since I upgraded one box to Ubuntu 8.04 (Hardy Heron), and another box to Ubuntu 8.10 (Intrepid Ibex), it was quite irritating for some time to watch youtube videos with no or flaky sound along with skipping videos.

In Firefox, when I went to “Tools -> Add-ons -> Plugins”, or just typed “about:plugins” in the address bar, I saw that I do have “Shockwave Flash 9.0 r124”, however it just dis not want to work smoothly. The same was true for “”Shockwave Flash 9.0 r100”.

So you would think that the right thing to do was to go to the Adobe website: “http://get.adobe.com/flashplayer/“, choose “get the one for Ubuntu 8.04+” option, and download the latest (v10 / v11 / v12 / v13 / v14 / whatever…) flash player, right? Well, not really. After I did that, I saw both “Shockwave Flash 9.0 r124” and “Shockwave Flash 10.0 r15”, so I disabled 9.0 one, and enabled 10.0 – should be good right? NOPE.

What appeared to be the solution for this mess of flash plugins was to do some “sudo apt-cache search flash…” searches, and figure out what needed to go from both systems.

There were two culprits that overruled the only enabled “Shockwave Flash 10.0 vr15” plugin: “swfdec-mozilla” and “mozilla-plugin-gnash”. And hence they are going to be removed with all other potential inconsistencies:

sudo apt-get remove -y --purge flashplugin-nonfree gnash gnash-common mozilla-plugin-gnash swfdec-mozilla libflashsupport nspluginwrapper
sudo rm -f /usr/lib/mozilla/plugins/*flash*
sudo rm -f ~/.mozilla/plugins/*flash*
sudo rm -f /usr/lib/firefox/plugins/*flash*
sudo rm -f /usr/lib/firefox-addons/plugins/*flash*
sudo rm -rfd /usr/lib/nspluginwrapper

After this, I had a good feeling and went to http://get.adobe.com/flashplayer/ again, chose “get the one for Ubuntu 8.04+”, saved “install_flash_player_version_linux.deb” locally, and install it with my bare hands:

sudo dpkg -i install_flash_player_10_linux.deb

Restarted Firefox, and let me tell you – Quality of my Ubuntu life has improved significantly since then!
Want to improve the quality of your life significantly? Follow the two steps above :)


24
Sep 08

How to Digg at Work

work, digg, life...

How many people do digg at work? Do you do it? The answer to this is mostly “YES”; sometimes, the answer is “NOT OFTEN”, in order to avoid “YES”; and the most rare answer is “NO”. And only sometimes, now days in 21st century, the answer could be “WHAT THE HECK IS DIGG!?”

However 99% of people will have no difficulties with answering the question “Do you work?”. That is due to the fact that “WORK” has been defined for a way longer period of time than “DIGG”. And most of the time, unless employed by Digg, its competitors, or “ Times Magazine”, DIGG and WORK do not go together. Therefore employees do not like to be caught by their managers and supervisors while reading fresh news from the Digg’s first page.

But there is nothing wrong with spending some time reading DIGG at work. In fact it can, and most of the time will boost the productivity, if not abused, and done right – really! According to our friend Albert Einstein, there is nothing as innovative and productive as taking 10-15 minutes breaks. And Albert generally knew what he was talking about. :)

A wise man once told me: “The secret to creativity is knowing how to hide your sources” (I believe it was also Einstein) – so here is a simple way of using Digg at work without causing any suspicion.

Lynx – is the answer to creating “work and digg” balance, and can be used to read pretty much any website, not only Digg. It is available on windows, Mac, comes pre-installed with most distributions of Linux, and according to its documentation is available for other platforms as well.

Here is what reading Digg at work using Lynx web browser would look like:

Reading Digg.com with Lynx Web Browser

Colors of the terminal could be changed to blend in better with the Desktop, and applications mostly used, which will depend on the nature of the job. Lynx is very easy to navigate – mostly just by using “up down left right” arrows, and is considered “work safe” – it is a simple text after all.

Boost your work productivity, and… happy Digging!

what else is interesting about digg: How does Digg Make Money?