One Small Citrus for Man; One Giant Leaf for Mankind

– Who does not like fruits!?
– Well.. that depends. Are you talking about “a structure of a plant that contains its seeds?”
– No silly, of course not! I am talking about data bases!

© by my brain

The Right Fruit for the Right Job


Now days in order to be competent in a world of Big Data you must get at least a Masters in Fruits, or as I call it an “MF Degree”. Why!? Well how about’em fruits:

You see? Very important to know which fruit to choose for your next {m|b|tr}illion dollar gig.

To expand my MF degree I love doing research in a big data space, and as I was walking around #oscon 2011 expo, I was really pleased to discover a new sort of fruits that I have not heard of before. You would think “yea, ok.. YAFDB: Yet Another Fruit DB”, but no => this one is different => this one has a kicker, this one has a.. “leaf”!

Leafing A for C


You may notice that the above fruit DBs missing that “power of the leaf”, and look rather leafless. And in the world of NoSQL databases fruit without a leaf has somewhat inconsistent properties. Well, let’s rephrase that: eventually the leaf will grow, so we can say that eventually those fruits will look consistent.

But what if a NoSQL database already came with leaf attached to it? You can’t argue that if it did, it would have a complete, consistent look to it.

Well that is quite interesting.. Why a NoSQL database can’t have a configuration to actually be consistent? Think about it.. If the data is spread/sharded/persisted to multiple nodes using a “consistent hashing” algorithm, where clients could have a guarantee that “this” data would live on “these” set of nodes, then any time an insert/update is completed ( truly committed ), any reads for that data would know exactly where/which nodes to read this data from. Since the hash is consistent.

The answer is actually obvious => by ensuring ‘C’ in a CAP theorem via consistent hash, you would need to sacrifice some of ‘A’.. Since certain data is limited by a concrete set of nodes (that client relies on), if some of those nodes are down, DB would need to lock/bring back/reconfigure/reshuffle data, and for that “moment” that data would be unAvailable. This can be improved/tuned with replication, but the “A sacrifice” remains to be there.

Well now I can actually try out the above with this new fruit DB that I discovered @ OSCON. It’s time you meet CitrusLeaf DB

Citrus DB with a Leaf Attached


You can go ahead and read their Architecture Paper with pretty pictures and quite interesting claims, but here I’ll just mention some interesting facts that are mostly not in a paper, which I gathered from talking to CitrusLeaf dudes at OSCON. By the way, they were really open about the internals of CitrusLeaf, even though it is a closed source, commercial product. So here we go:

  • The business niche CitrusLeaf aims to conquer is “Real Time Bidding” which in short is a bidding system that offers the opportunity to dynamically bid impressions ( Online Advertisement ). More about it here: http://en.wikipedia.org/wiki/Sell_Side_Platform
  • The pattern in Real Time Bidding space is 60/40 => 60% reads and 40% writes. CitrusLeaf promises to perform equally well for reads and writes
  • They claim to perform at 200,000 Transactions Per Second per node. Claim is based on 8 byte transactions, which according to CitrusLeaf folks is the usual transaction size in Real Time Bidding world
  • CitrusLeaf can use 3 different storage strategies: DRAM, SSD and Rotation Disks. They are optimized to work with SSDs, where the above benchmark drops to 20,000 Transactions Per Second for a single SSD. In a normal setup, a node would have about 4 SSD attached, where 80,000 Transactions Per Second can be achieved
  • Clients are available in C, C#, Java, Python, PHP, and Ruby
  • CitrusLeaf is ACID compliant, and uses consistent hashing to achieve ‘C’
  • Stores data in a B-Tree, since it does more (real time) reads than writes
  • Citrusleaf can store indices for 100 million records in 7 gigabytes of DRAM
  • Pricing model is per usage => e.g. per TB. Trial release includes a tracking mechanism where the system is reporting the usage

I feel like CitrusLeaf would be a cool addition to my MF degree, besides I already came up with a slogan for them: “One small citrus for man; one giant leaf for mankind” © by my brain