dotkam.com stats


07
Mar 10

Think About Code Quality

code-qualityRecently one of my friends from work asked me to help him improve the process around code quality and developer productivity. So I compiled my thoughts and e-mailed to him, but then I realized that it may be very helpful for others who are involved in software industry. Are you? Then keep reading.. :)

Although tools and frameworks listed here are JVM-based language focused, the approach can be definitely reused with any other environment / language / technology. So without further ado, here it goes:

The center of the code quality monitoring can be either Continuous Integration ( e.g. Cruise Control: http://cruisecontrol.sourceforge.net/ ) or Sonar ( http://sonar.codehaus.org/ ). Sometimes both.

Continuous Integration should be setup to “Intergarte Continuously” :) which means every time something is checked in, force a build. That is the whole purpose of “Continuous Integration”, and that is why I am really against the way Cruise Control is used on some client sites [builds on demand by pressing a build button.. grrr, back to 1990-ties].

So given that “Continuous Integration” is not misused, it will be plugged in with PMD ( http://pmd.sourceforge.net/ ) / Findbugs ( http://findbugs.sourceforge.net/ ), Checkstyle ( http://checkstyle.sourceforge.net/ ) and Test Coverage ( e.g. Cobertura: http://cobertura.sourceforge.net/ ) tools that will generate reports, and reflect the only true state of the code that is checked into the repository.

Now, as to a developer corner..

RAD is nothing more than just an Eclipse with lots of bloated (mostly unused IBM plugins). But being Eclipse it is very pluginable by nature. This means that PMD / Checkstyle / Cobertura reports can be available to developers prior checking in the code. PMD / Checkstyle at compile time, Cobertura at (test) run time. If possible, try to use something like Spring Tool Suite: http://www.springsource.com/products/sts, which is also an Eclipse, but much lighter (compare to RAD), faster and smarter “pluged-in”.

As far as testing. JUnit (http://www.junit.org/) should be aimed to cover two different separate layers of testing: Component/Unit Testing and Integration Testing. Component tests should be aimed to test exclusively content of the component with all of its dependencies mocked out ( http://mockito.org/ ). Where as Integration tests should test how well components integrate together, with all the test data staged ( in case DB is used: http://www.dbunit.org/ )

As for the test coverage, 85% to 90% is a good goal to aim for. Do not sign off, until this level of coverage is reached no matter how close your dead lines are, since if you do, that will increase amount of defects ten fold, that is just a law :)

If developers work on code that is / can be deployed to an Application Server, such as Tomcat, JBoss, Geronimo ( if you’re unlucky, Websphere :) ), consider getting JRebel ( http://www.zeroturnaround.com/jrebel/ ), it’ll boost developers productivity by.. let’s just say “a lot” :)

This is how I see it, and this really works for me, and projects around me.


10
Feb 10

Multiple Around Advices Applied to the Same Join Point

Spring and AspectJ

So, what do you think will happen if you apply two around advices to the same join point? They both call “proceedingJoinPoint.proceed()” which calls their target object.

But then if the target object is the same, then it is going to be called twice.. Hmm.. not something you would want to happen especially if that target object is a service that withdraws money from your bank account..

According to the Spring documentation, you may specify the order in which these advices are applied, using Ordered interface, so in case the target object (that service that withdraws money from your account) is called twice, you have an opportunity to specify the order of who calls it first:)

To free the minds of many from unnecessary worries, I’ll show you what really happens when two around advices are applied to the same join point.

Here is this target object (transfer money service):

public class WireTransferMoneyService implements TransferMoneyService {
 
	public void transferMoney( Money money, 
			         Account sourceAccount,
			         Account targetAccount) {
 
		// Start the "Wire Transfer" manager
		// ...
 
		System.err.println( this.getClass().getSimpleName() + " is called" );
 
		sourceAccount.withdraw( money );
		targetAccount.deposit( money );	
	}
}

And here are the two around advices that are applied to this transfer service:

@Around("com.xmen.iii.aspect.SystemArchitecture.businessService()")
public Object doSomethingOnTransfer(ProceedingJoinPoint pjp) throws Throwable {
 
	System.err.println( Thread.currentThread().getStackTrace()[1].getMethodName() +
			" \t\twas called before " + pjp.getTarget().getClass().getSimpleName() );
 
	Object retVal = pjp.proceed();
 
	System.err.println( Thread.currentThread().getStackTrace()[1].getMethodName() +
			" \t\twas called after " + pjp.getTarget().getClass().getSimpleName());
 
	return retVal;
}
 
@Around("com.xmen.iii.aspect.SystemArchitecture.businessService()")
public Object doSomethingElseOnTransfer(ProceedingJoinPoint pjp) throws Throwable {
 
	System.err.println( Thread.currentThread().getStackTrace()[1].getMethodName() +
			" \twas called before " + pjp.getTarget().getClass().getSimpleName() );
 
	Object retVal = pjp.proceed();
 
	System.err.println( Thread.currentThread().getStackTrace()[1].getMethodName() +
			" \twas called after " + pjp.getTarget().getClass().getSimpleName() );
 
	return retVal;
}

Now it’s simple: let’s run it and see what happens… And that is what happens:

2010-02-10 19:52:47,542 INFO [org.springframework.test.context.TestContextManager] - <@TestExecutionListeners is not present for class [class com.xmen.iii.integration.TransferLoggingAspectTest]: using defaults.>
2010-02-10 19:52:47,667 INFO [org.springframework.beans.factory.xml.XmlBeanDefinitionReader] - <Loading XML bean definitions from class path resource [com/xmen/iii/integration/TransferLoggingAspectTest-context.xml]>
2010-02-10 19:52:47,808 INFO [org.springframework.beans.factory.xml.XmlBeanDefinitionReader] - <Loading XML bean definitions from class path resource [META-INF/spring/aspect-context.xml]>
2010-02-10 19:52:47,839 INFO [org.springframework.beans.factory.xml.XmlBeanDefinitionReader] - <Loading XML bean definitions from class path resource [META-INF/spring/app-context.xml]>
2010-02-10 19:52:47,917 INFO [org.springframework.context.support.GenericApplicationContext] - <Refreshing org.springframework.context.support.GenericApplicationContext@1820dda: display name [org.springframework.context.support.GenericApplicationContext@1820dda]; startup date [Wed Feb 10 19:52:47 EST 2010]; root of context hierarchy>
2010-02-10 19:52:47,917 INFO [org.springframework.context.support.GenericApplicationContext] - <Bean factory for application context [org.springframework.context.support.GenericApplicationContext@1820dda]: org.springframework.beans.factory.support.DefaultListableBeanFactory@1126b07>
2010-02-10 19:52:48,355 INFO [org.springframework.beans.factory.support.DefaultListableBeanFactory] - <Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@1126b07: defining beans [transferMoneyService,messageSource,org.springframework.aop.config.internalAutoProxyCreator,com.xmen.iii.aspect.logging.ServiceLoggingAspect#0,org.springframework.context.annotation.internalCommonAnnotationProcessor,org.springframework.context.annotation.internalAutowiredAnnotationProcessor,org.springframework.context.annotation.internalRequiredAnnotationProcessor]; root of factory hierarchy>
"
doSomethingOnTransfer                   was called before WireTransferMoneyService
doSomethingElseOnTransfer               was called before WireTransferMoneyService
WireTransferMoneyService is called
doSomethingElseOnTransfer               was called after WireTransferMoneyService
doSomethingOnTransfer                   was called after WireTransferMoneyService
"
2010-02-10 19:52:48,433 INFO [org.springframework.context.support.GenericApplicationContext] - <Closing org.springframework.context.support.GenericApplicationContext@1820dda: display name [org.springframework.context.support.GenericApplicationContext@1820dda]; startup date [Wed Feb 10 19:52:47 EST 2010]; root of context hierarchy>
2010-02-10 19:52:48,433 INFO [org.springframework.beans.factory.support.DefaultListableBeanFactory] - <Destroying singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@1126b07: defining beans [transferMoneyService,messageSource,org.springframework.aop.config.internalAutoProxyCreator,com.xmen.iii.aspect.logging.ServiceLoggingAspect#0,org.springframework.context.annotation.internalCommonAnnotationProcessor,org.springframework.context.annotation.internalAutowiredAnnotationProcessor,org.springframework.context.annotation.internalRequiredAnnotationProcessor]; root of factory hierarchy>

So, as you can see, AspectJ and Spring are modest but smart, they chain those around advices for you, which is of course niice.

Happy Advising!


10
Feb 10

Sharing My Stargate Address

I guess you have it as well, if you use Comcast as your Internet Service Provider. Apparently they have these little Stargates that allow us to travel to different.. well not planets yet ( we, as a young human race, are starting slow ), but zip codes for starters. Inter-continent near real time travel is also available. But, of course, Earth would be our scope for now. Hopefully all the wormholes are blacklisted, and every traveler is equiped with a reliable iris.

It all started as a hunch, but then was officially confirmed by the Comcast’s Freudian slip:
My Stargate Address


19
Dec 09

UnBuzz Me

unbuzz meI am so tired of this artificial and useless layer that is being created on top of our language.

I am pretty sure it is a problem in any field, but I am a technical guy, so I am talking about technical buzz words.

As I look at it, big corporations use these useless acronyms to impress and sell. Individuals use buzz words to hide their own incompetence behind them.

Several years ago, when I joined IBM, the first thing I learned was not wonders of open source commitment, or coolness of IBM research labs, but “what TLA stands for”. Because without knowing that, I could not really read any documentation, or talk to most of the people in workplace. So what does TLA stand for? Three Letter Acronym.

And while TLA on its own is not harmful, its misuse in modern tech world is huge, and it makes it uber confusing for newcomers to understand what means what.

Afer I left IBM, I went even deeper into consulting, and just could not survive without learning more, and more, and more, and more… a little bit more of TLAs. Sometimes it’s useful to understand what the client really wants, sometimes just useful to be able to “talk corporate” to people who can’t talk any other language.

So what kind of misuse of TLAs am I talking about?

Let’s take for example “URL” (Uniform Resource Locator) – is this TLA? Yes. Is it being misused? No. Why not? Because it means one thing, and one thing only. And when we say URL, we mean just that – something that specifies where an identified resource is available and the mechanism for retrieving it.

Now let’s look at “SOA” (Service Oriented Architecture) – is this TLA? Yes. Is it being misused? Absolutely! Anywhere you go, you hear about SOA. People that work by your side, they do SOA. People who hire you, they want you to know SOA. Any software product, if it is any big, will be “SOA ready”, “SOA compliant”, “SOA leader”, etc… So you wonder, what SOA really is.. Well, nothing new, nor it is any revolutionary concept. SOA is what programming was all about from the day one – bunch of services exposed for others to use. That is IT! It is not a concept at all. So we have a TLA, and all it does it just BUZZes – no meaning behind it, but lots, and lots of confusion. Bunch Of Exposed Services, so how about I’ll call it BOES, or just BS maybe?

If SOA is not enough, let’s look at another one – “BPM” (Business Process Management) – is this TLA? Yes. Is it being misused? Absolutely! When you look at any Business, it will have at least some kind of Process, right? And it is a known fact that a process needs to be Managed in order to function. By people or programmatically, but it will be managed. So what does “Business Process Management” really mean? The answer is “nothing”. So when you hear about somebody saying that “we need to apply a BPM solution here”, think about a simple “if-then-else” conditional sentence. Example can be “if the item is ordered, then ship it”. That is it! Sometimes people refer to it as “Work Flow”, instead of BPM [where 'M' sometimes stands for Modeling], which makes a bit more sense. But should it have this confusing “BPM” title? No! But guess what is easier to market (read “sell”): “Business Process Management” solution or a bunch of simple “if-then-else” statements?

And these are just a couple of tech buzz words examples that make no sense.

Looking beyond tech, we have empty terms/words like: Enterprise, Immersion, Leverage, Paradigm Shift, Synergy, Tipping Point, etc… All of them create this confusing and useless layer, that hurts our language, communication, clarity, and makes us live in our buzzy little clouds with no clear idea where we are and what we really know. And no, I did not mean SaaS “cloud” YABW (Yet Another Buzz Word) or even PaaS cloud, nor I meant IaaS cloud… grrrr, buzz all around.


26
Nov 09

Learn Grails by Its Plugins

Grails PluginsHow do you approach learning new technology? Google it? Buy a book? Go to training? Start using it for your work?

Well, I figured that the answer to that would depend on the technology itself. And although I bought a Grails book, and spend sometime googling and building little Grails POC projects, and actually used it for work, I still felt that something is missing, that there is that gap between me and Grails.

That is when I discovered that the best way to learn Grails, to understand its guts, is to contribute to it.

Normally, in order to get a commiter status to a mature open source project you would have to open lots of JIRAs, provide many patches, donate many ideas, etc… In case of Grails, it is actually extremely easy and fast. Here are three easy steps to see you code posted on “grails.org”:

For more logistics refer to the official grails create plugins guide. But that is really it! That is how easy it is to join Grails developer community, and grow from a Consumer to the Creator.

I got lucky and saw a live presentation by Jeff Brown at Groovy on Grails One Day Seminar in Philly. That is when I got excited, and started to work on my own plugin during the seminar’s hackathon. Two days later I had a 0.1 version of plugin commited to github, three days later released it to grails.org. Just think about it – three days from scratch, and you can become a creator of an official Grails plugin – how cool is that?

Now go and create that plugin!


31
Oct 09

Would you like your own Sputnik?

Geek Sputnik

Do you like technology? What about latest and greatest technology? Would that be cool if you had your own Earth-orbiting satellite that would look upon us from the sky, and every time something cool was invented, released, discovered, it would instantly record a video about it and made it available for you to watch? It would be cool, right? Well, you can have it then.

sput⋅nik

–noun, any of a series of Soviet Earth-orbiting satellites: “Sputnik I” was the world’s first space satellite. It was launched into an elliptical low earth orbit by the Soviet Union on 4 October 1957, and was the first in a series of satellites collectively known as the Sputnik program

Since 1957 Sputnik spent a long time learning what’s cool about out techy world, how to aggregate this enormous amount of new discoveries, and present it in a clean, simple and concise fashion. Interestingly enough, a phrase “high tech” was firstly coined in that same year – coincidence?

But Sputnik does not work alone, it has the best breed of people that are specialized in high tech from their birth, that’s right – “Uber Geeks”!

As wiki points out geek is a slang term, noting individuals as “a peculiar or otherwise odd person, especially one who is perceived to be overly obsessed with one or more things including those of intellectuality, electronics, etc.”

Sputnik geeks are not only obsessed with intellectuality, electronics and much more, but they also love putting it into a simple video form and share it with the rest of the world. And that is exactly why Geek Sputnik TV was born – “from uber geek satellite to the rest of the world about technology”.

Since the whole “Sputnik Program” was started in former Soviet Union, Geek Sputnik TV inherited the legacy and currently broadcasts in Russian. However with an immediate attention and a high interest from WWW, we may be seeing these uber geeks to expand and include more languages including English in the near future.

Stay tuned – geeks are sharing!


28
Oct 09

Spring Insight in Action – 5 Minutes From Scratch

Spring Source Logo

Spring Insight gives deep visibility into your application’s real activity on a request-by-request basis. For any request you can see all the JDBC queries it made, how much time it took to render, or timings for any of your major Spring beans.” – said Jon Travis in his article on a Spring blog.

That is such a great idea I though, and watched Jon’s screencast.

What actually surprised me is how simple and quick it is to try the Spring Insight in action. Here are these 3 simple steps:

Step 1. Download Spring Tool Suite

I like to think about Spring Tool Suite as an Eclipse on Spring rocks (Spring IDE, Spring Interactive Tutorials, Exception Resolution, and much more). And now ( since version 2.2 ), it comes with tc Server Developer Edition that includes Spring Insight, so the easiest way to try out Spring Insight is to download Spring Tool Suite, since it comes with it: http://www.springsource.com/products/springsource-tool-suite-download

Note that “tc Development Edition” can be downloaded and run on its own, Spring Tool Suite (STS) gives us something extra: “ready to go” sample applications that we can deploy to tc Server – all in one.

Step 2. Import a sample web application

About those sample applications… Now as you have STS unpacked/unzipped, you can run it and go to “File” –> “Import” –> “Spring Tool Suite” –> “Sample Projects”. You should see three sample applications “Hotel Booking”, “PetClinic” and “SpringTravel”. I chose “PetClinic”, but it does not really matter, we can use any sample application to play with Spring Insight.

Once you click “Ok”, STS will ask you if that is ok to download 20+ MB of JARs, you, of course, having a huge HD, would say yes, and.. here you go 1 minute later you have yourself a fully functional ready to deploy web app!

Step 3. Deploy a web application to tc Server Developer Edition

Now right click on PetClinic app, -> Run on server -> Choose Spring tc Server (it is not going to say anything about Insight, but it’s there :) ).

At this point STS will ask you to browse to the location of your tc server, it should be under the directory you installed STS, e.g in my case it was: “/opt/springsource/tc-server-6.0.20.C”

After you click “Finish”, you should see “INFO: Deploying web application archive insight.war” as one of your deployment messages in STS console.

After another minute, once your app (PetClinic in my case) is deployed, go to http://localhost:8080/insight and you should be good to go:

Spring Insight

Insight Away!


21
Oct 09

They say offshore QA team, I say AUTOMATE IT!

automate testing

Part I. Poor Trees

I really see no advantage of having all these test scripts that are made as excel spread sheets and then printed out in hundreds (don’t you hear trees begging: “save us”?) and given to QA team members to spend a couple months to follow and make sure every little condition/case is met.

Guess what happens when new requirements come in, or the old ones get changed… Spend a week or two going through all these excel files, manually refactoring every little condition/case effected. And then what? Regression testing ..ummm.. another two months?

Oh.. shoot requirements changed in the middle of testing, what should we do.. hmm.. print more paper. Poor trees…

Part II. Can I get some sleep please?

Let’s make all this process twice as effective, let’s offshore 50% of the QA (testing). Oh.. ok:

– Hello? Do you speak English?

– Yes, hi we are a great software company, and yes we do speak English.

– Great! Can I outsource 50% of my testing to you guys, so we have 24 hour coverage?

– Sure – we’ll be glad to do that for you.

– Awesome! I just sent you the package with everything that needs to be done.

2 a.m. “Hello. We are looking at requirements, and we did not find any definition for ‘A’..”

3 a.m. “Hello. We need to make an important decision, and we were wondering… ”

5 a.m. “Hello. Before we send you today’s status, we wanted to make sure..”

Part III. Open up a little

Of course we need people in charge of QA, and we need QA teams – the guys are great! They make us shine when we ship our high quality creation out the door. But can they be more effective, and do less work at the same time? Can they use technologies that are available, instead of using MS Office for things that it was really not made for? Can they ensure that whoever comes to test after them will not be lost, can pick up where they left off, and keep doing an awesome job? Yes, Yes, and Yes, and many more Yeses!

The hardest part is to let go the fear and the standards that were set 20, 15, 10 years ago for Software Testing. Yes, it made sense then ( maybe :) ), but now there are dozens of tools and practices that will increase productivity tenfold, you just need to open up a little. You can do it!


Part IV. Automate yourself!

Let’s look at the simple example of using some of these time saving tools. Here is how one of the test cases can be automated by using easyB and Selenium. The automated test itself will worth a thousand words, so here it is:

before "start selenium", {
 given "selenium is up and running", {
   // start selenium
 }
}
 
scenario "a valid person has been entered", {
 
 when "filling out the person form with a first and last name", {
   selenium.open("http://acme.racing.net/greport/personracereport.html")
   selenium.type("fname", "Britney")
   selenium.type("lname", "Smith")
 }
 
 and "the submit link has been clicked", {
   selenium.click("submit")
 }
 
 then "the report should have a list of races for that person", {
   selenium.waitForPageToLoad("5000")
   values = ["Mclean 1/2 Marathon", "Reston 5K", "Herndon 10K", "Leesburg 10K"]
   for( i in 0..< values.size() ){
      selenium.getText("//table//tr[${(i+3)}]/td").shouldBeEqualTovalues[i]
   }
 }
}
 
after "stop selenium", {
  then "selenium should be shutdown", {
    // stop selenium
  }
}

And, let’s say Groovy is not your first language, and maybe you’re not a programmer at all, but really, can you read, and understand, that:

 when 'filling out the person form with a first and last name'
 and   'the submit link has been clicked'
 then  'the report should have a list of races for that person'

Ok, granted, you may not know regular expressions, and “//table//tr[${(i+3)}]/td” may be just a little over your head, but guess what.. It was auto-generated (by selenium) for you the first time you drove through the screens. And you know what the beauty of it is? It can run on it’s own, and it can tell you whether your application satisfy “this” particular requirement. You can schedule it to run every day, if you’d like, or every time the code is modified, or every Labor Day, or… you get the point. :)


Part V. We still need people, but they can be more productive!

“So what?”, you say, “One little test case – we have hundreds, and it would take us a month to create all these “automated” cases and scripts.”, and I say – yes, it may take a month now, but it will save you a year later. Why? Because if you create these scripts carefully, reflecting all the business requirements, these scripts will turn into the fastest, most accurate and “easy to interact with” QA team on Earth for your company/project. And that I would say, worth a month of work!

easyB example was taken from an extremely good presentation Industrial Strength Groovy by Paul King. Thank you Paul!


30
Aug 09

Key Value Store List

B Tree

While playing with CouchDB, I decided to expand on the subject and research to see what else is out there. Apparently there are lots of cool implementations of Key Value Stores. And since to try them all will take a long time, I decided make some notes of what I found, to simplify my journey.

Here I created a simple reference list of different, non-commercial implementations of Key Value Stores. Let me know if there are more interesting projects that are not in this list, so we can keep this list updated.

Tokyo Cabinet

A C library that implements a very fast and space efficient key-value store:

The database is a simple data file containing records, each is a pair of a key and a value. Every key and value is serial bytes with variable length. Both binary data and character string can be used as a key and a value. There is neither concept of data tables nor data types. Records are organized in hash table, B+ tree, or fixed-length array.

Besides bindings for Ruby, there are also APIs for Perl, Java, and Lua available.

To share Tokyo Cabinet across machines, Tokyo Tyrant provides a server for concurrent and remote connections.

Speed and efficiency are two consistent themes for Tokyo Cabinet. Benchmarks show that it only takes 0.7 seconds to store 1 million records in the regular hash table and 1.6 seconds for the B-Tree engine. To achieve this, the overhead per record is kept at as low as possible, ranging between 5 and 20 bytes: 5 bytes for B-Tree, 16-20 bytes for the Hash-table engine. And if small overhead is not enough, Tokyo Cabinet also has native support for Lempel-Ziv or BWT compression algorithms, which can reduce your database to ~25% of it’s size (typical text compression rate). Oh, and did I mention that it is thread safe (uses pthreads) and offers row-level locking?

good intro: Tokyo Cabinet Beyond Key Value Store

Project Voldemort

Voldemort is a very cool project that comes out of LinkedIn. They seem to even be providing a full time guy doing development and support via a mailing list. Kudos to them, because Voldemort, as far as I can tell, is great. Best of all, it scales. You can add servers to the cluster, you don’t do any client side hashing, throughput is increased as the size of the cluster increases. As far as I can tell, you can handle any increase in requests by adding servers as well as those servers being fault tolerant, so a dead server doesn’t bring down the cluster.

Voldemort does have a downside for me, because I primarily use ruby and the provided client is written in java, so you either have to use JRuby (which is awesome but not always realistic) or Facebook Thrift to interact with Voldemort. This means thrift has to be compiled on all of your machines, and since Thrift uses Boost C++ library, and Boost C++ library is both slow and painful to compile, deployment of Voldemort apps is increased significantly.

Voldemort is also intersting because it has pluggable data storage backend and the bulk of it is mostly for the sharding and fault tolerance and less about data storage. Voldemort might actually be a good layer on top of Redis or Tokyo Cabinet some day.

Voldemort, it should be noted, is also only going to be worth using if you actually need to spread your data out over a cluster of servers. If your data fits on a single server in Tokyo Tyrant, you are not going to gain anything by using Voldemort. Voldemort however, might be seen as a good migration path from Tokyo * when you do hit that wall were performance isn’t enough.(from: NoSQL If Only It Was That Easy)

CouchDB

Apache CouchDB is a document-oriented database that can be queried and indexed in a MapReduce fashion using JavaScript. CouchDB also offers incremental replication with bi-directional conflict detection and resolution.

CouchDB provides a RESTful JSON API than can be accessed from any environment that allows HTTP requests. There are myriad third-party client libraries that make this even easier from your programming language of choice. CouchDB’s built in Web administration console speaks directly to the database using HTTP requests issued from your browser.

It’s a “distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API”. Data is stored in ‘documents’, which are essentially key-value maps themselves, using the data types you see in JSON. CouchDB can do full text indexing of your documents, and lets you express views over your data in Javascript. I could imagine using CouchDB to store lots of data on users: name, age, sex, address, IM name and lots of other fields, many of which could be null, and each site update adds or changes the available fields. In situations like that it quickly gets unwieldy adding and changing columns in a database, and updating versions of your application code to match. (from: Anti RDBMS a List of Distributed Key Value Stores)

good intro: InfoQ: CouchDB From 10K Feet

MongoDB

MongoDB is not a key/value store, it’s quite a bit more. It’s definitely not a RDBMS either. I haven’t used MongoDB in production, but I have used it a little building a test app and it is a very cool piece of kit. It seems to be very performant and either has, or will have soon, fault tolerance and auto-sharding (aka it will scale). I think Mongo might be the closest thing to a RDBMS replacement that I’ve seen so far. It won’t work for all data sets and access patterns, but it’s built for your typical CRUD stuff. Storing what is essentially a huge hash, and being able to select on any of those keys, is what most people use a relational database for. If your DB is 3NF and you don’t do any joins (you’re just selecting a bunch of tables and putting all the objects together, AKA what most people do in a web app), MongoDB would probably kick ass for you.

Oh, and did I mention that, of all the NoSQL options out there, MongoDB is the one of the only ones being developed as a business with commercial support available? If you’re dealing with lots of other people’s data, and have a business built on the data in your DB, this isn’t trivial.

On a side note, if you use Ruby, check out MongoMapper for very easy and nice to use ruby access.(from: NoSQL If Only It Was That Easy)

MongoDB: Dwight Merriman, 10gen (slides, video)

Cassandra

Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store. Cassandra brings together the distributed systems technologies from Dynamo and the data model from Google’s BigTable. Like Dynamo, Cassandra is eventually consistent. Like BigTable, Cassandra provides a ColumnFamily-based data model richer than typical key/value systems.

Cassandra was open sourced by Facebook in 2008, where it was designed by one of the authors of Amazon’s Dynamo. In a lot of ways you can think of Cassandra as Dynamo 2.0. Cassandra is in production use at Facebook but is still under heavy development.

Sounded very promising when the source was released by Facebook last year. They use it for inbox search. It’s Bigtable-esque, but uses a DHT so doesn’t need a central server (one of the Cassandra developers previously worked at Amazon on Dynamo). Unfortunately it’s languished in relative obscurity since release, because Facebook never really seemed interested in it as an open-source project. From what I can tell there isn’t much in the way of documentation or a community around the project at present. (from: Anti RDBMS a List of Distributed Key Value Stores)

good intro: Up and Running With Cassandra

MySQL Cluster / NDB

Although it is not your native Key Value Store, I found it interesting to put on the list. While it is commonly used through an SQL interface, the architecture and performance is exactly what you want: Cloud-like sharding, very good performance on key-value lookups, etc… And if you don’t want the SQL, you can use the NDB API directly, or REST through mod_ndb Apache module (http://code.google.com/p/mod-ndb/).

This would score high on your list if you evaluated it:

- Transparent sharding: Data is distributed through an md5sum hash on your primary key (or user defined key), yet you connect to whichever MySQL server you want, the partitions/shards are transparent behind that.

- Transparent re-sharding: In version 7.0, you can add more data nodes in an online manner, and re-partition tables without blocking traffic.

- Replication: Yes. (MySQL replication).

- Durable: Yes, ACID. (When using a redundant setup, which you always will.)

- Commit’s to disk: Not on commit, but with a 200ms-2000ms delay. Durability comes from committing to more than 1 node, *on commit*.

- Less than 10ms response times: You bet! 1-2 ms for quite complex queries even.

Other KVS I have in my backlog to expand on here later:

LightCloud

Lux IO

Flare

Tx

Oracle Berkeley DB

Ringo

Redis

Scalaris

Kai

HBase

Hypertable

Dynomite

MemcacheDB

ThruDB

And as a bonus – there is another quite interesting initiative started by Yehuda Katz:

Moneta

Moneta: A unified interface for key/value stores
================================================
 
Moneta provides a standard interface for interacting with various kinds of key/value stores.
 
Out of the box, it supports:
 
* File store for xattr
* Basic File Store
* Memcache store
* In-memory store
* The xattrs in a file system
* DataMapper
* S3
* Berkeley DB
* Redis
* SDBM
* Tokyo
* CouchDB
 
All stores support key expiration, but only memcache supports it natively. All other stores
emulate expiration.
 
The Moneta API is purposely extremely similar to the Hash API. In order so support an
identical API across stores, it does not support iteration or partial matches, but that
might come in a future release.

Any additional info / projects are welcome!


19
Aug 09

Erlang ! { me, Hello }.

send hello message to Erlang

Erlang has been around since I was 7, so by this point I had 23 years to say hello to Erlang, but just did not get to it until today.. At 10 I was busy with chess, at 15 with girls, at 20 girlfriends, at 25 I don’t remember (that was a quite frequent condition at that age), and only now at 3 o’clock in the beautiful Wednesday morning, here in one of many Chicago hotels, I can really say it outloud: “Erlang ! { me, Hello }.”, or as we all use to saying it “System.out.println( “Hello to Erlang from me” );”, or using the language of our grandparents: “Me: Hello Erlang!”

I already spent about 30 minutes playing with it – 5 minutes with erl interpreter, 5 minutes coding, and 20 minutes figuring out how the heck I can execute it from command line with passing parameters to functions. So here it goes 30 minute summary in “count yourself” number of sentences…

From “Erlang – Quick Start“, I stole a factorial (fac) function:

-module(test).
-export([fac/1]).
 
fac(0) -> 1;
fac(N) -> N * fac(N-1).

“Compile the program by typing c(test) then run it” said Erlang site, so I did:

3> c(test).
{ok,test}
4> test:fac(20).
2432902008176640000
5> test:fac(40).
815915283247897734345611269596115894272000000000
6>

What I wanted to do now is to have a more useful factorial that can actually take a number parameter and run from a command line. See, Erlang is cool with numbers, because it uses arbitrary-sized integers when it does integer arithmetic, and I wanted to see that 10000.. digit number on my screen, but I wanted to do it from command line…

Hence, reading further the Erlang’s documentation, in Erlang How To FAQ I found “How to run an Erlang program directly from the unix shell:”

        matthias >erl -compile hello
        matthias >erl -noshell -s hello hello_world -s init stop
        hello, world

Great, I thought, let’s run my factorial function then:

$ erl -noshell -s test fac 4 -s init stop
{"init terminating in do_boot",{undef,[{test,'test:fac',[['4']]},{init,start_it,1},{init,start_em,1}]}}
 
Crash dump was written to: erl_crash.dump
init terminating in do_boot ()

I tried many different combinations, prefixes, suffixes, but nothing seemed to work. Google (at this time) did not really help, and I spend another 5-10 minutes by going to actually study the language from multiple online resources I could find. After aggregating the knowledge, here goes a solution:

-module( matematika ).		% module name = file name
-export( [factorial/1] ).		% exporting a factorial function, that takes exactly 1 argument
 
% public, since exported above
factorial( [CommandLineParameter] ) ->
 
	% converting the input parameter to an integer, so we can use it in (private) fac below
	Number = list_to_integer( atom_to_list( CommandLineParameter ) ),
	FactorialResult = fac( Number ),
 
	io:format( "~w! = ~w~n" ,[Number, FactorialResult] ),			% pretty much like printf in C
	init:stop().											% need to explicitly stop it
 
% private, since not exported
fac( 0 ) -> 1;					% if 0, then return 1
fac( N ) -> N * fac( N - 1 ).			% else return fac of N minus 1

(I put %comments above for myself, so I can make sense of it tomorrow)

Let’s run it now:

$ erl -noshell -s matematika factorial 10

10! = 3628800

Awesome! My first stolen and adapted Erlang creation! Let’s see that factorial of .. let’s say 2500 :)

$ erl -noshell -s matematika factorial 2500


2500!=162888842416926354689668105747439663365399942834366593333
761170598517395953006666015681181171091114301822189949967063775
407379642957266480360849144773982699565766503949953039081536069
313589385624248687168633365117877728319632346514905978458047074
520807127737619451831790023662437656379915366899692425817099473
955735537991551620610205879561628364536090561091825520933523438
440298824173752468219542814600203368965255916069562338913433294
969546310263930229454748650689662592679638050717072642347493989
468072742236518740460239946352245451040613097756653973305720645
026457997934905356924399618617581860376174835804874205168542257
467008667252720784248969925977883224857503131037675382806351903
130554386521130700598953600694590165036980214021274304347037205
774546036842214862077129715702791830982471445806697511922924126
875707763824427831458131252725129871400134654305773736954160374
386043307314954277237484986013167770729137200202006247592856875
946971039429028314584331171481048021391502558449541563727025722
429319793486407721042419353225446943557177410280427218310573933
839468119502298621190184926686015339505156759957938618691111894
105137524428488796590017749394464101657140531047449031317150211
285312051145217906000448322292856476064080179041772517805638616
704522178956984018390162683438304694297727727823412207694734265
878202872900194730775246958252155279043555763913056000888393253
937210136778443737969895720575345197710315491879632577212080296
732791524306529332768002582234532193839787438122696823349137174
760687670811121707247122877205618078452290605963728534389393406
703483582596248272104119965697657195713053485619074455216492879
719763758474871783557654928157780691218383646855409834599921063
373144702996594627688077741944550267192758309026313016206320680
530057452746436412708183108931890404685083431502083760663324657
349706015263327982666486689576849283883469142513936741022368381
903094157650249629927012864342540407330646247523995884057015184
717062826800920338962166558742062917836339935141477580556616102
759761599188076139416375666490347795870693771994555537476372358
925544445791134700553339780029989334462364486499563386435498770
970697902521176942715439141796399164240719914064566047833333979
658667979051009689054775584486605430424545544714920455985028492
775158386405002083658607397637102066859718496781089357617987825
390662781413816362946370821897681257991937027979675382384665624
733872791767882787048074812304136442761397202291044563080832580
377638267813956876382413025080202917826793584257121650412123520
8825054296165661030756208371742686402825404804558501327839670
731298809850930719924452525141301863810787127140637580161952796
470931012669932742565234239616031337114081022694921413641260386
424388652301371711255153268827616495293442715781089495795404683
744579676459521729702016200147034375778237008585095355232063710
088291957991216310837002831440396924100323429063456827045895594
917126433490705797776990807538192111396635158758664846773837413
564155213989495350785689041240261464178518418465026963508203252
038246166655605208324074965984192733197462771017672726300923288
607540014472757890113403434211921496288437000162551272645252320
615215716654175248938850328046313070690361405371373329623736166
731299101093298365654056037733083226277004269609573104069448790
684864546219098996171110998911324197479806896470305987119560932
856582719643423019817880041224237194274668604715496198407207355
809431389490372488422206677783166941973289816036063374723748298
696836902300888969044824525828910570687623075008425420179724412
174632013134752558921448609478176626573353890791801685222886849
990731518133839408072332112603244018982882369997032825586118714
392208201914776888366261219130250913546151105147763080829651928
290074106631605007724254431488105804572887069328232683043301900
466160052172383665181738152989844063638391709547589900409420631
7468376377314153856018840069377218558903334939371343395777264
426365318130887683599836088345497158322556553595094840894654614
406383376396868199531042942079405403474627428650274595771829568
255995466465003325664472293654101201924430539079190443194487046
380416281749159026300696294947812447714612418298771332692848211
154125929312681287195951419960699391722211424311898584816200565
373258799897110855302098610980845978364362685231061369005215230
788471798866541648729889538439509890111560301062943923923769988
997147648363950708508382034563798637202146673551309938897540440
786998003118907883241416258343188784000679367284896011745823289
734561911522255979545433492376668285434347783001266108600121473
498779217963921384546755156766188951735866744108093964636023471
349527019498920540070752394649104708510770999131507315610483632
330982876174119389012968417760250231722884408641474340374893362
833795392300949100935853949419952854923937872507275773843355108
735204114864752466565493366395162830913421138498633186453324268
762572668445390770903011717202242312950408723851830364004678106
656009078222984968635868416766693788940980240045058068275614302
839043567118028698385054274108921811298084233976240006277809181
325997330000786683674194376654268377368369949823431206244095885
960702059149846318232687179678572748082487793549844548187548311
400556064744050666472476457947299489568330636742840865513804654
776149494225932236964194164784786218640251626514525549920894810
545056499250214652147224638246034892251401029945773476251196919
612592752826016419187902895630887737745713281219681141023586882
321633118519558748181198803719862314967759867112077291645583600
510531017148754676649505951293767894337326791942862414631993126
449392270145239108425380056269282654566300820253116701666036949
374826952156733286262700190206665725355635411559887439781283350
874285109852181238143636938256407098028863909426094536431398247
852543854757537942646791855344906017659039384905122375234004320
416026882823286655632286053156546748028668750196973803337027786
891236116711910780819707846081218679760982625290715984522150204
916425232591826590887410401320568045430558979943145617193692059
705465421939829497586908594951642657711985208269462425640322372
314303330475772671495014049400654155619566942191085558759378416
790160458835621823599743742665152596726797511419955382867531134
959437002011587472231152746047820309830039699792147174883667855
760287780241438038062954704157981072533480950728570115662921849
376586317866043009133972729549191452640224225320222817060812601
144385749818368781373067888827129923654130858902252804622762751
192618314790547633755929057344102888830740499275949997990026813
425216747458578654838647236496508876074530642629714199563056483
975900965902520059913821418901761372054615449749487163282905699
188377895083782877246773269374715942460000414707728522199826307
191493572953655749106244370767768017669677808468909623091993519
298870943235516530358122980923396832082235382703392292864000000
000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000

Now, THAT is cool!