"; */ ?>


19
Dec 09

UnBuzz Me

unbuzz meI am so tired of this artificial and useless layer that is being created on top of our language.

I am pretty sure it is a problem in any field, but I am a technical guy, so I am talking about technical buzz words.

As I look at it, big corporations use these useless acronyms to impress and sell. Individuals use buzz words to hide their own incompetence behind them.

Several years ago, when I joined IBM, the first thing I learned was not wonders of open source commitment, or coolness of IBM research labs, but “what TLA stands for”. Because without knowing that, I could not really read any documentation, or talk to most of the people in workplace. So what does TLA stand for? Three Letter Acronym.

And while TLA on its own is not harmful, its misuse in modern tech world is huge, and it makes it uber confusing for newcomers to understand what means what.

Afer I left IBM, I went even deeper into consulting, and just could not survive without learning more, and more, and more, and more… a little bit more of TLAs. Sometimes it’s useful to understand what the client really wants, sometimes just useful to be able to “talk corporate” to people who can’t talk any other language.

So what kind of misuse of TLAs am I talking about?

Let’s take for example “URL” (Uniform Resource Locator) – is this TLA? Yes. Is it being misused? No. Why not? Because it means one thing, and one thing only. And when we say URL, we mean just that – something that specifies where an identified resource is available and the mechanism for retrieving it.

Now let’s look at “SOA” (Service Oriented Architecture) – is this TLA? Yes. Is it being misused? Absolutely! Anywhere you go, you hear about SOA. People that work by your side, they do SOA. People who hire you, they want you to know SOA. Any software product, if it is any big, will be “SOA ready”, “SOA compliant”, “SOA leader”, etc… So you wonder, what SOA really is.. Well, nothing new, nor it is any revolutionary concept. SOA is what programming was all about from the day one – bunch of services exposed for others to use. That is IT! It is not a concept at all. So we have a TLA, and all it does it just BUZZes – no meaning behind it, but lots, and lots of confusion. Bunch Of Exposed Services, so how about I’ll call it BOES, or just BS maybe?

If SOA is not enough, let’s look at another one – “BPM” (Business Process Management) – is this TLA? Yes. Is it being misused? Absolutely! When you look at any Business, it will have at least some kind of Process, right? And it is a known fact that a process needs to be Managed in order to function. By people or programmatically, but it will be managed. So what does “Business Process Management” really mean? The answer is “nothing”. So when you hear about somebody saying that “we need to apply a BPM solution here”, think about a simple “if-then-else” conditional sentence. Example can be “if the item is ordered, then ship it”. That is it! Sometimes people refer to it as “Work Flow”, instead of BPM [where ‘M’ sometimes stands for Modeling], which makes a bit more sense. But should it have this confusing “BPM” title? No! But guess what is easier to market (read “sell”): “Business Process Management” solution or a bunch of simple “if-then-else” statements?

And these are just a couple of tech buzz words examples that make no sense.

Looking beyond tech, we have empty terms/words like: Enterprise, Immersion, Leverage, Paradigm Shift, Synergy, Tipping Point, etc… All of them create this confusing and useless layer, that hurts our language, communication, clarity, and makes us live in our buzzy little clouds with no clear idea where we are and what we really know. And no, I did not mean SaaS “cloud” YABW (Yet Another Buzz Word) or even PaaS cloud, nor I meant IaaS cloud… grrrr, buzz all around.


26
Nov 09

Learn Grails by Its Plugins

Grails PluginsHow do you approach learning new technology? Google it? Buy a book? Go to training? Start using it for your work?

Well, I figured that the answer to that would depend on the technology itself. And although I bought a Grails book, and spend sometime googling and building little Grails POC projects, and actually used it for work, I still felt that something is missing, that there is that gap between me and Grails.

That is when I discovered that the best way to learn Grails, to understand its guts, is to contribute to it.

Normally, in order to get a commiter status to a mature open source project you would have to open lots of JIRAs, provide many patches, donate many ideas, etc… In case of Grails, it is actually extremely easy and fast. Here are three easy steps to see you code posted on “grails.org”:

For more logistics refer to the official grails create plugins guide. But that is really it! That is how easy it is to join Grails developer community, and grow from a Consumer to the Creator.

I got lucky and saw a live presentation by Jeff Brown at Groovy on Grails One Day Seminar in Philly. That is when I got excited, and started to work on my own plugin during the seminar’s hackathon. Two days later I had a 0.1 version of plugin commited to github, three days later released it to grails.org. Just think about it – three days from scratch, and you can become a creator of an official Grails plugin – how cool is that?

Now go and create that plugin!


28
Oct 09

Spring Insight in Action – 5 Minutes From Scratch

Spring Source Logo

Spring Insight gives deep visibility into your application’s real activity on a request-by-request basis. For any request you can see all the JDBC queries it made, how much time it took to render, or timings for any of your major Spring beans.” – said Jon Travis in his article on a Spring blog.

That is such a great idea I though, and watched Jon’s screencast.

What actually surprised me is how simple and quick it is to try the Spring Insight in action. Here are these 3 simple steps:

Step 1. Download Spring Tool Suite

I like to think about Spring Tool Suite as an Eclipse on Spring rocks (Spring IDE, Spring Interactive Tutorials, Exception Resolution, and much more). And now ( since version 2.2 ), it comes with tc Server Developer Edition that includes Spring Insight, so the easiest way to try out Spring Insight is to download Spring Tool Suite, since it comes with it: http://www.springsource.com/products/springsource-tool-suite-download

Note that “tc Development Edition” can be downloaded and run on its own, Spring Tool Suite (STS) gives us something extra: “ready to go” sample applications that we can deploy to tc Server – all in one.

Step 2. Import a sample web application

About those sample applications… Now as you have STS unpacked/unzipped, you can run it and go to “File” –> “Import” –> “Spring Tool Suite” –> “Sample Projects”. You should see three sample applications “Hotel Booking”, “PetClinic” and “SpringTravel”. I chose “PetClinic”, but it does not really matter, we can use any sample application to play with Spring Insight.

Once you click “Ok”, STS will ask you if that is ok to download 20+ MB of JARs, you, of course, having a huge HD, would say yes, and.. here you go 1 minute later you have yourself a fully functional ready to deploy web app!

Step 3. Deploy a web application to tc Server Developer Edition

Now right click on PetClinic app, -> Run on server -> Choose Spring tc Server (it is not going to say anything about Insight, but it’s there :) ).

At this point STS will ask you to browse to the location of your tc server, it should be under the directory you installed STS, e.g in my case it was: “/opt/springsource/tc-server-6.0.20.C”

After you click “Finish”, you should see “INFO: Deploying web application archive insight.war” as one of your deployment messages in STS console.

After another minute, once your app (PetClinic in my case) is deployed, go to http://localhost:8080/insight and you should be good to go:

Spring Insight

Insight Away!


21
Oct 09

They say offshore QA team, I say AUTOMATE IT!

automate testing

Part I. Poor Trees

I really see no advantage of having all these test scripts that are made as excel spread sheets and then printed out in hundreds (don’t you hear trees begging: “save us”?) and given to QA team members to spend a couple months to follow and make sure every little condition/case is met.

Guess what happens when new requirements come in, or the old ones get changed… Spend a week or two going through all these excel files, manually refactoring every little condition/case effected. And then what? Regression testing ..ummm.. another two months?

Oh.. shoot requirements changed in the middle of testing, what should we do.. hmm.. print more paper. Poor trees…

Part II. Can I get some sleep please?

Let’s make all this process twice as effective, let’s offshore 50% of the QA (testing). Oh.. ok:

– Hello? Do you speak English?

– Yes, hi we are a great software company, and yes we do speak English.

– Great! Can I outsource 50% of my testing to you guys, so we have 24 hour coverage?

– Sure – we’ll be glad to do that for you.

– Awesome! I just sent you the package with everything that needs to be done.

2 a.m. “Hello. We are looking at requirements, and we did not find any definition for ‘A’..”

3 a.m. “Hello. We need to make an important decision, and we were wondering… ”

5 a.m. “Hello. Before we send you today’s status, we wanted to make sure..”

Part III. Open up a little

Of course we need people in charge of QA, and we need QA teams – the guys are great! They make us shine when we ship our high quality creation out the door. But can they be more effective, and do less work at the same time? Can they use technologies that are available, instead of using MS Office for things that it was really not made for? Can they ensure that whoever comes to test after them will not be lost, can pick up where they left off, and keep doing an awesome job? Yes, Yes, and Yes, and many more Yeses!

The hardest part is to let go the fear and the standards that were set 20, 15, 10 years ago for Software Testing. Yes, it made sense then ( maybe :) ), but now there are dozens of tools and practices that will increase productivity tenfold, you just need to open up a little. You can do it!

Part IV. Automate yourself!

Let’s look at the simple example of using some of these time saving tools. Here is how one of the test cases can be automated by using easyB and Selenium. The automated test itself will worth a thousand words, so here it is:

before "start selenium", {
 given "selenium is up and running", {
   // start selenium
 }
}
 
scenario "a valid person has been entered", {
 
 when "filling out the person form with a first and last name", {
   selenium.open("http://acme.racing.net/greport/personracereport.html")
   selenium.type("fname", "Britney")
   selenium.type("lname", "Smith")
 }
 
 and "the submit link has been clicked", {
   selenium.click("submit")
 }
 
 then "the report should have a list of races for that person", {
   selenium.waitForPageToLoad("5000")
   values = ["Mclean 1/2 Marathon", "Reston 5K", "Herndon 10K", "Leesburg 10K"]
   for( i in 0..< values.size() ){
      selenium.getText("//table//tr[${(i+3)}]/td").shouldBeEqualTovalues[i]
   }
 }
}
 
after "stop selenium", {
  then "selenium should be shutdown", {
    // stop selenium
  }
}

And, let’s say Groovy is not your first language, and maybe you’re not a programmer at all, but really, can you read, and understand, that:

 when 'filling out the person form with a first and last name'
 and   'the submit link has been clicked'
 then  'the report should have a list of races for that person'

Ok, granted, you may not know regular expressions, and “//table//tr[${(i+3)}]/td” may be just a little over your head, but guess what.. It was auto-generated (by selenium) for you the first time you drove through the screens. And you know what the beauty of it is? It can run on it’s own, and it can tell you whether your application satisfy “this” particular requirement. You can schedule it to run every day, if you’d like, or every time the code is modified, or every Labor Day, or… you get the point. :)

Part V. We still need people, but they can be more productive!

“So what?”, you say, “One little test case – we have hundreds, and it would take us a month to create all these “automated” cases and scripts.”, and I say – yes, it may take a month now, but it will save you a year later. Why? Because if you create these scripts carefully, reflecting all the business requirements, these scripts will turn into the fastest, most accurate and “easy to interact with” QA team on Earth for your company/project. And that I would say, worth a month of work!

easyB example was taken from an extremely good presentation Industrial Strength Groovy by Paul King. Thank you Paul!


30
Aug 09

Key Value Store List

B Tree

While playing with CouchDB, I decided to expand on the subject and research to see what else is out there. Apparently there are lots of cool implementations of Key Value Stores. And since to try them all will take a long time, I decided make some notes of what I found, to simplify my journey.

Here I created a simple reference list of different, non-commercial implementations of Key Value Stores. Let me know if there are more interesting projects that are not in this list, so we can keep this list updated.

Tokyo Cabinet

A C library that implements a very fast and space efficient key-value store:

The database is a simple data file containing records, each is a pair of a key and a value. Every key and value is serial bytes with variable length. Both binary data and character string can be used as a key and a value. There is neither concept of data tables nor data types. Records are organized in hash table, B+ tree, or fixed-length array.

Besides bindings for Ruby, there are also APIs for Perl, Java, and Lua available.

To share Tokyo Cabinet across machines, Tokyo Tyrant provides a server for concurrent and remote connections.

Speed and efficiency are two consistent themes for Tokyo Cabinet. Benchmarks show that it only takes 0.7 seconds to store 1 million records in the regular hash table and 1.6 seconds for the B-Tree engine. To achieve this, the overhead per record is kept at as low as possible, ranging between 5 and 20 bytes: 5 bytes for B-Tree, 16-20 bytes for the Hash-table engine. And if small overhead is not enough, Tokyo Cabinet also has native support for Lempel-Ziv or BWT compression algorithms, which can reduce your database to ~25% of it’s size (typical text compression rate). Oh, and did I mention that it is thread safe (uses pthreads) and offers row-level locking?

good intro: Tokyo Cabinet Beyond Key Value Store

Project Voldemort

Voldemort is a very cool project that comes out of LinkedIn. They seem to even be providing a full time guy doing development and support via a mailing list. Kudos to them, because Voldemort, as far as I can tell, is great. Best of all, it scales. You can add servers to the cluster, you don’t do any client side hashing, throughput is increased as the size of the cluster increases. As far as I can tell, you can handle any increase in requests by adding servers as well as those servers being fault tolerant, so a dead server doesn’t bring down the cluster.

Voldemort does have a downside for me, because I primarily use ruby and the provided client is written in java, so you either have to use JRuby (which is awesome but not always realistic) or Facebook Thrift to interact with Voldemort. This means thrift has to be compiled on all of your machines, and since Thrift uses Boost C++ library, and Boost C++ library is both slow and painful to compile, deployment of Voldemort apps is increased significantly.

Voldemort is also intersting because it has pluggable data storage backend and the bulk of it is mostly for the sharding and fault tolerance and less about data storage. Voldemort might actually be a good layer on top of Redis or Tokyo Cabinet some day.

Voldemort, it should be noted, is also only going to be worth using if you actually need to spread your data out over a cluster of servers. If your data fits on a single server in Tokyo Tyrant, you are not going to gain anything by using Voldemort. Voldemort however, might be seen as a good migration path from Tokyo * when you do hit that wall were performance isn’t enough.(from: NoSQL If Only It Was That Easy)

CouchDB

Apache CouchDB is a document-oriented database that can be queried and indexed in a MapReduce fashion using JavaScript. CouchDB also offers incremental replication with bi-directional conflict detection and resolution.

CouchDB provides a RESTful JSON API than can be accessed from any environment that allows HTTP requests. There are myriad third-party client libraries that make this even easier from your programming language of choice. CouchDB’s built in Web administration console speaks directly to the database using HTTP requests issued from your browser.

It’s a “distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API”. Data is stored in ‘documents’, which are essentially key-value maps themselves, using the data types you see in JSON. CouchDB can do full text indexing of your documents, and lets you express views over your data in Javascript. I could imagine using CouchDB to store lots of data on users: name, age, sex, address, IM name and lots of other fields, many of which could be null, and each site update adds or changes the available fields. In situations like that it quickly gets unwieldy adding and changing columns in a database, and updating versions of your application code to match. (from: Anti RDBMS a List of Distributed Key Value Stores)

good intro: InfoQ: CouchDB From 10K Feet

MongoDB

MongoDB is not a key/value store, it’s quite a bit more. It’s definitely not a RDBMS either. I haven’t used MongoDB in production, but I have used it a little building a test app and it is a very cool piece of kit. It seems to be very performant and either has, or will have soon, fault tolerance and auto-sharding (aka it will scale). I think Mongo might be the closest thing to a RDBMS replacement that I’ve seen so far. It won’t work for all data sets and access patterns, but it’s built for your typical CRUD stuff. Storing what is essentially a huge hash, and being able to select on any of those keys, is what most people use a relational database for. If your DB is 3NF and you don’t do any joins (you’re just selecting a bunch of tables and putting all the objects together, AKA what most people do in a web app), MongoDB would probably kick ass for you.

Oh, and did I mention that, of all the NoSQL options out there, MongoDB is the one of the only ones being developed as a business with commercial support available? If you’re dealing with lots of other people’s data, and have a business built on the data in your DB, this isn’t trivial.

On a side note, if you use Ruby, check out MongoMapper for very easy and nice to use ruby access.(from: NoSQL If Only It Was That Easy)

MongoDB: Dwight Merriman, 10gen (slides, video)

Cassandra

Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store. Cassandra brings together the distributed systems technologies from Dynamo and the data model from Google’s BigTable. Like Dynamo, Cassandra is eventually consistent. Like BigTable, Cassandra provides a ColumnFamily-based data model richer than typical key/value systems.

Cassandra was open sourced by Facebook in 2008, where it was designed by one of the authors of Amazon’s Dynamo. In a lot of ways you can think of Cassandra as Dynamo 2.0. Cassandra is in production use at Facebook but is still under heavy development.

Sounded very promising when the source was released by Facebook last year. They use it for inbox search. It’s Bigtable-esque, but uses a DHT so doesn’t need a central server (one of the Cassandra developers previously worked at Amazon on Dynamo). Unfortunately it’s languished in relative obscurity since release, because Facebook never really seemed interested in it as an open-source project. From what I can tell there isn’t much in the way of documentation or a community around the project at present. (from: Anti RDBMS a List of Distributed Key Value Stores)

good intro: Up and Running With Cassandra

MySQL Cluster / NDB

Although it is not your native Key Value Store, I found it interesting to put on the list. While it is commonly used through an SQL interface, the architecture and performance is exactly what you want: Cloud-like sharding, very good performance on key-value lookups, etc… And if you don’t want the SQL, you can use the NDB API directly, or REST through mod_ndb Apache module (http://code.google.com/p/mod-ndb/).

This would score high on your list if you evaluated it:

– Transparent sharding: Data is distributed through an md5sum hash on your primary key (or user defined key), yet you connect to whichever MySQL server you want, the partitions/shards are transparent behind that.

– Transparent re-sharding: In version 7.0, you can add more data nodes in an online manner, and re-partition tables without blocking traffic.

– Replication: Yes. (MySQL replication).

– Durable: Yes, ACID. (When using a redundant setup, which you always will.)

– Commit’s to disk: Not on commit, but with a 200ms-2000ms delay. Durability comes from committing to more than 1 node, *on commit*.

– Less than 10ms response times: You bet! 1-2 ms for quite complex queries even.

Other KVS I have in my backlog to expand on here later:

LightCloud

Lux IO

Flare

Tx

Oracle Berkeley DB

Ringo

Redis

Scalaris

Kai

HBase

Hypertable

Dynomite

MemcacheDB

ThruDB

And as a bonus – there is another quite interesting initiative started by Yehuda Katz:

Moneta

Moneta: A unified interface for key/value stores
================================================
 
Moneta provides a standard interface for interacting with various kinds of key/value stores.
 
Out of the box, it supports:
 
* File store for xattr
* Basic File Store
* Memcache store
* In-memory store
* The xattrs in a file system
* DataMapper
* S3
* Berkeley DB
* Redis
* SDBM
* Tokyo
* CouchDB
 
All stores support key expiration, but only memcache supports it natively. All other stores
emulate expiration.
 
The Moneta API is purposely extremely similar to the Hash API. In order so support an
identical API across stores, it does not support iteration or partial matches, but that
might come in a future release.

Any additional info / projects are welcome!