"; */ ?>

Adding Simple to HBase

Mutate and Complect!


The usual trend in functional programing is “immutable” => good, “mutable” => bad. Not true for all cases, but it is true for most, especially when multiple threads, processes, machines are involved.

HBase APIs are very much based on mutation. Since there are so many different ways to, for example, “scan” data, instead of using overloaded constructors or builders, HBase relies on setters. Count the number of setters in Scan, for example.

This just does not sit well with “immutable is good” feeling.

A long time HBaser might not agree, but I believe a learning curve is quite steep for HBase newcomers. Тhis is due to many things, Hadoop architecture, data model, row key design, co-processors, all the cool things it does. But mainly, I think, this is due to a heavy set of APIs that are just not simple.

Connecting “with” HBase


Here is an example from HBase book on how to find all columns in a row and family that start with “abc”. In SQL this would be done with something like:

SELECT * FROM <table> WHERE <row> LIKE 'abc%';

In HBase (this is a book example) it would be:

HTableInterface t = ...;
byte[] row = ...;
byte[] family = ...;
byte[] prefix = Bytes.toBytes("abc");
Scan scan = new Scan(row, row);        // (optional) limit to one row
scan.addFamily(family);                // (optional) limit to one family
Filter f = new ColumnPrefixFilter(prefix);
scan.setFilter(f);
scan.setBatch(10);                     // set this if there could be many columns returned
ResultScanner rs = t.getScanner(scan);
for (Result r = rs.next(); r != null; r = rs.next()) {
  for (KeyValue kv : r.raw()) {
    // each kv represents a column
  }
}
rs.close();

and that is given that data is not actually read into a comprehendible data structure (done in a nested loop), and concepts like row / family / column / scan, etc.. are well understood. I say it is not that simple. But can it be?

I say yes, it can. How about:

(scan conn table-name :starts-with "abc")

while a connection (conn) needs to be created and a family might be added if needed, this is a much simpler way to “connect with” HBase.

These are some of the reasons cbass was created: mainly to add “simple” to HBase.