HBase Scan: Let me cache it for you

HBase uses Scanners to SELECT things FROM table WHERE this AND that. An interesting bit that HBase scanners use is caching.

Since HBase designed to sit between a low level HDFS and a high level SQL front end, many of its APIs are “naturally leaking” which makes it harder to come up with good names, since they have to fit both worlds.

Scanner “caching” is a good example of a feature name that makes perfect sense internally in HBase, and is more than confusing for an HBase client.

Caching scans.. what!?


Here is how it works in a nutshell:

Scanners are iteratees that return results by calling “.next” on them.

Once the “.next” is called on a scanner, it will go and fetch the next result that matches its filter (i.e. similar to the SQL’s WHERE clause).

Here is the tricky part. Before the scanner returns that single result, it caches a chunk of results, so the next call to “.next” would come from this chunk, which effectively is a local “cache”. This is done so the trip to the region server for each call to “.next” can be avoided:

@Override
public Result next() throws IOException {
  // If the scanner is closed and there's nothing left in the cache, next is a no-op.
  if (cache.size() == 0 && this.closed) {
    return null;
  }
  if (cache.size() == 0) {
    loadCache();
  }
 
  if (cache.size() > 0) {
    return cache.poll();
  }
 
  // if we exhausted this scanner before calling close, write out the scan metrics
  writeScanMetrics();
  return null;
}

the snippet above is from HBase ClientScanner.

The loadCache() fetches a chunk size (read “cache size”) from a region server, and stuffs it in cache, which is just a Queue<Result>, that will be drained on all subsequent calls to “.next”.

Oh.. and Max Result Size is in bytes


Why is that tricky? Well, the tricky part is the size of this “chunk” that will be cached by the scanner on each request.

By default, this size is Integer.MAX_VALUE. Which means that the scanner will try to cache as much as possible, which is an Integer.MAX_VALUE or “hbase.client.scanner.max.result.size” which is by default 2MB. Depending on the size of the final result set, and on how scanners are used this can get unwieldy pretty quickly.

“hbase.client.scanner.max.result.size” is in bytes (not row numbers), so it is not exactly high level SQL’s “LIMIT” / “ROWNUM”. If not properly set, “caching” may end up either taking all the memory or simply return timeouts (e.g. “ScannerTimeoutException” which could get routed to “OutOfOrderScannerNextException”s, etc..) from the region server. Hence to get these two: “max size”, “cache size” play together nicely is important to scan smoothly.

Here is another good explanation by my friend Dan on why HBase caching is tricky and what it really does behind the covers.

Clients are Humans


If scanner “caching” was named a bit different, for example a “fetch size” would not be too bad, it would be a bit more obvious for the HBase client API. But then behind the covers, these values are truly “cached” and read from cache in each call to “scanner.next()”.

Both make sense, but I would argue that a client facing API should take precedence here. HBase developers already know the internals, and fetch size (as a property set to Scan) would not hurt them. “caching” can still be used internally under HBase covers.