Dec 15

Super Powers and Their Mutable Friends

After releasing my bullet proof time series database most of the world’s high frequency companies started converting to it. In less than a day major Fortune 7.3 billion players adopted their solutions and embraced the simplicity and greatness of what my Clojure time series database delivered.

So what now? When all the money is made and the adoption rate is higher than I could ever predicted.. What now? Well, now it’s time to fix it, because it’s, um, broken.

Keys to Time

Here is the data example for the current broken solution:

(def events
  {1449088877092 {:GOOG {:bid 762.74 :offer 762.79}}
   1449088876590 {:AAPL {:bid 116.60 :offer 116.70}}
   1449088877601 {:MSFT {:bid 55.22 :offer 55.27}}
   1449088877203 {:TSLA {:bid 232.57 :offer 232.72}}
   1449088875914 {:NFLX {:bid 128.95 :offer 129.05}}
   1449088870005 {:FB {:bid 105.96 :offer 106.6}}})

It is a map: say we have a couple of events coming in at the exact same millisecond:

(def events [
  {:ts 1449088877203 :ticker :GOOG :event-id 1}    ;; <<
  {:ts 1449088876590 :ticker :AAPL :event-id 2}
  {:ts 1449088877601 :ticker :MSFT :event-id 3}
  {:ts 1449088877203 :ticker :TSLA :event-id 4}    ;; <<
  {:ts 1449088875914 :ticker :NFLX :event-id 5}
  {:ts 1449088870005 :ticker :FB   :event-id 6}])

Notice that Tesla and Google have the same timestamp. So the (sorted-map-by) would not work here, as it would re assoc them. Of course a custom comparator can be used that will not treat “the same keys as the same”, but then there is a problem with key collisions.

Natural Numbers

So here I present to you a massively refactored solution with its codebase experiencing a two fold increase. The one and only: “The Time Series Database in One Line of Clojure 2.0”, or simply “The Time Series Database in 2.0 Lines of Clojure”.

I’ll format the first line for a better readability:

(defn ts [{t1 :ts} {t2 :ts}] 
  (if-not (= t1 t2) 
    (compare t1 t2)

This is a simple comparator with a twist: when it sees two timestamps that are the same, it lies.

Now on to the second line, a “database codebase conclusion”, as I call it:

(def db (sorted-set-by ts))

And.. done.


Some tools and queries from a previous 1.0 product:

;; database with data
(defn with [db data] (reduce conj db data))
;; find data before a timestamp
(defn before [db ts] (subseq db <= {:ts ts}))
;; find data after a timestamp
(defn after [db ts] (subseq db >= {:ts ts}))

Let’s look at the database with data:

=> (with db events)
#{{:ts 1449088870005, :ticker :FB, :event-id 6}
  {:ts 1449088875914, :ticker :NFLX, :event-id 5}
  {:ts 1449088876590, :ticker :AAPL, :event-id 2}
  {:ts 1449088877203, :ticker :GOOG, :event-id 1}  ;; << same
  {:ts 1449088877203, :ticker :TSLA, :event-id 4}  ;; << timestamp
  {:ts 1449088877601, :ticker :MSFT, :event-id 3}}

slicing and dicing:

(before (with db events) 1449088876592)
({:ts 1449088870005, :ticker :FB, :event-id 6} 
 {:ts 1449088875914, :ticker :NFLX, :event-id 5} 
 {:ts 1449088876590, :ticker :AAPL, :event-id 2})
(after (with db events) 1449088876592)
({:ts 1449088877203, :ticker :GOOG, :event-id 1} 
 {:ts 1449088877203, :ticker :TSLA, :event-id 4} 
 {:ts 1449088877601, :ticker :MSFT, :event-id 3})

Super Hero Friends

While it is nice to be able to slice a sorted set with a lying comparator, at times, it may not be desirable to do so.

But every super hero has a true friend. Spiderman, for instance, has many. So does “The Time Series Database in 2.0 Lines of Clojure”. The friend’s name is multim and it’s also a Super.

Dec 15

Time Series Database in One Line of Clojure

If you ever worked in the financial sector, specifically high frequency trading, a time series database is a well known tool that orders up all those quotes, orders, trades for financial pleasure.

The are many of these databases available. The Wall Street being The Wall Street would of course primarily use proprietary ones, since, well, it’s proprietary :), but giving them a credit: they do outperform open source ones by a lot, at least presently (talking about millions per second).

Disrupting Time Series Business

So I decided to write an open source time series database that will outperform them all not necessarily by performance, but definitely by clarity and size. Get ready for this one line.

If you read this far that means you are ready, so let’s begin by creating a database:

(def db (sorted-map-by >))

Oh, by the way we are done. It’s the one and only: The Time Series Database.

Map is King of Data

Let’s use it. First we’ll need some data:

(def data
  {1449088877092 {:GOOG {:bid 762.74 :offer 762.79}}
   1449088876590 {:AAPL {:bid 116.60 :offer 116.70}}
   1449088877601 {:MSFT {:bid 55.22 :offer 55.27}}
   1449088877203 {:TSLA {:bid 232.57 :offer 232.72}}
   1449088875914 {:NFLX {:bid 128.95 :offer 129.05}}
   1449088870005 {:FB {:bid 105.96 :offer 106.6}}})

The format is simple {timestamp data}.

Now a query to have a database as a value with this data:

(defn with [db data] (merge db data))

And finally some time based queries, like before and after:

(defn before [database ts] (into {} (subseq database > ts)))
(defn after [database ts] (into {} (subseq database < ts)))



(before (with db data) 1449088877091)
{1449088876590 {:AAPL {:bid 116.6, :offer 116.7}},
 1449088875914 {:NFLX {:bid 128.95, :offer 129.05}},
 1449088870005 {:FB {:bid 105.96, :offer 106.6}}}
(after (with db data) 1449088877091)
{1449088877601 {:MSFT {:bid 55.22, :offer 55.27}},
 1449088877203 {:TSLA {:bid 232.57, :offer 232.72}},
 1449088877092 {:GOOG {:bid 762.74, :offer 762.79}}}

Beware, you, other time series databases!

P.S. Of course there is a possibility of events that came in at the exact same millisecond, so here is another line that solves it

Nov 15

Clojure Libraries in The Matrix

Clojure universe is mostly built on top of libraries rather than “frameworks” or “platforms”, which makes it really flexible and lots of fun to work with. Any library can be swapped, contributed to, or even created from scratch.

There are several things that make libraries great. The quality of its solution is of course the main focus which delivers the most value, but there are others. The one I’d like to mention is not how much a library does, but how little it should.

I like apples, you like me, you like apples

Dependencies are often overlooked when developing libraries. There are quite a few libraries that suffer from depending on something for either convenience, or for its built in example, or just in case, etc.

This results in downloading the whole maven repository when working on the project that depends on just a few of such libraries.

This also could create conflicts between the dependencies libraries bring and the real project required dependencies.

We can do better, and we should.

Those people don’t know what they are doing

The reason I bring it up is not because I am tired of these libraries, or it is time for a rant, but it is simply because I do it myself. And usually by the time I notice I did it, it requires significant rework to make sure developers that use/depend on my libraries do not bring “apples” that I like and they might not.

Useful vs. The Core

A great example of this is me including an excellent clojure/tools.logging as a top level dependency of mount. Mount manages application state lifecycle, and it would only make sense if every time a state is started or stopped, mount would log it:

dev=> (mount/start)
14:34:10.813 [nREPL-worker-0] INFO  mount.core - >> starting..  app-config
14:34:10.814 [nREPL-worker-0] INFO  mount.core - >> starting..  conn
14:34:10.838 [nREPL-worker-0] INFO  mount.core - >> starting..  nyse-app
14:34:10.844 [nREPL-worker-0] INFO  mount.core - >> starting..  nrepl
dev=> (mount/stop-except #'app.www/nyse-app)
14:34:47.766 [nREPL-worker-0] INFO  mount.core - << stopping..  nrepl
14:34:47.766 [nREPL-worker-0] INFO  mount.core - << stopping..  conn
14:34:47.766 [nREPL-worker-0] INFO  mount.core - << stopping..  app-config

It’s useful, right? Of course it is. As a developer that depends on mount, you don’t have to do it, it is already there for you, very informative and clean.

But here is the catch:

* what if you don’t like the way it logs it?
* what if you don’t want it to log at all?
* what if you use a different library for logging?
* etc..

In other words: “what if you don’t like or need apples and you eat bananas instead?”.

It ends up that “useful” is most of the time a red flag. Stop and think whether this “useful” feature is really the core piece of functionality, or is a bolted on “nice to have”.

Novelty Freshness of Refactoring

While it is not desired to have extra dependencies, and the above idea to include logging was not great, what was great are new thoughts during refactoring:

“Ok, I’ll remove logging, but now mount users won’t know anything about states..”

“Maybe they can use something like (states-with-deps) that would give them the current state of the application”:

dev=> (states-with-deps)
({:name app-config, :order 1, 
                    :started? true
                    :suspended? false
                    :ns #object[clojure.lang.Namespace 0x6e126efc "app.config"], 
                    :deps ()} 
 {:name conn, :order 2, 
              :started? true
              :suspended? false
              :ns #object[clojure.lang.Namespace 0xf1a66a6 "app.nyse"], 
              :deps ([app-config #'app.config/app-config])} 
 {:name nrepl, :order 3, 
               :started? true
               :suspended? false
               :ns #object[clojure.lang.Namespace 0x2c134117 "app"], 
               :deps ([app-config #'app.config/app-config])})

“That’s not bad, but what if they start/stop states selectively, or they suspended/resumed some states.. no visibility”

“Well, it’s simple, why not just return all the states that were affected by a lifecycle method?”

And that’s what I did. But I did not go through this thought process when I had logging in, since logging created an illusion of visibility and control, while in reality it gave “an ok” visibility and no control.

The solution just returns a vector of states that were affected:

dev=> (mount/start)
{:started [#'app.config/app-config 

The cool additional thing, and the reason it is a vector and not a set, is these states are in the vector in the order they were touched, in this case “started”.

Rules of The Matrix

While I made a mistake, I am glad I did. It gave me lots of food for thought as well as made me do some other cool tricks with robert hooke to demonstrate how to bring the same logging back if needed.

It does feel great to only depend on the Clojure itself, and a tiny tools.macro, which I use a single function from, and could potentially just grab from there, and cut my dependencies to The One.

clojure / stateComments Off on Managing Clojure app state since (reset)
Nov 15

Managing Clojure app state since (reset)

After shipping two large projects with component, and having several in works, I decided, Component is not for me, and wrote mount.

Here are the differences between mount and Component, and below is the story of why.

Java is Good

For the last several years Java started to get a bad rep by many people who now like “better” languages. Java is too verbose, too complex, too mutable, too ugly, too last century, etc..

I like Java, I liked it 10 years ago, I like it now. I did work in many languages, who doesn’t now days, and I still like Java. Java is solid. Java is simple. Yes, it is.

There are many corner cases that are well documented, and there are more that are not documented, but it does not make it complex. You can learn about Monads and Endofunctors on your own, while completing and shipping successful Java projects. Category Theory is not a prerequisite. Java is stable. I like Java.

Love is not solid, it’s more

Now I can’t say I love Java. Java is my good friend, we have a solid relationship. Sometimes we go to work together, but I do not have a feeling of excitement, I am not running (well, not too fast at least) to my laptop to try this cool Java thing, I would not spend a weekend with it unless we need to.

That’s where Clojure comes in. I love it. I won’t go into details on why I love it, first of all its personal, second of all there are plenty of other blog posts, books, videos that make Clojure shine. I just want to state that I love it. There is a difference.

Clojure. The Beginning.

I came to Clojure several years ago from a pretty common background: lots of Java and Spring. I like Spring a lot. It makes Java world shine, it taught me great ways to approach problems, it has great documentation and friendly community, I love friendly communities.

As I wrote more and more Clojure I fell in love with each new discovery, it made me think of time in a way Java didn’t. It greatly extended my reach into science behind a language.

Time went on and I started doing Clojure professionally. It is quite a different experience between using Clojure side by side with Java / Scala projects, using Clojure for tools and libraries, and using Clojure professionally: for products/applications. Products are very different and very stateful beings.

Clojure Developers are People

The question of state beyond a “map” or a “vector” in a lexical scope, but a product state, became the one of great importance. And the choices are not great here.

From talking to people I concluded that

* some people keep their product (application) states in “def”s
* some in “atoms”
* others in Component

The “Clojure Gods” talk a lot about state, but very rarely about an application state and codebase organization. I doubt Datomic uses Component, but I don’t know.

Since Component was gaining popularity, and I talked to JUXT people (great people btw), they seem to be very enthusiastic about it, I decided to give it a go.

Component Framework

When I started creating projects with Component, I already learned to like the Clojure way of functions and namespaces, so an object (record) oriented approach of Component was immediately suspicious.

Component is not exactly Spring of course. It aims to structure a stateful application, so it is reloadable in REPL, since REPL restart time is, well, slow.

In order to do that effectively, Component requires a whole app buy in, which makes it a framework rather than a library: another “unClojure” feeling that stayed with me while using Component.

Spring is a framework, and I like it. But in Java world that’s the culture. “Frameworks” is the approach. It is well accepted and tools are built around this.

Clojure world is all about libraries, and I love it, the same way I love open source solutions vs. closed packaged ones, i.e. open systems vs. closed systems. High cohesion, loose coupling, win win.

At Large

I understand that many people like Component, and I think their projects are based on it. Although it is not exactly evident, since most of products, Component would be needed for, are proprietary: enterprise (a.k.a. “at large”), “startups”, etc.. But there are several open source ones that look really good. A couple examples:

* Onyx
* BirdWatch

although BirdWatch switched from Component to system-toolbox:

“I have thrown out the Component library in master on GitHub, and I find using the systems-toolbox much more straight-forward”

I like Scotch

Long story short, while I delivered with Component, it did not deliver for me.

I rewrote Component projects with mount, and already shipped two of them. Both are live and blooming in prod.

Another team I work with liked this approach a lot, and rewrote one of their products in mount as well. First thing one of their developers said after trying it: “oh.. look, it’s like Clojure again!”.

It is, it is Clojure again for me too. I get it, Component may “do it” for many people, but alternatives are great!

I like Java, I love Clojure.

Oct 15

iterator-seq: chunks and hasNext

A couple of interesting fact to keep in mind about iterator-seq:

  • it calls hasNext right away
  • it “caches” reads by chunks of 32 items

Its implementation is quite simple and returns a lazy seq, but the above two is good to keep in mind when working with iteratees:

private static final int CHUNK_SIZE = 32;
public static ISeq chunkIteratorSeq(final Iterator iter){
    if(iter.hasNext()) {
        return new LazySeq(new AFn() {
            public Object invoke() {
                Object[] arr = new Object[CHUNK_SIZE];
                int n = 0;
                while(iter.hasNext() && n < CHUNK_SIZE)
                    arr[n++] = iter.next();
                return new ChunkedCons(new ArrayChunk(arr, 0, n), chunkIteratorSeq(iter));
    return null;

Why 32? I like 42 better!

32 is a good choice for the CHUNK_SIZE since it matches the number of child nodes in Clojure (persistent) collections:

static public PersistentVector create(ISeq items){
    Object[] arr = new Object[32];
    int i = 0;
    for(;items != null && i < 32; items = items.next())
        arr[i++] = items.first();
    if(items != null) {  // >32, construct with array directly
        PersistentVector start = new PersistentVector(32, 5, EMPTY_NODE, arr);
        TransientVector ret = start.asTransient();
        for (; items != null; items = items.next())
            ret = ret.conj(items.first());
        return ret.persistent();
    } else if(i == 32) {   // exactly 32, skip copy
        return new PersistentVector(32, 5, EMPTY_NODE, arr);
    } else {  // <32, copy to minimum array and construct
        Object[] arr2 = new Object[i];
        System.arraycopy(arr, 0, arr2, 0, i);
        return new PersistentVector(i, 5, EMPTY_NODE, arr2);

The Dark Side of “hasNext()”

But before creating a lazy seq, the first call “iterator-seq” does is iter.hasNext(). While this makes sense (why create a seq, if there is nothing to create it from), a thing to keep in mind is the implementation of the iteratee which is passed to “iterator-seq”. Here is an example from my recent HBase journey.

cbass wraps an HBase Scanner in “iterator-seq”:

(let [results (-> (.iterator (.getScanner h-table (scan-filter criteria)))

Once “iterator-seq” makes a call to iter.hasNext(), HBase scanner goes out and fetches the first result based on its filter. While this sounds ok, internally, depending on HBase client caching configuration, it may end up in fetching lots a lots of data to “cache” locally before returning a single item. Not exactly a “lazy seq behavior” the one can expect. More about it here.

To conclude: it is always good to keep a fresh copy of Clojure source code in the head 🙂