Aug 15

Plain Old Clojure Object

Those times you need to have Java APIs.. Some of these APIs need to return data. In Clojure it is usually a map:

{:q “What is..?” :a 42}

In Java it is not that simple for several reasons.. Java maps are mutable, there are no idiomatic tools to inspect, destructure them, Java (programmers) like different types for different POJOs, etc..

So this data needs to be encapsulated in a way Java likes it, usually in a form of an object with private fields and getters with no behavior, i.e. POJOs.

Of course a Clojure project may have Java sources, where these POJOs can live, but why not just stick to Clojure all the way and create them using gen-class. Why? Because it is fun, and also because we can easily :require and use other Clojure libraries in these POJOs in case we need to.

JSL: Java as a second language

Oh yea, and let’s call them POCOs, cause they kind of are:

(ns poco)
  :name org.stargate.PlainOldClojureObject
  :state "state"
  :init "init"
  :constructors {[Boolean Boolean String] []}
  :methods [[isHuman [] Boolean]
            [isFound [] Boolean]
            [planet [] String]])
(defn -init [human? found? planet]
  [[] (atom {:human? human?
             :found? found?
             :planet planet})])
(defn- get-field [this k]
  (@(.state this) k))
(defn -isHuman [this]
  (get-field this :human?))
(defn -isFound [this]
  (get-field this :found?))
(defn -planet [this]
  (get-field this :planet))
(defn -toString [this]
  (str @(.state this)))

This compiles and behaves exactly like a Java POJO would, since it is a POJO, I mean POCO:

user=> (import '[org.stargate PlainOldClojureObject])
user=> (def poco (PlainOldClojureObject. true true "42"))
user=> poco
#object[org.stargate.PlainOldClojureObject 0x68033b90 "{:human? true, :found? true, :planet \"42\"}"]
user=> (.isHuman poco)
user=> (.isFound poco)
user=> (.planet poco)

Of course there are records, but POCOs are just more fun :)

Apr 15

Question Everything

Feeding Da Brain

In 90s you would say: “I am a programmer”. Some would reply with “o.. k”. More insightful would reply with a question “which programming language?”.

21st century.. socially accepted terminology has changed a bit, now you would say “I am a developer”. Some would ask “which programming language?”. More insightful would reply with a question “which out of these 42 languages do you use the most?”

The greatest thing about using several at the same time is that feeling of constant adjustment as I jump between the languages. It feels like my brain goes through exuberant synaptogenesis over and over again building those new formations.

   What's for dinner today honey?
   Asynchronous brain refactoring with a gentle touch of "mental polish"

With all these new synapses, I came to love the fact that something that seemed so holy and “crystal right” before, now gets questioned and can easily be dismissed. Was it wrong all along? No. Did it change? No. So what changed then? Well.. perception did.

Inmates of the “Gang of Four” Prison

Design patterns are these “ways” of doing things that cripple new programmers, and imprison many senior ones. Instead of having an ability to think freely, we have all these “software standard patterns” which mostly have to do with limitations of “technology at time”.

Take big guys, like C++ / Java / C#, while they have many great features and ideas, these languages have horrible story of “behavior and state”: you always have to guard something. Whether it is from multiple threads, or from other people misusing it. The languages themselves promote reuse vs. decoupling: i.e. “let’s inherit that behavior”, etc..

So how do we overcome these risks and limitations? Simple: let’s create dozens of “ways” that all developers will follow to fight this together. Oh, yea, and let’s make it industry standard, call them patterns, teach them in schools, and select people by how well they can “apply” these patterns to “any” problem at hand.

Not all developers bought into this cult of course. Here is Peter Norvig’s notes from 1996, where he “dismisses” 16 out of 23 patterns from Gang of Four, by just using functions, types, modules, etc.

Builder Pattern vs. Immutable Data Structures

Builder pattern makes sense unless.. several things. There is a great “Builders vs. Option Maps” short post that talks about builder patter limitations:

* Builders are too verbose
* Builders are not data structures
* Builders are mutable
* Builders can’t easily compose
* Builders are order dependent

Due to mutable data structures (in Java/C#/alike) Builders still make sense for things like Google protobufs with simple (i.e. primitive) types, but for most cases where immutable things need to be created it is best to use immutable data structures for the above reasons.

While jumping between the languages, I often need to create things in Clojure that are implemented in Java with Builders. This is not always easy, especially when Builders rely on external state or/and when Builders need to be passed around (i.e. to achieve a certain level of composition).

Let’s say I need to create a notification client that, by design (on the Java side of things), takes some initial state (i.e. an external system connection props), and then event handlers (callbacks) are registered on it one by one, before it gets built, i.e. builds a final, immutable notification client:

SomeClassBuilder builder = SomeClass.newBuilder()
                             .setState( state )
                             .setAnotherThing( thing );
builder.register( notificationHandlerOne );
builder.register( notificationHandlerTwo );
builder.register( notificationHandlerN );

In case you need to decouple “register events” logic from this monolithic piece above, you would pass that builder to the caller that would pass it down the chain. It is something that seems “normal” to do (at least to a “9 to 5” developer), since methods with side effects do not really raise any eyebrows in OO languages. In fact most of methods in those languages have side effects.

I quite often see people designing builders such as the one above (with lots of external state), and when I need to use them in Clojure, it becomes very apparent that the above is not well designed:

;; creates a "mutable" builder..
(defn- make-bldr [state thing]
  (-> (SomeClass/newBuilder)
      (.withState state)
      (.withAnotherThing thing)))
;; wraps "builder.register(foo)" into a composable function
(defn register-event-handler! [bldr handler]
    ;; in case handler is just a Clojure function wrap it with something ".register" will accept
    (.register bldr handler))
(defn notification-client [state thing]
  (let [bldr (make-bldr state thing)]
    ;; ... do things that would call "register-event-handler!" passing them the "bldr"
    (.build bldr)))

Things that immediately feel “off” are: returning a mutable builder from “make-bldr”, mutating that builder in “register-event-handler!”, and returning that mutated builder back.. Not something common in Clojure at all.

Again the goal is to “decouple logic to register events from notification client creation“, if both can happen at the same time, the above Builder would work.

In Clojure it would just be a map. All data structures in Clojure are immutable, so there would be no intermediate mutable holder/builder, it would always be an immutable map. When all handlers are registered, this map would be passed to a function that would create a notification client with these handlers and start it with “state” and “thing”.

Mocking Suspicions

Another synapse formation, that was created from using many languages at the same time, convinced me that if I have to use а mock to test something, that something needs a close look, and would potentially welcome refactoring.

The most common case for mocking is:

A method of a component "A" takes a component "B" that depends on a component "C".

So if I want to test A’s method, I can just mock B and not to worry about C.

The flaw here is:

"B" that depends on a component "C".

These things are extremely beneficial to question. I used to use Spring a lot, and when I did, I loved it. Learned from it, taught it to others, and had a great sense of accomplishment when high complexity could be broken down to simple pieces and re wired together with Spring.

Time went on, and in Erlang or Clojure, or even Groovy for that matter, I used Spring less and less. I still use it for all my Java work, but not as much. So if 10 years ago:

"B" that depends on a component "C".

was a way of life, now, every time I see it, I ask why?. Does “B” have to depend on “C”? Can “B” be stateless and take “C” whenever it needed it, rather that be injected with it and carry that state burden?

If before “B” was:

public class B {
  private final C c;
  public B ( C c ) {
    this.c = c;
  public Profit doBusiness() {
    return new Profit( c.doYourBusiness() + 42 );

Can it instead just be:

public final class B {
  public static Profit doBusiness( C c ) {
    return new Profit( c.doYourBusiness() + 42 );

In most cases it can! It really can, the problem is not enough of us question that dependency, but we should.

This does not mean “B” no longer depends on “C”, it means something more: there is no “B” (“there is no spoon..”) as it just becomes a module, which does not need to be passed around as an instance. The only thing that is left from “B” is “doBusiness( C c )”. Do we need to mock “C” now? Can it, its instance disappear the way “B” did? Most likely, and even if it can’t for whatever reason (i.e. someone else’s code), I did question it, and so should you.

The more synapse formations I go through the better I learn to question pretty much everything. It is fun, and it pays back with beautiful revelations. I love my brain :)

Jul 14

Pom Pom Clojure

Fun with lein, Money with maven

While doing Clojure projects, it is the second time I faced a problem with a customer’s “build team” that knows what Java is, loves Maven, but does not believe in Mr. Leiningen, hence all of the lein niceties (plugins, once liners, tasks, etc..) need to now be converted to “pom.xml”s.

A good start is “lein pom”. While it only scratches the surface, it does generate a “pom.xml” with most of the dependencies. But in most cases it needs to be “massaged” well in order to fit а real maven build process.

Usual Suspects

Besides the most common “lein repl”, here is what I usually need lein to do:

* Compile Clojure code
* Some files need to be AOT compiled
* Run Clojure tests
* Compile ClojureScript

(not Clojure specific, but I’ll include it anyway)

* Compile protobuf
* Create a JAR for most projects
* Create a self executing “uberjar” for others

When Clojure is “Ahead Of Time”

Compiling, AOTing and running tests can be done with Clojure Maven Plugin:


notice “namespaces” and “compileDeclaredNamespaceOnly”, this is how AOT is done for selected namespaces.

For AOT it’s good to remember that “a side effect of compiling Clojure code is loading the namespaces in order to make macros and functions they use available”, here are AOT compilation gotchas to keep in mind.

Compiling ClojureScript

This one is a bit trickier. If it is possible to convince a build team to install lein as a library that is used for the build process (e.g. similar to “protoc” to compile protobufs), then to compile ClojureScript, a lein cljsbuild can be added to the profile:

vi ~/.lein/profiles.clj
{:user {:plugins [[lein-cljsbuild "1.0.0"]]}}

and an exec maven plugin can be used to relay the execution to “lein”:

            <id>compiling ClojureScript</id>

In fact, if “lein” is installed, it can be used via “exec-maven-plugin” to do everything else as well, but it all depends on build teams’ restrictions. For example, financial customers may have extremely strict “policies”/”rules”/”opinions”.

A couple more options to explore for building ClojureScript would be lein maven plugin and zi-cljs. Here is a related discussion on a ClojureScript google group.

Making Shippables

“lein uberjar” with some config in “project.clj” is all that is needed in “lein” world. In maven universe maven shade plugin will do just that:

                    <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">

above will create a self executing JAR with all dependencies included and with an entry point (-main) in “org.gitpod.WhatsApp”.

Google Protocol Buffers

With lein it is as simple as pluging in lein protobuf. In maven, it is not as simple, but also not terribly difficult and solved via maven-protoc-plugin:

        <!--    <param>${PROTOBUF_HOME}/src/google/protobuf</param>-->

here is a repository it currently lives at:


notice “additionalProtopathElements”. In case clojure-protobuf is used with extensions, a path to “descriptor.proto” can be specified in “additionalProtopathElements”.

Mar 14

Leiningen Templates with Arguments

This template is so wrong!

Project templates can be as excellent as they can be awful since they are very opinionated beings:

  • “a web project MUST be Compojure based!”
  • “a network project MUST be Netty based!”
  • “there is no way I am building a web project based on Compojure!”
  • “a network project? of course ZeroMQ!”

But they can be really useful for quick prototypes, for learning new things, even for “real deal”, if of course you agree with their opinion.

Whatsapp WWW

I’ve recently built a template WWW to bootstrap web apps based on Clojure, ClojureScript, Compojure and Ring. Reason? I needed a faster way of getting apps up and running, especially when prototyping ideas.

Building a Leiningen template is quite a simple task. Once the template is build and installed/deployed, I could just do lein new www whatsapp and make my next $19 billion. But I wanted more!

WWW by default would create a project template with a very simple structure that can be immediately brought up (e.g. “lein ring server”) removing any obstacles on the way of making billions, everything is setup: just open vi and start hacking.

However, what if I want a ClojureScript REPL connected to my browser? I would need to go through some docs, and then depending on my experience with lein, Clojure and ClojureScript, I could quickly (or not) set it up myself.

Well, I wanted WWW template to have that setup for me right away. But here is a dilemma: sometimes I do want ClojureScript browser connected REPL, other times I don’t, and the way this REPL setup goes, it requires a code change to enable/disable it.

It’s Good to Have Options

The documented way of creating a lein template does not really talk about “options” that I want with my REPL. What do I do? Well.. It’s a Clojure universe, and, if anything, two things hold true most of the time:

  • It’s Simple
  • It’s a Function

A brand new lein template is done with lein new template www. It creates.. ready? “a template for a template”. That’s right, a template project to create a lein template.

An entry point into this template (of a template) will be located in src/leiningen/new/www.clj. Here www is the name of this template. Let’s peek inside:

$ lein new template www
Generating fresh 'lein new' template project.
$ cd www
$ cat src/leiningen/new/www.clj
(ns leiningen.new.www
  (:use [leiningen.new.templates :only [renderer name-to-path ->files]]))
(def render (renderer "www"))
(defn www
  "FIXME: write documentation"
  (let [data {:name name
              :sanitized (name-to-path name)}]
    (->files data
             ["src/{{sanitized}}/foo.clj" (render "foo.clj" data)])))

See? www here is just a function, that in this case, takes a name parameter. And after this template is installed lein new www whatsapp will pass whatsapp to it.

Just a Function, Just Beatiful

Since www is just a function, why not make it take optional parameters which could potentially change the resulting template?

(defn www
  "Create a new Clojure + ClojureScript + Compojure + Ring project"
   (www name :noop))
  ([name opts]

Great! Now www can either take just name as before, or name and some magic opts. How about one of the options will determine whether or not a template with ClojureScript browser connected REPL is added? I think yes.

And now we can do:

  • either lein new www [app name]
  • or lein new www [app name] :with-brepl

which will create two different flavors of the same template.

Oct 13

Scala: Where Ingenuity Lies

Love(d) It!

From my two years of “making money” with Scala, I went from loving an every bit of it to disliking majority of it. Not all of it though, since I still feel that “guilty pleasure” when I get to do Scala: it is a more concise Java, all the stream APIs built in, e.g. map/filter/groupBy/reduce/.. (Java 8 calls them Streams), pattern matching, nice way to construct things, “suggested” immutability, etc..

AKKA and Visual Basic

I am lucky I learned to be dangerous in Erlang before I picked up Scala. Hence as I was going deeper and deeper into AKKA, I could not help but notice how AKKA is missing the point of Erlang simplicity. It was 1.2, and of course 2.X fixed a lot of it, but again, it still has this feeling of leaking “unnecessary cleverness”.

And of course Erlang process isolation is what makes it the platform for distributed systems, where AKKA’s biggest limitation is, well.. JVM really. It had become apparent to me that, while I still liked Scala, whenever I need to create/work with distributed systems, Erlang is a lot simpler and does a much better job. With enough time put into Visual Basic, it can also “embrace” OTP, but .. why?

Ain’t Exactly Hammock Driven

Doing Scala I constantly lived in the world of the category theory. Not that it is necessary to know it to write decent Scala code, but anywhere you go, everybody “talks” it. It was ok, but felt somewhat like an unnecessary mental overload. Not the category theory itself (it was very nice), but the burden of constantly staying on your monad toes.

It’s About Time: Scala State Succeeded by Clojure

Finally, learning and appreciating Clojure, made it quite obvious that Scala is just.. overcooked. At that point it seemed to be an ML’s ugly step sister. Something that I could do in 10 lines of Clojure, I could also do in 15 lines of Scala (more often than not), but what an ugly, and not at all intuitive, 15 lines that was in comparison.

Clojure also taught me that it is a lot easier to reason about time when the whole codebase is just a series of state successions: (f””’… (f” (f’ (f 42)))), which makes a “time increment” be just a single increment from (f) to (f’). This, by the way, is also the reason why the resulting codebase in Clojure is tiny compared to Scala. Not because it is dynamically typed, but because Clojure is opinionated as a language (immutability, composition, state succession, …), and Scala is opinionated as a community, while the language lags behind.

It is Not Because Of Type

Typed Clojure is great and makes Clojure optionally typed.

Interesting thing though.. I have had problems with “would have better checked it at compile time” in Groovy, but rarely, if ever, in Clojure.

I suspect the reason behind this is immutability and just plain simple seq API (a.k.a. collections): [], (), {}, #{}, where everything has pretty much the same semantics.

Scala however, is a soup of mutable and immutable collections, with another soup of functions (which are explicitly objects => may also carry state) and “stateful” classes. Hence when you program in Scala, it is imperative to have a strong type system, otherwise there is no way to know whether A is contravariant to B and covariant to C and the method M can take it. In Clojure it is mostly a function that takes a seq or a map => not much to sweat about.

Where Ingenuity Lies

Simplicity and elegancy take a lot of work and dedication. It is a lot easier to write yet another ML, but make it several times more complex. It is hard however to absorb and channel all that complexity through a very simple set of APIs with minimal syntax, as Rich did with Clojure.

Oct 13

Limited Async

An executor service (a.k.a. smart pool of threads) that is backed by a LimitedQueue.

The purpose of this tiny library is to be able to block on “.submit” whenever the q task limit is reached. Here is why..


If a regular BlockingQueue is used, a ThreadPoolExecutor calls queue’s “offer” method which does not block: inserts a task and returns true, or returns false in case a queue is “capacity-restricted” and its capacity was reached.

While this behavior is useful, there are cases where we do need to block and wait until a ThreadPoolExecutor has a thread available to work on a task. One reason could be an off heap storage that is being read and processed by a ThreadPoolExecutor: e.g. there is no need, and sometimes completely undesired, to use JVM heap for something that is already available off heap.

Another good use is described in “Creating a NotifyingBlockingThreadPoolExecutor“.

How To

The easiest way get lasync and try it is with Leiningen:

[lasync "0.1.1"]

Or maven from Clojars: https://clojars.org/lasync/versions

Or to see it “inside out”: github

Use It

To create a pool with limited number of threads and a backing q limit:

(ns sample.project
  (:use lasync))
(def pool (limit-pool))

That is pretty much it. The pool is a regular ExecutorService that can have tasks submitted to it:

(.submit pool #(+ 41 1))

By default lasync will create a number of threads and a blocking queue limit that matches the number of available cores:

(defonce available-cores 
  (.. Runtime getRuntime availableProcessors))

But this number can be changed by:

user=> (def pool (limit-pool :nthreads 42))
user=> (def pool (limit-pool :limit 42))
user=> (def pool (limit-pool :nthreads 42 :limit 42))

Show Me

To see lasync in action we can enjoy a built in “Lasync Show”:

lein repl
user=> (use 'lasync.show)
user=> (rock-on 69)  ;; Woodstock'69
INFO: pool q-size: 4, submitted: 1
INFO: pool q-size: 4, submitted: 3
INFO: pool q-size: 4, submitted: 2
INFO: pool q-size: 4, submitted: 0
INFO: pool q-size: 4, submitted: 4
INFO: pool q-size: 4, submitted: 5
INFO: pool q-size: 4, submitted: 6
INFO: pool q-size: 4, submitted: 7
INFO: pool q-size: 4, submitted: 62
INFO: pool q-size: 3, submitted: 60
INFO: pool q-size: 4, submitted: 63
INFO: pool q-size: 3, submitted: 65
INFO: pool q-size: 3, submitted: 64
INFO: pool q-size: 2, submitted: 66
INFO: pool q-size: 1, submitted: 67
INFO: pool q-size: 0, submitted: 68

Here lasync show was rocking on 4 core box (which it picked up on), so regardless of how many tasks are being pushed to it, the queue max size always stays at 4, and lasync creates that back pressure in case the task q limit is reached. In fact the “blocking” can be seen in action, as each task is sleeping for a second, so the whole thing can be visually seen being processed by 4, pause, next 4, pause, etc..

Here is the code behind the show

Oct 13

Datomic: Your Call Will Be Answered In The Order It Was Received

The Star Family

Mother Sun likes to have its planets close by at all times. Some closer than others, but that’s how life goes: someone grows up and becomes a star, others take their places next to that someone.

This narrative is about such a family, a Solar System family, where planets live at a certain distance from the Sun. The schema is simple, “solar/planet” with a “solar/distance”:

(def schema
  [{:db/id #db/id[:db.part/db]
    :db/ident :solar/planet
    :db/valueType :db.type/string
    :db/cardinality :db.cardinality/one
    :db.install/_attribute :db.part/db}
   {:db/id #db/id[:db.part/db]
    :db/ident :solar/distance
    :db/valueType :db.type/long
    :db/cardinality :db.cardinality/one
    :db.install/_attribute :db.part/db}])

Here is the data from almighty wikipedia:

(def data
  [{:db/id #db/id[:db.part/user]
    :solar/planet "Mercury"
    :solar/distance 57909175}
   {:db/id #db/id[:db.part/user]
    :solar/planet "Venus"
    :solar/distance 108208930}
   {:db/id #db/id[:db.part/user]
    :solar/planet "Earth"
    :solar/distance 149597890}
   {:db/id #db/id[:db.part/user]
    :solar/planet "Mars"
    :solar/distance 227936640}
   {:db/id #db/id[:db.part/user]
    :solar/planet "Jupiter"
    :solar/distance 778412010}
   {:db/id #db/id[:db.part/user]
    :solar/planet "Saturn"
    :solar/distance 1426725400}
   {:db/id #db/id[:db.part/user]
    :solar/planet "Uranus"
    :solar/distance 2870972200}
   {:db/id #db/id[:db.part/user]
    :solar/planet "Neptune"
    :solar/distance 4498252900}])

Creating a schema and importing the data:

(d/transact *conn* schema)
(d/transact *conn* data)


Great, so now we have all that knowledge in Datomic. Let’s check it out:

(d/q '[:find ?p ?d :in $ 
       :where [?e :solar/planet ?p]
              [?e :solar/distance ?d]] (db *conn*))
#{["Venus" 108208930] ["Earth" 149597890] ["Saturn" 1426725400] ["Uranus" 2870972200] 
  ["Jupiter" 778412010] ["Mercury" 57909175] ["Neptune" 4498252900] ["Mars" 227936640]}

Looks right, but.. out of order? Yea, that is strange, since we “imported” the data in a vector, e.g. with an order in mind. Let’s focus on the planets themselves:

(d/q '[:find ?p :in $ 
       :where [?e :solar/planet ?p]] (db *conn*))
#{["Saturn"] ["Venus"] ["Jupiter"] ["Mercury"] ["Earth"] ["Mars"] ["Neptune"] ["Uranus"]}

Again out of order, this time in a “different” out of order.

Popping The Hood

One thing to notice above is that the result, that gets returned from a query, is a set, and a set has no (specific) order. So given the result and its type, Datomic did not do anything wrong, it just returned a set of planets: exactly what we asked for.

However, since all the facts were asserted in order, Datomic must have remembered them in order, right? Well let’s check. Every fact that gets asserted, gets assigned an entity id. Hence instead of looking at planet names, let’s look at corresponding entity ids:

(d/q '[:find ?e :in $
       :where [?e :solar/planet ?p]] (db *conn*))
#{[17592186045418] [17592186045420] [17592186045419] [17592186045422] [17592186045421] 
  [17592186045424] [17592186045423] [17592186045425]}

Better. Now we see that Datomic in fact has entity ids that we can easily sort:

(d/q '[:find (sort ?e) :in $ 
       :where [?e :solar/planet ?p]] (db *conn*))
[[(17592186045418 17592186045419 17592186045420 17592186045421 17592186045422 17592186045423 
   17592186045424 17592186045425)]]

And even convert these ids back to planet names, it’s all data after all:

(->> (d/q '[:find (sort ?e) :in $ 
            :where [?e :solar/planet ?p]] (db *conn*))
     (map (comp :solar/planet #(d/entity (db *conn*) %))))
("Mercury" "Venus" "Earth" "Mars" "Jupiter" "Saturn" "Uranus" "Neptune")

Very nice. But can we do better? Yes we can.

Now I Know The Trick

The problem with the solution above, it does two lookups: first to get the entity id, second to lookup data for this entity id. But we can do better. The query to get an entity id already “works with” a planet name, and “knows” about it. So why not use both of them right away:

(d/q '[:find ?p ?e :in $ 
       :where [?e :solar/planet ?p]] (db *conn*))
#{["Mars" 17592186045421] ["Saturn" 17592186045423] ["Neptune" 17592186045425] 
  ["Uranus" 17592186045424] ["Earth" 17592186045420] ["Mercury" 17592186045418] 
  ["Venus" 17592186045419] ["Jupiter" 17592186045422]}

Same query, just a little more data back. And Clojure loves data, now its trivial to get them in order with just Clojure:

(->> (d/q '[:find ?p ?e :in $ 
           :where [?e :solar/planet ?p]] (db *conn*))
     (sort-by second)
     (map first))
("Mercury" "Venus" "Earth" "Mars" "Jupiter" "Saturn" "Uranus" "Neptune")

That’s more like it. Of course we can use our knowledge about the data, and sort planets by the distance:

(->> (d/q '[:find ?p ?d :in $ 
            :where [?e :solar/planet ?p]
                   [?e :solar/distance ?d]] (db *conn*))
     (sort-by second)
     (map first))
("Mercury" "Venus" "Earth" "Mars" "Jupiter" "Saturn" "Uranus" "Neptune")

Not all data comes with such a direct ranking (as distance) of course, but whatever comes in Datomic’s way is definitely processed in the order it was received.

Sent from Earth

Jul 13

ClojureScript: Use Any JavaScript Library

Since ClojureScript relies on Google Closure compiler to “get down” to JavaScript, in order to take advantage of an “advanced” compilation, external JavaScript libraries have to follow certain Google Closure standards.

A Google Closure compiler introduces the concept of the “extern”: a symbol that is defined in code external to the code processed by the compiler. This is needed to exclude certain JS vars and functions that do not follow the standard that the g-closure advanced compilation relies on.

Many JS libraries do not follow g-closure standards, and there is a closure-compiler repository with some pre-built externs for JQuery, Jasmine, AngularJS and others. However there are about X thousand more JS libraries that could be useful while writing apps with ClojureScript.

While there is a way to manually go through all the ClojureScript code, find “external” JS vars/functions and write externs for them, there is a much nicer alternative written by Chris Houser in a gist: “Externs for ClojureScript” that creates a “DummyExternClass” and attaches all the vars and functions that are not part of (not recognized by) “core.cljs”.

Here is an example of creating externs for an arbitrary ClojureScript file that uses a nice chap timeline JS library:

user=> (print (externs-for-cljs ".../cljs/timeline.cljs"))
var document={};
var links.Timeline={};
var links.events={};
var DummyExternClass={};

The original file itself is not important, the output is. “externs-for-cljs” treated a couple of namespaced functions as vars, but it is a easy fix:

var links={};

At this point the whole output can be saved as “timeline-externs.js”, and pointed to by “lein-cljsbuild”:

    [{:source-paths [...],
      {:source-map ...,
       :output-to ...,
       :externs ["resources/public/js/externs/timeline-externs.js"]
       :optimizations :advanced}}]}

ClojureScript files based on other JS libraries that are not in a closure compiler repo: e.g. Twitter Bootstrap, Raphael and others can be “extern”ed the same way in order to take advantage of g-closure advanced compilation.

Interesting bit here that is not related to externs, but is to an advanced compilation is a “:source-map” attribute which is a way to map a combined/minified file back to an unbuilt state. It generates a source map which holds information about original JS files. When you query a certain line and column number in the (advanced) generated JavaScript you can do a lookup in the source map which returns the original location. Very handy to debug “:advanced” compiled ClojureScript.

For more info:

Jun 13

Clojure: Down to Nanos

Here is what needs to happen: there is a URN that is a part of an HTTP request. It needs to be parsed/split on the last “:”. The right part would be the key, and the left part would be a value (we’ll call it “id” in this case). Here is an example:

user=> (def urn "company:org:account:347-68F3726A84C")

After parsing we should get a neat map:

{:by "company:org:account", :id "347-68F3726A84C"}

While it feels more readable to start “regex”ing the problem:

user=> (require '[clojure.string :as cstr])
user=> (def re-colon (re-pattern #":"))
user=> (cstr/split "company:org:account:347-68F3726A84C" re-colon)
["company" "org" "account" "347-68F3726A84C"]

Just splitting on a simple single character regex (above) takes almost a microsecond (i.e. in this case about 2242 CPU cycles):

user=> (bench (cstr/split "company:org:account:347-68F3726A84C" re-colon))
       Execution time mean : 830.235720 ns

In general it is always best to use language “builtins”, so we’d turn to Java’s own lastIndexOf:

(defn parse-urn-id [urn]
  (let [last-colon (.lastIndexOf urn ":")]
    {:by (subs urn 0 last-colon)
     :id (subs urn (+ last-colon 1))}))

Putting “validation” outside for a moment, this actually does what is needed:

=> (parse-urn-id urn)
{:by "company:org:account", :id "347-68F3726A84C"}
user=> (bench (parse-urn-id urn))
       Execution time mean : 5.588747 µs

Wow.. builtins seem to fail us. How come?

A culprit is not “lastIndexOf”, but a way Clojure resolves an “untyped” “urn”. Anything that is defined with “def” is kept inside a Clojure “Var” that uses reflection and is not amenable to HotSpot optimizations. An interesting read on what actually happens: “Why Clojure doesn’t need invokedynamic, but it might be nice”.

While, in most cases, parsing a String for 6 microseconds is a perfectly fine expectation, there is a simple hint that can make it run 60 times faster. It’s a hint.. It’s a type hint:

(defn parse-urn-id [^String urn]
  (let [last-colon (.lastIndexOf urn ":")]
    {:by (subs urn 0 last-colon)
     :id (subs urn (+ last-colon 1))}))
user=> (bench (parse-urn-id urn))
       Execution time mean : 83.409471 ns

By hinting a “urn” type to be a “^String”, this function is now 67 times faster.

Achieve a warm and fuzzy feeling...   [Done]

May 13

Datomic: Can Simple be also Fast?

“An ounce of performance is worth pounds of promises”
© Mae West

Speed for the sake of Speed

Benchmarks, performance, speed comparisons, metrics, apples to apples, apples to nuclear weapons.. We, software developers “for real”, know that all that is irrelevant. What relevant is a clear problem definition, flexible data model, simple tool/way to solve it, a feeling of accomplishment, approved billing papers, moving along closer to the stars..

All is true, premature optimization is the root of all evil, readability over pointer arithmetic, performance does not really matter,… until it does. Until a problem definition itself is numbers over time i.e. ..ions per second.

It is not only HFT that defines capabilities of trading companies in their ability to cope with speed of light and picoseconds. Having big data and real time online processing coming out of basements and garages and finally hitting the brains of suits who suddenly “get it” and have money ready for it, makes a new wave of “performance problems” that are “just that”: little bit of business behind the problem, and mostly raw things per second.

Never Trust Sheep

© Ryan Stiles

Interesting thing about choosing the right database for a problem today is the fact that there are just so many available, and many scream that they can do everything for you, from storing data to taking your shirts to dry cleaning on Wednesdays if you buy a dozen of their enterprise licensees. One thing I learned is “to be quick and efficient we have to mute screamers and assholes”, so some fruit and fragile NoSQL boys and Oracle are out.

Datomic is kind of a new kid on a block. Although the development started about 3 years ago, and it was done by Relevance and Rich Hickey, so it is definitely ready for an enterprise ride. To put in into perspective: if MongoDB was ready for Foursquare as soon as MongoDb learned to “async fsync”, then given the creds, Datomic is ready for NASA.

Speed and Time

While usually all Datomic discussions involve “time” and rely on Cojure’s core principles of “Epochal Time Model”, this novel introduces another main character: “Speed”.

Datomic processes write transactions by sending them to a transactor via queue, which is currently HornetQ. Hornet claims to be able to do millions a second with SpecJms2007.

Before transaction is off to a tx queue it needs to be created though. It usually consist out of one or more queries. The documented, and recommended, way to create a query is a Datomic’s implementation of Datalog. It is a nice composable way to talk to the data.

While writes are great, and HornetQ looks quite promising, the fact that Datomic is ACID, and all the writes are coordinated by a transactor, Datomic is not (at least not yet) a system that would be a good fit for things like clickstream data. It does not mean writes are slow, it just means that it is not for 1% of problems that need “clickstream” like writes.

Reads are another story, one of the kickers in Datomic is having access to the database as a value. While “as a value” part sounds like a foreign term for a database, it provides crazy benefits: immutability, time, …, and among all others it provides a database value inside the application server that runs queries on it. A closest analogy is a git repository where you talk to the git repo locally, hence “locality of references”, hence caching, hence fast reads.

Peeking into The Universe

It is always good to start from the beginning, so why not start directly from the source: The Universe. Humans naively separated The Universe into galaxies, where a single galaxy has many planets. So here is our data model, many galaxies and many planets:

(def schema [
  {:db/id #db/id[:db.part/db]
   :db/ident :universe/planet
   :db/valueType :db.type/string
   :db/cardinality :db.cardinality/many
   :db/index true
   :db.install/_attribute :db.part/db}
  {:db/id #db/id[:db.part/db]
   :db/ident :universe/galaxy
   :db/valueType :db.type/string
   :db/cardinality :db.cardinality/many
   :db/index true
   :db.install/_attribute :db.part/db} ])

Here is the data we are going to be looking for:

(def data [
  {:db/id #db/id[:db.part/user]
   :universe/galaxy "Pegasus"
   :universe/planet "M7G_677"} ])

And here is a Datalogy way to look for planets in Pegasus galaxy:

(d/q '[:find ?planet :in $ ?galaxy 
       :where [?e :universe/planet ?planet]
              [?e :universe/galaxy ?galaxy]] (db *conn*) "Pegasus")

Which returns an expected planet: #{[“M7G_677”]}

Secret of a Clause

Now let’s bench it. I am going to use criterium:

  (d/q '[:find ?planet :in $ ?galaxy 
         :where [?e :universe/planet ?planet]
                [?e :universe/galaxy ?galaxy]] (db *conn*) "Pegasus"))
   Evaluation count : 384000 in 60 samples of 6400 calls.
Execution time mean : 169.482941 µs

170µs for a lookup.. not too shabby, but not blazing either.. Can we do better? Yes.

The way Datalog is implemented in Datomic, it takes less time, and needs to do less, if most specific query clauses are specified first. In the example above “[?e :universe/galaxy ?galaxy]” is the most specific, since the “?galaxy” is provided as “Pegasus”. Let’s use this knowledge and bench it again:

  (d/q '[:find ?planet :in $ ?galaxy 
         :where [?e :universe/galaxy ?galaxy]
                [?e :universe/planet ?planet]] (db *conn*) "Pegasus"))
   Evaluation count : 384000 in 60 samples of 6400 calls.
Execution time mean : 158.893114 µs

The “clause trick” shaved off 10µs, which is pretty good. But can we do better? We can.

The Art of Low Level

Since I was a kid, I knew that Basic is great, but it is for kids, and Assembly is the real deal. While Datalog is not Basic, and it is really powerful and good for 95% of cases, there are limitations with how fast you can go. Again, most of the time, a 200µs lookup is more than enough, but sometimes it is not. At times like these we turn to Datomic Assembly: we turn directly to indices.

While in the universe of other DB engines and languages doing something low level requires reading 3 books and then paying a consultant to do it, in Clojuretomic land it is a simple function that is called datoms:

Usage: (datoms db index & components)
Raw access to the index data, by index. The index must be supplied,
and, optionally, one or more leading components of the index can be
supplied to narrow the result.
    :eavt and :aevt indexes will contain all datoms
    :avet contains datoms for attributes where :db/index = true.
    :vaet contains datoms for attributes of :db.type/ref
    :vaet is the reverse index

And it is not just simple, it also works. Let’s cook a generic lookup function that would lookup an attribute of an entity by another attribute of that same entity:

(defn lookup [?a _ a v] 
  (let [d (db *conn*)] 
    (map #(map :v (d/datoms d :eavt (:e %) ?a)) 
         (d/datoms d :avet a v))))

A couple of things:

First we look for an entity with an attribute “a” that has a value “v”. For that we’re peeking into the AVET index. In other words, we know “A” and “V” from “AVET”, we are looking for an “E”.

Then we map over all matches and get the values of an attribute in question: “?a”. This time we’d do that by peeking into “EAVT” index. In other words, we are looking for a value “V”, by “E” and “A” in “EAVT”.

Note that idiomatically, or should I say datomically, we should pass a database value into this “lookup” function, since it would enable lookups across time, and not just “the latest”.

Let’s bench it already:

    (lookup :universe/planet :by :universe/galaxy "Pegasus")))
   Evaluation count : 11366580 in 60 samples of 189443 calls.
Execution time mean : 5.848209 µs

Yep, from 170µs to 6µs, not bad of a jump.

Theory of Relativity

All performance numbers are of course relative. Relative to the data, to the hardware, to the problem and other sequence of events. The above is done on “Mac Book Pro 2.8 GHz, 8GB, i7 2 cores” with a 64 bit build of JDK 1.7.0_21 and “datomic-free 0.8.3862”. Having Clojure love for concurrency I would expect Datomic performance (i.e. those 6µs) to improve even further on the real multicore server.

Another thing to note, above results are from a single Datomic instance. Since Datomic is easily horizontally scalable, by partitioning reads and having them go out against “many Datomics” (I am not using a “cluster” term here, since the database is a value and it is/can be collocated with readers) the performance will improve even further.

While most of the time I buy arguments on how useless benchmarks are, sometimes they really do help. And believe it or not, my journey from Datalog to Datomic indicies, with a strictly performance agenda, made it possible to apply Datomic to the real big money problem, beyond hello worlds, git gists and prototypes that die after the “initial commit”.