Oct 13

Scala: Where Ingenuity Lies

Love(d) It!

From my two years of “making money” with Scala, I went from loving an every bit of it to disliking majority of it. Not all of it though, since I still feel that “guilty pleasure” when I get to do Scala: it is a more concise Java, all the stream APIs built in, e.g. map/filter/groupBy/reduce/.. (Java 8 calls them Streams), pattern matching, nice way to construct things, “suggested” immutability, etc..

AKKA and Visual Basic

I am lucky I learned to be dangerous in Erlang before I picked up Scala. Hence as I was going deeper and deeper into AKKA, I could not help but notice how AKKA is missing the point of Erlang simplicity. It was 1.2, and of course 2.X fixed a lot of it, but again, it still has this feeling of leaking “unnecessary cleverness”.

And of course Erlang process isolation is what makes it the platform for distributed systems, where AKKA’s biggest limitation is, well.. JVM really. It had become apparent to me that, while I still liked Scala, whenever I need to create/work with distributed systems, Erlang is a lot simpler and does a much better job. With enough time put into Visual Basic, it can also “embrace” OTP, but .. why?

Ain’t Exactly Hammock Driven

Doing Scala I constantly lived in the world of the category theory. Not that it is necessary to know it to write decent Scala code, but anywhere you go, everybody “talks” it. It was ok, but felt somewhat like an unnecessary mental overload. Not the category theory itself (it was very nice), but the burden of constantly staying on your monad toes.

It’s About Time: Scala State Succeeded by Clojure

Finally, learning and appreciating Clojure, made it quite obvious that Scala is just.. overcooked. At that point it seemed to be an ML’s ugly step sister. Something that I could do in 10 lines of Clojure, I could also do in 15 lines of Scala (more often than not), but what an ugly, and not at all intuitive, 15 lines that was in comparison.

Clojure also taught me that it is a lot easier to reason about time when the whole codebase is just a series of state successions: (f””’… (f” (f’ (f 42)))), which makes a “time increment” be just a single increment from (f) to (f’). This, by the way, is also the reason why the resulting codebase in Clojure is tiny compared to Scala. Not because it is dynamically typed, but because Clojure is opinionated as a language (immutability, composition, state succession, …), and Scala is opinionated as a community, while the language lags behind.

It is Not Because Of Type

Typed Clojure is great and makes Clojure optionally typed.

Interesting thing though.. I have had problems with “would have better checked it at compile time” in Groovy, but rarely, if ever, in Clojure.

I suspect the reason behind this is immutability and just plain simple seq API (a.k.a. collections): [], (), {}, #{}, where everything has pretty much the same semantics.

Scala however, is a soup of mutable and immutable collections, with another soup of functions (which are explicitly objects => may also carry state) and “stateful” classes. Hence when you program in Scala, it is imperative to have a strong type system, otherwise there is no way to know whether A is contravariant to B and covariant to C and the method M can take it. In Clojure it is mostly a function that takes a seq or a map => not much to sweat about.

Where Ingenuity Lies

Simplicity and elegancy take a lot of work and dedication. It is a lot easier to write yet another ML, but make it several times more complex. It is hard however to absorb and channel all that complexity through a very simple set of APIs with minimal syntax, as Rich did with Clojure.

Oct 13

Limited Async

An executor service (a.k.a. smart pool of threads) that is backed by a LimitedQueue.

The purpose of this tiny library is to be able to block on “.submit” whenever the q task limit is reached. Here is why..


If a regular BlockingQueue is used, a ThreadPoolExecutor calls queue’s “offer” method which does not block: inserts a task and returns true, or returns false in case a queue is “capacity-restricted” and its capacity was reached.

While this behavior is useful, there are cases where we do need to block and wait until a ThreadPoolExecutor has a thread available to work on a task. One reason could be an off heap storage that is being read and processed by a ThreadPoolExecutor: e.g. there is no need, and sometimes completely undesired, to use JVM heap for something that is already available off heap.

Another good use is described in “Creating a NotifyingBlockingThreadPoolExecutor“.

How To

The easiest way get lasync and try it is with Leiningen:

[lasync "0.1.1"]

Or maven from Clojars: https://clojars.org/lasync/versions

Or to see it “inside out”: github

Use It

To create a pool with limited number of threads and a backing q limit:

(ns sample.project
  (:use lasync))
(def pool (limit-pool))

That is pretty much it. The pool is a regular ExecutorService that can have tasks submitted to it:

(.submit pool #(+ 41 1))

By default lasync will create a number of threads and a blocking queue limit that matches the number of available cores:

(defonce available-cores 
  (.. Runtime getRuntime availableProcessors))

But this number can be changed by:

user=> (def pool (limit-pool :nthreads 42))
user=> (def pool (limit-pool :limit 42))
user=> (def pool (limit-pool :nthreads 42 :limit 42))

Show Me

To see lasync in action we can enjoy a built in “Lasync Show”:

lein repl
user=> (use 'lasync.show)
user=> (rock-on 69)  ;; Woodstock'69
INFO: pool q-size: 4, submitted: 1
INFO: pool q-size: 4, submitted: 3
INFO: pool q-size: 4, submitted: 2
INFO: pool q-size: 4, submitted: 0
INFO: pool q-size: 4, submitted: 4
INFO: pool q-size: 4, submitted: 5
INFO: pool q-size: 4, submitted: 6
INFO: pool q-size: 4, submitted: 7
INFO: pool q-size: 4, submitted: 62
INFO: pool q-size: 3, submitted: 60
INFO: pool q-size: 4, submitted: 63
INFO: pool q-size: 3, submitted: 65
INFO: pool q-size: 3, submitted: 64
INFO: pool q-size: 2, submitted: 66
INFO: pool q-size: 1, submitted: 67
INFO: pool q-size: 0, submitted: 68

Here lasync show was rocking on 4 core box (which it picked up on), so regardless of how many tasks are being pushed to it, the queue max size always stays at 4, and lasync creates that back pressure in case the task q limit is reached. In fact the “blocking” can be seen in action, as each task is sleeping for a second, so the whole thing can be visually seen being processed by 4, pause, next 4, pause, etc..

Here is the code behind the show

Oct 13

Datomic: Your Call Will Be Answered In The Order It Was Received

The Star Family

Mother Sun likes to have its planets close by at all times. Some closer than others, but that’s how life goes: someone grows up and becomes a star, others take their places next to that someone.

This narrative is about such a family, a Solar System family, where planets live at a certain distance from the Sun. The schema is simple, “solar/planet” with a “solar/distance”:

(def schema
  [{:db/id #db/id[:db.part/db]
    :db/ident :solar/planet
    :db/valueType :db.type/string
    :db/cardinality :db.cardinality/one
    :db.install/_attribute :db.part/db}
   {:db/id #db/id[:db.part/db]
    :db/ident :solar/distance
    :db/valueType :db.type/long
    :db/cardinality :db.cardinality/one
    :db.install/_attribute :db.part/db}])

Here is the data from almighty wikipedia:

(def data
  [{:db/id #db/id[:db.part/user]
    :solar/planet "Mercury"
    :solar/distance 57909175}
   {:db/id #db/id[:db.part/user]
    :solar/planet "Venus"
    :solar/distance 108208930}
   {:db/id #db/id[:db.part/user]
    :solar/planet "Earth"
    :solar/distance 149597890}
   {:db/id #db/id[:db.part/user]
    :solar/planet "Mars"
    :solar/distance 227936640}
   {:db/id #db/id[:db.part/user]
    :solar/planet "Jupiter"
    :solar/distance 778412010}
   {:db/id #db/id[:db.part/user]
    :solar/planet "Saturn"
    :solar/distance 1426725400}
   {:db/id #db/id[:db.part/user]
    :solar/planet "Uranus"
    :solar/distance 2870972200}
   {:db/id #db/id[:db.part/user]
    :solar/planet "Neptune"
    :solar/distance 4498252900}])

Creating a schema and importing the data:

(d/transact *conn* schema)
(d/transact *conn* data)


Great, so now we have all that knowledge in Datomic. Let’s check it out:

(d/q '[:find ?p ?d :in $ 
       :where [?e :solar/planet ?p]
              [?e :solar/distance ?d]] (db *conn*))
#{["Venus" 108208930] ["Earth" 149597890] ["Saturn" 1426725400] ["Uranus" 2870972200] 
  ["Jupiter" 778412010] ["Mercury" 57909175] ["Neptune" 4498252900] ["Mars" 227936640]}

Looks right, but.. out of order? Yea, that is strange, since we “imported” the data in a vector, e.g. with an order in mind. Let’s focus on the planets themselves:

(d/q '[:find ?p :in $ 
       :where [?e :solar/planet ?p]] (db *conn*))
#{["Saturn"] ["Venus"] ["Jupiter"] ["Mercury"] ["Earth"] ["Mars"] ["Neptune"] ["Uranus"]}

Again out of order, this time in a “different” out of order.

Popping The Hood

One thing to notice above is that the result, that gets returned from a query, is a set, and a set has no (specific) order. So given the result and its type, Datomic did not do anything wrong, it just returned a set of planets: exactly what we asked for.

However, since all the facts were asserted in order, Datomic must have remembered them in order, right? Well let’s check. Every fact that gets asserted, gets assigned an entity id. Hence instead of looking at planet names, let’s look at corresponding entity ids:

(d/q '[:find ?e :in $
       :where [?e :solar/planet ?p]] (db *conn*))
#{[17592186045418] [17592186045420] [17592186045419] [17592186045422] [17592186045421] 
  [17592186045424] [17592186045423] [17592186045425]}

Better. Now we see that Datomic in fact has entity ids that we can easily sort:

(d/q '[:find (sort ?e) :in $ 
       :where [?e :solar/planet ?p]] (db *conn*))
[[(17592186045418 17592186045419 17592186045420 17592186045421 17592186045422 17592186045423 
   17592186045424 17592186045425)]]

And even convert these ids back to planet names, it’s all data after all:

(->> (d/q '[:find (sort ?e) :in $ 
            :where [?e :solar/planet ?p]] (db *conn*))
     (map (comp :solar/planet #(d/entity (db *conn*) %))))
("Mercury" "Venus" "Earth" "Mars" "Jupiter" "Saturn" "Uranus" "Neptune")

Very nice. But can we do better? Yes we can.

Now I Know The Trick

The problem with the solution above, it does two lookups: first to get the entity id, second to lookup data for this entity id. But we can do better. The query to get an entity id already “works with” a planet name, and “knows” about it. So why not use both of them right away:

(d/q '[:find ?p ?e :in $ 
       :where [?e :solar/planet ?p]] (db *conn*))
#{["Mars" 17592186045421] ["Saturn" 17592186045423] ["Neptune" 17592186045425] 
  ["Uranus" 17592186045424] ["Earth" 17592186045420] ["Mercury" 17592186045418] 
  ["Venus" 17592186045419] ["Jupiter" 17592186045422]}

Same query, just a little more data back. And Clojure loves data, now its trivial to get them in order with just Clojure:

(->> (d/q '[:find ?p ?e :in $ 
           :where [?e :solar/planet ?p]] (db *conn*))
     (sort-by second)
     (map first))
("Mercury" "Venus" "Earth" "Mars" "Jupiter" "Saturn" "Uranus" "Neptune")

That’s more like it. Of course we can use our knowledge about the data, and sort planets by the distance:

(->> (d/q '[:find ?p ?d :in $ 
            :where [?e :solar/planet ?p]
                   [?e :solar/distance ?d]] (db *conn*))
     (sort-by second)
     (map first))
("Mercury" "Venus" "Earth" "Mars" "Jupiter" "Saturn" "Uranus" "Neptune")

Not all data comes with such a direct ranking (as distance) of course, but whatever comes in Datomic’s way is definitely processed in the order it was received.

Sent from Earth

Jul 13

ClojureScript: Use Any JavaScript Library

Since ClojureScript relies on Google Closure compiler to “get down” to JavaScript, in order to take advantage of an “advanced” compilation, external JavaScript libraries have to follow certain Google Closure standards.

A Google Closure compiler introduces the concept of the “extern”: a symbol that is defined in code external to the code processed by the compiler. This is needed to exclude certain JS vars and functions that do not follow the standard that the g-closure advanced compilation relies on.

Many JS libraries do not follow g-closure standards, and there is a closure-compiler repository with some pre-built externs for JQuery, Jasmine, AngularJS and others. However there are about X thousand more JS libraries that could be useful while writing apps with ClojureScript.

While there is a way to manually go through all the ClojureScript code, find “external” JS vars/functions and write externs for them, there is a much nicer alternative written by Chris Houser in a gist: “Externs for ClojureScript” that creates a “DummyExternClass” and attaches all the vars and functions that are not part of (not recognized by) “core.cljs”.

Here is an example of creating externs for an arbitrary ClojureScript file that uses a nice chap timeline JS library:

user=> (print (externs-for-cljs ".../cljs/timeline.cljs"))
var document={};
var links.Timeline={};
var links.events={};
var DummyExternClass={};

The original file itself is not important, the output is. “externs-for-cljs” treated a couple of namespaced functions as vars, but it is a easy fix:

var links={};

At this point the whole output can be saved as “timeline-externs.js”, and pointed to by “lein-cljsbuild”:

    [{:source-paths [...],
      {:source-map ...,
       :output-to ...,
       :externs ["resources/public/js/externs/timeline-externs.js"]
       :optimizations :advanced}}]}

ClojureScript files based on other JS libraries that are not in a closure compiler repo: e.g. Twitter Bootstrap, Raphael and others can be “extern”ed the same way in order to take advantage of g-closure advanced compilation.

Interesting bit here that is not related to externs, but is to an advanced compilation is a “:source-map” attribute which is a way to map a combined/minified file back to an unbuilt state. It generates a source map which holds information about original JS files. When you query a certain line and column number in the (advanced) generated JavaScript you can do a lookup in the source map which returns the original location. Very handy to debug “:advanced” compiled ClojureScript.

For more info:

Jun 13

Clojure: Down to Nanos

Here is what needs to happen: there is a URN that is a part of an HTTP request. It needs to be parsed/split on the last “:”. The right part would be the key, and the left part would be a value (we’ll call it “id” in this case). Here is an example:

user=> (def urn "company:org:account:347-68F3726A84C")

After parsing we should get a neat map:

{:by "company:org:account", :id "347-68F3726A84C"}

While it feels more readable to start “regex”ing the problem:

user=> (require '[clojure.string :as cstr])
user=> (def re-colon (re-pattern #":"))
user=> (cstr/split "company:org:account:347-68F3726A84C" re-colon)
["company" "org" "account" "347-68F3726A84C"]

Just splitting on a simple single character regex (above) takes almost a microsecond (i.e. in this case about 2242 CPU cycles):

user=> (bench (cstr/split "company:org:account:347-68F3726A84C" re-colon))
       Execution time mean : 830.235720 ns

In general it is always best to use language “builtins”, so we’d turn to Java’s own lastIndexOf:

(defn parse-urn-id [urn]
  (let [last-colon (.lastIndexOf urn ":")]
    {:by (subs urn 0 last-colon)
     :id (subs urn (+ last-colon 1))}))

Putting “validation” outside for a moment, this actually does what is needed:

=> (parse-urn-id urn)
{:by "company:org:account", :id "347-68F3726A84C"}
user=> (bench (parse-urn-id urn))
       Execution time mean : 5.588747 µs

Wow.. builtins seem to fail us. How come?

A culprit is not “lastIndexOf”, but a way Clojure resolves an “untyped” “urn”. Anything that is defined with “def” is kept inside a Clojure “Var” that uses reflection and is not amenable to HotSpot optimizations. An interesting read on what actually happens: “Why Clojure doesn’t need invokedynamic, but it might be nice”.

While, in most cases, parsing a String for 6 microseconds is a perfectly fine expectation, there is a simple hint that can make it run 60 times faster. It’s a hint.. It’s a type hint:

(defn parse-urn-id [^String urn]
  (let [last-colon (.lastIndexOf urn ":")]
    {:by (subs urn 0 last-colon)
     :id (subs urn (+ last-colon 1))}))
user=> (bench (parse-urn-id urn))
       Execution time mean : 83.409471 ns

By hinting a “urn” type to be a “^String”, this function is now 67 times faster.

Achieve a warm and fuzzy feeling...   [Done]