Clojure: Down to Nanos

Here is what needs to happen: there is a URN that is a part of an HTTP request. It needs to be parsed/split on the last “:”. The right part would be the key, and the left part would be a value (we’ll call it “id” in this case). Here is an example:

user=> (def urn "company:org:account:347-68F3726A84C")

After parsing we should get a neat map:

{:by "company:org:account", :id "347-68F3726A84C"}

While it feels more readable to start “regex”ing the problem:

user=> (require '[clojure.string :as cstr])
 
user=> (def re-colon (re-pattern #":"))
user=> (cstr/split "company:org:account:347-68F3726A84C" re-colon)
 
["company" "org" "account" "347-68F3726A84C"]

Just splitting on a simple single character regex (above) takes almost a microsecond (i.e. in this case about 2242 CPU cycles):

user=> (bench (cstr/split "company:org:account:347-68F3726A84C" re-colon))
 
       Execution time mean : 830.235720 ns

In general it is always best to use language “builtins”, so we’d turn to Java’s own lastIndexOf:

(defn parse-urn-id [urn]
  (let [last-colon (.lastIndexOf urn ":")]
    {:by (subs urn 0 last-colon)
     :id (subs urn (+ last-colon 1))}))

Putting “validation” outside for a moment, this actually does what is needed:

=> (parse-urn-id urn)
{:by "company:org:account", :id "347-68F3726A84C"}
user=> (bench (parse-urn-id urn))
 
       Execution time mean : 5.588747 µs

Wow.. builtins seem to fail us. How come?

A culprit is not “lastIndexOf”, but a way Clojure resolves an “untyped” “urn”. Anything that is defined with “def” is kept inside a Clojure “Var” that uses reflection and is not amenable to HotSpot optimizations. An interesting read on what actually happens: “Why Clojure doesn’t need invokedynamic, but it might be nice”.

While, in most cases, parsing a String for 6 microseconds is a perfectly fine expectation, there is a simple hint that can make it run 60 times faster. It’s a hint.. It’s a type hint:

(defn parse-urn-id [^String urn]
  (let [last-colon (.lastIndexOf urn ":")]
    {:by (subs urn 0 last-colon)
     :id (subs urn (+ last-colon 1))}))
user=> (bench (parse-urn-id urn))
 
       Execution time mean : 83.409471 ns

By hinting a “urn” type to be a “^String”, this function is now 67 times faster.

Achieve a warm and fuzzy feeling...   [Done]