"; */ ?>

clojure


6
Oct 15

Adding Simple to HBase

Mutate and Complect!


The usual trend in functional programing is “immutable” => good, “mutable” => bad. Not true for all cases, but it is true for most, especially when multiple threads, processes, machines are involved.

HBase APIs are very much based on mutation. Since there are so many different ways to, for example, “scan” data, instead of using overloaded constructors or builders, HBase relies on setters. Count the number of setters in Scan, for example.

This just does not sit well with “immutable is good” feeling.

A long time HBaser might not agree, but I believe a learning curve is quite steep for HBase newcomers. Тhis is due to many things, Hadoop architecture, data model, row key design, co-processors, all the cool things it does. But mainly, I think, this is due to a heavy set of APIs that are just not simple.

Connecting “with” HBase


Here is an example from HBase book on how to find all columns in a row and family that start with “abc”. In SQL this would be done with something like:

SELECT * FROM <table> WHERE <row> LIKE 'abc%';

In HBase (this is a book example) it would be:

HTableInterface t = ...;
byte[] row = ...;
byte[] family = ...;
byte[] prefix = Bytes.toBytes("abc");
Scan scan = new Scan(row, row);        // (optional) limit to one row
scan.addFamily(family);                // (optional) limit to one family
Filter f = new ColumnPrefixFilter(prefix);
scan.setFilter(f);
scan.setBatch(10);                     // set this if there could be many columns returned
ResultScanner rs = t.getScanner(scan);
for (Result r = rs.next(); r != null; r = rs.next()) {
  for (KeyValue kv : r.raw()) {
    // each kv represents a column
  }
}
rs.close();

and that is given that data is not actually read into a comprehendible data structure (done in a nested loop), and concepts like row / family / column / scan, etc.. are well understood. I say it is not that simple. But can it be?

I say yes, it can. How about:

(scan conn table-name :starts-with "abc")

while a connection (conn) needs to be created and a family might be added if needed, this is a much simpler way to “connect with” HBase.

These are some of the reasons cbass was created: mainly to add “simple” to HBase.


12
Aug 15

Plain Old Clojure Object

Those times you need to have Java APIs.. Some of these APIs need to return data. In Clojure it is usually a map:

{:q “What is..?” :a 42}

In Java it is not that simple for several reasons.. Java maps are mutable, there are no idiomatic tools to inspect, destructure them, Java (programmers) like different types for different POJOs, etc..

So this data needs to be encapsulated in a way Java likes it, usually in a form of an object with private fields and getters with no behavior, i.e. POJOs.

Of course a Clojure project may have Java sources, where these POJOs can live, but why not just stick to Clojure all the way and create them using gen-class. Why? Because it is fun, and also because we can easily :require and use other Clojure libraries in these POJOs in case we need to.

JSL: Java as a second language


Oh yea, and let’s call them POCOs, cause they kind of are:

(ns poco)
 
(gen-class 
  :name org.stargate.PlainOldClojureObject
  :state "state"
  :init "init"
  :constructors {[Boolean Boolean String] []}
  :methods [[isHuman [] Boolean]
            [isFound [] Boolean]
            [planet [] String]])
 
(defn -init [human? found? planet]
  [[] (atom {:human? human?
             :found? found?
             :planet planet})])
 
(defn- get-field [this k]
  (@(.state this) k))
 
(defn -isHuman [this]
  (get-field this :human?))
 
(defn -isFound [this]
  (get-field this :found?))
 
(defn -planet [this]
  (get-field this :planet))
 
(defn -toString [this]
  (str @(.state this)))

This compiles and behaves exactly like a Java POJO would, since it is a POJO, I mean POCO:

user=> (import '[org.stargate PlainOldClojureObject])
org.stargate.PlainOldClojureObject
 
user=> (def poco (PlainOldClojureObject. true true "42"))
#'user/poco
 
user=> poco
#object[org.stargate.PlainOldClojureObject 0x68033b90 "{:human? true, :found? true, :planet \"42\"}"]
 
user=> (.isHuman poco)
true
user=> (.isFound poco)
true
user=> (.planet poco)
"42"

Of course there are records, but POCOs are just more fun :)


23
Apr 15

Question Everything

Feeding Da Brain


In 90s you would say: “I am a programmer”. Some would reply with “o.. k”. More insightful would reply with a question “which programming language?”.

21st century.. socially accepted terminology has changed a bit, now you would say “I am a developer”. Some would ask “which programming language?”. More insightful would reply with a question “which out of these 42 languages do you use the most?”

The greatest thing about using several at the same time is that feeling of constant adjustment as I jump between the languages. It feels like my brain goes through exuberant synaptogenesis over and over again building those new formations.

   What's for dinner today honey?
   Asynchronous brain refactoring with a gentle touch of "mental polish"

With all these new synapses, I came to love the fact that something that seemed so holy and “crystal right” before, now gets questioned and can easily be dismissed. Was it wrong all along? No. Did it change? No. So what changed then? Well.. perception did.

Inmates of the “Gang of Four” Prison


Design patterns are these “ways” of doing things that cripple new programmers, and imprison many senior ones. Instead of having an ability to think freely, we have all these “software standard patterns” which mostly have to do with limitations of “technology at time”.

Take big guys, like C++ / Java / C#, while they have many great features and ideas, these languages have horrible story of “behavior and state”: you always have to guard something. Whether it is from multiple threads, or from other people misusing it. The languages themselves promote reuse vs. decoupling: i.e. “let’s inherit that behavior”, etc..

So how do we overcome these risks and limitations? Simple: let’s create dozens of “ways” that all developers will follow to fight this together. Oh, yea, and let’s make it industry standard, call them patterns, teach them in schools, and select people by how well they can “apply” these patterns to “any” problem at hand.

Not all developers bought into this cult of course. Here is Peter Norvig’s notes from 1996, where he “dismisses” 16 out of 23 patterns from Gang of Four, by just using functions, types, modules, etc.

Builder Pattern vs. Immutable Data Structures


Builder pattern makes sense unless.. several things. There is a great “Builders vs. Option Maps” short post that talks about builder patter limitations:

* Builders are too verbose
* Builders are not data structures
* Builders are mutable
* Builders can’t easily compose
* Builders are order dependent

Due to mutable data structures (in Java/C#/alike) Builders still make sense for things like Google protobufs with simple (i.e. primitive) types, but for most cases where immutable things need to be created it is best to use immutable data structures for the above reasons.

While jumping between the languages, I often need to create things in Clojure that are implemented in Java with Builders. This is not always easy, especially when Builders rely on external state or/and when Builders need to be passed around (i.e. to achieve a certain level of composition).

Let’s say I need to create a notification client that, by design (on the Java side of things), takes some initial state (i.e. an external system connection props), and then event handlers (callbacks) are registered on it one by one, before it gets built, i.e. builds a final, immutable notification client:

SomeClassBuilder builder = SomeClass.newBuilder()
                             .setState( state )
                             .setAnotherThing( thing );
 
builder.register( notificationHandlerOne );
builder.register( notificationHandlerTwo );
...
builder.register( notificationHandlerN );
 
builder.build();

In case you need to decouple “register events” logic from this monolithic piece above, you would pass that builder to the caller that would pass it down the chain. It is something that seems “normal” to do (at least to a “9 to 5” developer), since methods with side effects do not really raise any eyebrows in OO languages. In fact most of methods in those languages have side effects.

I quite often see people designing builders such as the one above (with lots of external state), and when I need to use them in Clojure, it becomes very apparent that the above is not well designed:

;; creates a "mutable" builder..
(defn- make-bldr [state thing]
  (-> (SomeClass/newBuilder)
      (.withState state)
      (.withAnotherThing thing)))
 
;; wraps "builder.register(foo)" into a composable function
(defn register-event-handler! [bldr handler]
    ;; in case handler is just a Clojure function wrap it with something ".register" will accept
    (.register bldr handler))
 
(defn notification-client [state thing]
  (let [bldr (make-bldr state thing)]
    ;; ... do things that would call "register-event-handler!" passing them the "bldr"
    (.build bldr)))

Things that immediately feel “off” are: returning a mutable builder from “make-bldr”, mutating that builder in “register-event-handler!”, and returning that mutated builder back.. Not something common in Clojure at all.

Again the goal is to “decouple logic to register events from notification client creation“, if both can happen at the same time, the above Builder would work.

In Clojure it would just be a map. All data structures in Clojure are immutable, so there would be no intermediate mutable holder/builder, it would always be an immutable map. When all handlers are registered, this map would be passed to a function that would create a notification client with these handlers and start it with “state” and “thing”.

Mocking Suspicions


Another synapse formation, that was created from using many languages at the same time, convinced me that if I have to use а mock to test something, that something needs a close look, and would potentially welcome refactoring.

The most common case for mocking is:

A method of a component "A" takes a component "B" that depends on a component "C".

So if I want to test A’s method, I can just mock B and not to worry about C.

The flaw here is:

"B" that depends on a component "C".

These things are extremely beneficial to question. I used to use Spring a lot, and when I did, I loved it. Learned from it, taught it to others, and had a great sense of accomplishment when high complexity could be broken down to simple pieces and re wired together with Spring.

Time went on, and in Erlang or Clojure, or even Groovy for that matter, I used Spring less and less. I still use it for all my Java work, but not as much. So if 10 years ago:

"B" that depends on a component "C".

was a way of life, now, every time I see it, I ask why?. Does “B” have to depend on “C”? Can “B” be stateless and take “C” whenever it needed it, rather that be injected with it and carry that state burden?

If before “B” was:

public class B {
 
  private final C c;
 
  public B ( C c ) {
    this.c = c;
  }
 
  public Profit doBusiness() {
    return new Profit( c.doYourBusiness() + 42 );
  }
}

Can it instead just be:

public final class B {
  public static Profit doBusiness( C c ) {
    return new Profit( c.doYourBusiness() + 42 );
  }
}

In most cases it can! It really can, the problem is not enough of us question that dependency, but we should.

This does not mean “B” no longer depends on “C”, it means something more: there is no “B” (“there is no spoon..”) as it just becomes a module, which does not need to be passed around as an instance. The only thing that is left from “B” is “doBusiness( C c )”. Do we need to mock “C” now? Can it, its instance disappear the way “B” did? Most likely, and even if it can’t for whatever reason (i.e. someone else’s code), I did question it, and so should you.


The more synapse formations I go through the better I learn to question pretty much everything. It is fun, and it pays back with beautiful revelations. I love my brain :)


2
Jul 14

Pom Pom Clojure

Fun with lein, Money with maven


While doing Clojure projects, it is the second time I faced a problem with a customer’s “build team” that knows what Java is, loves Maven, but does not believe in Mr. Leiningen, hence all of the lein niceties (plugins, once liners, tasks, etc..) need to now be converted to “pom.xml”s.

A good start is “lein pom”. While it only scratches the surface, it does generate a “pom.xml” with most of the dependencies. But in most cases it needs to be “massaged” well in order to fit а real maven build process.

Usual Suspects


Besides the most common “lein repl”, here is what I usually need lein to do:

* Compile Clojure code
* Some files need to be AOT compiled
* Run Clojure tests
* Compile ClojureScript

(not Clojure specific, but I’ll include it anyway)

* Compile protobuf
* Create a JAR for most projects
* Create a self executing “uberjar” for others

When Clojure is “Ahead Of Time”


Compiling, AOTing and running tests can be done with Clojure Maven Plugin:

<plugin>
    <groupId>com.theoryinpractise</groupId>
    <artifactId>clojure-maven-plugin</artifactId>
    <version>1.3.20</version>
    <extensions>true</extensions>
    <executions>
        <execution>
            <id>compile</id>
            <phase>compile</phase>
            <goals>
                <goal>compile</goal>
            </goals>
        </execution>
    </executions>
    <configuration>
        <namespaces>
            <namespace>whatsapp.core</namespace>
        </namespaces>
        <compileDeclaredNamespaceOnly>true</compileDeclaredNamespaceOnly>
        <sourceDirectories>
            <sourceDirectory>src</sourceDirectory>
        </sourceDirectories>
        <testSourceDirectories>
            <testSourceDirectory>test</testSourceDirectory>
        </testSourceDirectories>
    </configuration>
</plugin>

notice “namespaces” and “compileDeclaredNamespaceOnly”, this is how AOT is done for selected namespaces.

For AOT it’s good to remember that “a side effect of compiling Clojure code is loading the namespaces in order to make macros and functions they use available”, here are AOT compilation gotchas to keep in mind.

Compiling ClojureScript


This one is a bit trickier. If it is possible to convince a build team to install lein as a library that is used for the build process (e.g. similar to “protoc” to compile protobufs), then to compile ClojureScript, a lein cljsbuild can be added to the profile:

vi ~/.lein/profiles.clj
{:user {:plugins [[lein-cljsbuild "1.0.0"]]}}

and an exec maven plugin can be used to relay the execution to “lein”:

<plugin>
    <artifactId>exec-maven-plugin</artifactId>
    <groupId>org.codehaus.mojo</groupId>
    <executions>
        <execution>
            <id>compiling ClojureScript</id>
            <phase>generate-sources</phase>
            <goals>
                <goal>exec</goal>
            </goals>
            <configuration>
                <executable>lein</executable>
                <arguments>
                    <argument>cljsbuild</argument>
                    <argument>once</argument>
                </arguments>
            </configuration>
        </execution>
    </executions>
</plugin>

In fact, if “lein” is installed, it can be used via “exec-maven-plugin” to do everything else as well, but it all depends on build teams’ restrictions. For example, financial customers may have extremely strict “policies”/”rules”/”opinions”.

A couple more options to explore for building ClojureScript would be lein maven plugin and zi-cljs. Here is a related discussion on a ClojureScript google group.

Making Shippables


“lein uberjar” with some config in “project.clj” is all that is needed in “lein” world. In maven universe maven shade plugin will do just that:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <version>2.3</version>
    <executions>
        <execution>
            <phase>package</phase>
            <goals>
                <goal>shade</goal>
            </goals>
            <configuration>
                <transformers>
                    <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                        <mainClass>org.gitpod.WhatsApp</mainClass>
                    </transformer>
                </transformers>
                <filters>
                    <filter>
                        <artifact>*:*</artifact>
                        <excludes>
                            <exclude>META-INF/*.SF</exclude>
                            <exclude>META-INF/*.DSA</exclude>
                            <exclude>META-INF/*.RSA</exclude>
                        </excludes>
                    </filter>
                </filters>
            </configuration>
        </execution>
    </executions>
</plugin>

above will create a self executing JAR with all dependencies included and with an entry point (-main) in “org.gitpod.WhatsApp”.

Google Protocol Buffers


With lein it is as simple as pluging in lein protobuf. In maven, it is not as simple, but also not terribly difficult and solved via maven-protoc-plugin:

<plugin>
    <groupId>com.google.protobuf.tools</groupId>
    <artifactId>maven-protoc-plugin</artifactId>
    <version>0.3.2</version>
    <extensions>true</extensions>
    <executions>
        <execution>
            <goals>
                <goal>compile</goal>
            </goals>
            <phase>generate-sources</phase>
        </execution>
    </executions>
    <configuration>
        <protocExecutable>${PROTOBUF_HOME}/src/protoc</protocExecutable>
        <protoSourceRoot>resources/proto</protoSourceRoot>
        <outputDirectory>target/classes</outputDirectory>
        <!--<additionalProtopathElements>-->
        <!--    <param>${PROTOBUF_HOME}/src/google/protobuf</param>-->
        <!--</additionalProtopathElements>-->
    </configuration>
</plugin>

here is a repository it currently lives at:

<pluginRepositories>
    <pluginRepository>
        <id>protoc-plugin</id>
        <url>http://sergei-ivanov.github.com/maven-protoc-plugin/repo/releases/</url>
    </pluginRepository>
</pluginRepositories>

notice “additionalProtopathElements”. In case clojure-protobuf is used with extensions, a path to “descriptor.proto” can be specified in “additionalProtopathElements”.


8
Mar 14

Leiningen Templates with Arguments

This template is so wrong!


Project templates can be as excellent as they can be awful since they are very opinionated beings:

  • “a web project MUST be Compojure based!”
  • “a network project MUST be Netty based!”
  • “there is no way I am building a web project based on Compojure!”
  • “a network project? of course ZeroMQ!”

But they can be really useful for quick prototypes, for learning new things, even for “real deal”, if of course you agree with their opinion.

Whatsapp WWW


I’ve recently built a template WWW to bootstrap web apps based on Clojure, ClojureScript, Compojure and Ring. Reason? I needed a faster way of getting apps up and running, especially when prototyping ideas.

Building a Leiningen template is quite a simple task. Once the template is build and installed/deployed, I could just do lein new www whatsapp and make my next $19 billion. But I wanted more!

WWW by default would create a project template with a very simple structure that can be immediately brought up (e.g. “lein ring server”) removing any obstacles on the way of making billions, everything is setup: just open vi and start hacking.

However, what if I want a ClojureScript REPL connected to my browser? I would need to go through some docs, and then depending on my experience with lein, Clojure and ClojureScript, I could quickly (or not) set it up myself.

Well, I wanted WWW template to have that setup for me right away. But here is a dilemma: sometimes I do want ClojureScript browser connected REPL, other times I don’t, and the way this REPL setup goes, it requires a code change to enable/disable it.

It’s Good to Have Options


The documented way of creating a lein template does not really talk about “options” that I want with my REPL. What do I do? Well.. It’s a Clojure universe, and, if anything, two things hold true most of the time:

  • It’s Simple
  • It’s a Function

A brand new lein template is done with lein new template www. It creates.. ready? “a template for a template”. That’s right, a template project to create a lein template.

An entry point into this template (of a template) will be located in src/leiningen/new/www.clj. Here www is the name of this template. Let’s peek inside:

$ lein new template www
Generating fresh 'lein new' template project.
$ cd www
$ cat src/leiningen/new/www.clj
(ns leiningen.new.www
  (:use [leiningen.new.templates :only [renderer name-to-path ->files]]))
 
(def render (renderer "www"))
 
(defn www
  "FIXME: write documentation"
  [name]
  (let [data {:name name
              :sanitized (name-to-path name)}]
    (->files data
             ["src/{{sanitized}}/foo.clj" (render "foo.clj" data)])))

See? www here is just a function, that in this case, takes a name parameter. And after this template is installed lein new www whatsapp will pass whatsapp to it.

Just a Function, Just Beatiful


Since www is just a function, why not make it take optional parameters which could potentially change the resulting template?

(defn www
  "Create a new Clojure + ClojureScript + Compojure + Ring project"
  ([name] 
   (www name :noop))
  ([name opts]
  ...))

Great! Now www can either take just name as before, or name and some magic opts. How about one of the options will determine whether or not a template with ClojureScript browser connected REPL is added? I think yes.

And now we can do:

  • either lein new www [app name]
  • or lein new www [app name] :with-brepl

which will create two different flavors of the same template.