"; */ ?>


09
Apr 17

Hazelcast: Keep your cluster close, but cache closer

Hazelcast has a neat feature called Near Cache. Whenever clients talk to Hazelcast servers each get/put is a network call, and depending on how far the cluster is these calls may get pretty costly.

The idea of Near Cache is to bring data closer to the caller, and keep it in sync with the source. Which is why it is highly recommended for data structures that are mostly read.

Hazelcast Near Cache

Near Cache can be created / configured on both: server and client sides

Optionally Near Cache keys can be stored on the file system, and then preloaded when the client restarts.

The examples below are run from a Clojure REPL and use chazel, which is a Clojure library for Hazelcast. To follow along you can:

$ git clone https://github.com/tolitius/chazel
$ cd chazel
$ boot dev
boot.user=> ;; ready for examples

In case you don’t have boot installed it is a one liner install.

Server Side Setup

Well use two different servers not too far from each other so the network latency is enough to get a good visual on how Near Cache could help.

On the server side we’ll create an "events" map (which will start the server if it was not yet started), and will add 100,000 pseudo events to it:

;; these are done on the server:
 
(def m (hz-map "events"))
 
(dotimes [n 100000] (put! m n n))

We can visualize all these puts with hface:

hface putting 100,000 entries

Client Side Without Near Cache

On the client side we’ll create a function to walk over first n keys in the "events" map:

(defn walk-over [m n]
  (dotimes [k n] (get m k)))

Create a new Hazelcast client instance (without Near Cache configured), and walk over first 100,000 events (twice):

(def hz-client (client-instance {:hosts ["10.x.y.z"]}))
 
(def m (hz-map "events" hz-client))
 
(time (walk-over m 100000))
=> "Elapsed time: 30534.997599 msecs"
 
(time (walk-over m 100000))
=> "Elapsed time: 30547.810322 msecs"

Each iteration roughly took 30.5 seconds, and by monitoring the server’s network it was sending packets back and forth for every get:

Hazelcast with no Near Cache

We can see that all these packets came from / correlate well to an "events" map:

hface putting 100,000 entries

Client Side With Near Cache

Now let’s create a different client and configure it with Near Cache for the "events" map:

(def client-with-nc (client-instance {:hosts ["10.x.y.z"]
                                      :near-cache {:name "events"}}))

Let’s repeat the exercise:

(def m (hz-map "events" client-with-nc))
 
(time (walk-over m 100000))
=> "Elapsed time: 30474.719965 msecs"
 
(time (walk-over m 100000))
=> "Elapsed time: 102.141527 msecs"

The first iteration took 30.5 seconds as expected, but the second, and all the subsequent ones, took 100 milliseconds. That’s because a Near Cache kicked in, and all these events are close to the client: are in the client’s memory.

As expected all subsequent calls do not use the server:

Hazelcast with Near Cache

Keeping Near Cache in Sync

The first logical question is: ok, I brought these events into memory, but would not they become stale in case they change on the server?

Let’s check:

;; checking on the client side
(get m 41)
=> 41
;; on the server: changing the value of a key 41 to 42
(put! m 41 42)
;; checking again on the client side
(get m 41)
=> 42

Pretty neat. Hazelcast invalidates “nearly cached” entries by broadcasting invalidation events from the cluster members. These events are fire and forget, but Hazelcast is very good at figuring out if and when these events are lost.

There are a couple of system properties that could be configured to control this behaviour:

  • hazelcast.invalidation.max.tolerated.miss.count: Default value is 10. If missed invalidation count is bigger than this value, relevant cached data will be made unreachable, and the new value will be populated from the source.

  • hazelcast.invalidation.reconciliation.interval.seconds: Default value is 60 seconds. This is a periodic task that scans cluster members periodically to compare generated invalidation events with the received ones from Near Cache.

Near Cache Preloader

In case clients are restarted all the near caches would be lost and would need to be naturally repopulated by applications / client requests.

Near Cache can be configured with a preloader that would persist all the keys from the map to disk, and would repopulate the cache using the keys from the file in case of a restart.

Let’s create a client instance with such a preloader:

(def client-with-nc (client-instance {:hosts ["10.x.y.z"] 
                                      :near-cache {:name "events"
                                                   :preloader {:enabled true
                                                               :store-initial-delay-seconds 60}}}))

And walk over the map:

(def m (hz-map "events" client-with-nc))
 
(walk-over m 100000)

As per store-initial-delay-seconds config property, 60 seconds after we created a reference to this map, preloader will persist the keys into the nearCache-events.store file (filename is configurable):

INFO: Stored 100000 keys of Near Cache events in 306 ms (1953 kB)

Now let’s restart the client and try to iterate over the map again:

(shutdown-client client-with-nc)
(def client-with-nc (client-instance {:hosts ["10.x.y.z"]
                                      :near-cache {:name "events"
                                      :preloader {:enabled true}}}))
 
(def m (hz-map "events" client-with-nc))
 
(time (walk-over m 100000))
INFO: Loaded 100000 keys of Near Cache events in 3230 ms
"Elapsed time: 2920.688369 msecs"
 
(time (walk-over m 100000))
;; "Elapsed time: 103.878848 msecs"

The first iteration took 3 seconds (and not 30) since once the preloader loaded all the keys, the rest (27 seconds worth of data) came back from the client’s memory.

This 3 second spike can be observed by the network usage:

Hazelcast with Near Cache

And all the subsequent calls now again take 100 ms.

Near Cache Full Config

There are a lot more Near Cache knobs beyond the map name and preloader. All are well documented in the Hazelcast docs and available as edn config with chazel.

Here is an example:

{:in-memory-format :BINARY,
 :invalidate-on-change true,
 :time-to-live-seconds 300,
 :max-idle-seconds 30,
 :cache-local-entries true,
 :local-update-policy :CACHE_ON_UPDATE,
 :preloader {:enabled true,
             :directory "nearcache-example",
             :store-initial-delay-seconds 15,
             :store-interval-seconds 60},
 :eviction  {:eviction-policy :LRU,
             :max-size-policy :ENTRY_COUNT,
             :size 800000}}

Any config options that are not provided will be set to Hazelcast defaults.


10
Jan 17

Hubble Space Mission Securely Configured

This week we learned of a recent alien hacking into Earth. We have a suspect but unsure about the true source. Could be The Borg, could be Klingons, could be Cardassians. One thing is certain: we need better security and flexibility for our space missions.

We’ll start with the first line of defense: science based education. The better we are educated, the better we are equipped to make decisions, to understand the universe, to vote.

One of the greatest space exploration frontiers is the Hubble Space Telescope. The next 5 minutes we will work on configuring and bringing the telescope online keeping things secure and flexible in a process.

The Master Plan

In order to keep things simple and solid we are going to use these tools:

  • Vault is a tool for managing secrets.
  • Consul besides being a chief magistrates of the Roman Republic* is now also a service discovery and distributed configuration tool.
  • Hazelcast is a simple, powerful and pleasure to work with an in memory data grid.

Here is the master plan to bring Hubble online:

override hazelcast hosts

Hubble has its own internal configuration file which is not environment specific:

{:hubble {:server {:port 4242}
          :store {:url "spacecraft://tape"}
          :camera {:mode "mono"}
          :mission {:target "Eagle Nebula"}
 
          :log {:name "hubble-log"
                :hazelcast {:hosts "OVERRIDE ME"
                            :group-name "OVERRIDE ME"
                            :group-password "OVERRIDE ME"
                            :retry-ms 5000
                            :retry-max 720000}}}}

As you can see the initial, default mission is the “Eagle Nebula”, Hubble’s state is stored on tape, it uses a mono (vs. color) camera and has in internal server that runs on port 4242.

Another thing to notice, Hubble stores an audit/event log in a Hazelcast cluster. This cluster needs environment specific location and creds. While the location may or may not be encrypted, the creds should definitely be.

All the above of course can be, and some of them will be, overridden at startup. We are going to keep the overrides in Consul, and the creds in Vault. On Hubble startup the Consul overrides will be merged with the Hubble internal config, and the creds will be unencrypted and security read from Vault and used to connect to the Hazelcast cluster.

Environment Matters

Before configuring Hubble, let’s create and initialize the environment. As I mentioned before we would need to setup Consul, Vault and Hazelcast.

Consul and Vault

Consul will play two roles in the setup:

  • a “distributed configuration” service
  • Vault’s secret backend

Both can be easily started with docker. We’ll use cault‘s help to setup both.

$ git clone https://github.com/tolitius/cault
$ cd cault
 
$ docker-compose up -d
Creating cault_consul_1
Creating cault_vault_1

Cault runs both Consul and Vault’s official docker images, with Consul configured to be Vault’s backend. Almost done.

Once the Vault is started, it needs to be “unsealed”:

docker exec -it cault_vault_1 sh
$ vault init          ## will show 5 unseal keys and a root token
$ vault unseal        ## use 3 out 5 unseal keys
$ vault auth          ## use a root token        ## >>> (!) remember this token

Not to duplicate it here, you can follow unsealing Vault step by step with visuals in cault docs.

We would also save Hubble secrets here within the docker image:

$ vi creds

add {"group-name": "big-bank", "group-password": "super-s3cret!!!"} and save the file.

now write it into Vault:

$ vault write secret/hubble-audit value=@creds
 
Success! Data written to: secret/hubble-audit

This way the actual group name and password won’t show up in the bash history.

Hazelcast Cluster in 1, 2, 3

The next part of the environment is a Hazelcast cluster where Hubble will be sending all of the events.

We’ll do it with chazel. I’ll use boot in this example, but you can use lein / gradle / pom.xml, anything that can bring [chazel "0.1.12"] from clojars.

Open a new terminal and:

$ boot repl
boot.user=> (set-env! :dependencies '[[chazel "0.1.12"]])
boot.user=> (require '[chazel.core :as hz])
 
;; creating a 3 node cluster
boot.user=> (hz/cluster-of 3 :conf (hz/with-creds {:group-name "big-bank"
                                                   :group-password "super-s3cret!!!"}))
 
Members [3] {
    Member [192.168.0.108]:5701 - f6c0f121-53e8-4be0-a958-e8d35571459d
    Member [192.168.0.108]:5702 - e773c493-efe8-4806-b568-d2af57947fc9
    Member [192.168.0.108]:5703 - f9e0719d-aec7-405e-9aef-48baa56b11ec this}

And we have a 3 node Hazelcast cluster up and running.

Note that Consul, Vault and Hazelcast cluster would already be running in the real world scenario before we get to write and deploy Hubble code.

Let there be Hubble!

The Hubble codebase lives on github, as it should :) So let’ clone it first:

$ git clone https://github.com/tolitius/hubble
 
$ cd hubble

“Putting some data where Consul is”

We do have Consul up and running, but we have no overrides in it. We can either:

  • manually add overrides for Hubble config or
  • just initialize Consul with current Hubble config / default overrides

Hubble has init-consul boot task which will just copy a part of Hubble config to Consul, so we can override values later if we need to:

$ boot init-consul
read config from resource: "config.edn"
22:49:34.919 [clojure-agent-send-off-pool-0] INFO  hubble.env - initializing Consul at http://localhost:8500/v1/kv

Let’s revisit Hubble config and figure out what needs to be overridden:

{:hubble {:server {:port 4242}
          :store {:url "spacecraft://tape"}
          :camera {:mode "mono"}
          :mission {:target "Eagle Nebula"}
 
          :log {:enabled false                              ;; can be overridden at startup / runtime / consul, etc.
                :auth-token "OVERRIDE ME"
                :name "hubble-log"
                :hazelcast {:hosts "OVERRIDE ME"
                            :group-name "OVERRIDE ME"
                            :group-password "OVERRIDE ME"
                            :retry-ms 5000
                            :retry-max 720000}}
 
          :vault {:url "OVERRIDE ME"}}}

The only obvious thing to override is hubble/log/hazelcast/hosts since creds need to be later overridden securely at runtime, as well as the hubble/log/auth-token. In fact if you look into Consul, you would see neither creds nor the auth token.

The less obvious thing to override is the hubble/vault/url. We need this, so Hubble knows where Vault lives once it needs to read and decrypt creds at runtime.

We will also override hubble/log/enabled to enable Hubble event logging.

So let’ override these in Consul:

  • hubble/log/hazelcast/hosts to ["127.0.0.1"]
  • hubble/vault/url to http://127.0.0.1:8200
  • hubble/log/enabled to true

We can either go to the Consul UI to override these one by one, but it is easier to do it programmatically in one shot.

Envoy Extraordinary and Minister Plenipotentiary

Hubble relies on envoy to communicate with Consul, so writing a value or a map with all overrides can be done in a single go:

(from under /path/to/hubble)

$ boot dev
boot.user=> (require '[envoy.core :as envoy])
nil
boot.user=> (def overrides {:hubble {:log {:enabled true
                                           :hazelcast {:hosts ["127.0.0.1"]}}
 
                                     :vault {:url "http://127.0.0.1:8200"}}})
#'boot.user/overrides
boot.user=> (envoy/map->consul "http://localhost:8500/v1/kv" overrides)

We can spot check these in Consul UI:

override hazelcast hosts

Consul is all ready. And we are ready to bring Hubble online.

Secrets are best kept by people who don’t know them

Two more things to solve the puzzle are Hazelcast creds and the auth token. We know that creds are encrypted and live in Vault. In order to securely read them out we would need a token to access them. But we also do not want to expose the token to these creds, so we would ask Vault to place the creds in one of the Vault’s cubbyholes for, say 120 ms, and generate a temporary, one time use, token to access this cubbyhole. This way, once the Hubble app gets creds at runtime, this auth token did its job and can no longer be used.

In Vault lingo this is called “Response Wrapping“.

cault, the one you cloned at the very beginning, has a script to generate this token. And supporting documentation on response wrapping.

We saved Hubble Hazelcast creds under secret/hubble-audit, so let’s generate this temp token for it. We need to remember the Vault’s root token from the “Vault init” step in order for cault script to work:

(from under /path/to/cault)

$ export VAULT_ADDR=http://127.0.0.1:8200
$ export VAULT_TOKEN=797e09b4-aada-c3e9-7fe8-4b7f6d67b4aa
 
$ ./tools/vault/cubbyhole-wrap-token.sh /secret/hubble-audit
eda33881-5f34-cc34-806d-3e7da3906230

eda33881-5f34-cc34-806d-3e7da3906230 is the token we need, and, by default, it is going to be good for 120 ms. In order to pass it along to Hubble start, we’ll rely on cprop to merge an ENV var (could be a system property, etc.) with existing Hubble config.

In the Hubble config the token lives here:

{:hubble {:log {:auth-token "OVERRIDE ME"}}}

So to override it we can simply export an ENV var before running the Hubble app:

(from under /path/to/hubble)

$ export HUBBLE__LOG__AUTH_TOKEN=eda33881-5f34-cc34-806d-3e7da3906230

Now we 100% ready. Let’s roll:

(from under /path/to/hubble)

$ boot up
INFO  mount-up.core - >> starting.. #'hubble.env/config
read config from resource: "config.edn"
INFO  mount-up.core - >> starting.. #'hubble.core/camera
INFO  mount-up.core - >> starting.. #'hubble.core/store
INFO  mount-up.core - >> starting.. #'hubble.core/mission
INFO  mount-up.core - >> starting.. #'hubble.watch/consul-watcher
INFO  hubble.watch - watching on http://localhost:8500/v1/kv/hubble
INFO  mount-up.core - >> starting.. #'hubble.server/http-server
INFO  mount-up.core - >> starting.. #'hubble.core/mission-log
INFO  vault.client - Read cubbyhole/response (valid for 0 seconds)
INFO  chazel.core - connecting to:  {:hosts [127.0.0.1], :group-name ********, :group-password ********, :retry-ms 5000, :retry-max 720000}
Jan 09, 2017 11:54:40 PM com.hazelcast.core.LifecycleService
INFO: hz.client_0 [big-bank] [3.7.4] HazelcastClient 3.7.4 (20161209 - 3df1bb5) is STARTING
Jan 09, 2017 11:54:40 PM com.hazelcast.core.LifecycleService
INFO: hz.client_0 [big-bank] [3.7.4] HazelcastClient 3.7.4 (20161209 - 3df1bb5) is STARTED
Jan 09, 2017 11:54:40 PM com.hazelcast.client.connection.ClientConnectionManager
INFO: hz.client_0 [big-bank] [3.7.4] Authenticated with server [192.168.0.108]:5703, server version:3.7.4 Local address: /127.0.0.1:52261
Jan 09, 2017 11:54:40 PM com.hazelcast.client.spi.impl.ClientMembershipListener
INFO: hz.client_0 [big-bank] [3.7.4]
 
Members [3] {
    Member [192.168.0.108]:5701 - f6c0f121-53e8-4be0-a958-e8d35571459d
    Member [192.168.0.108]:5702 - e773c493-efe8-4806-b568-d2af57947fc9
    Member [192.168.0.108]:5703 - f9e0719d-aec7-405e-9aef-48baa56b11ec
}
 
Jan 09, 2017 11:54:40 PM com.hazelcast.core.LifecycleService
INFO: hz.client_0 [big-bank] [3.7.4] HazelcastClient 3.7.4 (20161209 - 3df1bb5) is CLIENT_CONNECTED
Starting reload server on ws://localhost:52265
Writing adzerk/boot_reload/init17597.cljs to connect to ws://localhost:52265...
 
Starting file watcher (CTRL-C to quit)...
 
Adding :require adzerk.boot-reload.init17597 to app.cljs.edn...
Compiling ClojureScript...
• js/app.js
Elapsed time: 8.926 sec

Exploring Universe with Hubble

… All systems are check … All systems are online

Let’s go to http://localhost:4242/ where Hubble’s server is listening to:

Let’s repoint Hubble to the Cat’s Eye Nebula by changing a hubble/mission/target to “Cats Eye Nebula”:

Also let’s upgrade Hubble’s camera from a monochrome one to the one that captures color by changing hubble/camera/mode to “color”:

Check the event log

Captain wanted the full report of events from the Hubble log. Aye aye, captain:

(from under a boot repl with a chazel dep, as we discussed above)

;; (Hubble serializes its events with transit)
boot.user=> (require '[chazel.serializer :as ser])
 
boot.user=> (->> (for [[k v] (into {} (hz/hz-map "hubble-log"))]
                   [k (ser/transit-in v)])
                 (into {})
                 pprint)
{1484024251414
 {:name "#'hubble.core/mission",
  :state {:active true, :details {:target "Cats Eye Nebula"}},
  :action :up},
 1484024437754
 {:name "#'hubble.core/camera",
  :state {:on? true, :settings {:mode "color"}},
  :action :up}}

This is the event log persisted in Hazelcast. In case Hubble goes offline, we still have both: its configuration reliably stored in Consul and all the events are stored in the Hazelcast cluster.

Looking Hazelcast cluster in the face

This is not necessary, but we can also monitor the state of Hubble event log with hface:

But.. how?

To peek a bit inside, here is how Consul overrides are merged with the Hubble config:

(defn create-config []
  (let [conf (load-config :merge [(from-system-props)
                                  (from-env)])]
    (->> (conf :consul)
         to-consul-path
         (envoy/merge-with-consul conf))))

And here is how Hazelcast creds are read from Vault:

(defn with-creds [conf at token]
  (-> (vault/merge-config conf {:at at
                                :vhost [:hubble :vault :url]
                                :token token})
      (get-in at)))

And these creds are only merged into a subset Hubble config that is used once to connect to the Hazelcast cluster:

(defstate mission-log :start (hz/client-instance (env/with-creds env/config
                                                                 [:hubble :log :hazelcast]
                                                                 [:hubble :log :auth-token]))
                      :stop (hz/shutdown-client mission-log))

In other words creds never get to env/config, they are only seen once at the cluster connection time, and only by Hazelcast client instance.

You can follow the hubble/env.clj to see how it all comes together.

While we attempt to be closer to a rocket science, it is in fact really simple to integrate Vault and Consul into a Clojure application.

The first step is made

We are operating Hubble and raising the human intelligence one nebula at a time.


09
Jan 17

cprop: internal tools worth opening

Most of the tools I push to github are created out of something that I needed at the moment but could not find a good alternative for. cprop was one of such libraries. It sat there on github all alone for quite some time, and was used only by several people on my team, until it was integrated into Luminus.

Suddenly I started talking to many different people who found flaws in it, or just wanted to add features. I learned a couple of interesting usages from Heroku guys, as well as the importance of merging creds with Vault, coexisting with configs from other fault tolerant and external services such as Consul and more.

One of the useful cprop features is merging configs from various sources. Which is quite an open extension point: i.e. once cprop does its work and comes up with an app config, you can decide how and what will be merged with it before it really becomes a thing. It can be a local map, a .properties file, more ENV vars, more system properties, more configs from anywhere else, including remote/external resources, result from which can be converted to an EDN map.

To enable this merge extension point cprop has several tools that in practice could be really useful on its own: i.e. can be used outside of the (load-config) scope.

Loading from various sources

Could be used as OS, file system and edn oriented I/O tools. Also quite useful in the REPL.

Loading form a classpath

(require '[cprop.source :as cs])
 
(cs/from-resource "path/within/classpath/to-some.edn")

Loads an EDN file anywhere from within a classpath into a Clojure map.

Loading from a file system

(require '[cprop.source :as cs])
 
(cs/from-file "/path/to/something.edn")

Loads an EDN file from a file system with an absolute path into a Clojure map.

Loading from system properties

(require '[cprop.source :as cs])
 
(cs/from-system-props)

Loads all the system properties into a Clojure map. i.e. all the properties that are set with
-Dkey=value, or programmatically set with System.setProperty(key, value), etc.

System properties are usually separated by . (periods). cprop will convert these periods to - (dashes) for key separators.

In order to create a structure in the resulting EDN map use _ (an underscore).

For example:

-Dhttp_pool_socket.timeout=4242

or

System.setProperty("http_pool_socket.timeout" "4242");

will be read into:

{:http
 {:pool
  {:socket-timeout 4242}}}

notice how . was used as - key separator and _ was used to “get-in”: i.e. to create a hierarchy.

Loading from OS (ENV variables)

(require '[cprop.source :as cs])
 
(cs/from-env)

Loads all the environment variables into a Clojure map.

ENV variables lack structure. The only way to mimic the structure is via use of an underscore character. The _ is converted to - by cprop, so instead, to identify nesting, two underscores can be used.

For example:

export HTTP__POOL__SOCKET_TIMEOUT=4242

would be read into:

{:http
 {:pool
  {:socket-timeout 4242}}}

Notice how two underscores are used for “getting in” and a single underscore just gets converted to a dash as a key separator. More about it, including type inference, in the docs

Loading from .properties files

(require '[cprop.source :as cs])
 
(cs/from-props-file "/path/to/some.properties")

Loads all the key value pairs from .properties file into a Clojure map.

The traditional syntax of a .properties file does not change. For example:

  • . means structure

four.two=42 would be translated to {:four {:two 42}}

  • _ would be a key separator

fourty_two=42 would be translated to {:forty-two 42}

  • , in a value would be a seq separator

planet.uran.moons=titania,oberon would be translated to {:planet {:uran {:moons ["titania" "oberon"]}}}

For example let’s take a solar-system.properties file:

## solar system components
components=sun,planets,dwarf planets,moons,comets,asteroids,meteoroids,dust,atomic particles,electromagnetic.radiation,magnetic field
 
star=sun
 
## planets with Earth days to complete an orbit
planet.mercury.orbit_days=87.969
planet.venus.orbit_days=224.7
planet.earth.orbit_days=365.2564
planet.mars.orbit_days=686.93
planet.jupiter.orbit_days=4332.59
planet.saturn.orbit_days=10755.7
planet.uran.orbit_days=30688.5
planet.neptune.orbit_days=60148.35
 
## planets natural satellites
planet.earth.moons=moon
planet.jupiter.moons=io,europa,ganymede,callisto
planet.saturn.moons=titan
planet.uran.moons=titania,oberon
planet.neptune.moons=triton
 
# favorite dwarf planet's moons
dwarf.pluto.moons=charon,styx,nix,kerberos,hydra
(cs/from-props-file "solar-system.properties")

will convert it to:

{:components ["sun" "planets" "dwarf planets" "moons" "comets"
              "asteroids" "meteoroids" "dust" "atomic particles"
              "electromagnetic.radiation" "magnetic field"],
 :star "sun",
 :planet
 {:uran {:moons ["titania" "oberon"],
         :orbit-days 30688.5},
  :saturn {:orbit-days 10755.7,
           :moons "titan"},
  :earth {:orbit-days 365.2564,
          :moons "moon"},
  :neptune {:moons "triton",
            :orbit-days 60148.35},
  :jupiter {:moons ["io" "europa" "ganymede" "callisto"],
            :orbit-days 4332.59},
  :mercury {:orbit-days 87.969},
  :mars {:orbit-days 686.93},
  :venus {:orbit-days 224.7}},
 :dwarf {:pluto {:moons ["charon" "styx" "nix" "kerberos" "hydra"]}}}

Converting for other sources

Most Java apps store their configs in .properties files. Most docker deployments rely on ENV variables. cprop has some open tools it uses internally to work with these formats to bring EDN closer to non EDN apps and sources.

EDN to .properties

(require '[cprop.tools :as t])
 
(t/map->props-file config)

Converts config map into a .properties file, saves the file under temp directory and returns a path to it.

For example, let’s say we have a map m:

{:datomic
 {:url
  "datomic:sql://?jdbc:postgresql://localhost:5432/datomic?user=datomic&password=datomic"},
 :source
 {:account
  {:rabbit
   {:host "127.0.0.1",
    :port 5672,
    :vhost "/z-broker",
    :username "guest",
    :password "guest"}}},
 :answer 42}
(t/map->props-file m)

would convert it to a property file and would return an OS/env specific path to it, in this case:

"/tmp/cprops-1483938858641-2232644763732980231.tmp"
$ cat /tmp/cprops-1483938858641-2232644763732980231.tmp
answer=42
source.account.rabbit.host=127.0.0.1
source.account.rabbit.port=5672
source.account.rabbit.vhost=/z-broker
source.account.rabbit.username=guest
source.account.rabbit.password=guest
datomic.url=datomic:sql://?jdbc:postgresql://localhost:5432/datomic?user=datomic&password=datomic

EDN to ENV

(require '[cprop.tools :as t])
 
(t/map->env-file config)

Converts config map into a file with ENV variable exports, saves the file under temp directory and returns a path to it.

For example, let’s say we have a map m:

{:datomic
 {:url
  "datomic:sql://?jdbc:postgresql://localhost:5432/datomic?user=datomic&password=datomic"},
 :source
 {:account
  {:rabbit
   {:host "127.0.0.1",
    :port 5672,
    :vhost "/z-broker",
    :username "guest",
    :password "guest"}}},
 :answer 42}
(t/map->env-file m)

would convert it to a property file and would return an OS/env specific path to it, in this case:

"/tmp/cprops-1483939362242-8501882574334641044.tmp"
$ cat /tmp/cprops-1483939362242-8501882574334641044.tmp
export ANSWER=42
export SOURCE__ACCOUNT__RABBIT__HOST=127.0.0.1
export SOURCE__ACCOUNT__RABBIT__PORT=5672
export SOURCE__ACCOUNT__RABBIT__VHOST=/z-broker
export SOURCE__ACCOUNT__RABBIT__USERNAME=guest
export SOURCE__ACCOUNT__RABBIT__PASSWORD=guest
export DATOMIC__URL=datomic:sql://?jdbc:postgresql://localhost:5432/datomic?user=datomic&password=datomic

notice the double underscores to preserve the original map’s hierarchy.

.properties to one level EDN

(require '[cprop.source :as cs])
 
(cs/slurp-props-file "/path/to/some.properties")

Besides the from-props-file function that converts .properties file to a map with hierarchy, there is also a slurp-props-file function that simply converts a property file to a map without parsing values or building a hierarchy.

For example this “solar-system.properties” file:

## solar system components
components=sun,planets,dwarf planets,moons,comets,asteroids,meteoroids,dust,atomic particles,electromagnetic.radiation,magnetic field
 
star=sun
 
## planets with Earth days to complete an orbit
planet.mercury.orbit_days=87.969
planet.venus.orbit_days=224.7
planet.earth.orbit_days=365.2564
planet.mars.orbit_days=686.93
planet.jupiter.orbit_days=4332.59
planet.saturn.orbit_days=10755.7
planet.uran.orbit_days=30688.5
planet.neptune.orbit_days=60148.35
 
## planets natural satellites
planet.earth.moons=moon
planet.jupiter.moons=io,europa,ganymede,callisto
planet.saturn.moons=titan
planet.uran.moons=titania,oberon
planet.neptune.moons=triton
 
# favorite dwarf planet's moons
dwarf.pluto.moons=charon,styx,nix,kerberos,hydra

by

(cs/slurp-props-file "solar-system.properties")

would be converted to a “one level” EDN map:

{"star" "sun",
 
 "planet.jupiter.moons" "io,europa,ganymede,callisto",
 "planet.neptune.moons" "triton",
 "planet.jupiter.orbit_days" "4332.59",
 "planet.uran.orbit_days" "30688.5",
 "planet.venus.orbit_days" "224.7",
 "planet.earth.moons" "moon",
 "planet.saturn.orbit_days" "10755.7",
 "planet.mercury.orbit_days" "87.969",
 "planet.saturn.moons" "titan",
 "planet.earth.orbit_days" "365.2564",
 "planet.uran.moons" "titania,oberon",
 "planet.mars.orbit_days" "686.93",
 "planet.neptune.orbit_days" "60148.35"
 
 "dwarf.pluto.moons" "charon,styx,nix,kerberos,hydra",
 
 "components" "sun,planets,dwarf planets,moons,comets,asteroids,meteoroids,dust,atomic particles,electromagnetic.radiation,magnetic field"}

06
Jan 17

Why Configuration Makes You Happier

The first programming language I learned was Basic. It was sometime ago, and my first computing beast at the time was ZX Spectrum.

I was in school, then another school, then college, then another college, etc. All that time application configuration was not really a thing I cared about. Who needs to create programs for different environments? Who needs pluggable resources? Why would you ever care about it?

A program that plays checkers, I wrote in Pascal in 1993, certainly did not need it. A text editor written in C a couple years after did not need it. A program in AI class in college, that created travel plans for robots with various sensors did not care about dev/qa/prod, since it was always “just prod”.

Then I suddenly started to make money creating programs. Well, not that suddenly, but it was a really strange feeling at first:

I can just do what I love, and also get paid?

Sweet.

All these people, so much care

But something did change in the way I approached creating software. All of a sudden it wasn’t just me who cared about programs I create, but other people too. In fact they cared so much that they were ready to give me their money.

But not just people, businesses too. Which meant I could no longer empower those businesses with my daily brain dumps the way I created programs before. At this point I should have had organized my thoughts in these polished blocks of software that had to be confirmed by other people before these blocks see the light.

Creating vs. Nurturing

Usually developers do not take configuration seriously. A large number of developers “are here to create software not to nurture it”. Some people prefer one set of patterns, others another set. We jam our ideas in one of those sets, and then “have to go” through this “bleak” DevOps phase to make sure our creation can survive. Sure Docker made it more fun, Consul made it more convenient, Ansible made it saner, but it is still “so far remote” from what developers love the most: creating.

Since besides programming I’ve always liked hardware and operating systems, DevOps intrigues me not a bit less than programming. But most of the software developers I’ve interacted with prefer others, specially dedicated, people to do DevOps work for them. Not just developers, all the larger organizations I’ve worked at, in addition to development, have build, test and operations teams.

Configuration. Connecting People.

It is interesting that the only true common ground that technically connects developers to build, test and operations people is configuration. Great documentation and communication help, but configuration connects.

The strength, quality and flexibility of this connection amplifies happiness of all the people involved in this enthralling journey from inception to production.

Since I am lucky to be one of the happiest developers on the planet :) I have “experience to believe” that happiness naturally navigates to quality.

That’s why I see application configuration as a first class citizen in the world of software development. That’s why I try to make it better where I can.


21
Nov 16

No Ceremony

DI framework makes sense for OOP


In Java (or most OOP languages):

  • Objects need to be created
  • In most of the cases they are stateful
  • Dependencies (state) often need to be injected
  • Order of the creation needs to be determined/given for the injection to work

Hence an IoC framework such as Spring makes perfect sense (in Java):

for example creating a dataSource, a sessionFactory and a txManager in Spring

DI framework “hurts functionally”


In Clojure (or similar functional languages):

  • Explicit objects with state and behavior are discouraged
  • Code organized in namespaces and small functions
  • Functions are directly referenced across modules/namespaces

DI/IoC framework would hurt all of the above: “beans” with functionality can only be accessed via creating other framework managed “beans”: very much like a need to create an Object to access another Object’s stateful functionality.

Business


Let’s say we need to find a user in a database.

we would need to connect to a database:

;; in reality would return a database connection instance
(defn connect-to-database [{:keys [connection-uri]}]
  {:connected-to connection-uri})

and find a user by passing a database connection instance and a username:

;; pretending to execute a query
 
(defn find-user [database username]
  (if (:connection database)
    (do
      (println "running query:"
               "SELECT * FROM users WHERE username = "
               username "on" database)
      :jimi)
    (throw (RuntimeException. (str "can't execute the query => database is disconnected: " database)))))

examples are immediately REPL’able, hence we pretend to connect to a database, and pretend to execute the query, but the format and ideas remain.

Application Context


One way to use a stateful external resource(s) such as a database in the find-user function above, is to follow the Spring approach and to define an almost identical to Spring Lifecycle interface:

(defprotocol Lifecycle
  (start [this] "Start this component.")
  (stop [this] "Stop this component."))

Then define several records that would implement that interface.

By the way, Clojure records are usually used with methods (protocol implementations) that makes them “two fold”: they complect data with behavior, very much like Objects do. (Here is an interesting discussion about it)

(defrecord Config [path]
  Lifecycle
  (start [component]
    (let [conf path] ;; would fetch/edn-read config from the "path", here just taking it as conf for the sake of an example
      (assoc component :config conf)))
  (stop [component]
    (assoc component :config nil)))
(defrecord Database [config]
  Lifecycle
  (start [component]
    (let [conn (connect-to-database config)]
      (assoc component :connection conn)))
  (stop [component]
    (assoc component :connection nil)))
(defrecord YetAnotherComponent [database]
  Lifecycle
  (start [this]
    (assoc this :admin (find-user database "admin")))
  (stop [this]
    this))

Now as the classes (records above) are defined, we can create an “application context”:

(def config (-> (Config. {:connection-uri "postgresql://localhost:5432/clojure-spring"})
                start))
 
(def db (-> (Database. config) start))
 
(def yet-another-bean (-> (YetAnotherComponent. db) start))
;; >> running query: SELECT * FROM users WHERE username =  admin on #boot.user.Database{:config {:connection-uri postgresql://localhost:5432/clojure-spring}, :connection {:connected-to postgresql://localhost:5432/clojure-spring}}

and finally we get to the good stuff (the reason we did all this):

(:admin yet-another-bean)
;; >> :jimi

a couple of things to notice:

* Well defined order *

Start/stop order needs to be defined for all “beans”, because if it isn’t:

(def db (-> (Database. config)))
(def yet-another-bean (-> (YetAnotherComponent. db) start))
;; >> java.lang.RuntimeException: 
;;      can't execute the query => database is disconnected: boot.user.Database@399337a0
* Reality is not that simple *

All the “components” above can’t be just created as defs in reality, since they are unmanaged, hence something is needed where all these components:

  • are defined
  • created
  • injected into each other in the right order
  • and then destroyed properly and orderly

Library vs. Framework


This can be done as a library that plugs in each component into the application on demand / incrementally. Which would retain the way the code is navigated, organized and understood, and would allow the code to be retrofitted when new components are added and removed, etc. + all the usual “library benefits”.

OR

It can be done as a framework where all the components live and managed. This framework approach is what Spring does in Java / Groovy, which in fact works great in Java / Groovy.

.. but not in Clojure.

Here is why: you can’t really do (:admin yet-another-bean) from any function, since this function needs:

: access to yet-another-bean
: that needs access to the Database
: that needs access to the Config
: etc..

Which means that only “something” that has access to yet-another-bean needs to pass it to that function. That “something” is.. well a “bean” that is a part of the framework. Oh.. and that function becomes a method.

Which means the echo system is now complected: this framework changes the way you navigate, :require and reason about the code.

It changes the way functions are created in one namespace, :required and simply used in another, since now you need to let the framework know about every function that takes in / has to work with a “component”.

This is exactly what frameworks mean
When they talk about requiring a “full app buy in”
And while it works great for Java and Spring
In Clojure you don’t create a bean after bean
You create a function and you’re “keeping it clean”

“Just doing” it


In the library approach (in this case mount) you can just do it with no ceremony and / or changing or losing the benefits of the Clojure echo system: namespaces and vars are beautiful things:

(require '[mount.core :as mount :refer [defstate]])
(defstate config :start {:connection-uri "postgresql://localhost:5432/clojure-spring"})
 
(defstate db :start {:connection (connect-to-database config)})
;; #'boot.user/db
(mount/start #'boot.user/db)
;; {:started ["#'boot.user/db"]}
(find-user db "admin")
;; running query: SELECT * FROM users WHERE username =  admin on
;; {:connection {:connected-to postgresql://localhost:5432/clojure-spring}}
 
;; :jimi

done.

no ceremony.

in fact the db state would most likely look like:

(defstate db :start (connect-to-database config)
             :stop (disconnect db))

Managing Objects


While most of the time it is unnecessary, we can use records from the above example with this library approach as well:

boot.user=> (defstate db :start (-> (Database. config) start)
                         :stop (stop db))
#'boot.user/db
 
boot.user=> (defstate config :start (-> (Config. {:connection-uri "postgresql://localhost:5432/clojure-spring"}) start)
                             :stop (stop config))
#'boot.user/config

and they become intelligently startable:

boot.user=> (mount/start)
{:started ["#'boot.user/config" "#'boot.user/db"]}
 
boot.user=> (find-user db "admin")
;; running query: SELECT * FROM users WHERE username =  admin on
;; #boot.user.Database{:config #boot.user.Config{:path {:connection-uri postgresql://localhost:5432/clojure-spring},
;; :config {:connection-uri postgresql://localhost:5432/clojure-spring}},
;; :connection {:connected-to nil}}
 
;; :jimi

and intelligently stoppable:

boot.user=> (mount/stop)
{:stopped ["#'boot.user/db" "#'boot.user/config"]}
 
boot.user=> (find-user db "admin")
 
;; java.lang.RuntimeException: can't execute the query => database is disconnected:
;;   '#'boot.user/db' is not started (to start all the states call mount/start)

Easy vs. Simple


While usually a great argument, this is not it.

In this case this is pragmatic vs. dogma