04
Apr 13

Software as Space-Time Continuum

Is Software for Real?


Here is a question and an answer. They both propose a sequence of logical conclusions, but the answer is of course a “stronger” form of communicating this sequence:

Q: Do you create computer programs? Do you design them based on a business/problem domain? Do you understand this domain? Does this domain apply to the real world? Do you understand the real world? Do you create computer programs based on your understanding of the real world? Do you need to?

A: If you create computer programs they should be based on a business/problem domain. Hence you should understand this domain. Most of the problems/businesses are directly applicable/modeled after the real world. Hence you should understand the real world. Therefore you should create computer programs that apply/modeled after the real world (as much as possible, hence then you can “implement” the domain more closely / find a solution to a problem that can be directly applied to the real world).

While Q and A both follow the same logical sequence, I’d like to explore the Q of things. I’d like to question whether it is really true that we need to design/model our software/languages/libraries after the real world. And if it is true how close we need to “get the real world right” in our software design.

Intuition vs. Intellect


The first instinct, or my “natural intuitive power”, suggests that it is really the case. If we do model after the real world then we are building systems, evolving our metal and software friends who can potentially just coexist with us in our world following the laws of nature. Another supportive argument is a “business problem” that needs to be solved with software. For example “empowering businesses to predict nature”. Of course in order to predict something, we have to understand it and its environment. Currently we do that by taking petabytes of data and slapping it with something like Random Forest, and looking at probabilities… But what if our software just “knew” how things work, not based on historical data, but based on metaphysics, i.e. based on “what the world is and how it really works”.

While I am still not convinced we have to model our software after the real world “all the time”, I do lean towards this idea for the “most problems” I get to work with. In the least number of cases, for example when I get to write a stress test that sends billions of option quotes over the wire, I really don’t care whether it is an “ugly” imperative for loop or a “beautiful” pure function ‘send’ that is mapped over an immutable lazy sequence of quotes. This of course is an over simplification: firstly I should care, and it should be an imperative for loop, since it is faster, and secondly mapping functions over a sequence of things is not really a true modeling of a real world. Or is it?

“A Mad Tea-Party”


Mad Hatter
If modeling after the real world is important, it is obvious we can’t omit “time” dimension when designing and creating software. But most of the time (no pun intended) we do just that. Most widely known/deployed software design approaches/patterns and instruments (e.g. programming languages) are stuck in “Wonderland“:

“Time has punished the Hatter by eternally standing still at 6 pm, and therefore there is always tea time”

Think about a C++/Ruby/Python/Java.. “object”. Does it have a notion of time? No. It may seem it has a notion of “now”, but when it is (its state) teared apart by multiple threads at the same time there is no notion of “now” anymore, and the poor object suffers terribly.


There is a classical talk by Rich Hickey “Are We There Yet?”, where he presents an “Epochal Time Model”:

Epochal Time Model

Where he introduces time as succession of states, and advocates immutability, i.e. the value of state “V1″ is forever “V1″ at that particular point in time. And we only get to state with a value “V2″ after applying a function “F” to the state “V1″. Theoretically if we did support this notion of time, we could just talk to this “timeline”, which Rich calls an identity, and ask “what was its state at that point in time?” or more precisely “when/at which point in time was the state’s ‘value’ equal to “V1″?”. Practically we can do it with Datomic.

Does Time Stutter?


It’s more of a metaphysical thought, but if we are after true representation of the real world, does the “Epochal Time Model” really reflect how the world “changes”? If you wave your hand, does it just hyperjump from a state A to a state B? How granular is the path from A to B? Or is it absolutely “continuous” in which case there is no granularity, i.e. no smallest time interval? If you observe someone waving her hand, it is a different story: “the human eye and its brain interface, the human visual system, can process 10 to 12 separate images per second”, so here is a “state succession” interval. But what does really happen? Is the hand itself moving one planck at a time? Is it moving “de Broglie wavelength” at a time? Is it even important when modeling software?

Do You “Function” Me?


Rich, and many other functional programming advocates, “sell” the idea of functional programming by stressing two things:

1. “This is how things work in the real world…”
2. “It is much easier to reason about…”

While the latter is mostly human based (how “we think”), it is connected to the #1. Since we think one way or another because our thinking is limited by what we know (+ imagination [this is controversial :]). And quite possibly our thinking is based on what we know “about the real world”, which we are part of.

It does seem to be important to model after the real world [ maybe :]. But why now? Why not 20 or 50 years ago? Did any of the programming languages actually aim to even connect to the real world before?

1957 Fortran: concepts included easier entry of equations into a computer

1958 Lisp: created as a practical mathematical notation for computer programs, influenced by the notation of Alonzo Church’s lambda calculus.

1958 ALGOL: was the standard method for algorithm description used by the ACM in textbooks and academic works

1962 Simula: created as a special purpose programming language for simulating discrete event systems

Ok, looks like the closest match is Simula that at least was created to simulate “something”. It is an ironic coincidence that it also is the source of Object Orientation (and actor model), but that is besides the point.

1971 Smalltalk-71 was created by Ingalls in a few mornings on a bet that a programming language based on the idea of message passing inspired by Simula could be implemented in “a page of code.”

1972 Prolog was motivated in part by the desire to reconcile the use of logic as a declarative knowledge representation language with the procedural representation of knowledge

1972 C was designed to be compiled using a relatively straightforward compiler, to provide low-level access to memory, to provide language constructs that map efficiently to machine instructions, and to require minimal run-time support.

1973 ML was conceived to develop proof tactics in the LCF theorem prover (whose language, pplambda, a combination of the first-order predicate calculus and the simply typed polymorphic lambda calculus, had ML as its metalanguage).

1975 Scheme started as an attempt to understand Carl Hewitt’s Actor model, for which purpose Steele and Sussman wrote a “tiny Lisp interpreter” using Maclisp and then “added mechanisms for creating actors and sending messages.”

1979 C++ remembering his Ph.D. experience, Stroustrup set out to enhance the C language with Simula-like features

1986 Erlang was designed with the aim of improving the development of telephony applications

1991 Java was originally designed (as “Oak”) for interactive television, but it was too advanced for the digital cable television industry at the time.

1995 JavaScript was originally implemented as part of web browsers so that client-side scripts could interact with the user, control the browser, communicate asynchronously, and alter the document content that was displayed

Looking at major languages created before “10/20 years ago”, their creational purpose does not seem to be “real world” related. For example the first two are clearly strictly math related.

Emotional Mathematics


So is it about Math? After all, we are “Computer Scientists” and although most of “Computer Programmers” are (unfortunately) very far from math, the science behind computers is very much based on it. Is the goal of mathematics, as a science, to model the real world? Let’s look how we “know to define” math:

math·e·mat·ics  [math-uh-mat-iks]
noun
1. The systematic treatment of magnitude, relationships between figures and forms, 
   and relations between quantities expressed symbolically.
2. A group of related sciences, including algebra, geometry, and calculus, 
   concerned with the study of number, quantity, shape, and space 
   and their interrelationships by using a specialized notation
3. Mathematical operations and processes involved in the solution of a problem 
   or study of some scientific field

Complimenting with “the current world intellect” (wikipedia):

Mathematics "has no generally accepted definition". 
Aristotle defined mathematics as "the science of quantity", 
Benjamin Peirce's "the science that draws necessary conclusions", 
                  "Mathematics is the mental activity which consists 
                  in carrying out constructs one after the other.",
Haskell Curry defined mathematics simply as "the science of formal systems", 
where a formal system is a set of symbols, or tokens, 
and some rules telling how the tokens may be combined into formulas.

From the above definitions it is hard to say whether the “real world modeling” is what Math is after, although we definitely know that Math is there to “explain” things in the world, oh.. wait, that’s Physics :]

If we are after modeling the real world (if it is indeed important), should we try to be as close as possible to the “reality”? Or should we just take the real world in by pieces and convenient definitions that suit a certain process, a problem at hand, programming style, hype, tech movement, etc.? It feels right to be close to the real world, since we are a part of it. It feels right to be close to ourselves. Are we just selfish? :]


16
Jan 13

Backup and Reset Nexus 4: Cracked and Locked Screen

Story of a Flaw


Nexus 4 has an invisible design flaw: the back side of a phone is just glass, which, while looks pretty to some, makes it slide down and fall from pretty much any surface. Another unfortunate caveat is the curved unprotected glass on the front. If the phone falls down, it’ll most likely crack…

And here I present a 20 day old brand new Nexus 4:

nexus screen

After the unfortunate flight down, the screen is done and stopped responding to any tapping/touching. In addition the screen is locked with a gesture pattern. The phone is not rooted and with a locked bootloader: exactly how it comes from Google.

Android iTunes


At this point I need to back it up and factory reset the phone in order to ship it safely for repairs. Since Google is moving towards Apple’s “lock down and control” system design, the phone can’t be just plugged in via USB to back up its files (e.g. via USB Mass Storage), now days there is MTP. As Google puts it:

“We did it because we wanted to be able to merge the “public shared storage” (i.e. for music and photos) with the internal private app storage.”

Yea, ok, while it does have a technical merit, it really brings user experience closer to the “iTunes level”, not as bad yet, but quite close. In any case, since the Nexus 4 screen is locked, Android iTunes complains that until that screen is unlocked it won’t be showing any files. Yea, thanks for the security Apple.. I mean Google.

Unlocking the Screen


There is a great tool written by Alexanre which allows to control an Android device remotely from Mac/PC. In other words a regular keyboard/mouse could be used to control the Android device. The tool is called Android Screencast. The only caveat, the phone needs to be rooted in order to be controlled by this tool. I got to see the phone’s screen on my Mac:

android screencast

In order to make Mac keyboard to work I need to do “chmod 777 /data/dalvik-cache” as root (e.g. “su”), but the phone is not rooted, bummer. The tool is great though.

Btw, if the phone was rooted, I could simply do: “adb shell rm /data/system/gesture.key” to get rid of the screen lock.

Backing Up What’s Dear


One thing I knew I could do for sure is to backup the “sdcard”. I knew this because of two things: I have “USB Debugging” on, which means I can use adb, and “sdcard” is not owned by root, which means it can be “pulled”/”read” by adb. Hence the first step is clear:

./adb pull /sdcard/ /destination

At this point all the pictures / videos / music, etc.. are backed up. Now I need to backup my SMS, contacts, etc.. Information that lives in Android “databases”. For example contacts usually live here “/data/data/com.android.providers.contacts/databases”. Which can’t be simply “pulled” as pitures, since they are protected, and hence cannot be read by adb directly.

Another tool to the rescue: Moborobo. The only caveat it is a Windows tool, and in order to install it I needed to power up my virtual box. The tool is pretty neat and quite powerful. Unfortunately all it could backup for me was SMS, everything else failed, but it is one step further nevertheless + I have most of my contacts gmail synced. Apps would be nice to backup, but they can be reinstalled manually later on.

Waving Goodbye or The Factory Reset


Now the interesting bit: the factory reset. It was not really straightforward, since most of googled instructions either talk about doing it from within a phone by tapping through settings, which is not an option in this case, or by using Home / Back buttons which are also a part of the screen that does not work. But after some minutes “the way” revealed itself.

Firstly the phone needs to be rebooted in “Recovery mode”, which can be done through fastboot, in case “USB Debugging” is not enabled:

Disconnect the phone. Power it down (by holding the Power button). Reboot into fastboot mode by holding the Volume Down button and Power:

fastboot start

The “Recovery Mode” is two Volume button clicks away (confirm with a Power button once the mode is selected):

fastboot recovery

In case “USB Debugging” is enabled, the easier way to get to this step would be via adb:

./adb -d reboot recovery

Which will boot into:

recovery mode

From here press and hold Power button and then press Volume Up, which will get into:

android factory reset

Now all that needs to be done is to mentally wave good bye to all the data and confirm the reset:

confirm factory reset

And the Award Goes To…


All the pictures are courtesy of my good old Nexus S, which has fell down countless number of times over the years and have a couple of scratches on the back. Yes, “they” knew how to build phones for real in the good old days..


23
Nov 12

Convert HTML5 FileList to Clojure Vector

When AJAX uploading data/files via input elements with ClojureScript, HTML5 returns FileList which is not really a list nor an array, hence can not be converted to Clojure by simply calling “js->clj“, although the usage is pretty similar to an array:

// uploadData is a form element
// fileChooser is input element of type 'file'
var file = document.forms['uploadData']['fileChooser'].files[0];

But as it is always in the Clojure world, there is a solution. This one is not trivial, but after spending almost 30 minutes, it surfaced under my finger tips:

(defn toArray [js-col]
  (-> (clj->js []) 
      (.-slice)
      (.call js-col)
      (js->clj)))

Now in ClojureScript the “(toArray fileList)” will return a classic seq that can be loved and iterated.


06
Jul 12

Integrating Font Awesome with Bootswatch

Assuming you know what Twitter Bootstrap, Font Awesome and Bootswatch are, here is how to integrate the happy trio.

Install a LESS Compiler


In order to glue everything together we would need a LESS compiler. The easiest way to install a LESS compiler is via Node Package Manager (a.k.a npm). If you need to install NPM, you can choose a “zero line” or a “fancy install” from npmjs.org

Once NPM is good to go, installing a LESS compiler is as simple as:

$ npm install --global less

Connect Font Awesome with Twitter Bootstrap


Follow first 3 steps from integration instructions from Font Awesome people. In case you plan not to have your fonts in “../font”, open “font-awesome.less”, and change the path accordingly:

@fontAwesomePath: '../font';

Less Twitter more Bootswatch


At this point you have Font Awesome integrated with Twitter Bootstrap’s LESS. Now we need to download Bootswatch LESS files and merge them with Twitter’s. Assuming you are in “twitter-bootstrap/less” directory…

  • Go to one of the Bootswatch themes and download “variabless.less” and “bootswatch.less”
  • It would look similar to:

  • Now replace the default “variables.less” with one from Bootswatch
  • Copy “bootswatch.less” to the same directory as the other LESS files (e.g.”twitter-bootstrap/less”)
  • Open up bootstrap.less and add the line “@import “bootswatch.less”;” just before the last “utilities” import statement:

  • @import "carousel.less";
    @import "hero-unit.less";
    @import "bootswatch.less"; // <<<<<< add this line
     
    // Utility classes
    @import "utilities.less"; // Has to be last to override when necessary

Creating the One and Only “bootstrap.css”


The final step is to compile the freshly baked “bootstrap.less” into a “bootstrap.css”:

$ lessc --compress ./less/bootstrap.less > bootstrap.css
Big Thanks

to Thomas Park, the author of Bootswatch, who helped to put the above steps together.


08
May 12

Scala: Fun with CanBuildFrom

As I found out through trying.. It may not be an easy task to explain Scala’s CanBuildFrom.

Before I dive into a quick gist, I think it’d be helpful to mention the best explanation of what happens behind the CanBuildFrom’s scenes that can be found on Stack Overflow in this answer.

The gist is, Scala has multiple layers of collections extending different capabilities. Let’s look at one such capability: TraversableLike, that most of the collections implement. Since let’s be honest, a collection is not very useful if it cannot be traversed. One of the most “famous” methods from TraversableLike is “map”:

def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That = {
  val b = bf(repr)
  b.sizeHint(this) 
  for (x <- this) b += f(x)
  b.result
}

Despite of the fact that it is “Scala looking”, it is actually quite simple => takes each element of a collection that it is called on, applies a provided function “f” to each element of the collection, and returns another collection “That” as the result.

The interesting bit here is:

... (implicit bf: CanBuildFrom[Repr, B, That]) ...

which seems a bit awkward (aren’t all Scala implicits..). “implicit” just means that a Scala compiler will search for this type “CanBuildFrom[Repr, B, That]” anywhere in the “scope”. In this case it’ll first look whether there is an “implicit CanBuildFrom[Repr, B, That]..” defined on the collection that the “map” is invoked on, then it’ll look in its super type/class, etc.. until it finds it.

Once it finds it, it’ll use that as a “builder” for “That” resulting collection. The way it looks for it though is not just “let me look if “CanBuildFrom” is there”, but also “let me look if “CanBuildFrom” is there that is parametrized with a given ‘Repr’ (e.g. collection) and ‘B’ (element type)”.

Here is a quick example. Let’s say we have a BitSet:

scala> import scala.collection.immutable.BitSet
import scala.collection.immutable.BitSet
 
scala> val bits = BitSet( 42, 84, 126 )
bits: scala.collection.immutable.BitSet = BitSet(42, 84, 126)

Once we map over this BitSet with a function (“/ 2L”) that produces something different than “Int”s as elements, a BitSet can no longer handle the result (BitSet can only have Ints as its elements) hence a Scala compiler jumps to a super class of a BitSet, which is a Set, and uses its “CanBuildFrom”, since it is a bit more generic:

implicit def canBuildFrom[A]: CanBuildFrom[Coll, A, Set[A]] = setCanBuildFrom[A]

Here “A” matches a “Long” that is now (after a map was applied) a type of resulting elements:

scala> val aintBits = bits.map( _ / 2L )
aintBits: scala.collection.immutable.Set[Long] = Set(21, 42, 63)

But we want our BitSet back.. Give me my BitSet back I say:

scala> val bitsAgain = aintBits.map( _.toInt )
val bitsAgain = aintBits.map( _.toInt )
bitsAgain: scala.collection.immutable.Set[Int] = Set(21, 42, 63)

But no, it does not.. And how would it know I need a BitSet. Hmm.. Give me my BitSet I urge you:

scala> val bitsAgain = aintBits.map( _.toInt ).asInstanceOf[BitSet]
val bitsAgain = aintBits.map( _.toInt ).asInstanceOf[BitSet]
java.lang.ClassCastException: scala.collection.immutable.Set$Set3 cannot be cast to scala.collection.immutable.BitSet
                              ... ...

But no, it does not..

Logically if “CanBuildFrom” is what got us a Set from a BitSet in the first place, can it be used to get a BitSet back?

Well, let’s see. We know that we have a Set of Longs (Set[Long]), where each element after applying a map function “toInt” is of type “Int”, and we need a BitSet back. Let’s create our own “CanBuildFrom” that does just that:

scala> import scala.collection.generic.CanBuildFrom
import scala.collection.generic.CanBuildFrom
 
scala> val setToBitSetBuilder = new CanBuildFrom[Set[Long], Int, BitSet] { def apply(from: Set[Long]) = this.apply(); def apply() = BitSet.newBuilder }
setToBitSetBuilder: java.lang.Object with scala.collection.generic.CanBuildFrom[Set[Long],Int,scala.collection.immutable.BitSet] = $anon$1@60bc1caa

Now let’s use it:

scala> val bitsAgain = aintBits.map( _.toInt )( setToBitSetBuilder )
val bitsAgain = aintBits.map( _.toInt )( setToBitSetBuilder )
bitsAgain: scala.collection.immutable.BitSet = BitSet(21, 42, 63)

And woo hoo, the “bitsAgain” is truly a BitSet again. What really happened, a Scala compiler was looking for an implicit “CanBuildFrom” for a collection “Set[Long]” and the (resulting) element type “Int”. And we just handed such a thing (“setToBitSetBuilder”) to it. “setToBitSetBuilder” just returns a “builder” that is used to build a resulting collection. In this case we use Scala’s own “BitSet.newBuilder”.

To make it more readable, a pimp my library pattern can be later used => aintBits.to[BitSet].

This is rather a quick overview of what “CanBuildFrom” is, and it does not really discuss a function currying which is used by “map(A)(B):C”, skims over implicits, etc.. But it gives a little insight to where and how “CanBuildFrom” can be used.


24
Jan 12

Clojure: Perfect Language for Perfect Numbers

In number theory, a perfect number is a positive integer that is equal to the sum of its proper positive divisors, that is, the sum of its positive divisors excluding the number itself (also known as its aliquot sum). Equivalently, a perfect number is a number that is half the sum of all of its positive divisors (including itself) i.e. σ1(n) = 2n.

After watching a Functional Thinking talk by Neal Ford, needed to give it some clojure…

(ns perfect-numbers.core)
 
(defn is-factor? [divident divisor] 
   (zero? (mod divident divisor)))
 
(defn factors [number] 
   (distinct                                      ; no dups for perfect squares (e.g. 16, 64, 49)
      (mapcat #(when (is-factor? number %)        ; when a factor is found
         [(/ number %) %])                        ; return a pair of [number/factor, factor]
         (range 1 (inc (Math/sqrt number)) 1))))  ; go upto a (sqrt number) inclusively
 
(defn perfect? [number]
   (= (reduce + (factors number)) (* 2 number)))  ; check if sum of factors = 2*N
 
(def perfect-numbers
   (filter perfect? (nnext (range))))

Let’s give it a spin:

$ lein repl
REPL started; server listening on localhost port 61776
user=> (use 'perfect-numbers.core)
nil
user=> (take 4 perfect-numbers)
(6 28 496 8128)
user=>

Let’s time it:

user=> (time (doall (take 4 perfect-numbers)))
(6 28 496 8128)
"Elapsed time: 0.123 msecs"
user=> 
user=> (time (doall (take 5 perfect-numbers)))
(6 28 496 8128 33550336)
"Elapsed time: 1.8967263496E7 msecs" ( 5 hours 16 minutes )

11
Oct 11

AKKA Scheduler: Sending Message to Actor’s Self on Start

Akka has a little scheduler written using actors. This can be convenient if you want to schedule some periodic task for maintenance or similar. It allows you to register a message that you want to be sent to a specific actor at a periodic interval.

How Does AKKA Schedule Things?


Behind the scenes, AKKA scheduler relies on “ScheduledExecutorService” from the “java.util.concurrent” package. Hence when AKKA Scheduler needs to schedule “a message sent to an actor, given a certain initial delay and interval”, it just wraps the task of sending a message in a “java.lang.Runnable”, and uses a “ScheduledExecutorService” to schedule it:

service.scheduleAtFixedRate( createSendRunnable( receiver, message, true ), 
                             initialDelay, 
                             delay, 
                             timeUnit).asInstanceOf[ScheduledFuture[AnyRef]]

“Heartbeat” Actor


Let’s look at the example of scheduling a message that should be sent to an Actor’s “self” as the Actor on start. Why? Because it is a cool use case : )

“Heartbeat” would be an ideal example of such use case => “When a ‘Hearbeat Actor’ starts, it should start sending heartbeats with a given interval (e.g. every 2 seconds)”

Creating a Message

First we need to create a message that will be scheduled to be sent every so often. We’ll call it a “SendHeartbeat” message:

sealed trait HeartbeatMessage
case object SendHeartbeat extends HeartbeatMessage
Heartbeat of the Hollywood

Since sending a heartbeat needs to be scheduled as the Actor starts, the scheduling logic should be placed in the AKKA “preStart()” hook, which is called right before the Actor is started:

override def preStart() {
 
  logger.debug( "scheduling a heartbeat to go out every " + interval + " seconds" )
 
  // scheduling the task (with the 'self') should be the last statement in preStart()
  scheduledTask = Scheduler.schedule( self, SendHeartbeat, 0, interval, TimeUnit.SECONDS )
}

Another thing to note, all the other non scheduling on start logic, if any, should go before the call to the scheduler, otherwise the task will not be scheduled.

Heartbeat should also be stoppable. We could have called “Scheduler.shutdown()” in Actor’s “postStop()”, but first, this would stop all the other tasks that were potentially scheduled by others, and second, it will result in a very dark AKKA magic behavior.

Instead, the heartbeat task itself should be cancelled => which is lot cleaner than calling for the dark magic for no good reason:

override def postStop() {
  scheduledTask.cancel( true )
}

Having the above two in mind here is the Hollywood Heartbeat himself:

class Heartbeat ( val interval: Long ) extends Actor {
 
  private val logger = LoggerFactory.getLogger( this.getClass )
  private var scheduledTask: ScheduledFuture[AnyRef] = null
 
  override def preStart() {
 
    logger.debug( "scheduling a heartbeat to go out every " + interval + " seconds" )
 
    // scheduling the task (with the 'self') should be the last statement in preStart()
    scheduledTask = Scheduler.schedule( self, SendHeartbeat, 0, interval, TimeUnit.SECONDS )
  }
 
  def receive = {
 
    case SendHeartbeat =>
 
      logger.debug( "Sending a hearbeat" )
      // sending a heartbeat here.. socket.write( heartbeatBytes )
      true
 
    case unknown =>
 
      throw new RuntimeException( "ERROR: Received unknown message [" + unknown + "], can't handle it" )
  }
 
  override def postStop() {
    scheduledTask.cancel( true )
  }
}
Making Sure the Heart is Beating

We’ll use ScalaTest to run the beast. This is more of a runner than a real test, since it does not really test for anything, but it proves the point:

class HeartbeatTest extends WordSpec  {
 
  val heartBeatInterval = 2
 
  "it" should {
 
    "send heartbeats every" + heartBeatInterval + " seconds" in {
       val heartbeat = actorOf ( new Heartbeat( heartBeatInterval ) ).start()
       Thread.sleep( heartBeatInterval * 1000 + 3000 )
       heartbeat.stop()
     }
  }
}

And.. We are “In the Money“:

17:02:54,118 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDOUT] to Logger[ROOT]
 
Sending a hearbeat
Sending a hearbeat
Sending a hearbeat
 
Process finished with exit code 0

09
Sep 11

ØMQ and Google Protocol Buffers

Using ZeroMQ API, we can both: queue up and dispatch / route Google Protobuf messages with X lines of code, where X approaches to zero.. Well, it is ZeroMQ after all.

Google Protocol Buffers Side


Say we have a “Trade” message that is described by Google protobufs as:

message TradeMessage {
 
    required string messageType = 1;
 
    required int32 maxFloor = 2;
    required int32 qty = 3;
    required int32 accountType = 4;
    required int32 encodedTextLen = 5;
    ... ...
}

Let’s assume that our “messageType” is always 2 bytes long. Then Google Protocol Buffers will encode it as a sequence of bytes, where first two bytes will determine protobuf’s field type (10) and field lenght (2), and the rest will be the actual UTF-8 byte sequence that would represent a message type. Let’s make “TR” a message type for “Trade” messages.

Once a Google protobuf “Trade” message is generated it will start with a message type in a following format:

byte [] messageType = { 10, 2, 84, 82 };

Where ’84′ and ’82′ are ASCII for ‘T’ and ‘R’.

Now let’s say we have a some kind of “TradeGenerator” ( just for testing purposes to simulate the actual feed / load ) that will produce Google Protobuf encoded “Trade” messages:

public static Trade.TradeMessage nextTrade() {
 
    return
        Trade.TradeMessage.newBuilder()
                      .setMessageType( "TR" )
                      .setAccountType( 42 )
                         ... ... ...
    }

Note that it sets the message type to “TR” as we agreed upon.

ØMQ Side


Sending “Trade” messages with ØMQ is as simple as drinking a cup of coffee in the morning:

ZMQ.Context context = ZMQ.context( 1 );
ZMQ.Socket publisher = context.socket( ZMQ.PUB );
publisher.bind( "tcp://*:5556" );
 
// creating a static trade => encoding a trade message ONCE for this example
Trade.TradeMessage trade = TradeGenerator.nextTrade();
 
while ( true ) {
    publisher.send( trade.toByteArray(), 0 );
}

Consuming messages is as simple as eating a bagel with that coffee. The interesting part (call it “the kicker”) is that we can actually subscribe to a “TR” message type (first 4 bytes) using just ZeroMQ API:

ZMQ.Context context = ZMQ.context( 1 );
ZMQ.Socket subscriber = context.socket( ZMQ.SUB );
subscriber.connect( "tcp://localhost:5556" );
 
// subscribe to a Trade message type => Google Proto '10' ( type 2 )
//                                      Google Proto '2'  ( length 2 bytes )
//                                             ASCII '84' = "T"
//                                             ASCII '82' = "R"
 
byte [] messageType = { 10, 2, 84, 82 };
subscriber.subscribe( messageType );
 
for ( int i = 0; i < NUMBER_OF_MESSAGES; i++ ) {
 
    byte[] rawTrade = subscriber.recv( 0 );
 
    try {
        Trade.TradeMessage trade = Trade.TradeMessage.parseFrom( rawTrade );
        assert ( trade.getAccountType() == 42 );
    }
    catch ( InvalidProtocolBufferException pbe ) {
        throw new RuntimeException( pbe );
    }
}

Now all the “TR” messages will actually go to this subscriber.

NOTE: Alternatively, you can use a “Union” Google Protocol Buffers technique (or extensions) in order to encode all different message types: here is how.


28
Jul 11

One Small Citrus for Man; One Giant Leaf for Mankind

– Who does not like fruits!?
– Well.. that depends. Are you talking about “a structure of a plant that contains its seeds?”
– No silly, of course not! I am talking about data bases!

© by my brain

The Right Fruit for the Right Job


Now days in order to be competent in a world of Big Data you must get at least a Masters in Fruits, or as I call it an “MF Degree”. Why!? Well how about’em fruits:









You see? Very important to know which fruit to choose for your next {m|b|tr}illion dollar gig.

To expand my MF degree I love doing research in a big data space, and as I was walking around #oscon 2011 expo, I was really pleased to discover a new sort of fruits that I have not heard of before. You would think “yea, ok.. YAFDB: Yet Another Fruit DB”, but no => this one is different => this one has a kicker, this one has a.. “leaf”!

Leafing A for C


You may notice that the above fruit DBs missing that “power of the leaf”, and look rather leafless. And in the world of NoSQL databases fruit without a leaf has somewhat inconsistent properties. Well, let’s rephrase that: eventually the leaf will grow, so we can say that eventually those fruits will look consistent.

But what if a NoSQL database already came with leaf attached to it? You can’t argue that if it did, it would have a complete, consistent look to it.

Well that is quite interesting.. Why a NoSQL database can’t have a configuration to actually be consistent? Think about it.. If the data is spread/sharded/persisted to multiple nodes using a “consistent hashing” algorithm, where clients could have a guarantee that “this” data would live on “these” set of nodes, then any time an insert/update is completed ( truly committed ), any reads for that data would know exactly where/which nodes to read this data from. Since the hash is consistent.

The answer is actually obvious => by ensuring ‘C’ in a CAP theorem via consistent hash, you would need to sacrifice some of ‘A’.. Since certain data is limited by a concrete set of nodes (that client relies on), if some of those nodes are down, DB would need to lock/bring back/reconfigure/reshuffle data, and for that “moment” that data would be unAvailable. This can be improved/tuned with replication, but the “A sacrifice” remains to be there.

Well now I can actually try out the above with this new fruit DB that I discovered @ OSCON. It’s time you meet CitrusLeaf DB

Citrus DB with a Leaf Attached


You can go ahead and read their Architecture Paper with pretty pictures and quite interesting claims, but here I’ll just mention some interesting facts that are mostly not in a paper, which I gathered from talking to CitrusLeaf dudes at OSCON. By the way, they were really open about the internals of CitrusLeaf, even though it is a closed source, commercial product. So here we go:

  • The business niche CitrusLeaf aims to conquer is “Real Time Bidding” which in short is a bidding system that offers the opportunity to dynamically bid impressions ( Online Advertisement ). More about it here: http://en.wikipedia.org/wiki/Sell_Side_Platform
  • The pattern in Real Time Bidding space is 60/40 => 60% reads and 40% writes. CitrusLeaf promises to perform equally well for reads and writes
  • They claim to perform at 200,000 Transactions Per Second per node. Claim is based on 8 byte transactions, which according to CitrusLeaf folks is the usual transaction size in Real Time Bidding world
  • CitrusLeaf can use 3 different storage strategies: DRAM, SSD and Rotation Disks. They are optimized to work with SSDs, where the above benchmark drops to 20,000 Transactions Per Second for a single SSD. In a normal setup, a node would have about 4 SSD attached, where 80,000 Transactions Per Second can be achieved
  • Clients are available in C, C#, Java, Python, PHP, and Ruby
  • CitrusLeaf is ACID compliant, and uses consistent hashing to achieve ‘C’
  • Stores data in a B-Tree, since it does more (real time) reads than writes
  • Citrusleaf can store indices for 100 million records in 7 gigabytes of DRAM
  • Pricing model is per usage => e.g. per TB. Trial release includes a tracking mechanism where the system is reporting the usage

I feel like CitrusLeaf would be a cool addition to my MF degree, besides I already came up with a slogan for them: “One small citrus for man; one giant leaf for mankind” © by my brain


22
Jul 11

Having Cluster Fun @ Chariot Solutions

The best way to experiment with distributed computing is to have a distributed cluster of things to play with. One approach would of course be to spin off multiple Amazon EC2 instances, which would be wise and pretty cheap:

Micro instances provide 613 MB of memory and support 32-bit and 64-bit platforms on both Linux and Windows. Micro instance pricing for On-Demand instances starts at $0.02 per hour for Linux and $0.03 per hour for Windows”

However some problems are better solved/simulated by having real, “touchable” hardware, that would have real dedicated disks, dedicated cores, RAM, and would only share any kind of state with other nodes over network. Easier said that done though.. Do you have a dozen of spare (same spec’ed) PCs laying around?

But what if you had an awesome training room with, let’s say, 10 iMacs? That would look something like:

Chariot Solutions Training Room

This is in fact the real deal => “Chariot Solutions Training Room“, which is usually occupied by people learning about Scala, Clojure, Hadoop, Spring, Hibernate, Maven, Rails, etc..

So once upon a time, in after training hours, we decided to run some distributed simulations. As we were passing by the training room, we had a thought: “It’s Friday night, and as any other creatures, these beautiful machines would definitely like to hang out together”…

Cluster at Chariot Solutions

This is one of this night’s highlights: a MongoDB playground. The same Friday night we played with Riak, Cassandra, RabbitMQ and vanilla distributed Erlang. As you can imagine iMacs had a lot of fun in a process pumping data in and out via 10 Gigabit switch. And we geeked out like real men!