"; */ ?>


24
Jan 12

Clojure: Perfect Language for Perfect Numbers

In number theory, a perfect number is a positive integer that is equal to the sum of its proper positive divisors, that is, the sum of its positive divisors excluding the number itself (also known as its aliquot sum). Equivalently, a perfect number is a number that is half the sum of all of its positive divisors (including itself) i.e. σ1(n) = 2n.

After watching a Functional Thinking talk by Neal Ford, needed to give it some clojure…

(ns perfect-numbers.core)
 
(defn is-factor? [divident divisor] 
   (zero? (mod divident divisor)))
 
(defn factors [number] 
   (distinct                                      ; no dups for perfect squares (e.g. 16, 64, 49)
      (mapcat #(when (is-factor? number %)        ; when a factor is found
         [(/ number %) %])                        ; return a pair of [number/factor, factor]
         (range 1 (inc (Math/sqrt number)) 1))))  ; go upto a (sqrt number) inclusively
 
(defn perfect? [number]
   (= (reduce + (factors number)) (* 2 number)))  ; check if sum of factors = 2*N
 
(def perfect-numbers
   (filter perfect? (nnext (range))))

Let’s give it a spin:

$ lein repl
REPL started; server listening on localhost port 61776
user=> (use 'perfect-numbers.core)
nil
user=> (take 4 perfect-numbers)
(6 28 496 8128)
user=>

Let’s time it:

user=> (time (doall (take 4 perfect-numbers)))
(6 28 496 8128)
"Elapsed time: 0.123 msecs"
user=> 
user=> (time (doall (take 5 perfect-numbers)))
(6 28 496 8128 33550336)
"Elapsed time: 1.8967263496E7 msecs" ( 5 hours 16 minutes )

11
Oct 11

AKKA Scheduler: Sending Message to Actor’s Self on Start

Akka has a little scheduler written using actors. This can be convenient if you want to schedule some periodic task for maintenance or similar. It allows you to register a message that you want to be sent to a specific actor at a periodic interval.

How Does AKKA Schedule Things?


Behind the scenes, AKKA scheduler relies on “ScheduledExecutorService” from the “java.util.concurrent” package. Hence when AKKA Scheduler needs to schedule “a message sent to an actor, given a certain initial delay and interval”, it just wraps the task of sending a message in a “java.lang.Runnable”, and uses a “ScheduledExecutorService” to schedule it:

service.scheduleAtFixedRate( createSendRunnable( receiver, message, true ), 
                             initialDelay, 
                             delay, 
                             timeUnit).asInstanceOf[ScheduledFuture[AnyRef]]

“Heartbeat” Actor


Let’s look at the example of scheduling a message that should be sent to an Actor’s “self” as the Actor on start. Why? Because it is a cool use case : )

“Heartbeat” would be an ideal example of such use case => “When a ‘Hearbeat Actor’ starts, it should start sending heartbeats with a given interval (e.g. every 2 seconds)”

Creating a Message

First we need to create a message that will be scheduled to be sent every so often. We’ll call it a “SendHeartbeat” message:

sealed trait HeartbeatMessage
case object SendHeartbeat extends HeartbeatMessage
Heartbeat of the Hollywood

Since sending a heartbeat needs to be scheduled as the Actor starts, the scheduling logic should be placed in the AKKA “preStart()” hook, which is called right before the Actor is started:

override def preStart() {
 
  logger.debug( "scheduling a heartbeat to go out every " + interval + " seconds" )
 
  // scheduling the task (with the 'self') should be the last statement in preStart()
  scheduledTask = Scheduler.schedule( self, SendHeartbeat, 0, interval, TimeUnit.SECONDS )
}

Another thing to note, all the other non scheduling on start logic, if any, should go before the call to the scheduler, otherwise the task will not be scheduled.

Heartbeat should also be stoppable. We could have called “Scheduler.shutdown()” in Actor’s “postStop()”, but first, this would stop all the other tasks that were potentially scheduled by others, and second, it will result in a very dark AKKA magic behavior.

Instead, the heartbeat task itself should be cancelled => which is lot cleaner than calling for the dark magic for no good reason:

override def postStop() {
  scheduledTask.cancel( true )
}

Having the above two in mind here is the Hollywood Heartbeat himself:

class Heartbeat ( val interval: Long ) extends Actor {
 
  private val logger = LoggerFactory.getLogger( this.getClass )
  private var scheduledTask: ScheduledFuture[AnyRef] = null
 
  override def preStart() {
 
    logger.debug( "scheduling a heartbeat to go out every " + interval + " seconds" )
 
    // scheduling the task (with the 'self') should be the last statement in preStart()
    scheduledTask = Scheduler.schedule( self, SendHeartbeat, 0, interval, TimeUnit.SECONDS )
  }
 
  def receive = {
 
    case SendHeartbeat =>
 
      logger.debug( "Sending a hearbeat" )
      // sending a heartbeat here.. socket.write( heartbeatBytes )
      true
 
    case unknown =>
 
      throw new RuntimeException( "ERROR: Received unknown message [" + unknown + "], can't handle it" )
  }
 
  override def postStop() {
    scheduledTask.cancel( true )
  }
}
Making Sure the Heart is Beating

We’ll use ScalaTest to run the beast. This is more of a runner than a real test, since it does not really test for anything, but it proves the point:

class HeartbeatTest extends WordSpec  {
 
  val heartBeatInterval = 2
 
  "it" should {
 
    "send heartbeats every" + heartBeatInterval + " seconds" in {
       val heartbeat = actorOf ( new Heartbeat( heartBeatInterval ) ).start()
       Thread.sleep( heartBeatInterval * 1000 + 3000 )
       heartbeat.stop()
     }
  }
}

And.. We are “In the Money“:

17:02:54,118 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - Attaching appender named [STDOUT] to Logger[ROOT]
 
Sending a hearbeat
Sending a hearbeat
Sending a hearbeat
 
Process finished with exit code 0

09
Sep 11

ØMQ and Google Protocol Buffers

Using ZeroMQ API, we can both: queue up and dispatch / route Google Protobuf messages with X lines of code, where X approaches to zero.. Well, it is ZeroMQ after all.

Google Protocol Buffers Side


Say we have a “Trade” message that is described by Google protobufs as:

message TradeMessage {
 
    required string messageType = 1;
 
    required int32 maxFloor = 2;
    required int32 qty = 3;
    required int32 accountType = 4;
    required int32 encodedTextLen = 5;
    ... ...
}

Let’s assume that our “messageType” is always 2 bytes long. Then Google Protocol Buffers will encode it as a sequence of bytes, where first two bytes will determine protobuf’s field type (10) and field lenght (2), and the rest will be the actual UTF-8 byte sequence that would represent a message type. Let’s make “TR” a message type for “Trade” messages.

Once a Google protobuf “Trade” message is generated it will start with a message type in a following format:

byte [] messageType = { 10, 2, 84, 82 };

Where ’84’ and ’82’ are ASCII for ‘T’ and ‘R’.

Now let’s say we have a some kind of “TradeGenerator” ( just for testing purposes to simulate the actual feed / load ) that will produce Google Protobuf encoded “Trade” messages:

public static Trade.TradeMessage nextTrade() {
 
    return
        Trade.TradeMessage.newBuilder()
                      .setMessageType( "TR" )
                      .setAccountType( 42 )
                         ... ... ...
    }

Note that it sets the message type to “TR” as we agreed upon.

ØMQ Side


Sending “Trade” messages with ØMQ is as simple as drinking a cup of coffee in the morning:

ZMQ.Context context = ZMQ.context( 1 );
ZMQ.Socket publisher = context.socket( ZMQ.PUB );
publisher.bind( "tcp://*:5556" );
 
// creating a static trade => encoding a trade message ONCE for this example
Trade.TradeMessage trade = TradeGenerator.nextTrade();
 
while ( true ) {
    publisher.send( trade.toByteArray(), 0 );
}

Consuming messages is as simple as eating a bagel with that coffee. The interesting part (call it “the kicker”) is that we can actually subscribe to a “TR” message type (first 4 bytes) using just ZeroMQ API:

ZMQ.Context context = ZMQ.context( 1 );
ZMQ.Socket subscriber = context.socket( ZMQ.SUB );
subscriber.connect( "tcp://localhost:5556" );
 
// subscribe to a Trade message type => Google Proto '10' ( type 2 )
//                                      Google Proto '2'  ( length 2 bytes )
//                                             ASCII '84' = "T"
//                                             ASCII '82' = "R"
 
byte [] messageType = { 10, 2, 84, 82 };
subscriber.subscribe( messageType );
 
for ( int i = 0; i < NUMBER_OF_MESSAGES; i++ ) {
 
    byte[] rawTrade = subscriber.recv( 0 );
 
    try {
        Trade.TradeMessage trade = Trade.TradeMessage.parseFrom( rawTrade );
        assert ( trade.getAccountType() == 42 );
    }
    catch ( InvalidProtocolBufferException pbe ) {
        throw new RuntimeException( pbe );
    }
}

Now all the “TR” messages will actually go to this subscriber.

NOTE: Alternatively, you can use a “Union” Google Protocol Buffers technique (or extensions) in order to encode all different message types: here is how.


28
Jul 11

One Small Citrus for Man; One Giant Leaf for Mankind

– Who does not like fruits!?
– Well.. that depends. Are you talking about “a structure of a plant that contains its seeds?”
– No silly, of course not! I am talking about data bases!

© by my brain

The Right Fruit for the Right Job


Now days in order to be competent in a world of Big Data you must get at least a Masters in Fruits, or as I call it an “MF Degree”. Why!? Well how about’em fruits:

You see? Very important to know which fruit to choose for your next {m|b|tr}illion dollar gig.

To expand my MF degree I love doing research in a big data space, and as I was walking around #oscon 2011 expo, I was really pleased to discover a new sort of fruits that I have not heard of before. You would think “yea, ok.. YAFDB: Yet Another Fruit DB”, but no => this one is different => this one has a kicker, this one has a.. “leaf”!

Leafing A for C


You may notice that the above fruit DBs missing that “power of the leaf”, and look rather leafless. And in the world of NoSQL databases fruit without a leaf has somewhat inconsistent properties. Well, let’s rephrase that: eventually the leaf will grow, so we can say that eventually those fruits will look consistent.

But what if a NoSQL database already came with leaf attached to it? You can’t argue that if it did, it would have a complete, consistent look to it.

Well that is quite interesting.. Why a NoSQL database can’t have a configuration to actually be consistent? Think about it.. If the data is spread/sharded/persisted to multiple nodes using a “consistent hashing” algorithm, where clients could have a guarantee that “this” data would live on “these” set of nodes, then any time an insert/update is completed ( truly committed ), any reads for that data would know exactly where/which nodes to read this data from. Since the hash is consistent.

The answer is actually obvious => by ensuring ‘C’ in a CAP theorem via consistent hash, you would need to sacrifice some of ‘A’.. Since certain data is limited by a concrete set of nodes (that client relies on), if some of those nodes are down, DB would need to lock/bring back/reconfigure/reshuffle data, and for that “moment” that data would be unAvailable. This can be improved/tuned with replication, but the “A sacrifice” remains to be there.

Well now I can actually try out the above with this new fruit DB that I discovered @ OSCON. It’s time you meet CitrusLeaf DB

Citrus DB with a Leaf Attached


You can go ahead and read their Architecture Paper with pretty pictures and quite interesting claims, but here I’ll just mention some interesting facts that are mostly not in a paper, which I gathered from talking to CitrusLeaf dudes at OSCON. By the way, they were really open about the internals of CitrusLeaf, even though it is a closed source, commercial product. So here we go:

  • The business niche CitrusLeaf aims to conquer is “Real Time Bidding” which in short is a bidding system that offers the opportunity to dynamically bid impressions ( Online Advertisement ). More about it here: http://en.wikipedia.org/wiki/Sell_Side_Platform
  • The pattern in Real Time Bidding space is 60/40 => 60% reads and 40% writes. CitrusLeaf promises to perform equally well for reads and writes
  • They claim to perform at 200,000 Transactions Per Second per node. Claim is based on 8 byte transactions, which according to CitrusLeaf folks is the usual transaction size in Real Time Bidding world
  • CitrusLeaf can use 3 different storage strategies: DRAM, SSD and Rotation Disks. They are optimized to work with SSDs, where the above benchmark drops to 20,000 Transactions Per Second for a single SSD. In a normal setup, a node would have about 4 SSD attached, where 80,000 Transactions Per Second can be achieved
  • Clients are available in C, C#, Java, Python, PHP, and Ruby
  • CitrusLeaf is ACID compliant, and uses consistent hashing to achieve ‘C’
  • Stores data in a B-Tree, since it does more (real time) reads than writes
  • Citrusleaf can store indices for 100 million records in 7 gigabytes of DRAM
  • Pricing model is per usage => e.g. per TB. Trial release includes a tracking mechanism where the system is reporting the usage

I feel like CitrusLeaf would be a cool addition to my MF degree, besides I already came up with a slogan for them: “One small citrus for man; one giant leaf for mankind” © by my brain


22
Jul 11

Having Cluster Fun @ Chariot Solutions

The best way to experiment with distributed computing is to have a distributed cluster of things to play with. One approach would of course be to spin off multiple Amazon EC2 instances, which would be wise and pretty cheap:

Micro instances provide 613 MB of memory and support 32-bit and 64-bit platforms on both Linux and Windows. Micro instance pricing for On-Demand instances starts at $0.02 per hour for Linux and $0.03 per hour for Windows”

However some problems are better solved/simulated by having real, “touchable” hardware, that would have real dedicated disks, dedicated cores, RAM, and would only share any kind of state with other nodes over network. Easier said that done though.. Do you have a dozen of spare (same spec’ed) PCs laying around?

But what if you had an awesome training room with, let’s say, 10 iMacs? That would look something like:

Chariot Solutions Training Room

This is in fact the real deal => “Chariot Solutions Training Room“, which is usually occupied by people learning about Scala, Clojure, Hadoop, Spring, Hibernate, Maven, Rails, etc..

So once upon a time, in after training hours, we decided to run some distributed simulations. As we were passing by the training room, we had a thought: “It’s Friday night, and as any other creatures, these beautiful machines would definitely like to hang out together”…

Cluster at Chariot Solutions

This is one of this night’s highlights: a MongoDB playground. The same Friday night we played with Riak, Cassandra, RabbitMQ and vanilla distributed Erlang. As you can imagine iMacs had a lot of fun in a process pumping data in and out via 10 Gigabit switch. And we geeked out like real men!