Jul 11

One Small Citrus for Man; One Giant Leaf for Mankind

– Who does not like fruits!?
– Well.. that depends. Are you talking about “a structure of a plant that contains its seeds?”
– No silly, of course not! I am talking about data bases!

© by my brain

The Right Fruit for the Right Job

Now days in order to be competent in a world of Big Data you must get at least a Masters in Fruits, or as I call it an “MF Degree”. Why!? Well how about’em fruits:

You see? Very important to know which fruit to choose for your next {m|b|tr}illion dollar gig.

To expand my MF degree I love doing research in a big data space, and as I was walking around #oscon 2011 expo, I was really pleased to discover a new sort of fruits that I have not heard of before. You would think “yea, ok.. YAFDB: Yet Another Fruit DB”, but no => this one is different => this one has a kicker, this one has a.. “leaf”!

Leafing A for C

You may notice that the above fruit DBs missing that “power of the leaf”, and look rather leafless. And in the world of NoSQL databases fruit without a leaf has somewhat inconsistent properties. Well, let’s rephrase that: eventually the leaf will grow, so we can say that eventually those fruits will look consistent.

But what if a NoSQL database already came with leaf attached to it? You can’t argue that if it did, it would have a complete, consistent look to it.

Well that is quite interesting.. Why a NoSQL database can’t have a configuration to actually be consistent? Think about it.. If the data is spread/sharded/persisted to multiple nodes using a “consistent hashing” algorithm, where clients could have a guarantee that “this” data would live on “these” set of nodes, then any time an insert/update is completed ( truly committed ), any reads for that data would know exactly where/which nodes to read this data from. Since the hash is consistent.

The answer is actually obvious => by ensuring ‘C’ in a CAP theorem via consistent hash, you would need to sacrifice some of ‘A’.. Since certain data is limited by a concrete set of nodes (that client relies on), if some of those nodes are down, DB would need to lock/bring back/reconfigure/reshuffle data, and for that “moment” that data would be unAvailable. This can be improved/tuned with replication, but the “A sacrifice” remains to be there.

Well now I can actually try out the above with this new fruit DB that I discovered @ OSCON. It’s time you meet CitrusLeaf DB

Citrus DB with a Leaf Attached

You can go ahead and read their Architecture Paper with pretty pictures and quite interesting claims, but here I’ll just mention some interesting facts that are mostly not in a paper, which I gathered from talking to CitrusLeaf dudes at OSCON. By the way, they were really open about the internals of CitrusLeaf, even though it is a closed source, commercial product. So here we go:

  • The business niche CitrusLeaf aims to conquer is “Real Time Bidding” which in short is a bidding system that offers the opportunity to dynamically bid impressions ( Online Advertisement ). More about it here: http://en.wikipedia.org/wiki/Sell_Side_Platform
  • The pattern in Real Time Bidding space is 60/40 => 60% reads and 40% writes. CitrusLeaf promises to perform equally well for reads and writes
  • They claim to perform at 200,000 Transactions Per Second per node. Claim is based on 8 byte transactions, which according to CitrusLeaf folks is the usual transaction size in Real Time Bidding world
  • CitrusLeaf can use 3 different storage strategies: DRAM, SSD and Rotation Disks. They are optimized to work with SSDs, where the above benchmark drops to 20,000 Transactions Per Second for a single SSD. In a normal setup, a node would have about 4 SSD attached, where 80,000 Transactions Per Second can be achieved
  • Clients are available in C, C#, Java, Python, PHP, and Ruby
  • CitrusLeaf is ACID compliant, and uses consistent hashing to achieve ‘C’
  • Stores data in a B-Tree, since it does more (real time) reads than writes
  • Citrusleaf can store indices for 100 million records in 7 gigabytes of DRAM
  • Pricing model is per usage => e.g. per TB. Trial release includes a tracking mechanism where the system is reporting the usage

I feel like CitrusLeaf would be a cool addition to my MF degree, besides I already came up with a slogan for them: “One small citrus for man; one giant leaf for mankind” © by my brain

Jul 11

Having Cluster Fun @ Chariot Solutions

The best way to experiment with distributed computing is to have a distributed cluster of things to play with. One approach would of course be to spin off multiple Amazon EC2 instances, which would be wise and pretty cheap:

Micro instances provide 613 MB of memory and support 32-bit and 64-bit platforms on both Linux and Windows. Micro instance pricing for On-Demand instances starts at $0.02 per hour for Linux and $0.03 per hour for Windows”

However some problems are better solved/simulated by having real, “touchable” hardware, that would have real dedicated disks, dedicated cores, RAM, and would only share any kind of state with other nodes over network. Easier said that done though.. Do you have a dozen of spare (same spec’ed) PCs laying around?

But what if you had an awesome training room with, let’s say, 10 iMacs? That would look something like:

Chariot Solutions Training Room

This is in fact the real deal => “Chariot Solutions Training Room“, which is usually occupied by people learning about Scala, Clojure, Hadoop, Spring, Hibernate, Maven, Rails, etc..

So once upon a time, in after training hours, we decided to run some distributed simulations. As we were passing by the training room, we had a thought: “It’s Friday night, and as any other creatures, these beautiful machines would definitely like to hang out together”…

Cluster at Chariot Solutions

This is one of this night’s highlights: a MongoDB playground. The same Friday night we played with Riak, Cassandra, RabbitMQ and vanilla distributed Erlang. As you can imagine iMacs had a lot of fun in a process pumping data in and out via 10 Gigabit switch. And we geeked out like real men!

Jul 11

NoRAM DB => “If It Does Not Fit in RAM, I Will Quietly Die For You”

Out of 472 NoSQL databases / distributed caches that are currently available, highly buzzed and scream that only their precious brand solves the world toughest problems.. There are only a few that have less screaming and more doing that I found so far:

Choosing a Drink

See.. Choice makes things better and worse at the same time =>

if I am thirsty, and I only have water available, 
I'll take that, satisfy my thirst, 
and come back to focusing on whatever it is I was doing

at the same time

if I am thirsty, and I have 472 sodas, and 253 juices, and 83 waters available
I'll take a two hour brake to choose the right one, satisfy my thirst, 
and come back to focusing on whatever it is I was doing

It may not seem as two different experiences at first, but they are different approaches.

Especially bad is when out of those 472 sodas, #58 has such a seductive ad on a can, and it promisses you $1,000,000 prize, if only you make a sip. So you are interested ( many people drink it too ), trying it… and spitting it out immediately => now there are 471 left to go.. remaining thirst, and such a dissatisfaction

If there is CAP, there is CRACS

NoSQL is no different now days. Choosing the right data store for your task / project / problem really depends on several factors. Let’s redefine a CAP theorem (because we can), and call it CRACS instead:

  • Cost:                    Do you have spare change for several TB of RAM?
  • Reliability:            Fault Tolerance, Availability
  • Amount of data:   Do you operate Megabytes or Terabytes, or maybe Petabytes?
  • Consistency:         Can you survive two reads @ the same millisecond return different results?
  • Speed:                  Reading, which includes various aggregates AND writing

That’s pretty much it, let me know if you have you favorite that is missing from CRACS, but at least for now, five is a good number. Of course there are other important things, like simplicity ( hello Cassandra: NOT ) and ease of development ( hello VoltDB, KDB: NOT ), and of course fun to work with, and of course great documentation, and of course great community, and of course… there are others, but above five seem to nail it.

Distributed Cache: Why Redis and Hazelcast?

Well, let’s see Redis and Hazelcast are distributed caches with an optional persistent store. They score because they do just that => CACHE, they are data structure based: e.g. List, Set, Queue.. even DistributedTask, DistributedEvent, etc. Again they are upfront with you: we do awesome CACHE => and they do. I have a good feeling about GemFire, but I have not tried it, and last time I contacted them, they did not respond, so that’s that.

NoSQL: Going Beyond RAM

See, what I really learned to dislike is “if it does not fit in RAM, I will quietly die for you” NoSQL data stores ( hello MongoDB ). And it is not just the index that should entirely fit into memory, but also the data that has not yet been fsync’ed, data that was brought back for querying, data that was fsync’ed, but still hangs around, etc..

The thing to understand when comparing Riak to MongoDB is that Riak actually writes to disk, and MongoDB writes to mmap files ( memory ). Try setting Mongo’s WriteConcern to “FSYNC_SAFE”, and now compare it to Riak => that would be a fair comparison. And having LevelDB as Riak’s backend, or even good old Bitcask, Riak will take Gold. Yes, Riak’s storage is pluggable : ).

Another thing besides that obvious Mongo “RAM problem” is JSON, I mean BSON. Don’t let this ‘B‘ fool you, a key name such as “name” will take at least 4 bytes, without even getting to the actual value for this key, which affects performance, as well as storage. Riak has protocol buffers as an alternative to JSON, which can really help with the document size. And with secondary indicies on the way, it may even prove be searchable : ).

Both Riak and MongoDB struggle with Map/Reduce: MongoDB does it in a single (!) SpiderMonkey thread, and of course it has a friendly GlobalLock that does not help, but it takes a point from Riak by having secondary indicies. But both: Mongo and Riak are in the process of rewriting their MapReduce frameworks completely, so we’ll see how it goes.

Cassandra Goodness and Awkwardness

Cassandra is however a solid piece of software, with one caveat: you have to hire DataStax guys, who are really, really, really good, by the way, but you have to pay up good money for such goodness. Otherwise, on your own, you don’t need to really have a PhD to handle Cassandra [ you only really need to have a PhD in Comp Sci, if you actually can and like to hang around in school for a couple more years, but that is a topic for another blog post ].

Cassandra documentation is somewhat good, and the code is there, the only problem is it looks and feels a bit Frankenstein: murmurhash from here, bloom filter from here, here is a sprinkle of Thrift ( really!? ), in case you want to Insert/Update, here is a “mutator”, etc.. Plus adding a node is really an afterthought => if you have TBs of data, adding a node can take days, no really => days. Querying is improving with CQL roll out in 0.8, but any kind of even simplistic aggregation requires some “Thriftiveness” [ read “Awkwardness” ].

CouchDB is Awesome, but JSON

CouchDB looks and feels like an awesome choice if the size of the data is not a problem. Same as with MongoDB, “JSON only” approach is a bit weird, why only JSON? The point of no schema is NOT that it changes ALL the time => then you have just a BLOB, content of which you cannot predict, hence index. The point is, the schema ( and rather some part of it ) may/will change, so WHY the heck do I have to carry those key names (that are VARCHARs and take space) around? Again CouchDB + alternative protocol would make it in my “Real NoSQL Dudes” list easily, as I like most things about it, especially secondary indicies, Ubuntu integration, mobile expansion, and of course Erlang, but Riak has Erlang too : )

VoltDB: With Great Speed Comes Great Responsibility

As far as speed, VoltDB would probably leave most of others behind (well, besides OneTick and KDB), but it actually is not NoSQL (hence it is fully ACID), and it is not NotJustInRam store, since it IS Just RAM. Three huge caveats are:

1. Everything is a precoded store procedure => ad hoc querying is quite difficult
2. Aggregation queries can’t work with data greater than 100MB ( for a temp table )
3. Data “can” be exported to Hadoop, but there is no real integration (yet) to analyze the data that is currently in RAM along with the data in Hadoop.

But it is young and worth mentioning, as I think RAM and hardware gets cheaper, “commodity nodes” become less important, so “Scale Up” solutions may actually win back some architectures. There is of course a question of Fault Tolerance, which Erlang / AKKA based systems will solve a lot better than any “Scale Up”s, but it is a topic for another time.

Dark Hourses of NoSQL

There are others, such as Tokyo Cabinet ( or is it Kyoto Cabinet now days ), Project Voldemort => I have not tried them, but have heard good stories about them. The only problem I see with these “dark horse” solutions is lack of adoption.

Neo4j: Graph it Like You Mean It

Ok, so why Neo4j? Well, because it is a graph data store, that screws with your mind a bit (data modeling) until you get it. But once you get it, there is no excuse not to use it, especially when you create your next “social network, location based, shopping smart” start up => modeling connected things as a graph just MAKES SENSE, and Neo4j is perfect for it.

You know how fast it is to find “all” connections for a single node in a non graph data store? Well, it is longer and longer with “all” being a big number. With Neo4j, it is as fast as to find a single connection => cause it is a graph, it’s a natural data structure to work with graph shaped data. It comes with a price it is not as easy to Scale Out, at least for free. But.. it is fully transactional ( even JTA ), it persists things to disk, it is baked into Spring Data. Try it, makes your brain do a couple of front splits, but your mind feels really stretched afterwards.

The Future is 600 to 1

There is of course HBase, which Hadapt guys are promising to beat 600 to 1, so we’ll see what Hadapt brings to the table. Meanwhile I invite Riak, Redis, Hazecast and Neo4j to walk together with me the slippery slope of NoSQL.

Apr 11

Spock It Like You Mean It!

So here we go.. Yesterday night we hacked our way into The Ancient Database where besides the data about ancients themselves, we found incredible stats about many different living species of all the planets ancients traveled to.

So what do we do now? Well, we need a way to query/read this data. So we asked Sam to develop a reader ( ancients called it a ‘DAO’ ) to find a number of living species on each planet since the “beginning of time”.

So she did, and called this “species reader” a ‘SpeciesDao’.

We, of course, trust Sam. But in ancient technology, trust is always based on challenging the assumption by testing all the possible permutations. So we are calling Mr. Spock to help us out…

Sam of course used a well known ancient technique ( code name: Spring Framework ) to create a “species reader”, so Mr. Spock will use all the goodies of this technique to inject all the dependencies ( e.g. data source, a reader itself, etc.. ) for a test.

Since Mr. Spock is a peaceful inhabitant from a planet Simplicity, it speaks a simple language called Groovy. This language is unique in a way that living beings of all the galaxies can pick it up and understand it within minutes, which by the way makes it a perfect choice for various intergalactic conferences.

To start off, we’ll tell Mr. Spock where to look for all the dependencies:

@ContextConfiguration( locations = [ "classpath:/conf/test-context.xml", "classpath:/conf/persistence-context.xml" ] )

The data we are going to test against will be in a form of DbUnit datasets, which by the way can be XML based, Excel based or even YAML based. In this example we are going to focus on planets of two galaxies that peak our curiosity the most: Milky Way and Pegasus.

  // Species of the Milky Way Galaxy by Planet
  static final P3X_888_SPECIES_DATASET = "/dataset/stage/p3x-888-species.xml"
  static final DAKARA_SPECIES_DATASET = "/dataset/stage/dakara-species.xml"
  // Species of the Pegasus Galaxy by Planet
  static final ASURAS_SPECIES_DATASET = "/dataset/stage/asuras-species.xml"
  static final WRAITH_HOMEWORLD_SPECIES_DATASET = "/dataset/stage/wraith-homeworld-species.xml"

Since it is a test, we already know the exact number of living spices since the ‘beginning of time’ for each dataset. And we’d need to call the reader Sam wrote ( ‘SpeciesDao’ ) for each dataset and compare it with an expected number. So you see, lots of repetitive actions. But no worries humans! Spock has an elegant way to deal with all these repetitions, by using its ‘WHERE’ power:

             planet                        |          numberOfSpeciesFound
      // Milky Way Galaxy
      P3X_888_SPECIES_DATASET              |                  888
      DAKARA_SPECIES_DATASET               |                 123804
      HELIOPOLIS_SPECIES_DATASET           |                   7

That’s neat. The only problem is if the number does not match ( yes, even Sam has bugs ) for example for planet Dakara, Mr. Spock will tell us that [ “something did not match up..” ], but will forget to mention that “it” did not match up specifically for a Dakara planet. And if we have thousands of such planets, it’ll be extremely difficult to find the culprit. But again, humans, this is easily solvable by using a Mr. Spock’s secret power: The power of ‘@Unroll’!

  @Unroll("#planet should have #numberOfSpeciesFound species found")
  def "only living species since begining of time should be found"() { ... }

By annotating a test method with @Unroll, in case where a number of living species found did not match a number that we expected, Mr. Spock will say just that. For example for Dakara, it’ll now say: “Dakara should have 123804 species found”, while also telling us the actual number that was found to compare. Pretty handy!

One last touch before we can fully trust Mr. Spock.. The way the ancient technology ( Spring ) was written, it won’t allow to inject collaborators ( e.g. data source ) statically before all the specifications / permutations. It can be tweaked to do that, but cmmon, who are we to tweak the ancients.. Instead we’ll tell Mr. Spock to do all the setup it needs only the first time a data source is injected:

    // setupSpec() cannot access Spring beans ( e.g. dataSource ), hence need to check it every time
    if ( ! databaseTester ) {
      databaseTester = new DataSourceDatabaseTester( dataSource )

Now we are ready to roll. Let’s put all the pieces together and make sure Sam’s creation does what it’s supposed to:

@ContextConfiguration( locations = [ "classpath:/conf/test-context.xml", "classpath:/conf/persistence-context.xml" ] )
class FindNumberOfSpeciesTest extends Specification {
  // Species of the Milky Way Galaxy by Planet
  static final P3X_888_SPECIES_DATASET = "/dataset/stage/p3x-888-species.xml"
  static final DAKARA_SPECIES_DATASET = "/dataset/stage/dakara-species.xml"
  static final HELIOPOLIS_SPECIES_DATASET = "/dataset/stage/heliopolis-species.xml"
  static final ASCHEN_PRIME_SPECIES_DATASET = "/dataset/stage/aschen-prime-species.xml"
  static final P4X_650_SPECIES_DATASET = "/dataset/stage/p4x-650-species.xml"
  static final VIS_UBAN_SPECIES_DATASET = "/dataset/stage/vis-uban-species.xml"
  static final PROCLARUSH_SPECIES_DATASET = "/dataset/stage/proclarush-species.xml"
  static final DAKARA_SPECIES_DATASET = "/dataset/stage/dakara-species.xml"
  static final HEBRIDAN_SPECIES_DATASET = "/dataset/stage/hebridan-species.xml"
  // Species of the Pegasus Galaxy by Planet
  static final ASURAS_SPECIES_DATASET = "/dataset/stage/asuras-species.xml"
  static final WRAITH_HOMEWORLD_SPECIES_DATASET = "/dataset/stage/wraith-homeworld-species.xml"
  static final SATEDA_SPECIES_DATASET = "/dataset/stage/sateda-species.xml"
  static final DAGAN_SPECIES_DATASET = "/dataset/stage/dagan-species.xml"
  static final LORD_PROTECTORS_SPECIES_DATASET = "/dataset/stage/lord-protectors-species.xml"
  static final M7G_677_SPECIES_DATASET = "/dataset/stage/m7g-677-species.xml"
  static final ATHOS_SPECIES_DATASET = "/dataset/stage/athos-677-species.xml"
  static final THE_BEGINNING_OF_TIME = Date.parse( "yyyy-M-d", "1979-01-01" )
  SpeciesDao speciesDao
  DataSource dataSource
  @Shared IDatabaseTester databaseTester
  @Unroll("#planet should have #numberOfSpeciesFound species found")
  def "only living species since the beginning of time should be found"() {
    when: stageTestData planet
    then: speciesDao.findNumberOfSpeciesLivingSince( THE_BEGINNING_OF_TIME ) == numberOfSpeciesFound
             planet                        |          numberOfSpeciesFound
      // Milky Way Galaxy
      P3X_888_SPECIES_DATASET              |                  888
      DAKARA_SPECIES_DATASET               |                 123804
      HELIOPOLIS_SPECIES_DATASET           |                   7
      ASCHEN_PRIME_SPECIES_DATASET         |                 2423984
      P4X_650_SPECIES_DATASET              |                  2600
      VIS_UBAN_SPECIES_DATASET             |                   0
      PROCLARUSH_SPECIES_DATASET           |                 8869346
      DAKARA_SPECIES_DATASET               |                  5672
      HEBRIDAN_SPECIES_DATASET             |                   67
      // Pagasus Galaxy
      ASURAS_SPECIES_DATASET               |                  823
      WRAITH_HOMEWORLD_SPECIES_DATASET     |                 62634
      SATEDA_SPECIES_DATASET               |                  327
      SATEDA_SPECIES_DATASET               |                   0
      DAGAN_SPECIES_DATASET                |                  777
      LORD_PROTECTORS_SPECIES_DATASET      |                  8786
      M7G_677_SPECIES_DATASET              |                  4739
      ATHOS_SPECIES_DATASET                |                3767822
  def stageTestData = { dataSetLocation ->
    dataSource != null
    speciesDao != null
    // setupSpec() cannot access Spring beans ( e.g. dataSource ), hence need to check it every time
    if ( ! databaseTester ) {
      databaseTester = new DataSourceDatabaseTester( dataSource )
    databaseTester.setDataSet( new FlatXmlDataFileLoader().load( dataSetLocation ) )

Spock away Humans!

Apr 11

Mobile Devices Visiting Dotkam

Looking at mobile stats I can conclude that Android users are most interested in my content ( number of users vs average 21 seconds on the page ), where iPhone users are most curious ( max number of visits ), and of course thank you the lonely “Sony” user who spent 4 and a half minutes here :)

Moible Devices Visiting Dotkam

But overall most of my daily visits ( currently about 1300 ) are coming from other places. Need to work on that..

Mar 11

Spring Batch: CommandLineJobRunner to Run Multiple Jobs

Sometimes this requirement may jump at you: “I still want to have them as several jobs, but I want no operational overhead, and need to run them in sequence via a single launch…”.

So if you were using CommandLineJobRunner to launch these jobs, it would only be natural to aggregate those calls under your own e.g. “CommandLineMultipleJobRunner” that just delegates its calls to “CommandLineJobRunner” for each job in a list..

The only caveat is the default SystemExiter which is set to JvmSystemExiter, which makes it imposible to make more than one launch using a “CommandLineJobRunner”, since “JvmSystemExiter” does:

public void exit(int status) {

Fortunately, there is a static accessor to override this behavior:

// disabling a default "JvmSystemExiter" to be able to run several times
CommandLineJobRunner.presetSystemExiter( new SystemExiter() { public void exit( int status ) {} } );

Now, call “CommandLineJobRunner.main( String[] args )” as many times as you’d like.

If you have a choice, think about creating a single multi step job instead, but if you don’t have a luxury, “CommandLineMultipleJobRunner” should now be not all that hard to implement.

Batch Away! :)

Mar 11

Having Fun with Groovy Date Parsing

How do you convert a String to Date in Groovy? Well it’s simple:

   Date.parse( "yyyy-M-d", "2011-01-15" )

Now, let’s say I would like to shorten this: yes, it looks to long, remember the actual data I am interested in is “2011-01-15”, everything else means nothing really.. datawise.

Ok, so I can

  Date.metaClass.'static'.fromString = { str ->
    Date.parse( "yyyy-M-d", str )

which gives me a shorter representation ( less fluff around the actual data ):

  Date.fromString( "2011-01-15" )

That’s not bad, but I would like to take it further :) I am staging data ( domain objects ) in my Spock tests, and need as less fluff as possible => “only data matters”. So here it is, the geeky solution:

Create a DataParser:

  class DateParser {
    def static te = { str -> Date.parse( "yyyy-M-d", str ) }

Wherever you need parse dates, import it as ‘d’:

  import org.dotkam.util.date.DateParser as d

Now create your d.tes ( I mean dates :) ) as:

  d.te( "2011-01-05" )

//TODO: it can of course be extended with multiple formats

Jan 11

Android Development: Best Practices

Cyanogen LogoI got introduced to an Android application development during Philly ETE conference by listening to “A guided tour of the Android ETE mobile application” talk, where Andrew Oswald ( a Java Architect from Chariot Solutions ) talked about creating an Android app for the conference, which was a cool introduction to “what it takes” to write one of those apps from scratch. I am looking forward to more talks around mobile development this year at Philly ETE conference as well.

Meanwhile I rely mostly on Android’s Developer Guide whenever I seek answers to my questions or/and best practices. Here are my notes on such practices I picked up listening to “A Beginner’s Guide to Android” Google I/O 2010 talk:

1. Avoid creating objects unless you really need to, try reusing Android’s API instead. Creating an object in a “desktop” world is relatively cheap, however in such a resource constraint environment as a mobile phone, it will drastically impact the performance.. not in a good way.

2. In order to avoid friendly “Force Close” popups from your applications, use Android’s “AsyncTask” which will allow to execute a certain activity in background:

 private class DownloadFilesTask extends AsyncTask<URL, Integer, Long> {
     protected Long doInBackground(URL... urls) {
         int count = urls.length;
         long totalSize = 0;
         for (int i = 0; i < count; i++) {
             totalSize += Downloader.downloadFile(urls[i]);
             publishProgress((int) ((i / (float) count) * 100));
         return totalSize;
     protected void onProgressUpdate(Integer... progress) {
     protected void onPostExecute(Long result) {
         showDialog("Downloaded " + result + " bytes");

read more about AsyncTask in Android Developer Guide

3. Think about what an absolute minimum amount of updates / syncs you can do, and stick to this minimum. This will greatly improve battery life as well resource usage by the application.

4. Only use a WakeLock when you need one with as minimum level as possible: PARTIAL_WAKE_LOCK, SCREEN_DIM_WAKE_LOCK, FULL_WAKE_LOCK. Here is more about PowerManager

5. Respect a “Back” button: make sure it actually brings user back to a previous state rather than to another state of the application’s flow.

6. Always check whether “data transfer” is enabled on a device, before attempting to transfer data:

ConnectivityManager cm= ( ConnectivityManager ) getSystemService( Context.CONNECTIVITY_SERVICE );
boolean backgroundEnabled = cm.getBackgroundDataSetings();

this is especially important while roaming, when that “twitter update” can cost user a lot. So do respect user settings.

7. Don’t use undocumented ( not officially supported ) APIs => Next Android release your app is going to break.

8. Respect and use an application lifecycle:

void onCreate(Bundle savedInstanceState)
void onStart()
void onRestart()
void onResume()
void onPause()
void onStop()
void onDestroy()

read more about Android Application Lifecycle

9. Externalize resources: localization / optimized layouts / strings / array of strings / etc.. Android compiles them into a list of internal resources by assigning an integer ID to each of them, hence making it “cheaper” at runtime, and easier to change => since they are defined in a single location. Here is an example of auto generated ( random layout ) resources:

    public static final class layout {
        public static final int about_details=0x7f030000;
        public static final int all_upcoming_filtered_buttons=0x7f030001;
        public static final int details_text=0x7f030002;
        public static final int details_webview=0x7f030003;
        public static final int faq_details=0x7f030004;
        public static final int faq_details_row=0x7f030005;
        public static final int main_tabs=0x7f030006;
        public static final int map_details=0x7f030007;
        //.... the above is auto generated

10. Think of hiring or delegating UI to people who are designers. Beauty is important

11. Be resolution independent. Check out “layoutopt” and “hierarchyviewer” that come with Android SDK under “tools”. They help analyzing and optimizing layouts.

12. Consider using a “non sticky” services when appropriate:

public int onStartCommand( Intent intent, int flags, int startId ) {
     handleCommand( intent );
    // If this Service gets killed, don't restart it
    return START_NOT_STICKY;

this is useful for services that are going to be executed on a regular basis. So when you are pulling for updates every 10, 15 minutes, it is ok if one of such updates is missed in favor to a healthy resource management

13. Do not use foreground services unless you absolutely need to.

And if you do use foreground services, use an ongoing notification ( which, starting from Android 2.0, used automatically, if a service is started as a foreground one, but just to keep something in mind to be used for older OS versions )

14. Kill your own services via stopSelf():

prtoceted void onPostExecute( Void result ) {

15. Use Intents and Intent Filters to Leverage Other Apps

16. Prefer Alarms and Intent Receivers to Services

Now, with this in mind, go and Rock That Android Space!

Jan 11

Android: Prefer Alarms and Intent Receivers to Services

Cyanogen LogoContinue learning from “A Beginner’s Guide to Android” Google I/O 2010 talk, here is an example on how to use Intent Filter + Intent Receiver + Alarm to implement “schedule to execute every so often” functionality.

In Android, Alarms let you set an event to happen either in the future or on an ongoing basis in the future.

Let’s say we have a listener ( extends BroadcastReceiver ) that would execute a certain action ( MyService ):

public class MyReceiver extends BroadcastReceiver {
    public void onReceive( Context context, Intent intent ) {
        Intent myIntent = new Intent( context, MyService.class );
        context.startService( myIntent );

Now let’s connect/map this listener/receiver to a “REFRESH_THIS” Intent by creating an intent filter in a manifest file:

<receiver android:name="MyReceiver">
        <action android:name="REFRESH_THIS"/>

So whenever a system broadcasts a “REFRESH_THIS” intent, MyReceiver is going to spawn up and a “context.startService( myIntent )” is going to be executed.

Now in order to schedule “REFRESH_THIS” intent to be broadcasted, we would use an AlarmManager:

String alarm = Context.ALARM_SERVICE;
AlarmManager am = ( AlarmManager ) getSystemService( alarm );
Intent intent = new Intent( "REFRESH_THIS" );
PendingIntent pi = PendingIntent.getBroadcast( this, 0, intent, 0 );
int type = AlarmManager.ELAPSED_REALTIME_WAKEUP;
long interval = AlarmManager.INTERVAL_FIFTEEN_MINUTES;
long triggerTime = SystemClock.elapsedRealtime() + interval;
am.setRepeating( type, triggerTime, interval, pi );

The above Alarm will wake up a device every 15 minutes and execute MyReceiver’s onReceive() method. The cool thing is that even if your application is killed, this alarm will continue to run without your application running on the background consuming resources.

One thing to note..

Prefer Inexact Alarms …so OS can optimize when the alarm goes off

Why!?.. Let’s say there are 15 applications that set 15 alarms, which take a minute each to execute, and are all scheduled to be executed with a 15 minute interval => Potentially ( depending on the time they’ve been scheduled at ) they can end up executing every minute ( one after another ) resulting in Android device to be constantly on which is a dramatic impact at the resource usage.

“Inexact Alarms” would let OS “phase shift” these alarms to execute at the same time, rather than being arbitrary distributed depending on the time they were scheduled at. This allows OS to optimize and allocate resources in more intelligent fashion. So if you have something that needs to happen regularly, but does not need to happen at exact time, use “Inexact Alarms”.

In the above “alarm example”, in order to use Inexact Alarm change this line:

am.setRepeating( type, triggerTime, interval, pi );


am.setInexactRepeating( type, triggerTime, interval, pi );

Now the alarm will rely on the OS to optimize its execution time.

Jan 11

Android: Using Intents and Intent Filters to Leverage Other Apps

Cyanogen LogoUsing “Intent Filters” is a very powerful way to connect different applications together which allows greater reuse and makes a user experience transparent to the fact that more than one application is used to achieve a certain task.

Here is an example that is discussed in “A Beginner’s Guide to Android” Google I/O 2010 talk.

Let’s say there is an application that finds hotels and would like to use another application to book it. For that it creates an implicit “Intent” where it says: “hey android, I intent to book this hotel, please find an application that is capable of booking it, and pass the data to do the booking”:

String action = "com.hotelapp.ACTION_BOOK";
String hotel = "hotel://name/" + selectedHotelName;
Uri data = Uri.parse( hotel );
Intent bookingIntent = new Intent( action, data );
startActivityForResult( bookingIntent );

Now let’s say there is such a booking app installed on a device. Then, in its manifest, it will announce itself through the “intent-filter”, as an application capable to perform this action, e.g. “com.hotelapp.ACTION_BOOK”, to get the data from another app and book a hotel:

<activity android:name="Booking" android:label="Book">
        <action android:name="com.hotelapp.ACTION_BOOK"/>
        <activity android:scheme="hotel"

Within the “Activity” of a booking app, you can then “getIntent()” to find the one that was used, get the action along with the data and do the booking:

public void onCreate( Bundle savedInstanceState ) {
    super.onCreate( savedInstance );
    setContentView( r.layout.main );
    Intent intent = getIntent();
    String action = intent.getAction();
    Uri data = intent.getData();
    String hotelName = data.getPath();
    // Providing booking functionality
    // ... ...
    setResult( RESULT_OK, null );

THE TAKE AWAY: Think about exposing functionality ( can be full of partial workflows ) of your apps in order for other developers / apps to use. At the same time check what is already exposed for you to leverage from other developers / apps.

P.S. Read more about Intents and Intent Filters