May 9, 2012

Introduction to Data Aggregation with NoSql Databases: Blog Series Part i

The new world of data modelling

Data aggregation in the world of Big Data is changing the way companies deploy products. Thinking about a product in terms of non-relational databases requires a shift in thought in terms of modeling your data. Trying to implement relational data models ‘as-is’ with NoSql will lead to severe performance hits to your application. NoSql data modeling relies on techniques like data flattening, aggregation and use of inverted indexes that defy the relational model paradigm to achieve performance and scalability.

This blog series tries to draw a couple of guidelines for transitioning to a NoSql backed application. We’ll take an application that would have a rather simple data model to work with if it were implemented on top of a relational database and try to model it in the context of a key-value store database, namely Riak. Specific to Riak we’ll draw attention to how MapReduce queries can be used to conduct analytics (do aggregates, ‘counts’ if you will) on the data we manage with our sample application. We will then contrast this approach to a ‘basic operations’based – like &’fetch’ only – implementation from a performance perspective. I’ll also expose some of the downsides of working with MapReduce and try to point out where to draw the line when employing Riak’s MapReduce.

Basic knowledge of Riak and associated NoSql concepts is prefered but not required. Code samples are provided in Erlang. You can learn more about Riak here.