Concept Four: For map-reduce operations, take advantage of ‘data locality’
In the context of a distributed system data locality describes how it is more expensive to transfer data to the node that knows how to process it rather than taking the computation to the node that holds the data.
For map-reduce operations Riak provides data locality for map operations – ie. all mapping will be performed on the nodes that hold the objects subjected to the map phases. Contrast this to reduce phases, which will be always run on the ‘coordinating node’ for the map-reduce operation. For more details on how map-reduce operations work with Riak, read on Basho’s Riak map-reduce page:
Note that in the context of our application one can tune the amount of filtering you do on the blog objects with the amount of a scan’s object size to the expense of the number of get operations you’ll perform on the scans bucket. For example, if you decide to store a month’s worth of scanning information in one scan bucket, you’ll do approximately 30 times more filtering and 30 times less fetching from that bucket. Always time your queries as you go along with the project. Figure out a good balance from the metrics in the non-functional requirements – number of users, expected growth rate etc.
Concept Five: Serialize object values in a format that can be processed by code that lives on cluster nodes.
As mentioned above, getting to the objects that are subject to a computation can be done by using Riak’s capability of chaining map operations. If you choose to do so there’s a good chance that with every map operation you’ll have to take a peek at the objects’ values involved. Map operations are performed on the nodes that hold the data – that means that the nodes must know at least how to deserialize the data stored in an object.
Hopefully this blog series will prove useful in starting with NoSQL in terms of data modeling and data aggregation. The code provided should also shed some light on doing map-reduce queries with Riak’s Erlang client.