In the first part of this blog series we discussed about the reasons in support of NoSQL databases vis-à-vis relational databases. The second part grappled with the various types of NoSQL databases. Here we are back again, with the third and concluding part that will delve deep into the ‘Selection Criteria for NoSQL Database.’
Let’s cut to the chase and give you the all important factors that you need to take into account before finalizing which NoSQL database is right for your needs:
1. Storage Type – A good indicator towards making the right choice of NoSQL database is its storage type.
2. Concurrency Control- Concurrency control are what defines how two users can simultaneously edit the same bit of information. It happens quite often that one of the user is locked out and is unable to edit or perform other actions till the active user has finished editing.
3. Replication – Replication ensures that mirror copies are always in sync.
4. Implementation Language- Implementation language helps to determine how fast a database will process. Typically NoSQL databases written in low level languages such as C/C++ and Erlang will be the fastest. On the other hand, those written in higher level languages such as Java make customizations easier.
Comparison by Data (Size & Complexity)
Comparison by Type
Category | Description | Name of the Database |
Document Oriented | Data is stored as documents. An example format is FirstName=”Arun”, Address=”St. Xavier’s Road”, Spouse=[{Name:”Kiran”}], Children=[{Name:”Rohit”, Age: 8}] | CouchDB, Jackrabbit, MongoDB, OrientDB, simpleDB,Terrastore, etc. |
XML Database | Data is stored in XML format | BaseX, eXist, MarkLogic Server, etc. |
Graph databases | Data is stored as a collection of nodes, where nodes are analogous to objects in a programming language. Nodes are connected using edges. | AllegroGraph, DEX, Neo4j, FlockDB, Sones GraphDB, etc. |
Key-value store | In Key-value store category of NoSQL database, a user can store data in a schema-less way. A key may be strings, hashes, lists, sets, sorted sets and values are stored against these keys. | Cassandra, Riak, Redis, memcached, Big Table, etc. |
Detailed Comparison
Store | Name | API | Protocol | Query Method | Replication | Written In | CAP Characteristics |
Key Value | Riak | Json | REST | MapReduce | Async | Erlang | High Availability, Partition, Tolerance, Persistence |
Key Value | MemcachedDB | C, Python | Memcache protocol | Memcache pattern | No | C, Python | Consistency, Partition, Tolerance |
Column | HBase | Java | Any Write Call | MapReduce | HDFS | Java | Consistency, Partition, Tolerance, Persistence |
Column | Casandra | CQL and Thrift | Thrift | Casandra query language | Peer-to-Peer | Java | High Availability Partition, Tolerance, Persistence |
Document | MongoDB | BSON | C | Dynamic object based language & MapReduce | Master Slave & Auto-Sharding | C++ | Consistency, Partition, Tolerance, Persistence |
Document | CouchBase | Memcached | Memcached REST interface for cluster configuration | Javascript | Peer-to-Peer | C, C++, Erlang | Consistency, High Availability, Persistence |
GraphBase | Info Grid | Java | OpenID, RSS, Atom, JSON, Java embedded | Web user interface with HTML, RSS, Atom, JSON output, Java native | Peer-to-Peer | Java | High Availability, Partition, Tolerance |
GraphBase | Infinite Graph | Java | Direct Language Binding | Graph Navigation API, Predicate Language Qualification | Peer-to-Peer | Java | High Availability, Partition, Tolerance |
Conclusion
Here is another quick comparison between NoSQL and RDMS:
Opt NoSQL
1. If data is huge, unstructured, sparse/growing
2. Less rigid schema
3. Performance & Availability preferred over Redundancy
4. While scaling out is an out-of-the-box feature, it does not prevent scale up,
5. Cost Effective- uses clusters of cheap commodity servers to manage the exploding data and transaction volumes
Opt RDBMS
1. If Analytics, BI or Reporting is required.
2. For Benefits of ACID
3. Rigid Schema
4. No redundancy allowed
5. Allows Scale up & limited Scale-out (sharding)
6. Expensive- rely on expensive proprietary servers and storage systems
This concludes our 3-part series on NoSQL and the merits in its adoption. We hope you liked reading it. Please leave us with your valuable comments in the box below.
Good article. All the three parts were good and easy to understand
Great 3 blog posts. Helped me get my head around NoSQL. thanks :))
Nice set of articles that introduces a newbie to NoSQL…
Hey Girish,
Excellent work compiling these resources and very informative blog.
I am SSE in an MNC and as the trend goes, am breaking down a huge monolith in smaller micro-services. I am responsible for a use case, which involves user-management for the enterprise level solution. User-management in terms of CRUD of the users, and their access-control (roles and permissions – something like an RBAC solution). We have close to 12M users in production and ours is a read intensive application. User writes are typically, few thousands every day and user read tends to be in order 10M-12M everyday. App is distributed in 50 different clusters across the world.
I am trying to select a new database here for the micro-service and as per my use-case, “audit logging” is really important to me. Is there any of the database, which supports auto audit logging and versioning.
PS: I am heading towards couchbase DB for the implementation and I read that it always creates a new version of the record. But I was hoping if some DB could do it neatly. Using couchbase will involve reading of lot of older version records.
Thanks in anticipation.
Best,
Anshul
anshulsharma1208@gmail.com