Just Say Yes to NoSQL Part i

As the popularity of data virtualization continues to rise, companies are increasingly relying on data storage and retrieval mechanisms like NoSQL to extract tangible value out of the voluminous amounts of data available today.

This first blog in a 3-part series will explain why we believe NoSQL is among the best database technologies for sorting vast amounts of data. The second part will explore the different categories of NoSQL databases, and the final part will focus on how to zero in on the best NoSQL database solution.

Part 1: Reasons in Support of Adopting NoSQL

The need for data management has increased over the last few years, owing to a surge of interactive web and mobile applications being used by enterprises irrespective of their size. Emerging technologies like Big Data and Mobile have driven the adoption of NoSQL technology.

Big Data:

Personal user information, geo location data, social media activities, and sensor-generated data are just a few examples of the ever-expanding array of data being captured. Capturing and interpreting the vast amount of structured and unstructured formats of data is now bolstering the analytical capabilities of enterprises and enabling to create innovative products that power new routes to revenues.

Any changes in the content structure can be a cause of concern if the database is rigid and cannot accommodate new data types. Developers must be provided with a database that is extremely flexible to handle even unstructured formats of data which are a norm today and increasingly being used by companies for analytical purposes. Relational database management systems often use a schema-based approach which turns out to be a mismatch when it comes to processing unstructured formats of data types. It is precisely here that NoSQL comes in handy as it can use varying data types with ease.
NoSQL (Not Only SQL)

To its advantage, NoSQL works well with distributed data stores such as Google and Facebook where large data types are handled and need to be stored. Such kind of voluminous data types may not require having join operations, fixed schema and horizontal scaling.

Key Characteristics of NoSQL Databases

1. Distributed Computing (Scalability, Reliability, Sharing of Resources, Performance) NoSQL databases are distributed, can scale horizontally and handle large data volumes of several terabytes or petabytes, with low latency.

The rise in users and the volume of data requires web and mobile applications and supporting databases to scale accordingly using, something which can be achieved using the following two methods:

a. Vertical Scaling (or scale up) – This implies adding resources to a single node by deploying additional CPU or increasing the memory storage.

Scaling Up with Relational Databases: To support a large number of concurrent users and/or store more data, you need big servers with additional CPUs to handle the workload, more memory, and more disk storage to keep all the tables. Installing and maintaining big servers is a complex process that also adds to capital expenditure, unlike the low-cost, commodity hardware typically used at the web/application server tier of RDBMS.

b. Horizontal Scaling (or scale out) – Scale out is related to adding on to the nodes on a system. For example, adding a new computer to a distributed software application.

Scale Out with NoSQL Database: Scaling out takes recourse to a cluster of servers for storing data and supporting database operations. They use a cluster of servers to store data and support database operations. The cluster is expanded by adding additional servers so as to spread the database operations to a larger cluster. Since commodity servers are susceptible to failure, NoSQL databases are designed to withstand and recover from such recurring failures, making them highly resilient.

2. More Flexible Data Model NoSQL databases follow one of the following data models/stores:

a. Key Value Stores

b. Document Stores

c. Column Based Stores

d. Graph Databases

e. XML Databases

NoSQL databases Model
3. Asynchronous Inserts & Updates/Weak Transactional – Full transactional guarantees and simultaneous completion of transactions at all nodes in the distributed environment are not provided by NoSQL databases. Instead it guarantees the availability of the data at the distributed level (by some internal process of synchronization). This is the reason why NoSQL is a perfect model to apply for instances like social media applications where simultaneous transactions are not a constraint.

4. Query Language – These databases do not support SQL unlike in relational databases. However, few NoSQL databases support some other form of query language, like CouchDB uses JSON to store data and JavaScript as its query language.

5. NoJoins – NoSQL databases don’t use the concept of joins.

6. Low Cost – Uses clusters of cheap commodity servers instead of proprietary servers to manage the exploding data and transaction volumes.

7. Easy Implementation – provides schema flexibility and less complicated relationships unlike in RDBMS.

8. Good for scenarios which mostly require querying/searching (but not complex search or analytics) and very few or no updates.

NoSQL AdvantagesNoSQL Disadvantages
High ScalabilityToo many options (Above 150), which one to pick.
Schema FlexibilityLimited query capabilities (so far)
Distributed Computing (Reliability, Scalability, Sharing of Resources, Speed)Eventual consistency is not intuitive to program for strict scenarios like banking applications.
No complicated relationshipsLacks Joins, Group by, Order by facilities
Lower cost (Hardware Costs)ACID transactions
Open Source – All of the NoSQL options with the exceptions of Amazon S3 (Amazon Dynamo) are open-source solutions. This provides a low-cost entry point.Limited guarantee of support – Open source
ScalabilityHorizontallyHorizontally & Vertically
Query LanguageNo declarative query languageStructured Query Language (SQL)
SchemaNo predefined schema or less rigid schemasPredefined Schema (Data Definition Language & Data Manipulation Language)
Data TypeSupports unstructured and unpredictable dataSupports relational data and its relationships are stored in separate tables
ACID/BASEBased on BASE principle (Basically, Available, Soft State, Eventually Consistent)Based on ACID principle (Atomicity, Consistency, Isolation and Durability)
Transaction ManagementWeaker transactional guaranteeStrong transactional guarantees
Data Storage TechniqueSchema-free collections are utilized to store different types and document structures, such as {“color”, “blue”} and {“price”, “23.5”} can be stored within a single collection.No collections are used for data storage; instead use DML for it.

CAP Theorem

The following guarantees are not available in a distributed system, as per the CAP theorem:

  • Consistency – All nodes view the same data at the same time. Data in the database remains consistent after the execution of an operation..
  • Availability – A guarantee that every request receives a response about whether it was successful or failed. In other words the system is always up with no downtime.
  • Partition Tolerance – The system continues to operate despite failure of part of the system. The servers may be partitioned into multiple groups that cannot communicate with every other group. The network can break into two or more parts, each with active systems that cannot influence other parts.

CAP provides the basic requirements for a distributed system to follow 2 of the 3 requirements. To scale out, you have to partition.

NoSQL database follows different combinations of the CAP theorem. Here is an elaborate description of 3 such combinations::

CAP Theorem

CA – All nodes will remain in contact as a result of the single site cluster and any partition will block the system.

CP – Under this arrangement, some data might not be accessible but consistency and accuracy are not compromised. There is no need for distributed concurrency control as well..

AP – With the AP approach, the returned data might not be accurate but system will be available in spite of any partitioning.  This is best suited for replication needs and fault tolerance.

In the next post on NoSQL databases, we will explore the different categories of NoSQL databases that are most popular with software developers today.

Girish Kumar

Girish Kumar

Technical Lead

Girish Kumar is a Technical Lead at 3Pillar Global and the head of our Java Competency Center in India. He has been working in the Java domain for over 8 years and has gained rich expertise in a wide array of Java technologies including Spring, Hibernate and Web Services. In addition, he has good exposure in implementation of complete SDLC using Agile and TDD methodology. Prior to joining 3Pillar Global, Girish was working with Cognizant Technology Solutions for more than 5 years. Over there he has worked for some of the biggest names in the Banking and Finance verticals in U.S. & U.K.

Girish’s current challenges at 3Pillar include getting the best out of Apache Hadoop, NoSQL and distributed systems. He provides day-to-day leadership to the members of the Java Competency Center in India by enforcing best practices and providing technical guidance in key projects.

7 Responses to “Just Say Yes to NoSQL Part i”
  1. Adlen G. on

    Great article, it really helped us to switch to NoSQL

  2. Santo on

    Great article.. thanks for putting the details together

  3. Vivek Sharma on

    Nice Article…but am a bit confused on CAP Theoram. As per my understanding, for any distributed architecture, Network is not in our hand and therefore, Partition Tolerance has to be one of the factor. With P as a mandatory rule, any distributed architecture can achieve either of Consistency or Availability. If everything is working fine, all the three requirements CAP can be met. However, in case of a Network failure, C or A are the only options left. RDBMS’ enforce Consistency and NoSQL enforce Availablity.

    Please correct me if I am wrong.


  4. Manish on

    So based on this, can it be said that No SQL would not be best for banking transactions or for stock market transactions?

  5. Munish Bansal on

    Very well put together the details which could be quite complex to understand otherwise. Thanks.

  6. Anirudh Khanna on

    Nice content. Very Informative. Thanks for sharing.

Leave a Reply