Just Say Yes to NoSQL Part i

As the popularity of data virtualization continues to rise, companies are increasingly relying on data storage and retrieval mechanisms like NoSQL to extract tangible value out of the voluminous amounts of data available today.

This first blog in a 3-part series will explain why we believe NoSQL is among the best database technologies for sorting vast amounts of data. The second part will explore the different categories of NoSQL databases, and the final part will focus on how to zero in on the best NoSQL database solution.

Part 1: Reasons in Support of Adopting NoSQL

The need for data management has increased over the last few years, owing to a surge of interactive web and mobile applications being used by enterprises irrespective of their size. Emerging technologies like Big Data and Cloud Computing have driven the adoption of NoSQL technology.

Big Data:

Personal user information, geo location data, social media activities, and sensor-generated data are just a few examples of the ever-expanding array of data being captured. Capturing and interpreting the vast amount of structured and unstructured formats of data is now bolstering the analytical capabilities of enterprises and enabling to create innovative products that power new routes to revenues.

Any changes in the content structure can be a cause of concern if the database is rigid and cannot accommodate new data types. Developers must be provided with a database that is extremely flexible to handle even unstructured formats of data which are a norm today and increasingly being used by companies for analytical purposes. Relational database management system often use a schema-based approach which turns out to be a mismatch when it comes to processing unstructured formats of data types. It is precisely here that NoSQL comes in handy as it can use varying data types with ease.
NoSQL (Not Only SQL)

To its advantage, NoSQL works well with distributed data stores such as Google and Facebook where large data types are handled and need to be stored. Such kind of voluminous data types may not require having join operations, fixed schema and horizontal scaling.

Key Characteristics of NoSQL Databases

1. Distributed Computing (Scalability, Reliability, Sharing of Resources, Performance) NoSQL databases are distributed, can scale horizontally and handle large data volumes of several terabytes or petabytes, with low latency.

The rise in parallel users and the volume of data requires web and mobile applications and supporting databases to scale accordingly using, something which can be achieved using the following two methods:

a. Vertical Scaling (or scale up) – This implies adding resources to a single node by deploying additional CPU or increasing the memory storage.

Scaling Up with Relational Databases: To support a large number of concurrent users and/or store more data, you need big servers with additional CPUs to handle the workload, more memory, and more disk storage to keep all the tables. Installing and maintaining big servers is a complex process that also adds to capital expenditure, unlike the low-cost, commodity hardware typically used at the web/application server tier of RDBMS.

b. Horizontal Scaling (or scale out) – Scale out is related to adding on to the nodes on a system. E.g. – Adding a new computer to a distributed software application.

Scale Out with NoSQL Database: Scaling out takes recourse to a cluster of servers for storing data and supporting database operations. They use a cluster of servers to store data and support database operations. The cluster is expanded by adding additional servers so as to spread the database operations to a larger cluster. Since commodity servers are susceptible to failure, NoSQL databases are designed to withstand and recover from such recurring failures, making them highly resilient.

NoSQL databases provide a much easier, linear approach to database scaling. They are best suited to counter a sudden spike in new user activity. To handle unexpected spikes in volumes, all one has to do is add a new database server to expand the size of the cluster.

2.  More Flexible Data Model NoSQL databases follow one of the following data models/stores:

a. Key Value Stores

b. Document Stores

c. Column Based Stores

d. Graph Databases

e. XML Databases

 

NoSQL databases Model
3. Asynchronous Inserts & Updates/Weak Transactional – Full transactional guarantees and simultaneous completion of transactions at all nodes in the distributed environment are not provided by NoSQL databases. Instead it guarantees the availability of the data at the distributed level (by some internal process of synchronization). This is the reason why NoSQL is a perfect model to apply for instances like social media applications where simultaneous transactions are not a constraint.

4. Follows BASE/CAP instead of ACID – Instead of ACID, NoSQL databases more or less follow something called “BASE” (Basically Available, SoftState, Eventual Consistency). All NoSQL databases relax one or more of the ACID properties (CAP theorem). For example, when no updates occur for a certain period of time(could be few seconds), eventually all updates may propagate through the system depending on load, cluster size and network traffic which will make all the nodes consistent.

However, eventually consistent tools are hardly applicable in strict scenarios like banking applications. In these cases, a good idea could be to use in-memory, column-oriented and distributed SQL/ACID databases, like VoltDB.

5. Query Language – These databases do not support SQL unlike in relational databases. However, few NoSQL databases support some other form of query language, like CouchDB uses JSON to store data and JavaScript as its query language.

6. NoJoins – NoSQL databases don’t use the concept of joins.

7. Low Cost – Uses clusters of cheap commodity servers instead of proprietary servers to manage the exploding data and transaction volumes.

8. Easy Implementation – provides schema flexibility and less complicated relationships unlike in RDBMS.

9. Good for scenarios which mostly require querying/searching (but not complex search or analytics) and very few or no updates.

NoSQL AdvantagesNoSQL Disadvantages
High ScalabilityToo many options (Above 150), which one to pick.
Schema FlexibilityLimited query capabilities (so far)
Distributed Computing (Reliability, Scalability, Sharing of Resources, Speed)Eventual consistency is not intuitive to progran for strict scenarios like banking applications.
No complicated relationshipsLacks Joins, Group by, Order by facilities
Lower cost (Hardware Costs)ACID transactions
Open Source – All of the NoSQL options with the exceptions of Amazon S3 (Amazon Dynamo) are open-source solutions. This provides a low-cost entry point.Limited guarantee of support – Open source

 

FeatureNoSQLRDBMS
Data VolumeHandles Huge Data VolumesHandles Limited Data Volumes
Data ValidityHighly GuaranteedLess Guaranteed
ScalabilityHorizontallyHorizontally & Vertically
Query LanguageNo declarative query languageStructured Query Language (SQL)
SchemaNo predefined schema or less rigid schemasPredefined Schema (Data Definition Laguage & Data Manipulation Language)
Data TypeSupports unstructured and unpredictable dataSupports relational data and its relationships are stored in separate tables
ACID/BASEBased on BASE principle (Basically, Available, Soft State, Eventually Consistent)Based on ACID principle (Atomicity, Consistency, Isolation and Durability)
Transaction ManagementWeaker transactional guaranteeStrong transactional guarantees
Data Storage TechniqueSchema-free collections are utilized to store different typesand document structures, such as {“color”, “blue”} and {“price”, “23.5”} can be stored within a single collection.No collections are used for data storage; instead use DML for it.

CAP Theorem

The following guarantees are not available in a distributed system, as per the CAP theorem:

  • Consistency – All nodes view the same data at the same time. Data in the database remains consistent after the execution of an operation..
  • Availability – A guarantee that every request receives a response about whether it was successful or failed. In other words the system is always up with no downtime.
  • Partition Tolerance – The system continues to operate despite failure of part of the system. The servers may be partitioned into multiple groups that cannot communicate with every other group. The network can break into two or more parts, each with active systems that cannot influence other parts.

CAP provides the basic requirements for a distributed system to follow 2 of the 3 requirements. To scale out, you have to partition.

NoSQL database follows different combinations of the CAP theorem. Here is an elaborate description of 3 such combinations::

CAP Theorem

CA – All nodes will remain in contact as a result of the single site cluster and any partition will block the system.

CP – Under this arrangement, some data might not be accessible but consistency and accuracy are not compromised. There is no need for distributed concurrency control as well..

AP – With the AP approach, the returned data might not be accurate but system will be available in spite of any partitioning.  This is best suited for replication needs and fault tolerance.

In the next post on NoSQL databases, we will explore the different categories of NoSQL databases that are most popular with software developers today.

Girish Kumar

Girish Kumar

Technical Lead

Girish Kumar is a Technical Lead at 3Pillar Global and the head of our Java Competency Center in India. He has been working in the Java domain for over 8 years and has gained rich expertise in a wide array of Java technologies including Spring, Hibernate and Web Services. In addition, he has good exposure in implementation of complete SDLC using Agile and TDD methodology. Prior to joining 3Pillar Global, Girish was working with Cognizant Technology Solutions for more than 5 years. Over there he has worked for some of the biggest names in the Banking and Finance verticals in U.S. & U.K.

Girish’s current challenges at 3Pillar include getting the best out of Apache Hadoop, NoSQL and distributed systems. He provides day-to-day leadership to the members of the Java Competency Center in India by enforcing best practices and providing technical guidance in key projects.

4 Responses to “Just Say Yes to NoSQL Part i”
  1. Adlen G. on

    Great article, it really helped us to switch to NoSQL

    Reply
  2. Santo on

    Great article.. thanks for putting the details together

    Reply
  3. Manish on

    So based on this, can it be said that No SQL would not be best for banking transactions or for stock market transactions?

    Reply
Leave a Reply

Related Posts

3 Topics We Should Be Talking About at BrainstormTech Our 3Pillar clients, regardless of industry, share one common trait - they need help strategizing, designing, and delivering revenue-generating digita...
4 Reasons Everyone is Wrong About Blockchain: Your Guide to ... You know a technology has officially jumped the shark when iced tea companies decide they want in on the action. In case you missed that one, Long Isl...
3 Cloud Optimization Projects That Will Pay for Themselves i... AWS introduced 1,430 new features and tools in 2017, including 497 in the 4th quarter alone. This means that it can be a challenge for even the mos...
The Connection Between Innovation & Story On this episode of The Innovation Engine, we'll be looking at the connection between story and innovation. Among the topics we'll cover are why story ...
Go Native (App) or Go Home, and Other Key Takeaways from App... I just returned from my first WWDC. I feel like I learned more in a week at Apple’s annual developer’s conference than I have in years of actually dev...