December 5, 2014

Data stories and MongoDB

There is a lot of fuss in the software development space about big data nowadays. My question is, is it a technique and concept which is involved in capturing, storing, and manipulating large amounts of data, or is the more to big data than that?

We superfluously talk about data preserving, which means storing historic data. Why do we need to do that? Any guesses? If you thought, “For predictive analysis and data mining,” you’re on the right track. To add more to it, we can look at the data’s relationship to data science, statistics, and programing, as well as its usage in marketing, scientific research, and above all the ethical issues that lie behind its use.
So what are some potential innovative applications of big data?

There are many answers, but here are a few:

  1. It can help spot problem areas in a network and add throughput to help prepare for future demand.
  2. It is able to analyze traffic details for various devices.
  3. Big data can give insight into the type of content customers prefer, which enables them to make more accurate suggestions as to what subscribers might like.
  4. is performing DNA processing with the help of big data to help clients make connections. With some saliva in a tube, it can sequence a client’s DNA and match the client with other people in its database, like distantly removed cousins.
  5. A medical institute in the US is using big data in research that includes more than 1 million DNA variants in an effort to understand why some strains develop resistance to antibiotics.
  6. Las Vegas is using big data to aggregate data from various sources into a single real-time 3D model. The model includes both above and below ground utilities, and it is being used to visualize the location and performance of critical assets located under the city.

The above six points articulate the magnificence of big data. Let’s now move on to some specific flavors that big data uses. Now let me shed some light on the insights of MongoDB.


Most of you may know what MongoDB is, but just to brief, MongoDB is an open-source document database that provides high performance, high availability, and automatic scaling.

High performance and high availability are two things every other database talks about it, but what is automatic scaling? MongoDB’s key ingredient is automatic scalability, which is also known as horizontal scalability by two main features.

  1. Sharding, which is automatic in nature by default, distributes data across the cluster of machines.
  2. For low-latency, high throughput deployments Replica sets are used.

Let’s discuss the above two steps in detail:

Sharding: Sharding is a method for storing data across multiple machines. Larger data sets exceed the storage capacity of a single machine. Finally, working with set sizes larger than the system’s RAM stresses the I/O capacity of disk drives. To address these issues of scale, big data systems have a basic approach to handling large amounts of data, and that is sharding.

Sharding in MongoDB: Sharding is a horizontal scaling that, by contrast, divides the data set and distributes the data over multiple servers, or shards. Each shard is an independent database, and collectively, the shards make up a single logical database. MongoDB supports sharding through the configuration of sharded clusters.

Replication: Replication is the process of synchronizing data across multiple servers. Replication provides redundancy and increases data availability. With multiple copies of data on different database servers, replication protects a database from the loss of a single server. With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup.

Replication in MongoDB: A replica set is a group of MongoDB instances that host the same data set. One MongoDB, the primary, receives all write operations. All other instances, secondaries, apply operations from the primary so that they have the same data set.

To conclude, a MongoDB deployment hosts a number of databases. A database holds a set of collections. A collection holds a set of documents. A document is a set of key-value pairs. Documents have dynamic schema. Dynamic schema means that documents in the same collection do not need to have the same set of fields or structure, and common fields in a collection’s documents may hold different types of data.