Data stories and MongoDB

There is a lot of fuss in the software development space about big data nowadays. My question is, is it a technique and concept which is involved in capturing, storing, and manipulating large amounts of data, or is the more to big data than that?

We superfluously talk about data preserving, which means storing historic data. Why do we need to do that? Any guesses? If you thought, “For predictive analysis and data mining,” you’re on the right track. To add more to it, we can look at the data’s relationship to data science, statistics, and programing, as well as its usage in marketing, scientific research, and above all the ethical issues that lie behind its use.
So what are some potential innovative applications of big data?

There are many answers, but here are a few:

  1. It can help spot problem areas in a network and add throughput to help prepare for future demand.
  2. It is able to analyze traffic details for various devices.
  3. Big data can give insight into the type of content customers prefer, which enables them to make more accurate suggestions as to what subscribers might like.
  4. Ancestry.com is performing DNA processing with the help of big data to help clients make connections. With some saliva in a tube, it can sequence a client’s DNA and match the client with other people in its database, like distantly removed cousins.
  5. A medical institute in the US is using big data in research that includes more than 1 million DNA variants in an effort to understand why some strains develop resistance to antibiotics.
  6. Las Vegas is using big data to aggregate data from various sources into a single real-time 3D model. The model includes both above and below ground utilities, and it is being used to visualize the location and performance of critical assets located under the city.

The above six points articulate the magnificence of big data. Let’s now move on to some specific flavors that big data uses. Now let me shed some light on the insights of MongoDB.

MongoDB

Most of you may know what MongoDB is, but just to brief, MongoDB is an open-source document database that provides high performance, high availability, and automatic scaling.

High performance and high availability are two things every other database talks about it, but what is automatic scaling? MongoDB’s key ingredient is automatic scalability, which is also known as horizontal scalability by two main features.

  1. Sharding, which is automatic in nature by default, distributes data across the cluster of machines.
  2. For low-latency, high throughput deployments Replica sets are used.

Let’s discuss the above two steps in detail:

Sharding: Sharding is a method for storing data across multiple machines. Larger data sets exceed the storage capacity of a single machine. Finally, working with set sizes larger than the system’s RAM stresses the I/O capacity of disk drives. To address these issues of scale, big data systems have a basic approach to handling large amounts of data, and that is sharding.

Sharding in MongoDB: Sharding is a horizontal scaling that, by contrast, divides the data set and distributes the data over multiple servers, or shards. Each shard is an independent database, and collectively, the shards make up a single logical database. MongoDB supports sharding through the configuration of sharded clusters.

Replication: Replication is the process of synchronizing data across multiple servers. Replication provides redundancy and increases data availability. With multiple copies of data on different database servers, replication protects a database from the loss of a single server. With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup.

Replication in MongoDB: A replica set is a group of MongoDB instances that host the same data set. One MongoDB, the primary, receives all write operations. All other instances, secondaries, apply operations from the primary so that they have the same data set.

To conclude, a MongoDB deployment hosts a number of databases. A database holds a set of collections. A collection holds a set of documents. A document is a set of key-value pairs. Documents have dynamic schema. Dynamic schema means that documents in the same collection do not need to have the same set of fields or structure, and common fields in a collection’s documents may hold different types of data.

Vikas Kukreti

Vikas Kukreti

Technical Lead

Vikas Kukreti is working as a Technical Lead in Development Engineering at 3Pillar Global. Vikas has 9 years of database design, ETL, DWH, AWS, development and Data Science experience. He has database architecture experience in areas such as ETL, Data warehousing, Oracle Database, Hadoop, Hive, and Data Modeling. He has strong knowledge of Big Data, Data Science and Cloud Computing. Vikas is passionate about big data technology and Data Science, especially Hadoop and MongoDB. Vikas is a graduate of Uttar Pradesh Technical University (UPTU), India, and is a keen reader of classic novels and a movie freak.

Leave a Reply

Related Posts

4 Reasons Everyone is Wrong About Blockchain: Your Guide to ... You know a technology has officially jumped the shark when iced tea companies decide they want in on the action. In case you missed that one, Long Isl...
3Pillar Recognized as a Leading Experience Designer by Forre... Fairfax-based product development company named to its second Forrester report in 2018 FAIRFAX, VA (June 18) - Today, 3Pillar Global, a global cust...
3 Cloud Optimization Projects That Will Pay for Themselves i... AWS introduced 1,430 new features and tools in 2017, including 497 in the 4th quarter alone. This means that it can be a challenge for even the mos...
The Connection Between Innovation & Story On this episode of The Innovation Engine, we'll be looking at the connection between story and innovation. Among the topics we'll cover are why story ...
Go Native (App) or Go Home, and Other Key Takeaways from App... I just returned from my first WWDC. I feel like I learned more in a week at Apple’s annual developer’s conference than I have in years of actually dev...