High Availability and Automatic Failover in Hadoop

Vikas Kukreti

BY Vikas Kukreti

Technical Lead

Hadoop in Brief

Hadoop is one of the most popular sets of big data processing technologies/frameworks in use today. From Adobe and eBay to Facebook and Hulu, Hadoop is used by a wide range of companies for everything from machine learning and natural language processing to data analysis and ad optimization. It is now a common definition of Hadoop that it caters to “the 3 V’s” – volume, velocity, and variety. With use cases in analytics, search, log processing, analysis, and recommendation engines, Hadoop is commonly used in applications that are data-intensive.

The Hadoop Distributed File System (HDFS) is a reliable Hadoop shared storage. It scales horizontally, which means we can add as many nodes/servers without shutting down the system, which with commodity hardware can be very cost effective. It can be used to process both structured as well as unstructured data. It is highly optimized for very large sets of data.

The Hadoop Landscape

According to Forbes, 53% of companies in one recent study were employing big data analytics, which was “defined as systems that enable end-user access to and analysis of data contained and managed within the Hadoop ecosystem.” For a list of companies that use Hadoop and short descriptions of how they use it, see this Powered By page on the Hadoop Wiki.

Problems with the Hadoop Framework

Prior to Hadoop 2.0, i.e. Hadoop 1.0, there was a single point of failure (SPOF) in NameNode. By this, we mean if the NameNode fails, then whole Hadoop system/Cluster goes down. Then it can only be recovered manually with the help of a secondary NameNode which results in overall downtime of the Hadoop Cluster, i.e. the Hadoop cluster would not be available unless the Hadoop administrator restarts the NameNode.

Solution for SPOF

The shortcoming in Hadoop 1.0 was overcome in Hadoop 2.0 by providing support for two NameNodes. It introduces the Hadoop 2.0 High Availability feature that brings in an extra NameNode (Passive Standby NameNode) to the Hadoop Architecture, which is configured for automatic failover. Now Hadoop has its built-in fault tolerance. Data is replicated across multiple nodes by means of replication factor, which can be configured at the admin end and if a node (Active NameNode) goes down, the required data can be read from another node (Standby NameNode), which has the copy of that data by courtesy of Journal nodes.

Why/How can NameNode go down?

  1. Suppose you gave your cluster to another developer who did not understand the performance issues and ran a simple directory traversal, which gave a massive load to NameNode such that it crashed, hence the entire cluster goes down.
  2. If someone is trying to process a large number of small files, Hadoop performance can get a hit and NameNode can go down.
  3. Latency spikes due to garbage collection on the NameNode can bring NameNode down.
  4. If the root partition ran out of space, NameNode can crash with a corrupted edit log.
  5. It’s often a requirement that HDFS requires commercial NAS to which the NameNode can write a copy of its edit log. The entire cluster goes down any time the NAS is down because the NameNode needs to hard-mount the NAS.

Enabling High Availability and Automatic Failover

If you are running Hadoop and want to ensure stability and reliability of your cluster, one can overcome it by facilitating Hadoop’s high availability and automatic failover options. The problem this addresses is that without using high availability and automatic failover, any time the NameNode goes down, the entire system goes down. You can think of high availability and automatic failover as your insurance policy against a single point of failure.

Approx Downtime during Automatic Failover

As per Hortonworks:

  • A 60 node cluster with 6 million blocks using 300TB raw storage, and 100K files: 30 seconds. Hence total failover time ranges from 1-3 minutes.
  • A 200 node cluster with 20 million blocks occupying 1PB raw storage and 1 million files: 110 seconds. Hence total failover time ranges from 2.5 to 4.5 minutes.

Is automatic failover more expensive then SPOF?

Split-brain scenario

Besides a small downtime using HA, the other important point to be noted is that it is important for an HA cluster that only one of the NameNodes is active at a time. Otherwise, the namespace state would be divided between the two, risking data loss or other incorrect results. In order to ensure this property and prevent the so-called “split-brain scenario”.
In this case, multiple master nodes think they’re in charge of the cluster, is a “time skew” on the order of perhaps 10 or more seconds between the current master node and the other master. This is however taken care by JournalNodes, they will only allow a single NameNode to be a writer at a time. During a failover, the Standby NameNode which is to become active will play the role of writing to the JournalNodes, and prevent the other NameNode from continuing in the Active state, allowing the new Active NameNode to successfully take the failover

The video tutorial

This video tutorial was created for Hadoop administrators or database administrators responsible for administering their company’s Hadoop clusters. It assumes intermediate knowledge of Hadoop and provides a step-by-step walkthrough of how to enable high availability and automatic failover using Zookeeper.

Configuration details after HA implementation

The following changes are made in the files after implementing HA in Hadoop cluster:






Automatic Failover in Hadoop starts automatically in case of NameNode crashdown. Automatic failover adds ZooKeeper quorum and ZKFailoverController Process (ZKFC) components to an HDFS deployment. Work is going on to make Automatic failover a better process, like without starting the HDFS service and to reduce the HA time.

As per Cloudera, work was recently started to allow Hadoop configuration changes without a restart.


A number of changes (eg HDFS-1070, HDFS-1295, and HDFS-1391) are under way to significantly improve the time it takes to restart HDFS.

About The Author

Vikas Kukreti is working as a Technical Lead in Development Engineering at 3Pillar Global. Vikas has 9 years of database design, ETL, DWH, AWS, development and Data Science experience. He has database architecture experience in areas such as ETL, Data warehousing, Oracle Database, Hadoop, Hive, and Data Modeling. He has strong knowledge of Big Data, Data Science and Cloud Computing. Vikas is passionate about big data technology and Data Science, especially Hadoop and MongoDB. Vikas is a graduate of Uttar Pradesh Technical University (UPTU), India, and is a keen reader of classic novels and a movie freak.

Leave a Reply

Related Posts

The Road to AWS re:Invent 2018 – Weekly Predictions, P... For the last two weeks, I’ve been making predictions of what might be announced at AWS’ upcoming re:Invent conference. In week 1, I made some guesses ...
The Road to AWS re:Invent 2018 – Weekly Predictions, P... Last week I made the easy prediction that at re:Invent, AWS would announce more so-called ‘serverless’ capabilities. It’s no secret that they are all-...
The Road to AWS re:Invent 2018 – Weekly Predictions, P... Every year in Las Vegas, AWS holds their biggest conference of the year. Tens of thousands descend upon the desert. I heard numbers last year of about...
Why 9.5 out of 10 Great Ideas Fail to Achieve Product Succes... You know your market inside and out. You’ve cracked the nut. You’ve solved the problem. You know that this is the one. You’ve come up with the next br...
4 Reasons Everyone is Wrong About Blockchain: Your Guide to ... You know a technology has officially jumped the shark when iced tea companies decide they want in on the action. In case you missed that one, Long Isl...