How to Get Started with Hive on Cloudera

Apache Hive is a data warehousing package built on top of Hadoop for providing data summarization, query and analysis. Hive was initially developed by Facebook and was later contributed to the open source community. It is mostly being targeted toward users comfortable with SQL. It is similar to SQL and the query language of Hive is called HiveQL.

Step 1:

Cloudera CDH3 Setup:

CDH is the Open Source Distribution of Apache Hadoop and related projects. CDH delivers the core elements of Hadoop scalable storage and distributed computing – along with additional components such as a user interface, plus necessary enterprise capabilities such as security, and integration with a broad range of hardware and software solutions.

You can download the CDH3 VM file from this link.

Extract the zip file and associate it with your VmWare player.

Step 2:

Click on play virtual machine and login on ClouderaVm as explained below:

Login as:

Username - Cloudera
Password - Cloudera

Hive on Cloudera


Step 3:

Create a folder with any name on the Cloudera Vm desktop. For this example, I have named it himanshuHive.

Hive-Cloudera


Step 4:

  • Open terminal and execute the command:

cloudera@cloudera-vm:/home/cloudera# > sudo su

  • If it prompts for a password, then type: cloudera
  • Now run this command to open the Hive configuration file:

root@cloudera-vm:/home/cloudera# > cd /usr/lib/hive/conf/

root@cloudera-vm:/home/cloudera# > sudo gedit hive-site.xml

Now you would have to copy the path of the folder that we have created. (ie – himanshuHive) into the ConnectionUrl property of the Hive configuration file as below:

Hive on Cloudera


Step 5:

Type this command to enter into Hive shell: sudo hive

Hive on Cloudera

Now you are all set to execute the Hive command and run Hive queries into the Hive shell. Interested in learning more about Apache Hive and Cloudera? See Apache Hive documentation on the Apache Hive home page.