Welcome to part 3 of my series on how to setup a Hadoop single node, pseudo-cluster using Cloudera’s CDH distribution and HUE. In part 1, I walked through the beginning steps to configure a CentOS 6.5 system for use as a single-node Hadoop pseudo-cluster. In part 2, I walked through installing and configuring HBase, Zookeeper and Snappy. Below I’ll present the steps necessary to install the web-based management tool HUE for using with your cluster. (Hue is a trademark of Cloudera but is released under the Apache license version 2). This series is based on the Cloudera documentation, but I’ve modified it to be easier to follow for setting up a pseudo cluster.
Update: Please be careful if you are copying lines from these articles to paste into your Hadoop config files. I have found that the double hyphen characters used in the comment lines may copy over as a long hyphen instead. This is likely to cause issues when attempting to run the various components.
Before you can install HUE on your pseudo server, you need to see if Python 2.6 or Python 2.7 is installed. Open a terminal and enter: python <ENTER>. If Python is installed, the version will be displayed as it starts up. At the Python prompt enter: quit()<ENTER> to exit Python and return to the command prompt. If you have the correct version of Python installed, skip down to the next section, on installing HUE.
If you don’t have Python installed, you’ll need to add the RHEL EPEL (Extra Packages for Enterprise Linux) repository. Open a terminal window and switch to the root account by using “su -” and then enter the following command to install the EPEL repository (if you are running Centos 6 (64-bit)):
rpm -Uvh http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
After adding the repository, you can install Python 2.6 from the command line. As root, enter this command:
yum install python26
Once you’ve verified that you have Python installed or have installed it, you can add HUE to your Hadoop system. Open a terminal window, and switch to the root account. Entering the following command to start the HUE installation:
yum install hue
Any needed dependencies will be located and installed along with HUE, so the time it will take will vary according to that and on your Internet connection, but on my system it took less than ten minutes. Once it completes installation, and provided you had no errors, enter at a command prompt:
service hue start
Reopen a web browser and point it to your system name and port 8888 to access the HUE web interface. So for example, if your system was called HadoopTest, you would enter http://HadoopTest:8888 in your browser. You’ll be prompted for a user ID and password and you’ll be warned that this will become your administrator account, so be careful to remember what you enter.
HUE will open up with the Quick Start window, and probably will be displaying a number of problems with your configuration. That’s all normal, so don’t worry about it. You’ll need to configure HUE and several other applications to get it working properly.
Because we are setting up a pseudo-clustered machine, I’m assuming that you have direct access to the system and are not accessing the Hadoop machine from outside your local firewall. With that consideration, we don’t need to setup HttpFS on this machine.
Once again using the terminal window as root, edit the file:
/etc/hadoop/conf/hdfs-site.xml and add this property before the final </configuration> tag:
Save the file, exit the editor and restart your HDFS system with this command:
for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x restart ; done
Now we need to configure the system to allow Hue to submit requests for any user or group with access to the system. As root from a terminal prompt, edit the file /etc/hadoop/conf/core-site.xml and add these two properties before the final </configuration> tag:
<!– Hue WebHDFS proxy user setting –>
Now edit the file: /etc/hue/conf/hue.ini. Near the top is a section called [desktop] with the first parameter there being secret_key. Again, because this system is not meant to be a production server, you can leave it blank or add a random string of characters after the secret_key=. I recommend adding some characters, because not having any will cause HUE to generate a warning later.
Now search for [[hdfs_clusters]] and look for a comment section below it that starts with “#Use WebHDFS/HttpFs as the communication mechanism“. At the end of that section, add this:
webhdfs_url=http://<full server name with domain>:50070/webhdfs/v1/
substituting the fully qualified domain name for your server where indicated. Save the file and exit back to a terminal prompt.
The Hue plugin that is used to communication with the JobTracker should already be installed as part of the dependencies. You can verify this by changing folders to /usr/lib/hadoop/lib and verifying there is a hue-plugins jar file present.
Restart the HUE server service hue start and at this point the HUE web interface is available, and you can access HBase through it. However, you will see several error messages when you login. We need to work through these next time.