This is part 2 of setting up a single-node Hadoop Yarn system for sandbox use. Part 1 was here, or for the series for using MapReduceV1, go here. I’m hoping to keep this series in a similar order as the original set of articles, and will deviate only when necessary. All the content here is based on the Cloudera documentation, but I’ve modified it to be easier to follow for setting up a pseudo cluster and added additional content where necessary.
Please be careful when copying lines from these articles to paste into Hadoop config files or a terminal window. I have found that the double hyphen characters used in the comment lines may copy over as a long hyphen instead. This is likely to cause issues when attempting to run the various components.
- Open a terminal window on the CentOS server and switch to the root user account with: su –
- Enter this command:
yum install hbase
- Type Y when prompted, and hit enter. After a few seconds the Hbase package will be downloaded and installed.
- HBase can use a large number of files at once which may conflict with a system configuration called ulimit that allows a default maximum of 1024 concurrent open files. To increase this limit, as the root user in the terminal window, edit the /etc/security/limits.conf file.
- At the bottom of the file, before the #End of file line, add these two lines:
hdfs – nofile 32768
hbase – nofile 32768
- Restart the system to allow the changes to take effect.
- Edit the /etc/hadoop/conf/hdfs-site.xml file to allow the data node to serve a larger amount of files by adding the following property section to the file. (4096 is the minimum that should be used and xcievers is spelled correctly)
- Restart HDFS by either restarting the server, or use this command:
for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x restart ; done
- At this point HBase is installed in standalone mode, and it should be switched to a pseudo-clustered mode. Install the HBase Master which controls the HBase system,and several other servers. Once again at the command line as root, enter the following:
yum install hbase-master
- Once the install completes, start the service with this command:
service hbase-master start
- Open a web browser and point it to the system name and port 60010. For example http://HadoopTest:60010 . In CDH version 5, you’ll see a website with multiple tabs at the top that you can click through to check out the status of the HBase system.
- Scroll down on the HOME page, and there should be one entry under the Region Servers section. Click on the Region Server link, and the webpage will refresh with the Region Server status information. At the bottom of the page is a link to return to the HBase Master webpage.
- Finally, in the terminal window, to access the HBase command line, enter the command:
- Check the version number of HBase with the command:
- Exit the HBase CLI with the command:
INSTALL A THRIFT SERVER
- Now install a Thrift server and a REST server to communicate with HBase. As root in a terminal window, enter this command:
yum install hbase-thrift
and then start it with:
service hbase-thrift start
- Install the REST server next. At the command line, enter:
yum install hbase-rest
- The HBase REST server by default uses port 8080, which is a commonly used one. It can be left as is, or changed. To alter the port, edit the config file /etc/hbase/conf/hbase-site.xml and add this property section between the configuration tags, using another port in the value section:
- Still within the hbase-site.xml file, add these two additional properties:
- The value for hbase.rootdir’s hostname must match the hostname value in /etc/hadoop/conf/core-site.xml’s fs.default.name or fs.defaultFS property.
- Save the file.
- Create the hbase directory in HDFS and give ownership to the HBase account, by entering these two lines (be sure to use the sudo – u hdfs portion to make sure the command is run by the correct Hadoop user):
sudo -u hdfs hadoop fs -mkdir /hbase
sudo -u hdfs hadoop fs -chown hbase /hbase
- Before the HBase installation is complete, Zookeeper -Server needs to be installed and configured to run on a single server. Install the package by entering this command as root:
yum install zookeeper-server
- Create a Zookeeper folder in the local Linux file system and change the ownership by entering the following:
mkdir -p /var/lib/zookeeper
chown -R zookeeper /var/lib/zookeeper/
- Since this is a first time installation of Zookeeper-server, initialize and start Zookeeper, by entering these commands as root:
service zookeeper-server init
service zookeeper-server start
(You’ll receive a message about specifying a myid if you are running in non-standalone mode. Its safe to ignore it).
- Please Note: For a production system Hadoop should be installed with an odd number of Zookeeper servers to maintain reliability. Because this is a pseudo-distributed mode installation primarily for testing and development, it is OK to use a single Zookeeper server.
- Once the Zookeeper subsystem is running, start the HBase-Master service with this command:
service hbase-master start
- A couple of subordinate services for HBase need to be installed now. To install the region server, type the following command as root:
yum install hbase-regionserver
- And start it with:
service hbase-regionserver start
- At this point, nine services should be running as part of the Hadoop Yarn installation. To verify this, enter the following command in a terminal window as root:
A list of ten services with their PID should be returned. The tenth one is the Jps command.
- Open a web browser and point it to the system name and port 60010. For example http://HadoopTest:60010 . Scroll down slightly to the Region Servers section on the HOME page, and there should now be two servers listed.
At this point, you have a basic Hadoop Yarn system configured with HBase and Zookeeper. You can access the various components using the command line tools. Next time, the configuration of the SNAPPY compression library and the first portion of installing and configuring HUE, Cloudera’s web interface for working with a Hadoop cluster.