This is part 4 of my series on how to setup a Hadoop single node, pseudo-cluster using Cloudera’s CDH distribution. In part 1, I walked through the beginning steps to configure a CentOS 6.5 system for use as a single-node Hadoop pseudo-cluster. In part 2, I walked through installing and configuring HBase, Zookeeper and Snappy. Part 3 covered installing HUE – the graphical front end from Cloudera that makes interacting with the various components in our cluster easier.
In this installment, we’ll set up Oozie – a job scheduler component for Hadoop, and fix a minor problem with the Hue setup. This series is based on the Cloudera documentation, but I’ve modified it to be easier to follow for setting up a pseudo cluster.
Update: Please be careful if you are copying lines from these articles to paste into your Hadoop config files. I have found that the double hyphen characters used in the comment lines may copy over as a long hyphen instead. This is likely to cause issues when attempting to run the various components. (more…)