Mac OSX Yosemite and Pentaho Kettle 5.2

yosemiteThe last two weeks has a couple of new software releases, and as always, not everything plays well together out of the box. Apple released Max OS X Yosemite (10.10) as a free upgrade, and Pentaho released Kettle (aka Pentaho Data Integration- PDI) community version 5.2.0. Upgrading both apps on my Mac went very smoothly (the Apple update does take several hours).

However, once I tried to run Kettle, I first got a message that Java was not available. After several security issues with Java, Apple removed it from OS X earlier this year. Apparently during the Yosemite update, Java gets removed again. Its an easy fix to get it back and the message that appears when trying to run a Java based app points to where to download an updated version for OS X. Download and install that.

Try and run PDI again, and an error message about the file being damaged is displayed.  Sigh. Continue reading

Setup a Single-node Hadoop Yarn machine using CDH5 – Part 4

hue_logo_300dpi_hugeThis is part 4 of a series about setting up a single-node Hadoop Yarn system for sandbox use. Part 1 was here, part 2 here, and part 3 here. I have another series for using MapReduceV1, which is here. I’m hoping to keep this series in a similar order as the original set of articles, and will deviate only when necessary. All the content here is based on the Cloudera documentation, but I’ve modified it to be easier to follow for setting up a pseudo cluster and added additional content where necessary.

Please be careful when copying lines from these articles to paste into Hadoop config files or a terminal window. I have found that the double hyphen characters used in the comment lines may copy over as a long hyphen instead. This is likely to cause issues when attempting to run the various components.

Before starting, make sure that Python 2.6 or 2.7 is installed on the server. This is easy to accomplish, by opening a terminal window, and from the command line, enter: python

If Python is installed, it will load up and display the version of the software. On my test PC, it responded with Python 2.6.6. Return to the command line by entering at the Python prompt:  quit()

Continue reading

Setup a Single-node Hadoop Yarn machine using CDH5 – Part 3

This is part 3 of a series about setting up a single-node Hadoop Yarn system for sandbox use. Part 1 was here, and part 2 here. I have another series for using MapReduceV1, which is here. I’m hoping to keep this series in a similar order as the original set of articles, and will deviate only when necessary. All the content here is based on the Cloudera documentation, but I’ve modified it to be easier to follow for setting up a pseudo cluster and added additional content where necessary.

Please be careful when copying lines from these articles to paste into Hadoop config files or a terminal window. I have found that the double hyphen characters used in the comment lines may copy over as a long hyphen instead. This is likely to cause issues when attempting to run the various components.

This entry is fairly short, but next time I’ll delve into installing HUE. Continue reading

Setup a Single-node Hadoop Yarn machine using CDH5 – Part 2

hbase_zooThis is part 2 of setting up a single-node Hadoop Yarn system for sandbox use. Part 1 was here, or for the series for using MapReduceV1, go here. I’m hoping to keep this series in a similar order as the original set of articles, and will deviate only when necessary. All the content here is based on the Cloudera documentation, but I’ve modified it to be easier to follow for setting up a pseudo cluster and added additional content where necessary.

Please be careful when copying lines from these articles to paste into Hadoop config files or a terminal window. I have found that the double hyphen characters used in the comment lines may copy over as a long hyphen instead. This is likely to cause issues when attempting to run the various components.

Install HBase

Continue reading