Photo Break!

bulb05Once in a while I like to post a picture I’ve taken if I like how it turned out. The one above is an example. :)

It was taken at 1/8000 of a second, f/11 with ISO 4000 using a 100mm lens and an off camera flash to the left with a blue gel.  If none of that makes any sense to you, don’t worry. I’ll be back to posting about ETL/Big Data soon.


Pentaho – Using Database Lookups

KettleAt my previous day job, we were often tasked with producing data extracts for distribution to outside companies who sold our products. To accomplish this we used a well known reporting platform to produce either Excel or CSV format files. While it worked, it was like using a hammer to drive in a screw. Its just not the best way to accomplish your goal. So during a rare period of calm, I dissected a couple of our extract reports and attempted to convert them to Pentaho Data Integration (aka Kettle).

Because we were pulling data from multiple sources and combining it into one unified output file,  the extracts were made up of several queries gathering data from multiple sources. The first query would pull the bulk of the data, and then specific fields from those results were used as filters to pull additional fields from tables in other data sources.  This concept can be used in Pentaho as well, making it unnecessary to temporarily store the data from the first extract and join it later with data from subsequent extracts. An additional benefit is that processing in this manner tends to be much quicker.

This type of processing can also be used for denormalizing data when building a data warehouse and in many other scenarios when working with ETL processes.

Continue reading

crashA short explanation for why the site has been quiet for a couple of weeks: In my last post I referenced updating to OS X Yosemite. While the operating system has been performing fairly well, one installed software package has not.

For the most part, I use virtual machines to develop and test different things like Hadoop, SQL Server, Pentaho and various other items. This allows me to try out multiple packages and operating systems without the expense of a lot of physical boxes. For the host software, I have been using Fusion, the workstation software from VMWare that allows you to run multiple virtual systems on a Mac. The last time I upgraded my Mac’s OS (less than a year ago to Mavericks), I had to fork over $45 to upgrade Fusion to overcome some issues with the new OS. Continue reading

Mac OSX Yosemite and Pentaho Kettle 5.2

yosemiteThe last two weeks has seen a couple of new software releases, and as always, not everything plays well together out of the box. Apple released Max OS X Yosemite (10.10) as a free upgrade, and Pentaho released Kettle (aka Pentaho Data Integration- PDI) community version 5.2.0. Upgrading both apps on my Mac went very smoothly (the Apple update does take several hours).

However, once I tried to run Kettle, I first got a message that Java was not available. After several security issues with Java, Apple removed it from OS X earlier this year. Apparently during the Yosemite update, Java gets removed again. Its an easy fix to get it back and the message that appears when trying to run a Java based app points to where to download an updated version for OS X. Download and install that.

Try and run PDI again, and an error message about the file being damaged is displayed.  Sigh. Continue reading