Pentaho: To Lookup or Not to Lookup

faqGenerally when developing an ETL process, if you have to replace a value from a source with a corresponding value, you should use a lookup table. For example, if you were replacing a country abbreviation with the full name of  the country, you could have a simple 2 column table with the abbreviation in one column and the full name in the other. By using a lookup table it becomes very easy to update values, enter new ones, or possibly delete obsolete ones. This also gives you the added benefit of being able to reuse your lookup table if you need to in other places. But what if you have a small group of items (say a dozen or less) that you need to replace? In that case you might want to look at using Pentaho’s Value Mapper component.

Continue reading

Mac OSX Yosemite and Pentaho Kettle 5.3

AppDamagedThe Pentaho Community Edition 5.3 Business Intelligence suite was released a few weeks ago, and I downloaded the ETL application tonight to install on my Mac. As with the past few versions, the application generates an error when you try to start it on Mac OS X because of security features in the operating system. I’ve covered previously a couple of ways to overcome those security issues and recently reader Ian emailed me with a third method that I decided to try out.

Continue reading

Using MariaDB JDBC with Pentaho Kettle (PDI)

Mariadb-seal-shaded-browntext-altA few weeks ago I got an email from reader Zachary Nielsen asking some questions about using the MariaDB JDBC driver with Pentaho Data Integration (aka PDI or Kettle). He had gotten it working as a JDNI option in PDI but wanted to have MariaDB listed as a database option in the database connection window. I looked into a bit, since I had not worked with the MariaDB JDBC connector, and here is what I found.

For those unfamiliar with MariaDB, its a fork of MySQL by the original developers of MySQL who had concerns over the acquisition of MySQL by Oracle. MariaDB can be used as a drop-in replacement for MySQL, and works using the MySQL syntax, ports and tools (MySQL Workbench and MySQL JDBC drivers), but additional functionality is also available if you like. The MariaDB team also released a JDBC driver to work in place of the MySQL one that appears to process faster (although the benchmarks are almost two years old – you mileage may vary).

In this part of the series, I’ll walk through setting up Pentaho DI to use the MariaDB JDBC driver. I’m still working on implementing the driver on a Pentaho ETL server so that part of the series will come later.  Continue reading