Using MariaDB JDBC with Pentaho Kettle (PDI)

Mariadb-seal-shaded-browntext-altA few weeks ago I got an email from reader Zachary Nielsen asking some questions about using the MariaDB JDBC driver with Pentaho Data Integration (aka PDI or Kettle). He had gotten it working as a JDNI option in PDI but wanted to have MariaDB listed as a database option in the database connection window. I looked into a bit, since I had not worked with the MariaDB JDBC connector, and here is what I found.

For those unfamiliar with MariaDB, its a fork of MySQL by the original developers of MySQL who had concerns over the acquisition of MySQL by Oracle. MariaDB can be used as a drop-in replacement for MySQL, and works using the MySQL syntax, ports and tools (MySQL Workbench and MySQL JDBC drivers), but additional functionality is also available if you like. The MariaDB team also released a JDBC driver to work in place of the MySQL one that appears to process faster (although the benchmarks are almost two years old – you mileage may vary).

In this part of the series, I’ll walk through setting up Pentaho DI to use the MariaDB JDBC driver. I’m still working on implementing the driver on a Pentaho ETL server so that part of the series will come later.  Continue reading

Centos – Show details during boot

startupBy default CentOS 6 shows an animation while the system boots up, indicating its progress with either a rotating ring or a progress bar (in my experience physical machine installs show the rings, and VM installs show the progress bar). However, if you are from a sysadmin background or are responsible for monitoring one or more CentOS boxes, you may want to see what’s happening while the system comes up, rather than a simple animation.

Continue reading




A couple of years ago, Cloudera released an  open source application to query Hadoop stored data with much of the familiar SQL language syntax used by database professionals. Cloudera seemed to have positioned Impala as a replacement for Hive and Pig and has taken some hits for it. Regardless of corporate motivations, because my day to day work over the past 8 years has revolved around using SQL to development and administer various DB systems, I have taken a keen interest in Impala and how it might be useful. (I’m also interested in Hortonworks Stinger initiative to improve Hive, but that will be a different post).

One of the biggest issues with open source applications, as I have noted before, is the lack of documentation and training materials for people trying to use them. Those of us who work in the corporate world don’t have the luxury of figuring stuff out on our own at our day jobs, so we often look beyond the supplied documentation for better resources for learning new applications. In the past year, two publishers have released books on Cloudera Impala, and I will look at them, compare and contrast and tell you which one I think is better.

Continue reading