Send Push Notifications from your Kettle jobs – Android

notificationWhen supporting ETL processes, one of the things you have to be conscious of is job failures. To get timely notification of those failures you can include email messages within your workflows that are generated when a failure occurs. Another way that is available, that many Pentaho Data Integration users may not be aware of, is the ability to send Push notifications out from your workflows. A developer named Joel Latino has supplied plugins to the Pentaho Marketplace that can be used to send notifications to Android and Apple iOS devices.

In this post, I’ll look at sending them to Android devices, and in a followup article I hope to cover sending to iOS using Joel’s plugins.

Continue reading

RU ready 4 some football?

JohnnyFootballPosts have been light here for a few weeks due to a number of factors, but I should resume my normal blogging soon.

In the interim, I did want to share this picture I took at the Cleveland Browns training camp from a couple of weeks ago. That’s Johnny Manziel working on his passing in the rain after the Browns lost their preseason exhibition game to the Detroit Lions a few days before. Click on the picture for a larger view. And keep your fingers crossed as they start their season this weekend against the Pittsburgh Steelers.

Emails files from Kettle

mailenvelopeA common task I encounter when working with ETL tools is to send output files somewhere. I often have to FTP files, but just as often, I need to email output files. I’ll cover how to FTP in a future post, but this time I’ll walk through how to set up a job in Pentaho Kettle (aka Pentaho Data Integration or PDI) to email data files.

Unlike the “Put FTP” step in PDI, where you can specify the file or files you want to upload as part of the job component, when sending files via email, you have to create  a transformation step to define the files you want to send, and then pipe that information into the Email step. This is similar to how variables work in Pentaho, where you define the variables in a step before you can use them.

If this is something you need to do, and you want to know how to do it, read on!

At its most basic level, this kind of task in PDI is very simple building on the task of creating files in PDI, whether they are text , Excel, or whatever. Once the output files are created, sending them via email involved only a couple of steps.

Continue reading

Debugging ETL with Error Output


sys-error
When developing new ETL flows, at an early stage you should include steps for error output so you can more easily locate and fix problems, especially when going between different data platforms. As an example, I have recently been working on data transformations that move records on a regular basis between PostgreSQL, Microsoft SQL Server and a DB2 mainframe system. All of these systems use similar data types, but not always the same ones and the method they use to handle data types may vary as well.

Generally what I do is create an On Error step at the destination points in the transformations, and dump any bad records to a text file. This allows me to review if the error is affecting all of the records ( which would indicate a problem with the configuration) or only some of the records (which may indicate a problem with how the step is coded). And in some cases the error output indicates an issue with the source data that wasn’t foreseen! The biggest benefit however is that you get to run the complete sample of records through your workflow to see how well it processes different values, rather than having it fail on the first error.

Continue reading