Create a Pentaho Kettle Repository on SQL Server

As I have stated previously when creating ETL workflows, its useful to store the information in a database repository, rather than as individual files on your workstation. This allows multiple users to have access to the information (why recreate the wheel?),  it allows you to pull it into your jobs quickly and easily, and you can back it up quickly and restore it if necessary. With the community version 7.0 of PENTAHO® DATA INTEGRATION (PDI), I am happy to report that you can finally create a repository for your ETL code on Microsoft SQL Server. Previously, you could setup a repository on MySQL or PostgreSQL with the community edition but there were compatibility problems with the code that Kettle used that didn’t work with SQL Server. After downloading the latest version I was attempting to make a connection to SQL Server, and decided to test setting up a repository again. I am happy to say it works so the remainder of this article will walk through the process of setting up a Pentaho repository on SQL Server 2016 from a Windows 10 machine.

Prerequisites:

  • Download the jTDS open source SQL Server JDBC driver. Extract the ZIP file, and copy the jtds-1.3.1.jar file from your download and save it into the data-integration\lib folder of your Pentaho application. Although Microsoft provides a JDBC driver, it did not work for me.
  • Create an empty database on your Microsoft SQL Server. I created one called “PentahoRepository”
  • Setup a SQL Server user account (not an Active Directory account) on your database server and give the account  DBO (owner) permissions on the database. Using a DDLADMIN level does not work. I created my account and called it “repository”. I also set the default database for this account to the new database.

Now that we have our prerequisites setup, we can start the PDI client.

Continue reading

Running Pentaho Kettle 7 on Windows 10

Recently I switched PCs to a newer Windows 10 based laptop for some of my work, and I wanted to get Pentaho Data Integration up and running on it. I downloaded the pdi-ce-7.0.0.0.25.zip file from the Community website, and extracted the contents to a folder in my Program Files directory. I tried running the SPOON.BAT to start it up but a window flashed on screen quickly and disappeared, but nothing else happened. I opened a command prompt and executed the SPOON.BAT file, but got a message that the JAVAW.EXE file could not be found. So I needed to perform a few other things to get it working.

A quick search engine query showed me that many people had the same issue, but there didn’t seem to be a consensus on how to resolve it. Below is how I managed to get it running.

Continue reading

Microsoft SQL Server on Docker (part 1)

sql_on_dockerAfter last week’s announcement that Microsoft has released a Public Preview version of SQL Server that would run on Linux platforms (currently Ubuntu 16.04 and Red Hat Enterprise 7.2), I wondered if a Docker version was going to be made available as well. Checking out Microsoft’s SQL Server v.Next website, I quickly saw that Lo and behold! there was! As I stated previously, a good portion of my professional career has been spent working with Microsoft SQL Server and a lot of my spare time is spent experimenting with Linux and Open Source applications. The merging of the two is hugely exciting to me and the ability to run SQL Server in a Docker container is amazing!

Previous Docker containers of Microsoft products have been a mixed bag of requirements. Some required you to run the Docker engine on a Windows platform in order for it to work, while others could be run on different Linux variants. Even the previously released version of SQL Server Express, the free version of SQL would only work on Windows hosted containers. But so far it seems with this version of SQL Server, Microsoft may be starting to explore the open source world possibilities.

There are some caveats though:

Continue reading

Docker Fun – Play Text Adventures

zorkThis time around in the Docker Fun series, we are getting a little more ambitious. Zork was one of the first popular text adventures, building on the work of Will Crowther and Dan Wells who created the mainframe game Colossal Cave (aka Adventure). Zork was originally written to run on a DEC PDP-10 system, and was later ported to just about every personal computer that was available. While text adventures at that time were struggling with two word commands like “Open Door”, Zork understood much more elaborate command like “Hit the grue with the Elvish sword”. The developers of Zork founded Infocom and eventually released a number of sequels and prequels to Zork as well as several dozen other text adventures in a number of different genres. Eventually the company was sold to Activision and the Infocom games have since lapsed into a sort of purgatory. Technically the download file for Zork is in violation of copyright laws, so feel free to substitute one of the many free Z5 files available on the Internet.

Continue reading