Several weeks back, I posted a tutorial on how to use the Update/Insert function in Pentaho Data Integration (PDI aka Kettle). Recently at work, I had an occasion to revisit the Update/Insert because a workflow using it was not getting all of the updated records. By switching it out with a workflow like that illustrated here, I was able to improve the runtime minutely. and also to rectify the problem.
For a setup, I use the same starting database that the Update/Insert tutorial uses. The instructions are reproduced here for ease of use:
Assumptions and requirements
For this tutorial, I am assuming you have access to a MySQL (or MariaDB) database server. We’ll be creating a a sample database based on one originally created by Fusheng Wang and Carlo Zaniolo at Siemens Corporate Research. For our purposes we only need one table with a small amount of data. Copy the script below and save it as a SQL file on your system. Run it in MySQL to create the database and populate the table (Yes Production is spelling incorrectly).
DROP DATABASE IF EXISTS sample;
CREATE DATABASE IF NOT EXISTS sample;
CREATE TABLE departments
( dept_no CHAR(4) NOT NULL,
dept_name VARCHAR(40) NOT NULL,
PRIMARY KEY (dept_no),
UNIQUE KEY (dept_name) );
INSERT INTO departments
Once you have it loaded, do a select on your departments table, and it should return five records.
Now copy the lines below into a text file using a basic text editor (Notepad, TextEdit, Notepad++, Gedit, etc) and save it as changes.txt to a location you can reach from within Pentaho. Notice we are adding four departments and correcting the spelling on the Production department.
USING TABLE OUTPUT AND UPDATE
- Open up Pentaho and create a new database connection to your sample database. Be sure to test the connection to make sure you can access the data.
- Create a new transformation, and from the Design tab, under the Input node, drag a Text File Input step onto the canvas.
- Double click the Text file input step, and on the first tab – File, click the Browse button and navigate to where you saved the changes.txt file, choosing OK on the file location window to be returned to the File tab. Click the Add button to move the file down to the Selected Files table.
- At this point, you can click the Show File Content button at the bottom to see what is in the text file. It should look like this:
- Close the Preview Data window and the Text File Input step.
- From the Output node drag a Table Output step. Create a normal hop between the two steps.
- Open the Table Output step and supply the connection information for the MySQL database that was created earlier in this tutorial. Leave the commit size set to the default value. Make sure the Truncate Table box is unchecked. Check the Specify database fields box.
- Switch to the Database Fields tab, and click the Get Fields button on the right side. The Fields to Insert grid should populate with the table name and stream field information.
- Click OK to exit the window.
- Drag an Update step from the Output node. Start to create a hop between the Table Output step and the Output node, and when the output type menu appears, choose Error Handling of Step.
- The hop will be created between the two steps, but it will be a dotted red line with a red X appearing on the hop. This denotes its an error handling step.
- Open the Update Step, and enter the connection information to the sample database and the departments table. Click the Get fields button next to the key(s) to look up the value(s) grid. Both of the fields from the departments table will be entered. Since the primary key is only the dept_no field, delete the dept_name line.
- To the right of the Update fields grid, click the Get Update fields button. Again the grid will be populated with both columns from the departments table. Because the update should not be updating the PK, remove the dept_no field from the grid.
- Click OK to close the update window, and save the transformation. Click the RUN button.The transformation should process very quickly, and the metrics window should appear similar to this:
- Notice that the Output step shows 4 records were Output and 1 was Rejected. The four new records were added to the deparments table, and one record that was already in the database was passed to the Error handling step. There the record was updated.
- Switching back to the MySQL console and doing a Select on the table will verify the results. Four new departments were added, and the spelling for the Production department was corrected.
Pentaho, Kettle and PDI are trademarks of Pentaho LLC.