Run Talend Open Studio – Data Integration on Mac OS X 10.12

After downloading Talend Open Studio Data Integration this week, I ran into a similar issue to one I’ve experienced before while using Pentaho Data Integration. I extracted the TOS zip file to a folder under Applications and attempted to start the TOS_DI_macosx_cocoa application. I got an error message that Talend couldn’t access its resource library, and it shut down.

Since Talend DI runs on JAVA, I thought the issue might lie with the quarantine process in OS X Sierra, that has been “improved”. Well I was right! Keep reading to see how to get TOS to run on your Mac!

Continue reading

Running Kettle (Pentaho Data Integration) on Mac OSX 10.12 Sierra

A new version of Mac OSX and a new version of Pentaho Data Integration (aka Kettle) but the same old problem getting Kettle to run. Apple tries to keep their operating system locked down and secure, so if you download applications from the Internet that aren’t from the Apple App Store, the files are quarantined.

With the update to Sierra, the quarantine process has been “improved”. Keep reading to see how to do it!

Continue reading

Extract PDF data with Tabula

tabulaAdobe’s PDF file format is a wonderful tool, allowing users on disparate operating systems to share documents easily. Because Adobe made the file format an open-standard in 2008, applications to create and read PDF files readers can be found on pretty much every operating system you can think of – Linux distros, Windows, Mac OS X and BSD,  just to name a few, from no cost to several hundred dollars. And in most cases, the original document, if not identical to the PDF, is close enough to identical to make the differences irrelevant.

In my work as a BI developer, occasionally I have to extract data from PDF documents and get it into a database. While reading the file is no problem, getting the data out in a usable format where I am not having to retype or reformat the output excessively is often times not so easy. Luckily I have come across an open-source tool, called Tabula,  that makes extracting data from a PDF much easier. It doesn’t work for every PDF, only on text-based PDFs. That means reports and data sets that were exported to a PDF file, rather than documents that were scanned into a computer and saved as a PDF file. (The latter tends to be image type files rather than text based documents.)

Continue reading

SQuirreL SQL Client for accessing different databases – Part 2

squirrel2In my last post on using SQuirrel SQL Client (aka Squirrel), I walked through downloading and installing it, and adding a connection to a locally hosted MySQL database. This time, I walk through using Squirrel to manipulate the MySQL database. I’m assuming you have the MySQL test database Sakila installed for this. If not, check out the documentation at the MySQL website.

OBJECTS DISPLAY
  1. Open Squirrel, and from the Aliases tab on the left, double click the entry you setup in Part 1, for the MySQL database. The Connect to window will appear. In the URL field, make sure you have specified a database name. For my setup, the URL reads: jdbc:mysql://localhost:3306/sakila
    mysql_login
  2. Enter your password in the appropriate spot and click Connect at the bottom.
  3. On the main work area in the center of the screen, a new window will appear with A LOT going on. In the upper left portion will be a drop down list labeled Catalog. Currently the list should show “sakila” as the selected entry. (Even though all of the databases you have access to on the server will show up in the Object list, in order to switch to a different database, you’ll need to select the new database from the Catalog drop down list.)catalog
  4. Below the Catalog list you’ll see two tabs (on Mac OS X you-ll see buttons instead of tabs): Objects and SQL. If Objects is not selected, click on it to make it active. Below Objects on the left will be a search field, and then below that will be an object tree, showing the MySQL databases on the server you are connected to, and at the bottom, a USERS entry. To the right you’ll see a set of tabs, a large number of them being reference ones:
    • FUNCTION names for numbers, strings and time/date and
    • KEYWORDS – reserved keywords for the particular database
    • TABLE and DATA TYPES -lists of the various kinds of tables and field types the particular database supports.
  5. Depending on the RDBMS you are connected to you may see system maintenance tabs as well.
    Objects

    SQuirreL SQL Client connected to MySQL database. Objects tab active.

    Please Note: None of the information displayed in the Objects area is editable by the user. It is read only.

  6. Double click the entry for “sakila” in the tree, to toggle it open and display its sub-tree. The tabs displayed on the right will change and only a few will be shown. The Info tab will give you some very brief Name information about the database. Click on the MySQL Open tables and you will see a list of the tables in the “sakila” database and if the table is in use and/or locked. Click on the MySQL Table Status tab, and you’ll get more information about each table and view in the database (MySQL Engine in use, how many rows in the table, etc).
  7. Double click the entry labeled TABLE under “sakila” to see a list of tables within the database. The window on the right will clear, until you select a table, so click on the first one, labeled “actor”.
    actor_table

    SQuirreL SQL Client connected to MySQL database with table selected

    • The first tab, Info again provides some basic naming information about the table.
    • Click the second tab (Content) and you can see the first 100 records in the table.
    • Row Count shows you how many records are in the table.
    • Columns provides information on the fields in the table.
    • There are several tabs showing information about Primary Keys and Exported and Imported Keys (Foreign Key relationships).
    • Several other tabs are also present, covering various settings and configurations of the database. Depending on the RDBMS you are connected to, what you see may vary.
  8. Below the TABLE entry in the Object list on the left, there is a VIEW entry. Double click that to see the VIEWS that are part of the database. In the “sakila” database there are seven of them, and clicking on each will update the window to the right with information about them. Of special interest is the last tab, labeled Source. Clicking on this will show the query that defines the view (very handy!)
  9. Below the VIEW entry, you will see one labeled PROCEDURE. Double clicking on this will show you the six Stored Procedures in the “sakila” database. Clicking on the name of any of them will provide information to the right on them. As with the VIEW section, you can see the source code for the Stored Procedure.
  10. Finally, below PROCEDURE, you will see an option labeled UDT for User Defined Types. If there are any UDTs defined in your database, they will display here. (Sakila has none).
  11. That’s the Objects side of the display!

Continue reading