Start MongoDB on Demand – Ubuntu

mongodbI use my laptop for development and testing on a number of different database platforms. And for a host of reasons, I like to keep programs installed locally rather than having to connect to another box whenever possible. That can present a challenge however because most database platforms want to start up when the computer starts up.  It makes sense (because typically these are server based applications) but I usually don’t want them to startup when I start the laptop up. My machine has enough to contend with when I am working without having to manage the overhead of several database, especially since I don’t always use each platform every day.

Recently, I started a project using MongoDB, and installed it via the Ubuntu Software Center. Because of some incompatibilities between Ubuntu 15.04 and MongoDB 3.0.4, the Software Center installed version 2.6.3. After restarting my laptop the next day, I noticed that Mongo starts when the laptop starts. Not good! So I poked around a bit to figure out how to disable this, and start it only when I want to start it. Continue reading

SQuirreL SQL Client for accessing different databases – Part 2

squirrel2In my last post on using SQuirrel SQL Client (aka Squirrel), I walked through downloading and installing it, and adding a connection to a locally hosted MySQL database. This time, I walk through using Squirrel to manipulate the MySQL database. I’m assuming you have the MySQL test database Sakila installed for this. If not, check out the documentation at the MySQL website.

OBJECTS DISPLAY
  1. Open Squirrel, and from the Aliases tab on the left, double click the entry you setup in Part 1, for the MySQL database. The Connect to window will appear. In the URL field, make sure you have specified a database name. For my setup, the URL reads: jdbc:mysql://localhost:3306/sakila
    mysql_login
  2. Enter your password in the appropriate spot and click Connect at the bottom.
  3. On the main work area in the center of the screen, a new window will appear with A LOT going on. In the upper left portion will be a drop down list labeled Catalog. Currently the list should show “sakila” as the selected entry. (Even though all of the databases you have access to on the server will show up in the Object list, in order to switch to a different database, you’ll need to select the new database from the Catalog drop down list.)catalog
  4. Below the Catalog list you’ll see two tabs (on Mac OS X you-ll see buttons instead of tabs): Objects and SQL. If Objects is not selected, click on it to make it active. Below Objects on the left will be a search field, and then below that will be an object tree, showing the MySQL databases on the server you are connected to, and at the bottom, a USERS entry. To the right you’ll see a set of tabs, a large number of them being reference ones:
    • FUNCTION names for numbers, strings and time/date and
    • KEYWORDS – reserved keywords for the particular database
    • TABLE and DATA TYPES -lists of the various kinds of tables and field types the particular database supports.
  5. Depending on the RDBMS you are connected to you may see system maintenance tabs as well.
    Objects

    SQuirreL SQL Client connected to MySQL database. Objects tab active.

    Please Note: None of the information displayed in the Objects area is editable by the user. It is read only.

  6. Double click the entry for “sakila” in the tree, to toggle it open and display its sub-tree. The tabs displayed on the right will change and only a few will be shown. The Info tab will give you some very brief Name information about the database. Click on the MySQL Open tables and you will see a list of the tables in the “sakila” database and if the table is in use and/or locked. Click on the MySQL Table Status tab, and you’ll get more information about each table and view in the database (MySQL Engine in use, how many rows in the table, etc).
  7. Double click the entry labeled TABLE under “sakila” to see a list of tables within the database. The window on the right will clear, until you select a table, so click on the first one, labeled “actor”.
    actor_table

    SQuirreL SQL Client connected to MySQL database with table selected

    • The first tab, Info again provides some basic naming information about the table.
    • Click the second tab (Content) and you can see the first 100 records in the table.
    • Row Count shows you how many records are in the table.
    • Columns provides information on the fields in the table.
    • There are several tabs showing information about Primary Keys and Exported and Imported Keys (Foreign Key relationships).
    • Several other tabs are also present, covering various settings and configurations of the database. Depending on the RDBMS you are connected to, what you see may vary.
  8. Below the TABLE entry in the Object list on the left, there is a VIEW entry. Double click that to see the VIEWS that are part of the database. In the “sakila” database there are seven of them, and clicking on each will update the window to the right with information about them. Of special interest is the last tab, labeled Source. Clicking on this will show the query that defines the view (very handy!)
  9. Below the VIEW entry, you will see one labeled PROCEDURE. Double clicking on this will show you the six Stored Procedures in the “sakila” database. Clicking on the name of any of them will provide information to the right on them. As with the VIEW section, you can see the source code for the Stored Procedure.
  10. Finally, below PROCEDURE, you will see an option labeled UDT for User Defined Types. If there are any UDTs defined in your database, they will display here. (Sakila has none).
  11. That’s the Objects side of the display!

Continue reading

CleanDataCLEAN DATA
by Megan Squire
Grade: A

One of the most time consuming, but also most important aspects of a data analysis project is cleaning the source data you are using. In a New York Times piece last year, this process was called data wrangling or data janitor work, and it was stated that 50 to 80 percent of a data scientist’s time will be spent on it.  The challenges can involve anything from removing extraneous characters to misspelled words to incorrect dates to any of hundreds of other issues. But the thing they all have in common is that before fixing the issue you have to understand what the issues are.

Megan Squire, professor of Computing Sciences at Elon University in North Carolina, attempts to break through some of these problems with a series of techniques to make the process more manageable in her book Clean Data from Packt Publishing.  Aimed at data scientists, this book also can benefit those who work in the ETL end of the process, since there is considerable cross over in functions.

Continue reading

SQuirreL SQL Client for accessing different databases – Part 1

squirrelIts been my experience that if you work on ETL projects, you eventually accumulate client software for a number of database systems on your development PC. The reason is pretty straightforward – you need to be able to access the systems you are working with to determine data types, schema structures, and occasionally to check that a User account and Password you have been given actually works.

One problem I’ve run into though is that not all operating systems are supported by different database vendors with their tools. While Windows has the largest installation base, Mac OS X, and Linux also are used for ETL development  but Microsoft’s SQL Server management tool will only work on Windows machines. Apple’s FileMaker software is similar, running on Mac OS X and Windows, but not Linux (since version 7). The examples go on and on. Also, because each tool is laid out differently, it can be difficult to find what you need quickly when you only work infrequently on a specific platform. Often times remembering where I need to go in a specific tool will take me longer than getting the actual information I was looking for.

All of this leads to the point of this post – using a free open source product call SQuirreL SQL Client to access multiple database platforms via one application regardless of whether you are running Windows, Mac OS X or any of a large variety of Linux distributions.

Continue reading