Pentaho Data Integration (aka PDI or Kettle) is one of the most fully-featured tools for extracting data from a MongoDB environment. MongoDB stores information in documents instead of records, with data for a distinct subject instance stored in a single document where a traditional database might use multiple tables linked via primary and foreign keys and joins. This paradigm generally makes retrieval quicker since than with a comparable relational database system. If you are using PDI to connect to MongoDB, it will probably be the initial source or final destination for the data. For this article, I’ll cover how to use PDI to extract information from a MongoDB collection and save it to a text file. It could just as easily be passed onto another database, simple, or manipulated for other processing downstream.
I do make one assumption: You have a development MongoDB environment setup and running. Continue reading