internet-300pxBeen a crazy busy month personally, lost one family member, added a new one, celebrated some stuff, and Halloween was here. I’ve been looking into a variety of information on the web so here is a round up of some interesting topics until I have some time to devote to normal articles and tutorials.

RethinkDB – this newer Big Data database platform seems to be getting some traction. Its aimed at realtime web applications, and uses a JSON type data structure (similar to MongoDB), but also provides support for JOINs (which MongoDB doesn’t). PacktPub’s blog posted an article on Learning RethinkDB. The documentation on the platform’s website is very good compared to what is normally associated with Open Source projects. Continue reading

pdi51While I was lounging around the back roads of Texas last week, Pentaho, Inc. was busy – releasing version 5.1 of Kettle in both Enterprise and Community editions. I’ve downloaded and installed the CE version on my laptop, and plan to put it through its passes this weekend. Pentaho’s website lists the improvements in more detail but here are some of the highlights that look interesting to me.

There are new Big Data features including: Continue reading

Public Data Sets

3d database structureIf you are just starting working with Hadoop and Big Data, you may be at a loss for data to experiment with. Luckily, there is an abundant supply of freely available data sets on the Internet. Here I will highlight a few of the sources I have found out about, and I’ll add more as I find them.

InfoChimps is a company of data scientists, cloud computing and open source experts who provide solutions for their customers to make Big Data platforms. They provide over 11,000 freely available data sets for you to download. Everything from an Excel readable list of crossword puzzle words to UFO sighting data sets are here. Continue reading