In the IT world over the past few years there has been a lot of talk / discussion / consternation / anticipation about the concept of “BIG DATA”. (You have to put it in quotes and all caps). If you don’t work or live with the technology sector, it may have passed you by, but lately its started to seep into the mainstream consciousness. Like many fashionable terms, what exactly is meant isn’t often relayed. Its just assumed that people know. After all, its the next big thing. You should already know what it is.
Its been stated in many articles I’ve seen that greater than 80% of the world’s data has been created in the past couple of years. While that certainly sounds impressive, lets look a little closer at “data”.
Data isn’t information, and its certainly not knowledge. A lot of data is just noise. If you post on Twitter what you had for lunch, that’s data (technically its datum because its one piece of data), but its not useful information for most people. (Your spouse might find it useful in deciding what to do for dinner, but most people aren’t going to care). If your BFF went to a movie last night without you and text-ed you a spoiler from the cinema, that’s data as well.
To garner information that raw data has to be evaluated and sifted through to develop facts, or concepts, or understanding of an idea. Newton being hit on the head by an apple falling from a tree in isolation is data. Newton taking multiple pieces of data, such as releasing a book in the air and watching it fall to earth and his observations about the noggin-thumping fruit and gaining an understanding of gravity, is information.
Big data is taking huge amounts of what may be unrelated bits and pieces of data, combing through them and trying to find something useful or insightful. Take all of the Google searches for a given week and try to discern some kind of pattern from what people were searching on. You could find out pretty easily what TV shows were popular, or who was in the news that week and garnered their fifteen minutes of fame. And there could be a lot of data there that isn’t relevant to anything. But there could be insights gleaned that we don’t realize were there, as well.
A few years ago Google announced that they had come up with an algorithm that showed users’ queries in the search engine predicted the spread of the H1N1 flu virus. Seriously. As people in different parts of the country became more aware of the flu, and started seeing initial symptoms of those around them,they started searching on it. Geographically, larger numbers of searches tended to coalesce around areas where the flu was becoming more prevalent. In other words, the searches coincided with the outbreaks in real-time, while the health organizations were getting data several days to a couple of weeks later.
This example shows some of the benefit of big data. It takes large amounts of data that weren’t previously available in a cost effective, and easily accessible form and allows us to digitize it. We can then analyze that data and look for trends and correlations that might not have been apparent to anyone. Sure there is a lot of noise in the data, but the insights that may be uncovered are mind boggling.
(Image courtesy of http://www.digital-delight.ch/)