After re-reading my initial post on What is Big Data? I decided a little more clarification was in order.
Humans as a people are generating more data now than at any point in our history. Figures thrown out are that 90% of human data has been created in the past few years. I think that may be misleading because a lot of data that is generated isn’t really human created data, but machine generated data. Human data may be a blog post like this one or a book from your local library.
Machine generated data could be a computer’s system and application logs. The amount of data in those logs can be pretty extensive, but they are generated by monitoring the computer and interactions with it, from external sources (other systems and people) and internally (hard drive reads/writes, CPU activity, system temperature, etc). Machines can generate significantly more data in a short amount of time than a human can.
To take all of that accumulated data and sift through it to find something meaningful will depend a lot on what you are trying to find. From the computer logs, you can look for errors if the system is malfunctioning. If the system was hacked, you might look for failed log-in attempts. A typical PC log might be several megabytes long which can be filtered pretty quickly. The biggest issue is how what to filter on to look through it quickly and efficiently to find what you are looking for. But when you have logs from a web server farm like Amazon or Google uses, sifting through those logs quickly and efficiently becomes a major challenge.
Companies, organizations and governments are producing massive amounts of data because they are monitoring and measuring so many different aspects of their activities. But once they have that information, they need to come up with a way to search it, manipulate it, sort it, and codify what is valuable from what isn’t. (And something that isn’t necessary valuable today, may be valuable tomorrow.)
That is the concept behind BIG DATA.