With Big Data making headlines daily, it’s easy to mistake “lots of data” for “Big Data.” As most IT folks agree, organizations of all stripes, from government agencies to academia, have been dealing with massive data sets for years. “But just because you have a lot of data, that doesn’t mean it should be considered ’Big Data,’” says Jim Gallo, national director of business analytics at ICC, a leader in business technology solutions focusing on big data and application development.
“If an organization has large volumes of structured data – point-of-sale data, inventory data, sensor data -– that doesn’t translate directly to a Big Data problem or opportunity,” says Gallo.
Today, most organizations use data warehouses and business intelligence (BI) suites to meet their analytics needs. But BI suites are limited to analyzing structured data in relational databases. When you combine the three “Vs” of Big Data – volume, variety and velocity – with unstructured data such as YouTube videos or medical images with the desire to learn something new from those mashups, you enter Big Data territory, according to Gallo.
“When you want to do something other than store and fetch images; when you begin to look inside the images and draw correlations to other data types like electronic health records (EHRs) or a Twitter feed or weather data, that’s when you have a Big Data challenge,” says Gallo.
So how can an organization know if the challenge it is facing is Big Data or just lots of data?
Click through for five ways that can help you determine if you’re dealing with a Big Data challenge or just lots of data, as identified by Jim Gallo, national director of business analytics at ICC.
When you are interested in learning new things from your existing data by putting it into an analytics platform and combining it with other unstructured data sets so you can find cause and effect, then you are dealing with a Big Data challenge. But, if you are simply retrieving and storing large files without the need to combine them with other information or conduct analytics on them, you’re likely facing a “lots of data” challenge.
Another “tell” is if the information you want to analyze is streaming by at high velocities and the value of that information is time dependent. High-frequency trading (HFT) is a good example of time-dependent data analyzed in real time so Wall Street traders can make money. HFT algorithms can be written to incorporate myriad factors, such as currency fluctuations, political decisions or Federal Reserve actions, with current news headlines, to affect trades. Those trades make or lose money based on timing the stock market to the millisecond.
“Suddenly you’re correlating more variables that were historically unstructured in nature,” says Gallo. “It’s all about timing and the organization that builds the best algorithms – that consider the most variables and have the highest degree of predictability – wins.”
Single feed data streams that relay one type of event, such as a cash register sale sent to a back-end inventory control and ordering system, are a typical type of data collection. A big box retailer may operate tens of thousands of terminals together streaming hundreds of thousands of simultaneous data feeds into the same database. This is not Big Data because those feeds simply initiate predetermined actions like ordering more inventory or shipping items now.
But when those same data feeds are combined with other data streams, like the price of fuel at a logistics company to optimize shipping costs, then it becomes an example of a Big Data challenge.
If most of your data is in a data warehouse somewhere sitting in nice rows and columns without a lot of interaction with the outside world, then, almost regardless of size, you are not dealing with “Big Data.” There are a few caveats here, however. If that data needs to be ready at a moment’s notice to help in, say, fraud detection at a credit card provider, that moves you into the area of Big Data. It becomes Big Data because the information is being used in a real-time process – one that takes its cues from any number of sources to determine if someone is getting ripped off or if they are on vacation. Luckily, today’s emerging crop of high-speed data bases is well equipped to handle this issue.
The power of Big Data does not originate from some singularity somewhere in cyberspace. Back in the days when a terabyte (TB) of enterprise-class storage cost millions, not $79.99, and processors were clocked in kilobytes per second, everything was Big Data. Today, when a typical smartphone has access to almost unlimited amounts of storage and processing in the cloud, defining Big Data vs. lots of data comes down to what you want to do with that data.
If you are looking for new business insights by combining your data with data from the outside world and then throwing a bunch of advanced analytics at it to answer pressing business questions — like when to jump into a new market or what color shoes to recommend to your customer based on their preference and the preferences of people just like them — then you are looking at a Big Data challenge.
But, if you just want to move massive amounts of data around your network faster and faster, for example, or take an overnight batch process and turn it into a few hours, then, really, you are dealing with a lots of data problem. What you want to do with your data makes it Big Data, not how much of there is.