Through many surveys, executives have identified that big data initiatives rank high on the projects lists for 2012 and 2013. It's not surprising, since the promise that big data can improve and streamline presentations, gain insights to consumer purchasing habits and, in the healthcare industry, even help save lives is simply too important to ignore.
However, several key questions must be answered:
[ Harness the power of Hadoop with InfoWorld's 7 top tools for taming big data. | Learn how to stay on top of the ever-growing amount of enterprise data with InfoWorld's Information Overload blog and Data Explosion newsletter. ]
- What data should you consider?
- How is data captured?
- What tangible benefits can big data initiatives provide my organization?
- What is the ROI for a big data initiative?
While there may be more questions on many IT executive's minds, these are just the four that dominate most conversations. Here are the details and answers on the above questions.
What data should you consider?
Data comes in three formats-structured, semi-structured and unstructured. Structured data is organized in a way that both computers and humans can read. The most obvious example is a relational database. Semi-structured data, which includes XML, email and electronic data interchange (EDI), lacks such formal structure but nonetheless contains tags that separate semantic elements. Finally, unstructured data refers to data types, including images, audio and video, which are not part of a database.
Advice: 5 things CIOs should know about big data
The foremost challenge is the need to unlock the data and gain access to it so you can store it and use it. This allows for the information to stay in its raw format, where it can be analyzed and reported on as it streams real-time into an analytics system. For structured data, this process is fairly straightforward. When working with unstructured data, on the other hand, advanced algorithms and powerful engines are needed to process the incoming data.
How is data captured?
There are countless data sources available to you, and they likewise provide countless types of data. Ultimately, it boils down to the combination of data that needs to be collected.
One of the most commonly discussed data sources that today's companies use to gain insight into their consumers and brand following is social media. This is possible because Facebook, Twitter and the other major social media sites all offer some sort of data access through an application programming interface (API).
The next significant data source pertains to location and movement patterns. As RFID, infrared and wireless technology gets smaller and more affordable, companies will have more assets, employees and customers reporting their location to the appropriate business application.
As organizations capture data from these sources and combine it with the structured and unstructured data they are storing on site and in the cloud, they must ensure that it is being used in a way that it pays off.