Through many surveys, executives have identified that big data initiatives rank high on the projects lists for 2012 and 2013. It's not surprising, since the promise that big data can improve and streamline presentations, gain insights to consumer purchasing habits and, in the healthcare industry, even help save lives is simply too important to ignore.
However, several key questions must be answered:
[ Harness the power of Hadoop with InfoWorld's 7 top tools for taming big data. | Learn how to stay on top of the ever-growing amount of enterprise data with InfoWorld's Information Overload blog and Data Explosion newsletter. ]
- What data should you consider?
- How is data captured?
- What tangible benefits can big data initiatives provide my organization?
- What is the ROI for a big data initiative?
While there may be more questions on many IT executive's minds, these are just the four that dominate most conversations. Here are the details and answers on the above questions.
What data should you consider?
Data comes in three formats-structured, semi-structured and unstructured. Structured data is organized in a way that both computers and humans can read. The most obvious example is a relational database. Semi-structured data, which includes XML, email and electronic data interchange (EDI), lacks such formal structure but nonetheless contains tags that separate semantic elements. Finally, unstructured data refers to data types, including images, audio and video, which are not part of a database.
Advice: 5 things CIOs should know about big data
The foremost challenge is the need to unlock the data and gain access to it so you can store it and use it. This allows for the information to stay in its raw format, where it can be analyzed and reported on as it streams real-time into an analytics system. For structured data, this process is fairly straightforward. When working with unstructured data, on the other hand, advanced algorithms and powerful engines are needed to process the incoming data.
How is data captured?
There are countless data sources available to you, and they likewise provide countless types of data. Ultimately, it boils down to the combination of data that needs to be collected.
One of the most commonly discussed data sources that today's companies use to gain insight into their consumers and brand following is social media. This is possible because Facebook, Twitter and the other major social media sites all offer some sort of data access through an application programming interface (API).
The next significant data source pertains to location and movement patterns. As RFID, infrared and wireless technology gets smaller and more affordable, companies will have more assets, employees and customers reporting their location to the appropriate business application.
As organizations capture data from these sources and combine it with the structured and unstructured data they are storing on site and in the cloud, they must ensure that it is being used in a way that it pays off.
What tangible benefits can big data initiatives provide my organization?
In most market verticals today, the majority of the information needed for a big data initiative is already available; however, it may in some cases lack volume and standardization. Many organizations face the challenge of quickly implementing the right platform to capture and extract data from the different business application siloes and make it available for data analysis.
Today's marketing firms and internal marketing departments face is the challenge of leveraging data to first generate more leads and then accurately measure the effectiveness of marketing campaigns. With many consumers discussing brands on different social media sites, it can be daunting tasks to attempt to track Tweets, user reviews, and Likes throughout the Web. However, with many social sites offering the ability to pull data, organizations can leverage the big data to get customer insight and real-time analytics.
How-to: Big data for marketing: Respect consumer privacy or get burned
There are numerous online data sources that retail can use. Much of it relates to consumer browsing behavior and overall brand sentiment. As noted above, data can be extracted using APIs from social media services, as well as from Google and Web server logs. In addition, many retailers captured data within their stores, whether it's tracking physical shopping carts or using customer rewards cards to monitor customer shopping patterns.
Healthcare can utilize data stored in electronic health record systems. In addition, recent federal incentives such as the stimulus bill, the HITECH Act and health care reform provide financial incentives for adopting health information technology, which makes more structured and semi-structured data available to physicians, executive and other stakeholders. Finally, wearable medical devices and mobile health applications are gaining popularity. Both the devices and the apps generate a continuous flow of data, which healthcare organizations are preparing to capture and use in the name of improving patient care.
Advances in medical research has enabled drug manufacturers to create highly targets pharmaceutical testing for patients with specific genetic markers. This has been accelerating with the overwhelming scale of data that scientists can utilize, with that data further shared through international efforts such as the Annotated Human Genome Data project.
For the global logistics industry, which deals with supply chain management and control of goods, big data originating from numerous sources- GPS technology, EDI messages from suppliers and shipping vendors, pallets and cases of goods, mobile devices with customer data, internal ERP systems and social media sources-can provide significantly insight and support to the re-engineering process.
Enterprise finance institutions such as Citigroup have begun the journey toward a big data initiative. The focal point here is reducing fraud and ensuring that patterns ordinarily hidden within data sets can be brought to the surface, expose activity such as money laundering and ensuring compliance with United States and international banking rules.
A combination of grants and executive initiatives, including a recent directive from President Barack Obama, aim to support big data within the federal government. The primary goal is to improve the accessibility of government services and information to the American people, in part by making possible data access using mobile devices. Federal CIO Steven VanRoekel will oversee the White House Roadmap for a Digital Government, which plans to provide free data access to the public and private sector-which would help many organizations advance their big data initiatives.
Of course, many industries in addition to those listed here can take advantage of big data to help drive change and improvement.
Can you easily measure ROI on big data initiatives?
While some vendors may claim immediate ROI on big data, you will need to identify the different aspects of the data they have access to and what value it will provide to your organization.
In most ROI scenarios, you must tie a cost-benefit analysis to the proposed project. In the case of a big data initiative, there are typically several non-measureable aspects. Since you're looking at large volumes of data with the intent to discover insights into potential changes to business processes, it's hard to predict the value of what can be discovered.
Advice: How to avoid big data spending pitfalls
That said, there are a couple rules of thumb to keep in mind.
The cost of a big data initiative is unlikely to increase as data volume increases, since big data technologies tend to be highly scalable. In addition, although data comes in different formats (structures, semi-structured and unstructured) at ever-increasing growth rates, the implementation and maintenance of platforms that support Hadoop can help make it much more cost effective than traditional database management systems. That's because newer solutions can run on commodity hardware running open-source software.
As with any new trend, hype surrounds big data. In the past, many organizations used large data warehouses for data analysis and evidence-based decision making. As we know it today, big data brings into the mix additional information, such as social media behavior, which was previously unavailable to users of a siloed warehouse-and now that data can be stored and managed in the cloud for a fraction of what it once cost. To make the most of that data, it's critical to find out what big data means to your organization and what you want your next (or first) big data initiative to accomplish.
Reda Chouffani is a vice president at Biz Technology Solutions, which helps medium and large companies in the Southeastern United States deploy BI and ER software as well as IT infrastructure. Follow everything from CIO.com on Twitter @CIOonline, on Facebook, and on Google +.
Read more about data management in CIO's Data Management Drilldown.
This story, "4 questions to ask before starting a big data initiative" was originally published by CIO.