Big data prep: 5 things IT should do now

Ready or not, big data is coming. Here are 5 things IT managers can do today to prepare for the data deluge of tomorrow

Got your "big data" plan in place? If not, you may want to start thinking about implementing one.

Big data is being hailed -- or hyped, depending on your point of view -- as a key strategic business asset of the future. That means it's only a matter of time before the suits in the corner office want to know IT's thoughts on the matter.

[ Find out why big data is a big deal. | Also on InfoWorld: New tools driving big data analytics. | Discover what's new in business applications with InfoWorld's Technology: Applications newsletter. | Get the latest insight on the tech news that matters from InfoWorld's Tech Watch blog. ]

What to tell them? To be sure, handling large amounts of data isn't virgin territory for most IT departments, but beyond the hype, analysts say, big data really is different from the data warehousing, data mining and business intelligence (BI) analysis that have come before it.

Data Explosion iGuide

Data is being generated with greater velocity and variability than ever before, and, unlike data in the past, most of it is unstructured and raw (sometimes called "gray data").

Blogs, social media networks, machine sensors and location-based data are generating a whole new universe of unstructured data that -- when quickly captured, managed and analyzed -- can help companies uncover facts and patterns they weren't able to recognize in the past.

"We've collected data for a long time, but it was very limited -- we produced a lot of it, but no one was doing much with it," says Paul Gustafson, director of Computer Sciences Corp.'s Leading Edge Forum, Technology Programs. "The data was archived and it was modeled around business processes, not modeled as a broader set of core knowledge for the enterprise. The mantra is this shift from collecting to connecting."

For example, the U.S. healthcare industry could drive efficiencies and increase productivity by effectively harnessing data related to quality of care, success rates and patient history, according to a May report on big data issued by McKinsey Global Institute, which estimated that the industry could generate more than $300 billion in value every year with such bigdata initiatives. The report likewise suggests that big data has the potential to increase an average retailer's operating margin by more than 60 percent.

IT is standing at the forefront of this data revolution, industry observers say.

"This is an opportunity to walk into the CEO's office and say, 'I can change this business and provide knowledge at your fingertips in a matter of seconds for a price point I couldn't touch five years ago,'" says Eric Williams, CIO at Catalina Marketing.

Williams should know -- Catalina maintains a 2.5-petabyte customer-loyalty database, including data on more than 190 million U.S. grocery shoppers collected by the largest retail chains. This information is, in turn, used to generate coupons at checkout based on purchase history.

To steer organizations into the era of real-time predictive intelligence, Williams and other industry watchers say, tech managers must evolve their enterprise information management architecture and culture to support advanced analytics on data stores that measure in terabytes and petabytes (potentially scaling to exabytes and zettabytes).

"IT is always saying they want to find ways to get closer to the business -- [big data] is a phenomenal opportunity to do exactly that," Williams says.

Overcoming big-data hurdles

Because it's early on, big-data technologies are still evolving and haven't yet reached the level of product maturity to which IT managers have grown accustomed with enterprise software.

Many emerging big-data products are rooted in open-source technologies, and while commercial distributions are available, many still lack the well-developed third-party consulting and support ecosystem that accompanies traditional enterprise applications like ERP, points out Marcus Collins, research director at Gartner.

What's more, there is a significant gap in big-data skills in most IT departments, which have, up until now, focused on building and maintaining more traditional, structured data warehouses.

And there are major shifts to be made, both in terms of culture and in traditional information management practices, before big data can successfully take hold within an IT organization and throughout the company, notes Mark Beyer, Gartner's research vice president of information management.

Rather than waiting for the pieces to fall into place, savvy IT leaders should start prepping themselves and their organizations to get ahead of the transformation, says Beyer and other analysts.

Here are the top five things tech managers should be doing today to lay out a proper foundation for the big-data era of tomorrow.

Take stock of your data

Nearly every organization potentially has access to a steady stream of unstructured data -- whether it's pouring in from social media networks or from sensors monitoring a factory floor. But just because an organization is producing this fire hose of information, that doesn't mean there's a business imperative to save and act on every byte.

"With this initial surge around big data, people are feeling an artificial need to understand all the data out there coming from weblogs or sensors," notes Neil Raden, vice president and principal analyst at Constellation Research.

Part of that anxiety may be coming from vendors and consultants eager to promote the next big thing in enterprise computing. "There's a certain push to this coming from people who are commercializing the technology," Raden observes.

Smart IT managers will resist the urge to try to drink from the fire hose, and instead serve as a filter in helping to figure out what data is and isn't relevant to the organization.

A good first step is to take stock of what data is created internally and determine what external data sources, if any, would fill in knowledge gaps and bring added insight to the business, Raden says.

Once the data scoping is underway, IT should proceed with highly targeted projects that can be used to showcase results as opposed to opting for a big-bang, big-data project. "You don't have to spend a few million dollars to start a project and see if it's worth it," Raden says.

Let business needs drive data dives

It sounds like a broken record, but the concept of IT/business alignment is absolutely critical to an initiative as big and varied as big data, IT analysts say.

Many of the initial big-data opportunities have been seeded in areas outside of IT, they say -- marketing, for example, has been early to tap into social media streams to gain better insights into customer requirements and buying trends.

While the business side may understand the opportunities, it is IT's responsibility to take charge of the data sharing and data federation concepts that are part and parcel of a big-data strategy.

"This is not something IT can go out and do on its own," says Dave Patton, principal of information management industries at PricewaterhouseCoopers LLP. "It will be hard to turn this into a story of success if [the initiative] is not aligned to business objectives."

Early in its big-data initiative, Catalina Marketing's Williams brought business managers together with its financial planning and analysis (FPA) group in a team effort to make a business case for information architecture investments.

The business side identified areas where new insights could deliver value -- for example, in determining subsequent purchases based on shopping cart items or through a next-buy analysis based on product offers -- and the FPA team ran the numbers to quantify what the results would mean in terms of enhanced productivity or increased sales.

Re-evaluate infrastructure and data architecture

Big data will require major changes in both server and storage infrastructure and information management architecture at most companies, Gartner's Beyer and other experts contend. IT managers need to be prepared to expand the IT platform to deal with the ever-expanding stores of both structured and unstructured data, they say.

That requires figuring out the best approach to making the platform both extensible and scalable and developing a roadmap for integrating all of the disparate systems that will be the feeders for the big-data analysis effort.

"Today, most enterprises have disparate, siloed systems for payroll, for customer management, for marketing," says Anjul Bhambhri, IBM's vice president of big-data products. "CIOs really need to have a strategy in place for bringing these disparate, siloed systems together and building a system of systems. You want to be asking questions that flow across all these systems to get answers."

To be sure, not every system will need to be integrated; approaches will vary depending on the size of company, the scope of the business problem, and the data requirements. But Bhambhri and others say the overarching goal should be to create an information management architecture that ensures data flow between systems. To create this foundation, companies will leverage technologies like middleware, service-oriented architecture, and business process integration, among others.

In the meantime, traditional data warehouse architectures are also under pressure. Gartner's Beyer says that 85 percent of currently deployed data warehouses will, in some respect, fail to address the new issues around extreme data management by 2015.

Even so, he says, "we don't want to give the idea that rip-and-replace is even on the table." Instead, existing repositories can be expanded and adapted to encompass built-in data processing capabilities.

"The warehouses of the past have been focused on determining what kind of data repository you have and where you have it. The new mindset is that data warehouses will be a combination of new and existing repositories plus data processing and delivery services," Beyer explains.

Bone up on the technology

The big-data world comes with a big list of new acronyms and technologies that have likely never graced a CIO's radar screen.

Open-source technology is getting most of the attention with technologies like Hadoop, MapReduce, and NoSQL taking credit for helping Web-based giants like Google and Facebook churn through their reservoirs of big data. Many of these technologies, while starting to be offered in more commercial forms, are still fairly immature and require people with very specific skills sets.

Beyond the new open-source options, IT groups will also have to ensure they are up to speed on other technologies important to the big-data world, such as in-database analytics, columnar databases and data warehouse appliances.

IT managers and their staffs need to dive in and at least familiarize themselves with these new tools in order to be properly situated to make big-data decisions going forward.

Prepare to hire or retrain staff

Whether it's a Hadoop expert or a data scientist, most IT organizations are sorely lacking the right talent to take the next steps with big data. The analytic skill sets are perhaps the most crucial, and they represent the area where the gap is currently largest.

McKinsey projects that in the U.S. alone, there will be a need by 2018 for between 140,000 and 190,000 additional experts in statistical methods and data-analysis technologies, including the widely hyped emerging role of "data scientist."

In addition, McKinsey anticipates the need for another 1.5 million data-literate managers, on either the business or tech side of the house, who have formal training in predictive analytics and statistics.

Under the IT department's jurisdiction, traditional data warehouse and BI professionals will require some retraining.

And in addition to traditional skills in information management, governance and database structure, the new big-data professionals need an understanding of semantics and mathematical disciplines -- not to mention expertise in the new predictive analytics tools and data management platforms that comprise big data.

"The people who built the databases of the past are not necessarily the people who will be building the databases of the future," says Catalina's Williams. "Don't underestimate the complexity in trying to produce something like this."

For some companies, especially those in less populated areas, staffing will likely complicate the challenge. "[Big data] definitely requires a different mindset and skills in a host of areas," says Rick Cowan, CIO at True Textiles, in Guilford, Maine, a contract manufacturer of interior fabrics for the commercial market.

"As a medium-sized business, it's been a challenge to be able to get staff and keep them up to speed with the ever-changing environment." To address the need, Cowan has begun formally retraining programmers and database analysts to come up to speed on advanced analytics.

IT department heads will have to do some transforming of their own to excel in this brave new world. While the best tech leaders of the past have been partly information librarian and partly infrastructure engineer, the IT managers of the future will be a combination of data scientist and business process engineer, says Gartner's Beyer.

"CIOs have been used to managing infrastructure based on a given instruction set from the business, as opposed to a CIO that is able to identify the opportunity and therefore push towards innovative use of information," he explains. "That's the transformation that needs to happen."

Stackpole, a frequent Computerworld contributor, has reported on business and technology for more than 20 years.

This story, "Big data prep: 5 things IT should do now" was originally published by Computerworld.

Copyright © 2011 IDG Communications, Inc.

How to choose a low-code development platform