Big data means big challenges in lifecycle management

No matter what its size or variety, data must still be managed through its lifecycle, even when the tools are immature

Integrated lifecycle management (ILM) faces a new frontier when it comes to big data. The core challenges are threefold: the sheer unbounded size of big data, the ephemeral nature of much of the new data, and the difficulty of enforcing consistent quality as the data scales along any and all of the three Vs (volume, velocity, and variability).

That's my takeaway from a recent article by Loraine Lawson. What she says is consistent with my general thinking on the topic. However, I disagree with her assertion that ILM "matters more" with big data than with smaller-scale data analytics environments. Keeping all of your business data assets secure, governed, and managed matters just as much in this new era as it ever did -- no more, no less.

[ Download InfoWorld's Big Data Analytics Deep Dive for a comprehensive, practical overview of this booming field. | Harness the power of Hadoop with InfoWorld's 7 top tools for taming big data. | For a quick, smart take on the news you'll be talking about, check out InfoWorld Tech Brief -- subscribe today. ]

What has changed is that comprehensive ILM has grown more difficult to ensure in big data environments, given rapid changes in the following areas:

  • New big data platform: Big data is ushering a menagerie of new platforms (Hadoop, NoSQL, in-memory, and graph databases) into enterprise computing environments, alongside stalwarts such MPP RDBMS, columnar, and dimensional databases. The chance that your existing ILM tools work out of the box with all of these new platforms is slim. Also, to the extent that you're doing big data in a public cloud, you may be required to use whatever ILM features -- strong, weak, or middling -- that may be native to the provider's environment. To mitigate your risks in this heterogeneous new world and to maintain strong confidence in your core data, you'll need to examine new big data platforms closely to ensure they have ILM features (data security, governance, archiving, retention) that are commensurate to the roles for which you plan to deploy them.
  • New big data subject domains: Big data has not altered enterprise requirements for data governance hubs where you store and manage office systems of record (customers, finances, HR). This is the role of your established EDW, most of which run on traditional RDBMS-based data platforms and incorporate strong ILM. But these systems of record data domains may have very little presence on your newer big data platforms, many of which focus instead on handling fresh data from social, event, sensor, clickstream, geospatial, and other new sources. These new data domains are often "ephemeral" in the sense there may be no need to retain the bulk of the data in permanent systems of record.
  • New big data scales: Big data does not mean that your new platforms support infinite volume, instantaneous velocity, or unbounded varieties. The sheer magnitudes of new data will make it impossible to store most of it anywhere, given the stubborn technological and economic constraints we all face. This reality will deepen big data managers' focus on tweaking multitemperature storage management, archiving, and retention policies. As you scale your big data environment, you will need to ensure that ILM requirements can be supported within your current constraints of volume (storage capacity), velocity (bandwidth, processor, and memory speeds), and variety (metadata depth).

I also take issue with the pundits who think the big data revolution will eliminate the need for any of us to delete any data unless we truly want to. Yes, it seems as if big data will continue to grow exponentially forevermore. It also seems as if big data platforms will continue to drop precipitously in cost. But I seriously doubt the cost of implementing and managing a big data cloud will ever drop to absolute zero.

If my hunch is correct, we won't ever be able to save every last scrap of the seemingly infinite, never-ending stream of big data pouring into the cloud -- even if we wanted to. Life cycles have an end, which is a key reason we need ILM in the first place.

This story, "Big data means big challenges in lifecycle management," was originally published at Read more of Extreme Analytics and follow the latest developments in big data at For the latest developments in business technology news, follow on Twitter.

From CIO: 8 Free Online Courses to Grow Your Tech Skills