Cleversafe slices up storage to increase big data reliability

New version of company's dispersed storage network aims to be a more efficient alternative to HDFS in Hadoop MapReduce deployments

Mo' data, mo' problems is one of the themes heard at organizations straining to get their arms around big data.

In an ideal world, data would continually flow into bottomless storage troves and be available 24/7 for real-time analysis from anywhere, anytime. But the reality is that costs for always-available storage, including essential backups, can quickly skyrocket and network connections can be notoriously fallible, so there's no guarantee the data you need to access at any given moment will be available.

A company called Cleversafe, which has been in the storage game for three years now, is taking a noteworthy approach to easing the storage strain. The company on Tuesday announced Version 3.0 of its Dispersed Storage Network (dsNet) system, designed to leverage the power of Hadoop MapReduce while replacing the Hadoop Distributed File System (HDFS).

Cleversafe claims dsNet is superior to HDFS in that it eliminates a key shortcoming in HDFS: the requirement to have three full copies of your data. In today's HDFS environments, failure of the single metadata node can render stored data inaccessible or result in a permanent loss of data, according to the company.

Instead, Cleversafe's dsNet system is capable of splicing up each object into a predetermined number of virtualized slices -- 16, for example. Those 16 different slices would be automatically distributed among separate disks, storage nodes, and geographic locations. When an application needs access to a full data file, the system retrieves the necessary slices and combines them in a location. But the system would not need to pull all 16 of those slices at once; just 10 would suffice to re-create a usable version.

If the system works as Cleversafe claims, the advantages are twofold. First, a company that has, say, 100PB of storage wouldn't need 300PB worth of SAN or NAS (enough to save a copy of each object in triplicate). Rather, 160PB of storage could be enough; it would depend in part on how an admin configured the system to slice up objects.

Second, the fact that a storage object sliced into 16 parts could be reconstructed with just 10 of those parts would increase reliability. If one or two storage nodes were not functioning properly and were unable to deliver a couple of slices, for example, the storage object could still be reformed and used.

According to Cleversafe, admins can create and configure three different types of vaults with the system: object vaults, metadata vaults, and analytic vaults. Object vaults are the holding cells for data slices; metadata vaults track where slices are at a given time; and the analytics vault houses the data that's being analyzed.

Admins can choose the number of slices that objects get divided into, as well as how many slices are necessary to reform the object. For example, rather than having the system slice up an object into 16 parts and requiring 10 parts to reform the object, an admin could have an object be split into 24 parts and require 18 to piece it back together. More slices mean higher reliability -- at a potentially higher storage cost.

Cleversafe is offering its system as software, where customers bring their own storage; it also sells commodity hardware on which the software runs. In addition, OEMs could use the software over a cloud storage service, according to the company. Lockheed Martin is currently working with Cleversafe to develop a version of the dispersed storage system specifically for federal government agencies.

Cleversafe plans make Version 3.0 of its software available by the end of the year.

This story, "Cleversafe slices up storage to increase big data reliability," was originally published at Get the first word on what the important tech news really means with the InfoWorld Tech Watch blog. For the latest developments in business technology news, follow on Twitter.