Symantec marries Hadoop with its own Cluster File System

Symantec Enterprise for Hadoop lets companies run big data analytics atop the storage infrastructure they already own

Add Symantec to the rapidly growing list of tech vendors aiming to groom Apache Hadoop for the enterprise. The company today announced Symantec Enterprise for Hadoop, an add-on with which companies can add the platform's Big Data analysis chops to their existing Symantec Cluster File System infrastructure.

"Being able to run Hadoop on your own infrastructure is something no one else is doing," said Don Angspatt, vice president of product management for Symantec's Storage and Availability Management Group

Symantec is not the first company to embrace Hadoop. IDC has predicted explosive revenue growth from Hadoop and MapReduce in coming years. Other players on the Hadoop field include Greenplum, Amazon, Cloudera, HP, Hortonworks, VMware, Microsoft, Dell, and Oracle.

Symantec Enterprise for Hadoop is built on Hortonworks' implementation of the platform. According to Symantec, the offering aims to address familiar Hadoop pain points, including improving poor compute and storage utilization, eliminating high replication requirements of saving three copies of data on separate nodes, reducing costly data moves for processing, and eliminating single points of failure in NameNode and JobTracker.

"These are things that IT operating at enterprise levels has come to expect, but they are not part of the Hadoop paradigm," Angspatt said.

The solution includes a Hadoop Connector that integrates Symantec's CFS with the Hadoop stack. Effectively, CFS takes the place of Hadoop's regular file system. The idea: A company can run Hadoop analytics on its existing storage infrastructure instead of having to invest in parallel storage system or deal with extracting it, transforming, and loading it to a separate cluster. The solution is scalable up to 16PB of both structured and unstructured data.

Symantec marries Hadoop with its own Cluster File System

The pairing of Hadoop and CFS improves availability, according to Symantec: In a regular Apache Hadoop environment, data is distributed across nodes with only one metadata server that knows the data location, which can result in performance bottlenecks and single points of failure. Symantec Enterprise Solution for Hadoop is designed to ensure that analytics applications continue to run as long as there is at least one working node in the cluster, according to the company. Each node in the cluster can access data simultaneously, which Symantec says eliminates both the performance bottleneck and single point of failure.

The Hadoop-CFS combo also delivers features such as snapshots, deduplication, and compression to help companies better manage their data troves.

The Symantec Enterprise Solution for Hadoop is available immediately at no additional charge to Cluster File System customers. It supports Hortonworks Data Platform 1.0 and Apache Hadoop 1.0.2.

This story, "Symantec marries Hadoop with its own Cluster File System," was originally published at Get the first word on what the important tech news really means with the InfoWorld Tech Watch blog. For the latest developments in business technology news, follow on Twitter.