Yahoo adds security and workflow management to Hadoop

Yahoo is integrating Hadoop with the Kerberos authentication standard, allowing for secure collaboration and multitenancy

Yahoo, which has championed the open source Apache Hadoop distributed computing platform, is adding capabilities for Kerberos security and workflow management to the platform.

The company on Tuesday will unveil the open source additions at the Yahoo-sponsored Hadoop Summit 2010 conference in Santa Clara, Calif., and donate them to the Apache Software Foundation.

[ InfoWorld's Paul Krill reported last week that Cloudera and Quest are linking Hadoop to Oracle databases. ]

Yahoo will integrate Hadoop with the Kerberos authentication standard, enabling more secure collaboration and sharing of authenticated data. Kerberos capabilities also allow for multi-tenancy, in which hardware can be used by multiple parties through authenticated access and processing of sensitive data, according to Yahoo, which developed and uses Hadoop.

"In the past, when we had data that had varying security requirements, we would isolate [the data] on different grids, and then we would control access to those clusters," said Shelton Shugar, senior vice president of cloud computing at Yahoo. Kerberos, though, will make it easier to manage security access.

"This [provides] strong authentication and protection of the data, and now we can mix applications and data on the same grid," said Shugar.

Hadoop has been used for data center and cloud computing.

We're using [Hadoop] as our big data infrastructure," Shugar said. The company collects almost 100 billion compute events each day, such as information about clicks and page views, and stores this data in Hadoop, Shugar said. Hadoop is used for customizing which content shows up for specific users based on indicated preferences, he said.

Yahoo's Oozie workflow engine for Hadoop provides workflow management and a coordination engine to manage jobs running on Hadoop. These could include the Hadoop Distributed File System, the Pig data flow language and execution framework for parallel computation, and the MapReduce software framework for distributed processing of large data sets.

Oozie is event-driven and can manage complex jobs, Shugar said.

Yahoo with its Hadoop moves is pushing its company profile, which has been diminished in the wake of Google's emergence, explained analyst Melanie Posey, of IDC.

"One of the reasons that Yahoo's kind of taken the lead and run with a lot of the extensions to Hadoop is to reassert its position in the Internet space as an innovator," Posey said.

The security and workflow technologies will  be available at the Yahoo Developer Network.

Yahoo started using Hadoop in 2005. It has become popular with Internet companies and is spreading to other types of companies as well, he said.

This article, "Yahoo adds security and workflow management to Hadoop," was originally published at Follow the latest developments in business technology news and get a digest of the key stories each day in the InfoWorld Daily newsletter and on your mobile device at

Copyright © 2010 IDG Communications, Inc.