Hadoop has always been a catch-all for disparate open source initiatives that combine for a more or less unified big data architecture. Some would claim that Hadoop has always been, at its very heart, simply a distributed file system (HDFS), but the range of HDFS-alternative databases, including Hbase and Cassandra, undermines that assertion.
Until recently, Hadoop has been, down deep, a specific job-execution layer -- MapReduce -- that executes on one or more alternative, massively parallel data-persistence layers, one of which happens to be HDFS. But the recent introduction of the next-generation execution layer for Hadoop -- known as YARN (Yet Another Resource Negotiator) -- eliminates the strict dependency of Hadoop environments on MapReduce.
[ Download InfoWorld's Big Data Analytics Deep Dive for a comprehensive, practical overview of this booming field. | For a quick, smart take on the news you'll be talking about, check out InfoWorld TechBrief -- subscribe today. ]
Just as critical, YARN eliminates a job-execution bottleneck that has bedeviled MapReduce from the start: the fact that all MapReduce jobs (pre-YARN) have had to run as batch processes through a single daemon (JobTracker), a constraint that limits scalability and dampens processing speed. These MapReduce constraints have spurred many vendors to implement their own speedups, such as IBM's Adaptive MapReduce, to get around the bottleneck of native MapReduce.
All of this might make one wonder what, specifically, "Hadoop" means anymore, in terms of an identifiable "stack" distinct from other big data and analytics platforms and tools. That's a definitional quibble -- YARN is a foundational component of the evolving big data mosaic. YARN puts traditional Hadoop into a larger context of composable, fit-to-purpose platforms for processing the full gamut of data management, analytics, and transactional computing jobs.
YARN transforms Hadoop (however defined) into a general-purpose, distributed job-execution layer of the sort that the open source initiative's original definition (still on the Apache website) alludes to. Though it retains backward compatibility with the MapReduce API and continues to execute MapReduce jobs, a YARN engine is capable of executing a wide range of jobs developed in other languages.