MapR invites Docker and Mesos to its big data party

MapR invites Docker and Mesos to its big data party
Credit: DARPA / Wikipedia

MapR's updated Hadoop distribution provides persistent storage for Dockerized apps, enables Hadoop jobs governance via Mesos

The latest version of MapR's Converged Data Platform (CDP) has Docker -- and Apache Mesos -- on its mind.

The update adds the MapR Posix Client, which exposes the proprietary MapR-FS filesystem used by CDP to Docker containers as "a fully distributed, secure, reliable, read-write file system." It also allows Docker containers to access the MapR-FS file system for persistent storage. Thus, applications in containers can use MapR-FS instead of Docker data volumes as a store for persistent data. Through this mechanism, users can "deploy data-oriented applications in Docker with the assurance that critical data will be persisted across application or server failures, or container movement across servers with no manual intervention."

Further, CDP implements Apache Myriad, an open source framework for scaling Hadoop clusters with Apache Mesos. Apache Myriad allows various elements within CDP to work with Mesos as a management system. Mesos was devised as an application deployment and resource management framework for whole data centers; in contrast, YARN, the Yet Another Resource Negotiator job framework used in modern Hadoop deployments, only deals with Hadoop jobs.

Myriad lets Mesos launch and manage YARN node managers and, thus, YARN jobs. MapR claims this allows a convenient path to multitenanted architectures, with Myriad partitioning resources between multiple YARN governors.

YARN and Mesos are hard to reconcile because they differ philosophically in how they request and allocate resources. Mesos offers details about available resources, so the best fit for the job at hand can be calculated. YARN makes decisions about scheduling, so resources managed by both frameworks have to be kept separate. Myriad, by contrast, allows resources to be allocated evenly across both frameworks.

Myriad was created in part by MapR, which becomes the first major commercial Hadoop distribution to make use of the project for production. That said, Myriad has since been received by the Apache Foundation for potential stewardship, so it's possible other distributions -- Hortonworks, Cloudera, Pivotal -- could add Myriad in time.

The growing use of containers has inspired novel approaches to data locality. Cisco, for instance, has new hardware designs intended to alleviate the problem, but MapR's strategy resides entirely in software.

From CIO: 8 Free Online Courses to Grow Your Tech Skills
View Comments
Join the discussion
Be the first to comment on this article. Our Commenting Policies