Joyent CTO: How big data platforms should work

Jason Hoffman, CTO/founder of cloud service provider Joyent, lays out his views on optimizing public cloud infrastructure

This week on the New Tech Forum, we're taking a look at the challenges of traditional storage and compute in the world of big data -- and the growing role of object storage and integrated compute resources.

Jason Hoffman, CTO and founder of cloud service provider Joyent, details how combining object storage and parallel compute clusters can make working with big data easier and faster by eliminating bottlenecks.

How objects and compute will eat the world
Networked storage vendors' days are numbered. Customers are fleeing to consolidated online object storage, and soon the digital object storage will surpass traditional file storage as the primary model for data outside of a DBMS. But there's a subtle and often unappreciated downside to most distributed object storage: data inertia. The implicit limits on moving huge data sets to in-network compute nodes deter business or clinical insights from surfacing.

At Joyent, we architected the Manta Storage and Compute Service -- "Manta," for short -- to be a best-in-class object store and an in-storage massively parallel compute cluster. It drives data latency effectively to zero, moving weekly or monthly jobs to an hourly or even an on-demand analytic cadence.

Whence big data?
Massive data volumes arise from machines (log files, API calls), digitized nature (DNA sequences, video, audio, environmental sensors), and the humanity of billions of people online (Facebook, Baidu, e-commerce). Take a mere 10 million patients' genomes, for example. That requires 20 exabytes (EB) of storage. Then there's camera phone resolution and market penetration, which has been growing exponentially. And according to Digital Marketing Ramblings, Twitter distributes 400 million updates per day to 500 million subscribers.

But in 2012, all enterprise storage vendors shipped just 16EB of capacity.

With the big data wherewithal to capture it all, we could be at the early stages of a deeply disruptive wave of innovation. This sweeping crush threatens business models and technical architectures that assumed a paucity of data and scarcity of places to put it.

The additional hidden cost to networked object storage is the implicit inertia in petabytes of recorded audio or e-commerce server logs. Namely, there is a need to move that data from its resting place to a computational node. Computation is necessary to glean the business insights, social relevancy, and clinical results that make saving digital ephemera worthwhile. That theoretical 1Gbps or even 40Gbps network is a severe limit to the class of algorithms that can be considered and the rate at which they can be applied.

What is object storage?
Objects differ from files in one key respect: They are immutable once written. They can be updated and overwritten in their entirety, but the in-place update of a POSIX or similar file system is verboten. In practice this is not a severe constraint, especially given that most object data itself is immutable. DNA rarely mutates, log tampering is bad, cat videos are remixed and republished. Besides, mutable data winds up in a richly indexed DBMS, which itself may reside on an object or file store.

Enforcing immutability yields architectural simplicity. A massively distributed storage system across networked nodes must contend with network congestion and component failures. A technically seductive approach is to distribute a single object's data across nodes, with erasure-coded partial redundancy to survive one or more node failures at reasonable economic investment. The limits of physics and metaphysics are captured in Brewer's CAP theorem and elsewhere like in Abadi's PACELC refinement.

These acronyms themselves express the trade-offs inherent in a distributed architecture. "CAP" stands for "consistency or availability under network partition"? "PACELC" is "partition, availability, or consistency, else latency or consistency"? Without going into the math, data writes orchestrated at the grain of an entire object are synchronous (consistency first) or asynchronous (availability first).

Unwittingly, these popular redundant-array-of-independent-nodes (RAIN) architectures force an expensive reconstitution operation across the network for each object read. Furthermore, no meaningful computation can be performed at a storage node because each node has only a fractional view, tesselated arbitrarily by the erasure code chosen. By design, this popular object storage architecture requires non-negotiable network bandwidth to operate on an object.

Joyent Manta Storage and Compute Architecture
The Joyent Manta Storage Service's architecture is a departure from storage-only approaches. Instead, a high-performance compute cluster is included in each storage node. We aimed for a converged storage-and-compute design goal.

Rather than erasure-coding across nodes, Manta uses erasure codes across disks (currently 9+2 across three stripes with three hot standby disks, if you must know). It also uses a default multiple-data-center full-copy redundancy to achieve equal data durability to RAIN systems. The economic overhang for a default two copies is surprisingly negligible compared to common RAIN architectures for standard redundancy services. This suggests a two-fold redundancy or a rich gross margin associated with these services.

Regardless, the architectural limitations exist when distributing objects across nodes. Instead of beginning with a full object copy, a Manta storage node already has the entire object over a very high-bandwidth PCIe 3.0 bus and a theoretical 256Gbps, and it can perform meaningful computation on that object with minimal data movement latency.

What types of meaningful computation, you ask? The biggest use cases we'ave seen are delightfully poetic one-liners operating on large data sets. They may extract user or other cohort data from logs for aggregate, operational, billing, sentiment, intention, or geographic insights. Another use case leverages the Internet-visible nature of Manta: bulk in-place media transcoding. These include more prosaic needs like e-commerce catalog thumbnail creation or video format changes. We're beginning to see higher-level machine learning and classification computations that exploit Manta's map/reduce framework even further.

Common, familiar tools
What really evokes ecstatic comments from Manta customers is its use of standard, familiar tools. It's big data without the steep learning curve. Developers familiar with a machine vision library like OpenCV or text-processing languages like Perl are free to use them at the parallel compute nodes in Manta without modification. This reduces training and retooling time, and it taps a relatively large pool of developers in addition to self-styled data scientists.

With the launch of Manta, it's only becoming clearer that object storage is eating the world of storage. There's no doubt that computation on big data is driving new business models. Our view is that simplified, high-performance compute on stored objects will unleash another wave of innovation and business advantage. Object storage that can also provide analytic insights -- what a concept. Now it's our new reality.

New Tech Forum provides a means to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all enquiries to newtechforum@infoworld.com.

This article, "Joyent CTO: How big data platforms should work," was originally published at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

Join the discussion
Be the first to comment on this article. Our Commenting Policies