How Yahoo wrangles its giant private cloud

Yahoo VP of Cloud Services Preeti Somal pulls back the curtain on the internet company's vast private cloud—and offers valuable lessons any enterprise can use

How Yahoo wrangles its giant private cloud
Thinkstock

Every week it seems we hear about another large enterprise moving a major chunk of workloads to AWS or some other public cloud. Meanwhile, the private cloud—once considered a vital part of the enterprise’s future—gets no respect. “The enterprises that banked on private clouds a few years ago are now having second thoughts,” says InfoWorld’s David Linthicum in a recent post.

I can assure you that Yahoo isn’t one of those enterprises. InfoWorld recently interviewed Yahoo’s VP of Cloud Services, Preeti Somal, who gave us an in-depth virtual tour of the company’s enormous private cloud, which runs hundreds of thousands of servers worldwide, averages one terabit per second of traffic to over a billion monthly users, and accommodates roughly 50,000 build jobs per day.

At that scale, a private cloud makes perfect sense, particularly if you’ve built out the sophisticated IaaS platform, PaaS offering, and developer services that Yahoo has. Not just the gargantuan footprint, but also the fact that Yahoo was an early adopter of OpenStack, containers, Jenkins, and other emerging technologies makes it a fascinating private cloud case.

Building on OpenStack

When Somal left her position as VP of R&D for VMware to join Yahoo in 2013, the internet company was already a year into its OpenStack odyssey. From the start, the No. 1 mandate was to answer the needs of Yahoo’s developers, which is why the first OpenStack-enabled project, OpenHouse, enabled developers to provision their own virtual machines.

“We got developers hooked and then we moved to production,” says Somal. As other OpenStack adopters have observed, however, managing OpenStack at scale is no walk in the park. “It’s been quite a journey,” she says. “We’ve had some scalability issues. The fact that rolling upgrades are not there, not easy … so it’s been a growth and learning process with OpenStack.”

Nonetheless, with the largest community, OpenStack remains pretty much the only reasonable open source choice for building a monster private cloud. Yahoo is all-in, running Nova, Glance, Horizon, Keystone, Neutron, and other services within the multifaceted OpenStack project. Somal’s Cloud Services group is also a big user of OpenStack Ironic, which enables automated provisioning of bare metal. “If I may be so bold in saying so,” says Somal, “I think we are the ones that have pushed the bare metal part to the max. We contributed very heavily to Ironic.”

OpenStack adopters often find they need to fill in gaps with their own code, and Yahoo is no exception. For example, Somal says, “We have major data centers around the globe and OpenStack does not have a good federation mechanism. We have multiple OpenStack clusters, and we built both automation and processes around how we can manage these in a cost-effective manner.”

Ultimately, Somal notes that Yahoo is in the same boat with other major OpenStack adopters, including Walmart, PayPal, and eBay. “Deploying OpenStack for a large-scale environment needs engineering resources, and it’s only companies that are willing to invest in those resources where OpenStack has been a success.” That includes a commitment to recruiting OpenStack talent, which is “really hard,” Somal admits. “We’re big with university recruiting, and we tend to just start from there and build up the talent.”

On the container vanguard

The scale of big internet companies pushes them to adopt or develop fresh technology to meet challenges other organizations simply don’t have. That’s one reason Somal estimates that Yahoo today has “tens of thousands” of Docker containers actually running in production. Yahoo saw the benefits of containers early on, creating its own way to manage LXC (Linux containers), the specification on which Docker was later built.

Atop the OpenStack IaaS layer, Yahoo has built a homegrown PaaS that has evolved to incorporate “the best open source software technology out there,” according to Somal. “Over the last few years we took our custom monolithic stack, made it modular, and have brought a tremendous amount of efficiency. Today our PaaS runs Docker containers using Mesos as the scheduling engine and using ZooKeeper for all the service registrations.”

Why Mesos instead of Kubernetes? Somal explains that “Mesos emerged for us as the right choice. Having said that, we’ve been tracking Kubernetes and it definitely looks like it’s picking up a lot of momentum, so we are doing some prototyping and starting to see what Kubernetes would mean.” At some point, she says, her team may build a hybrid environment using Mesos and Kubernetes side by side, but the scale of her current container deployment makes a wholesale migration to Kubernetes unrealistic at this time.

Along with the container-based PaaS for dev, test, and deployment, Yahoo earlier this year launched the open source project Screwdriver, a collection of continuous delivery services that first saw life largely as an abstraction layer created by Yahoo developers for Jenkins. Integration with Git repositories is part of the deal: According to the company, in addition to the 50,000 or more builds, Yahoo developers average 170,000 Git operations per day.

The challenge ahead

When I asked Somal what her biggest challenge was going forward, she responded with a phrase that may sound familiar to the average enterprise: modernizing applications. Yes, Yahoo’s public-facing consumer applications have been refactored to take advantage of internal cloud services and now run on the massive private cloud. But there’s still work to be done.

“How can we bring more [applications] on so that we can truly start leveraging that scale?” asks Somal. “Another way to say it is we delivered the agility, we have all the operational pieces in play, and now we need to start seeing the efficiency because we have a massive footprint running on our platforms. That is step two for us.”

Yahoo, like most enterprises, is also grappling with the role public cloud providers might play. As Somal puts it, “Even with a tremendous investment in private cloud, we have to be very cognizant of public cloud, and for us hybrid is a very real option. There are some use cases where it provides a tremendous amount of first-mover type opportunities, like any kind of burst capacity or specific regions where we don’t have big data centers.”

The latter issue ties into yet another important part of Somal’s purview: edge services. The Cloud Services group constantly works on optimizing content delivery to consumers, providing Yahoo developers with APIs for edge pods and using Yahoo’s massive analytics capability to analyze latencies (see “Yahoo struts its Hadoop stuff”). Yahoo’s intense cycle of monitoring and optimization is something any high-traffic, consumer-facing enterprise would be wise to emulate.

What enterprises can learn from Yahoo

Somal offers other lessons enterprise IT will find useful. For one, she advises against the Big Bang approach to modernizing legacy systems. “The first thing people think is I’m going to rebuild it all fully. That approach is too time-consuming, too costly. The approach we’ve taken here is we first wrap an API in front of that system, get customers moved to it, and then start modernizing beneath that system. That’s worked really well for us.”

In that same context, she suggests simplifying to the extent that you can. “Focus on the 80 percent,” she says, and don’t knock yourself out trying to accommodate requests that make you scratch your head. Moving to a service-based cloud almost always requires eliminating some choices while vastly increasing the efficiency of more mainstream endeavors.

Her broadest advice, however, may be the most valuable to enterprise IT. “Focus on the value,” she says, which you can only do when you know your customer. For Somal, those customers are engineers who need “agility, self-service, and ease of use.” For just about any enterprise, those qualities reign supreme across a whole bunch of internal and external customers.