Walmart doubles down on OpenStack

After migrating one of the world's largest e-commerce operations to OpenStack last year, Walmart's technologists hungrily anticipate the latest improvements as the project evolves

Not long ago, you'd be hard-pressed to find a single enterprise customer with a big production OpenStack deployment. That's because early adoption of OpenStack, the leading open source "cloud operating system," required a leap of faith. You had to believe that an unusual consortium of passionate developers and corporate interests could deliver a continuously improving cloud solution -- and that dealing with the inevitable rough edges would be worth it.

Last August, Walmart made that leap and bet its entire e-commerce operation on OpenStack to the tune of more than 100,000 cores and several petabytes of storage. Amandeep Singh Juneja, senior director of cloud operations and engineering for Walmart Labs, credits Walmart's "awesome team of architects, engineers, and technologists" for the successful effort, which he says "worked as seamlessly as you can imagine a migration could work."

Juneja himself jumped on the OpenStack bandwagon early when he joined HP -- the first major industry player to commit to OpenStack -- in February 2012 as chief architect of Web services for HP Cloud. InfoWorld Executive Editor Doug Dineley and I interviewed Juneja at Walmart Labs, where he was hired shortly before the OpenStack migration was complete. "We kind of had heart surgery, brain surgery, and a bunch of other surgeries at once -- while the site was running," says Juneja. "So it was a huge initiative, but we did well. Our site was up and running without a hiccup throughout the holidays."

From Juneja's description, Walmart.com was due for an upgrade, powered by a legacy ATG e-commerce solution on top of a scale-up production infrastructure that was mostly bare metal. "We realized that the application infrastructure that we had was not going to scale to the needs we had," says Juneja. "So we decided to go into SOA architecture -- to split our applications into a bunch of resource services and orchestration services -- and add IaaS at the bottom of that."

But why OpenStack for IaaS versus a commercial software solution, such as that offered by VMware? Juneja responds:

OpenStack met most of our needs, but the beauty of OpenStack was that there was this huge community investment as well as big technology company investments. They were using it at scale in public clouds -- that was one of the reasons -- you get something that is not ready yet, but will be ready by the time you need it. Second, you can customize it to your needs; you can make changes to it. Third, if there are issues, there's a community out there that will fix it for you.

Walmart.com has adopted most components of the Havana version of OpenStack: Nova (compute), Swift (object storage), Cinder (block storage), Neutron (networking), Horizon (dashboard), and Keystone (identity). Walmart actually divides its infrastructure into multiple clouds, says Juneja, and federates those clouds using Keystone.

Neutron is at an early phase, Juneja says, but Walmart.com plans to put more effort into networking improvements:

SDN is going to be our next step. Network is one area we need to put a lot of effort into. When you grow horizontally, you add compute, and the network is kind of the bottleneck for everything. That's an area where you want more redundancy. Right now we have Open vSwitches on hypervisors. We're also looking at a couple of other SDN vendors.

Walmart Labs' charter is to look ahead and experiment. "Think of a technology, and we're using it," Juneja says. With his multicloud, distributed IaaS architecture, "we can have betas running all over the place." The database layer has stayed the same, for example, with Oracle running on bare metal, but the company is actively determining how to rearchitect that layer -- and plans to use Cassandra "more and more." Juneja also sees promise in a new OpenStack project called Ironic, which enables VMs to be managed on bare metal.

Soon, Walmart.com will upgrade from Havana to Juno, the latest OpenStack release. This was supposed to be an in-place upgrade. But in fact, it will be destructive, so the cloud will have to be taken down and rebuilt. From Juno going forward, Juneja says he's confident in-place upgrades will be supported.

The bottom line, according to Juneja, is that OpenStack is now mature enough to use at scale in production, which was not true a few years ago. "With all the new projects coming in, with metering and measuring, you can build a public cloud from scratch and a private cloud from scratch. That's the best part." And the worst part? "The documentation is lacking," he says.

Clearly, Juneja and his team have assembled the expert human capital necessary to handle OpenStack. But he needs more. He has even hacked his own LinkedIn page, so in place of his current position at Walmart Labs, you see: "Hiring Sr/Staff/Principal Software Engineers for OpenStack Cloud."

That says something about the level of concentrated effort to make OpenStack run seamlessly in production. Many customers have essentially outsourced deployment and management of OpenStack to the likes of Mirantis or Rackspace for good reason. With the scale of Walmart.com, however, it makes sense to build deep understanding and experience in-house.

It's great to see a large organization like Walmart embrace the basic OpenStack proposition that a broad open source effort can lift all participants' boats, with significant improvements rolling out in every twice-yearly release. But it's also clear that if you're going to try OpenStack at home without professional services hand-holding, you'd better be prepared to make a major investment in the "awesome" architects, engineers, and technologists needed to make it work.

Copyright © 2015 IDG Communications, Inc.