To Istio and beyond: Azure’s Service Mesh Interface

Microsoft’s latest Kubernetes development is set to change the way we deploy distributed applications

Modern, cloud-first application development, at least on Azure, has become almost dependent on Kubernetes. Technologies such as Virtual Kubelets, the AKS (Azure Kubernetes Service), and the Azure Service Fabric Mesh are key to building scalable distributed applications on Azure, using containers to deploy and manage microservices.

Looking at Azure’s Kubernetes tools, it’s clear that Microsoft is doing a lot of work in and around the Cloud Native Computing Foundation, working on all aspects of the open source framework. We shouldn’t be surprised; Microsoft hired one of the founders of the Kubernetes project and then acquired Deis, a significant vendor. The Deis team is behind one of the latest Azure contributions to the Kubernetes ecosystem, the Service Mesh Interface (SMI).

Introducing service meshes

It’s perhaps best to first explain what a service mesh is and why it’s important to any Kubernetes-based application.

Modern IT architectures are all about abstraction. With cloud services we no longer need to think about the underlying hardware. If we’re using IaaS we define virtual machines to host our code. With PaaS we’re even further from the hardware, using the services and APIs we chose, picking an appropriate performance level for our applications and budgets. With container-based architectures such as Kubernetes, we’re at a point somewhere in between the two: using services like AKS we can define the underlying virtual machines, which then host our container pods and scale out with changes in compute and memory (and now with KEDA (Kubernetes-based event-driven autoscaling), on receipt of events).

That’s just one aspect of abstraction. Kubernetes microservices are, at heart, stateless; they use external storage and sit on top of either physical or virtual networks. It’s the network aspect of running Kubernetes that’s probably the most tricky: As services scale out and scale down, you need to modify your network to match the changes to your application. But how do you keep services connected when an application front end and back end may be scaling at different rates?

That’s where service meshes come in. They’re a new layer of abstraction, one that lifts your code away from the underlying network by taking advantage of the capabilities of a modern software-defined network. By acting as a set of network proxies that are deployed with your code, a service mesh manages service-to-service communication without your code needing any awareness of the underlying network. You can think of a service mesh as an automated control plane for your application’s networking, managing the underlying control plane as Kubernetes scales your code up and down.

A software-defined network for microservices

Perhaps best thought of as a way to implement smart, latency-aware, scalable load-balancing alongside service discovery, a service mesh is basically a distributed router with dynamic routing rules that are managed as part of a Kubernetes deployment. You can define additional rules; for example, keeping production and test systems separate, or handling a deployment of a new release and the change between container versions. Each pod in an application has a service mesh instance running as a sidecar, with service discovery and other stateful elements running outside your services.

With a service mesh you’re pushing intelligence into a new network layer, so you don’t have to put it into your microservices. Need to encrypt a connection? That’s a job for your service mesh. Need to authorize clients? Another task for the service mesh.

Too many meshes

Combining a Kubernetes deployment with a service mesh makes a lot of sense. However there’s one more big problem: Which one do you use? Envoy? Istio? LinkerdAspen Mesh? If you chose one, what’s to stop a development team in another part of your business from choosing another? Then what happens if your company decides to standardize on a specific platform?

That’s the problem Microsoft is setting out to solve with the Service Mesh Interface. Instead of each service mesh having its own set of APIs, the SMI is a way to implement common APIs that work across different service meshes, managing that new smart network. Instead of locking your code into a specific service mesh and its APIs, you can write code that addresses most common use cases via a common API. If you need to swap out a service mesh—if you change providers or you find one that works better—there’s no need to change your code, as long as the service mesh implements the SMI. All you need to do is change your service mesh sidecars and redeploy your code.

SMI: common service mesh APIs

Working with Kubernetes-ecosystem companies such as Hashicorp and Buoyant, Microsoft has been defining the key features for SMI that support common requests from its customers. In the initial release it has focused on three areas: traffic policy, traffic telemetry, and traffic management. These three areas are controlled by most service meshes, and the intention is to make this a specification that’s easy to implement without changing the underlying application.

By making SMI a set of standard APIs, there’s nothing to stop service mesh vendors from continuing to offer their own APIs or additional features outside those specified. Alternatively they don’t need to make any changes; third parties can build translation layers that sit between SMI APIs and proprietary service APIs. You won’t need a new version of Kubernetes either, as the SMI APIs are implemented as extension API servers and custom resource definitions. You can go ahead and install them in any cluster, using existing management tools. That should make SMI easy for Azure and other cloud-hosted Kubernetes services to build them into their existing managed Kubernetes services.

Whether you want to use Linkerd or Aspen Mesh or VMware’s NSX Service Mesh, with SMI you’ll be able to chose the one you prefer, improving code portability and avoiding lock-in to specific cloud services. Then there’s the opportunity to switch service meshes without affecting your code. If a new service mesh offers better performance, all you need to do is change your build pipeline to use the new mesh and then deploy an updated application.

It’s interesting to see Microsoft take the lead on a project like this, working with a wide cross section of the Kubernetes community. By taking an approach that’s explicitly not focused on building a service mesh, Azure can offer different service meshes as part of configuring AKS, letting you choose the tool you want without needing to change any code.

Copyright © 2019 IDG Communications, Inc.