How to gain visibility into a multicloud environment

A successful multicloud strategy requires effective awareness of all interdependencies—and a new approach to network monitoring

How to gain visibility into a multicloud environment
Thinkstock

Multicloud is typically understood to be an evolutionary step for enterprises that are moving past a single-cloud starting phase toward a “best of breed” approach to cloud offerings. Various factors dictate this. For some, it’s the diversity of workloads that require platform-specific functionality. For others, it’s an evolutionary journey, or a result of mergers and acquisitions.

Lately, we are seeing companies choose multicloud as a primary, cloud-first strategy right out of the gate. In some cases the reason is to reduce dependency on a single vendor, as the platform vendors start building more stickiness into their offerings. In other cases it is to optimize costs, depending on the workload characteristics. There are strong arguments on both these dimensions and it’s not unlike what we saw with hardware platforms.

Regardless of why you’re choosing to operate in multiple clouds, it does introduce some complexity, which, if not managed carefully, can outstrip the cost-saving component of the multicloud strategy and bedevil your performance goals.

That’s why visibility is so important. But like the shift to multicloud itself, a shift in data sets—to measure health and performance of WAN, Internet, cloud, and SaaS provider segments in addition to on-prem networks—is needed to gain operational visibility. In this article, we’ll unpack a few key terms that are connected to multicloud deployment, explain why traditional visibility approaches fall short in the cloud, and explore the approach needed to gain visibility for multicloud operations.

Hybrid cloud vs. multicloud

Hybrid cloud typically refers to a combination of existing legacy data centers, with some services consumed from the cloud. Most applications today are hybrid because they use one or more external API-based services, whether for authentication, payments, or logistics. If your internally hosted app makes a call to Azure AD or Okta for authentication, you are effectively running a hybrid cloud. If your website has a PayPal or Visa payment widget, you’re using hybrid cloud.

As applications get atomized into their constituent services and communicate only via structured API calls, it becomes feasible to locate and scale each component separately. This makes infrastructure and platform services like AWS extremely enticing. So while some core assets and functions may remain on premises, you can scale out the stateless components independently and have them reside in the cloud closer to the users.

VMware is the dominant leader in enterprise cloud and has a viable hybrid cloud offering through its partnership with Amazon. VMware Cloud on AWS allows you to easily extend VM workloads and virtual networks to the Amazon cloud, while still managing everything through vSphere.

Multicloud, on the other hand, refers to a combination of legacy data centers, with two or more cloud vendors. Let’s consider multicloud to include any type of external cloud offering, such as IaaS, PaaS, or SaaS. This is a far more complex environment with multiple substrates, each with its own orchestration quirks. The objective here is to let cost economics and “best-of-breed” dictate where workloads land. From a management perspective, you are dealing with a level of unpredictability and pace of change that can be challenging. Also, your call flows now include many more permutations, which makes performance tuning and troubleshooting especially complex.

Microservices APIs

The microservices architecture has been prevalent for a number of years now, and it has fundamentally changed how new applications are built. Uber is a great example of a service that primarily runs on a microservices ecosystem. Uber relies on third-party APIs for mapping, payments, notifications, and telephony. Each of these APIs may further rely on other back-end APIs. So every time you hail a ride on Uber, multiple API flows, cloud services, and network paths need to work correctly in order for your ride to take you home.

This is a level of complexity that IT organizations have never had to deal with before. The complexity is not immediately obvious when everything works, but the failure states are extremely complex to troubleshoot.

A great example of this is the recent AWS outage. From an infrastructure perspective, AWS had a minor power outage, and systems recovered in fairly short order. However, applications relying on AWS Direct Connect for their back-end flows continued to fail for several hours after the initial incident. The providers of a number of applications and services, including Atlassian, Slack, and Twilio, failed to factor in the hidden dependencies between their multiple clouds.

thousandeyes outages aws ashburn location ThousandEyes

A March 2 power outage impacting a small set of services in Amazon’s AWS-East Region (Ashburn) region quickly cascaded into a major issue for users of AWS Direct Connect. ThousandEyes revealed that more than 240 critical services felt the impact of the outage. 

The cloak of invisibility

One of the challenges with the cloud and the Internet, in general, is the lack of visibility. So many of our traditional network monitoring tools have relied on techniques like SNMP, flow, or packet captures. All of these require some level of privileged access to the servers, switches, firewalls, and routers that make up the data center. None of these can be employed with IaaS or PaaS services. You simply cannot put wiretaps inside Microsoft Azure, or stream flow records from Amazon’s data centers. As a result enterprises have gotten used to thinking of the cloud as a monolithic black box, hidden under a cloak of invisibility.

This approach does not work with a single cloud or hybrid cloud, and it certainly does not work with multicloud infrastructures. The number of path combinations increases factorially with the number of clouds. Each of these paths has numerous unpredictable elements. Thus your risk increases by orders of magnitude. You cannot continue treating these clouds as black boxes anymore. So what are your options?

The cloud uncloaked

Some clouds offer their own network visibility solution. In Microsoft Azure, for instance, you can visualize your enterprise domain, from your network to your Virtual Network (VNet) inside Azure, over your ExpressRoute connection. However, this does not give you a complete end-to-end picture including the external interdependencies. And of course, this solution is specific to Azure and does not offer information about other clouds or your legacy data center. With a multicloud strategy, as workloads move around, your visibility solution needs to follow the resource, regardless of where it resides.

How can you achieve this? There are active monitoring techniques that use specially instrumented application calls to understand not only the application availability and response times but also the underlying network and cloud infrastructure used to deliver those applications. This does not require any privileged information from the cloud infrastructure, so can be cloud and vendor agnostic. Typically all this requires is the target URL of the resource.

This is the approach we take at ThousandEyes, where we operate a global set of software agents that perform Internet-aware network monitoring. ThousandEyes monitors critical services across the Internet from multiple vantage points and algorithmically correlates data to understand service impacts. Thus we were able to determine that more than 240 critical services relying on AWS Direct Connect were impacted by the March 2 power outage.

There is no such thing as steady state in the cloud. All IaaS and PaaS vendors make heavy use of devops and automation tools, so changes happen rapidly without advance notice. At the same time, multicloud deployments often use containerization and automation services like Kubernetes to move workloads to the most optimum cloud platform. In this rapidly changing world, you need continuous visibility that will reflect the changes in the application delivery paths in order to give you a complete, up-to-date view.

Alex Henthorn-Iwane is vice president of product marketing at ThousandEyes. Alex leads product marketing and brings a perspective gained from working on innovative networking and analytics technologies since the early days of the commercial Internet.

Ameet Naik is technical marketing manager at ThousandEyes. Ameet has more than 20 years of experience in networking, IT systems, and information security and has held senior solutions engineering roles at several of the leading networking and security vendors. He has advised multiple global service providers and financial services organizations on best practices in enterprise networking since the early days of the Internet.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.