3 mistakes we made moving to a microservices architecture

During our move away from a monolithic architecture, we came up with three sources of friction in our microservices migration. We also developed ways to address them

number three painted on concrete
Silvia Frank / Pixel2013 (CC0)

My company, CircleCI, is a big believer in the blameless postmortem—the idea that when you discuss a project and take emotion out of the picture, you create a true learning experience. Following our migration to a microservices architecture, we had a good opportunity to run a blameless postmortem on what we did right and wrong, and what we’d do differently next time. If you’re thinking about starting the journey to microservices, I’d like to share some advice for creating a smoother transition.

Our move away from monolithic architecture took on urgency when we had a 24-hour outage in 2015. We wanted to be cautious: We’d heard a lot of tales of poor decision-making when transitioning full-stop into microservices. On the other hand, incremental changes to architecture weren’t bringing the transformation we needed.

Early wins breaking up our architecture gave us confidence that this was the right direction for our team, and we decided to go all in. Almost immediately, the wheels started coming off the wagon. Engineering productivity ground to a halt. We realized we were throwing people into an unfamiliar environment—like moving from the small-town comfort of the monolith to the unknown microservices big city.

In our post-mortem, we came up with three sources of friction in our microservices migration, and we developed ways to address them.

1. Decision-making

“Analysis paralysis” is being faced with a decision that’s so complex that you spend ages considering all the options without pulling the trigger on anything. The solution is to make hard decisions early on, and then reduce future decision-making to exceptions only—choosing to go in another direction only when that initial decision fails you.

In our case, we said to our engineers, “We’re a Clojure shop. It’s not an option for you to decide what language or stack you’re going to use. We all know Clojure, and it has treated us well.”

In deciding to use gRPC, Postgres, Docker, and Kubernetes, we felt like we had agreed on a common stack that would serve the project. It turns out that the nuances of those decisions were more complex than we anticipated: What version of Clojure? What libraries?

While we thought we had made our important decisions upfront, we didn’t anticipate the depth of decisions we were going to run into—we weren’t even close. So, what did we learn? We could have spent more time creating guidance upfront, but in an agile world, that isn’t a great investment of time. Instead, your team needs a very clear definition of how to make decisions, who can make them, and how to share those decisions efficiently with the rest of the team. Because you can’t anticipate every decision at the outset, make sure you have clear protocols to smoothly handle the unexpected.

2. Novelty

Engineers love new stuff. Sometimes, it’s because the old stuff hasn’t satisfactorily solved our problems, in which case it makes sense to seek new solutions. But there are times when old stuff might be the right choice for your microservices project. Moving to microservices is on its own a significant change, so limiting additional changes is a wise strategy.

At CircleCI, we’d been a MongoDB shop since 2011. It was the devil we knew, as they say. But when we moved to microservices, we decided this was a good opportunity to go back to PostgreSQL, which many of us preferred. Turns out, folks didn’t know Postgres as deeply as we’d assumed, and this ended up creating more friction and novelty, not less.

To understand which tools and systems are being used solely because of novelty (or because of habit), uplevel communication among team members. Invest time in figuring out how you’re going to roll out each tool for broader use (and for better reasons than novelty). You don’t want to find out that everyone is trying to solve issues independently and running into the exact same problems, when switching to a common tool would help everyone move faster.

3. Repetition

In line with novelty, repetition is a problem that will drag down your microservices ambitions if you don’t stop it in its tracks. In our case, we found out that three engineers each decided to write their own libraries to deal with the gRPC communication framework. That was two engineers’ worth of time wasted.

The lesson we learned was that we could use shared components for infrastructure concerns. The value of microservices is in the autonomy they unlock, but with that autonomy comes increased overhead. The key is to find the areas where the value created by shared components outweighs the overhead. But don’t leave it up to individual teams to make those trade-offs; it’s imperative to make it someone’s job, so that person can assess the overall benefits to the team. Shared components that don’t solve the needs of other teams don’t end up being shared components.

In our case, we created guilds, which are cross-team groups built around areas of expertise. (We used Spotify’s model for creating guilds.) For example, our site reliability engineering (SRE) guild has a weekly meeting that anyone can attend. When people are working on the same problem—like a gRPC library or connecting databases—we can centralize the work.

Today, more than a year after our initial launch, we’re feeling the impact of microservices. We’re running at five times the utilization of the previous platform, and the old operational concerns (like slow builds and outages) have largely disappeared. We were successful because we recognized that this project was an investment in engineering, not a product delivering value to customers.

Getting buy-in on this point is key when you’re breaking up architecture into microservices, because you’ll be making a lot of upfront decisions and investments. If you choose to approach a transition like this piecemeal, be aware that you’ll be facing a lot of rework. Make sure to have a healthy conversation about this investment before you take the first small steps toward microservices.

Copyright © 2018 IDG Communications, Inc.