If you haven’t heard, we have this awesome new thing now called a cloud server. Though the name doesn’t mean anything in particular, you can think of cloud servers as instances of compute and I/O resources that we can instantiate and destroy as we like. It’s really neat. (Note: Unsticks tongue from cheek.)
But it’s not all utopia. Sure, it makes standing up scalable environments extremely easy, but managing those environments gets funky when you start to worry about automatic scaling and service growth. All of a sudden the standard methods don’t really work anymore.
We used to consider scalability on a comparatively glacial rate. If we were adding a bunch of new employees, IT would need to accommodate them with expanded server resources for storage and app services, perhaps bring in a more powerful database cluster, that sort of thing. We’d plan our scaling out many months, even years. Big Internet sites ran boatloads of physical servers no matter the actual load because they had to be prepared for spikes or to accommodate normal traffic patterns. In slow times, those servers would run idle.
Now we consider scaling almost instantaneously. We can spawn new instances at will and toss them away when the load spike is over. We move in minutes, not months. But automating that can be dangerous and difficult to get right. The variables and tuning of automated app scaling are also highly application-specific; what works great for one app will blow up another. The devil is absolutely in the details.
As an example, consider a typical tiered Web app. We have database, storage, and front-end app servers. In order for this infrastructure to grow and shrink dynamically with changing load conditions, we need to monitor all of these parts in concert and make changes to them as the load dictates -- while taking the load on other parts into consideration as well.
If our front-end servers are starting to spike, we’ll need more of those, and though the database servers aren’t running terribly hot at the moment, they will be when we spin up another dozen front-end servers. Thus, we’ll need to bring in a few more database nodes. Then storage I/O will become an issue, so we may need to expand resources there too.
Later, when the load begins to wane across this infrastructure, we’ll need to start destroying some of those resources -- but not too many and not too quickly. We’ll also need to keep tabs on the load everywhere else when we do that, because reducing capacity in one area may negatively impact another. If we reduce database resources, the load may spike on the app servers because of a bottleneck, rather than front-end load -- we’ll need to be sensitive to that. There’s no sense in adding more app servers that will not address that load issue.
As you can see, the decision tree in this scenario can become quite extensive, and it can contain significant pitfalls. It’s full of monitoring and hold-down timers, wait states, thresholds, and counters. Also, countless declarative and comparative rules will combine to produce an adaptive infrastructure, and the logic itself needs to be watched and adjusted as needed. It’s definitely not a goal, but a constant journey.
If the infrastructure is more complex than this relatively simple Web app, perhaps integrating several public APIs, caching and queuing servers, companion NoSQL database servers, or any number of modern service accessories, the complexity of dynamically managing the load grows exponentially. It’s definitely not as simple as “if a server is overloaded, make another server.”
Oh, all of this assumes the app in question has been developed with this type of rapid scalability in mind. Otherwise, it may be extremely difficult or impossible to retrofit.
The benefits of dynamic scaling this way are significant. It delivers the best performance and availability for a lower price, and it really is a win-win. But don’t take it for granted.