People who operate clouds -- you can call yourselves "cloud ops" -- understand that the work is harder than you were originally told. There are many systems and subsystems to consider, including storage, database, and application layers.
Furthermore, today's reality is not the single-cloud deployments originally envisioned. Complex hybrid clouds and multiclouds are more prevalent these days than single clouds. Indeed, you could have as many as a dozen clouds under management.
If you're charged with cloud ops, you need to learn how to place a layer of technology between yourself and complex cloud services -- and learn fast.
Core to this goal is the gathering of cloud metrics, such as performance and transaction data, as they occur. This information is valuable for several reasons:
- You can trend the data and spot issues with the recent operations. This tracking includes read/write errors from your private cloud storage, which could be an indication that a drive is about to fail and needs to be replaced.
- You can use the data to provide predictive analytics. This includes determining when the demands on servers will require that you provision more servers. Having this insight lets you operate in line with capacity demands, which lets you use private and public cloud resources more cost-effectively.
- You can make your systems in the clouds self-healing. Cloud providers offer a few autofixes for common problems in the infrastructure. However, for the most part, you're in charge of the application layer. When you gather operational metrics, you can set thresholds via policies that react to certain conditions in that application layer. That means you won't get calls in the middle of the night because applications have stopped due to a database issue or a connectivity problem.
Cloud ops is gaining more importance as enterprises depend more on public and private clouds. However, not much has emerged around best practices and best tools in this area. Thus, most people charged with operating clouds have had to feel their way in the dark these last few years. That needs to change.
It's time we start thinking about what cloud metrics mean for quality cloud operations. At a very basic level, that means uptime. Cloud ops need to exceed traditional on premises operational practices in terms of uptime. Although the IT focus will be on operational metrics, the business will measure the value of the cloud by the number of service disruptions.
From both the IT and business perspectives, the objectives are to prevent bad things from happening and to promote continuous operations of cloud systems. The effective use of cloud metrics can help you reach these goals.