Of all of the measures we deploy in our infrastructures to keep them up and running, monitoring systems are one of the most critical, yet one of the most overlooked. Building a solid monitoring system to warn you of impending doom, help you quickly size up a disaster in progress, or aid in swiftly bringing complex performance troubleshooting to a close can be a hugely valuable endeavor no matter what scale operation you run. However, application deployment timetables and infrastructure upgrades often end up pushing monitoring deployment and maintenance to the back burner.
It's no wonder, either. No matter what kind of monitoring system you opt to use, they are very rarely a turn-key, set-it-and-forget-it affair. Unless you have an extremely simple environment, monitoring all of the important details while also weeding out unnecessary distractions and false positives can be a hugely time-consuming process.
It's important to be able to size up your options early on and make sure that the time you invest into the preferred tool is time well spent. Based on my experiences working with a variety of both open and closed source monitoring stacks, I've come up with a list of five things that every monitoring package in a sufficiently complicated environment should be capable of and be configured to do. If you find that your monitoring deployment is lacking in any of these areas, you might want to roll up your sleeves and fix it before it's too late.
Monitoring must-have No. 1: Multiple redundant collectors
Most monitoring systems utilize a software service to do their data collection and probing. In some cases, this data collection service is the same as the system that might perform alerting and provide the user interface. Other times, the collection service is a separate piece of software, and you can have more than one. Systems where you can deploy more than one collector and orchestrate them centrally are definitely preferable to ones where you can't.