5 must-have capabilities for every monitoring system

Solid infrastructure and application monitoring are critical to maintaining performance and uptime, but not all systems will make the cut

Of all of the measures we deploy in our infrastructures to keep them up and running, monitoring systems are one of the most critical, yet one of the most overlooked. Building a solid monitoring system to warn you of impending doom, help you quickly size up a disaster in progress, or aid in swiftly bringing complex performance troubleshooting to a close can be a hugely valuable endeavor no matter what scale operation you run. However, application deployment timetables and infrastructure upgrades often end up pushing monitoring deployment and maintenance to the back burner.

It's no wonder, either. No matter what kind of monitoring system you opt to use, they are very rarely a turn-key, set-it-and-forget-it affair. Unless you have an extremely simple environment, monitoring all of the important details while also weeding out unnecessary distractions and false positives can be a hugely time-consuming process.

It's important to be able to size up your options early on and make sure that the time you invest into the preferred tool is time well spent. Based on my experiences working with a variety of both open and closed source monitoring stacks, I've come up with a list of five things that every monitoring package in a sufficiently complicated environment should be capable of and be configured to do. If you find that your monitoring deployment is lacking in any of these areas, you might want to roll up your sleeves and fix it before it's too late.

Monitoring must-have No. 1: Multiple redundant collectors

Most monitoring systems utilize a software service to do their data collection and probing. In some cases, this data collection service is the same as the system that might perform alerting and provide the user interface. Other times, the collection service is a separate piece of software, and you can have more than one. Systems where you can deploy more than one collector and orchestrate them centrally are definitely preferable to ones where you can't.

There are two main reasons for this. One is that you may be able to build some redundancy in the event that a collector itself is impacted by a disaster. Another is that having multiple views of the same item can be very useful. Larger WANs can especially benefit from having collectors installed at each site so that a problem with the WAN can be differentiated from a problem with a single site (and data collection can continue even in the face of a WAN outage).

Monitoring must-have No. 2: Excellent graphing capability

Any monitoring system worth its salt will have an excellent graphing engine. That doesn't just mean smooth lines and nice colors, though those are great. If you've ever used a monitoring system to troubleshoot a performance problem, you'll know that being able to line up, zoom, scale, and pan across multiple graphs simultaneously can be hugely useful.

Imagine that you're trying to figure out what's slowing down a multitier Web app. Being able to stack up a bunch of seemingly unrelated graphs (storage latency, network throughput, database transactions per second, and so on) and watch for correlations between them can be tremendously enlightening when determining a root cause.

Monitoring must-have No. 3: Easy event suppression

One of the worst facets about any monitoring system are false positives and event floods caused by known or expected problems. Any time your cellphone blows up with 80 texts all about the same thing, you lose some sensitivity to what the monitoring system is telling you -- and risk losing an important warning in all of the noise. It's critical to have the ability to quickly suppress specific events that you know aren't important.

It's also important to be able to suppress events from a particular source when you know that maintenance or upgrade activity will generate errors. I've been in several situations where a known upgrade process caused secondary effects in other systems that weren't expected, but those effects weren't noticed until much later because the monitoring systems were ignored. Being able to create scheduled downtime windows within the monitoring system can help on that front.

Monitoring must-have No. 4: Multiple data collection methods

You have a variety of ways to get information out of an infrastructure. Almost any monitoring package will support the basic options, like ICMP pings to test for uptime, SNMP for collecting network statistics, and WMI for pulling event log data from Windows boxes. Those basic methods used to cover the vast majority of systems. However, SNMP is being left by the wayside in favor of other more modern monitoring and management interfaces such as WBEM and CIM. In fact, many vendors are starting to deprecate SNMP support completely in favor of CIM, and the trend will only accelerate as time goes on.

In addition to the newer protocols replacing SNMP, a range of other kinds of queries can be useful. Some examples might include being able to directly execute SQL queries and time their execution or watch their output, watching a Web service for a specific HTTP response or response code, or even watching text-based log files for specific entries. The more tools you have in the toolbox, the more likely you'll be able to find a means to monitor the minutiae that are important to you.

Monitoring must-have No. 5: Ease of integration and extension

No matter what else your chosen monitoring system does or doesn't do, the ability to extend it or integrate it with other systems may eventually mean the difference between needing to replace it and all the time you've invested in it. Though many monitoring systems are very good at what they do, they can't all excel at everything. Sometimes the only way to get the information you need is to program a solution for it or use a different tool. In those situations, being able to either extend or integrate your monitoring package to work with other software is key. That might entail being able to run an external script and interpret the response from it or integrate with a ticket management platform.

At the end of the day, the choice of which monitoring application to use will depend upon what you want it to accomplish. Some will be better at monitoring predominately Linux environments rather than Windows environments. Some are better at networks and infrastructure than applications. However, whatever you choose should be able to check the five boxes I've listed above, regardless of what your environment looks like.

If what you're running today or thinking of running tomorrow doesn't fit the bill, it may be worth looking elsewhere. Monitoring systems are too important and take too much time to dial in and make them a part of your workflow to leave them to chance.

This article, "5 must-have capabilities for every monitoring system," originally appeared at InfoWorld.com. Read more of Matt Prigge's Information Overload blog and follow the latest developments in storage at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

Copyright © 2013 IDG Communications, Inc.