Network monitoring is most certainly the gift that keeps on giving, and a network without some form of monitoring is like driving a car without gauges: It might be running fine, but you'll never know if you're almost out of gas, or if the engine temp is off the charts until it's too late.
The feature article I wrote this week, "Killer open-source monitoring tools," highlights tools such as Nagios and Cacti that I use daily to keep tabs on my networks. Every tool there has saved me countless hours, and provides me with plenty of ammunition to end arguments with carriers and ISPs over performance. The article ends with "Do-it-yourself," which is pretty self-explanatory, but more often than not I find that I need to hammer this point home to admins and management alike. When confronted with a new piece of hardware or software that needs to be monitored, my first thought isn't "How much will it cost to buy software to monitor this?" but rather "How long will it take me to write the monitoring code?"
Usually, it only takes me an hour or so to write something that does exactly what I want, rather than trying to shoehorn code meant for other purposes into my requirements -- if anything even exists that might fit the bill. It's really not that challenging, even to someone with limited coding skills. A basic understanding of SNMP and a reasonable facility with PHP, Perl, or even bash/csh is all that's required. Many times, all I need to write is a plug-in for Nagios or Cacti, which is pretty simple since it just has to plug into that existing framework and doesn't need to stand by itself. However, there are times when it does.
In order to illustrate this, I'm going to highlight a simple tool I wrote just last week. It's a PHP script that polls a Cisco 3000-series VPN concentrator for current user information and displays it in an easily digested format. You can get the same basic results by logging into the concentrator itself, but there are always instances where IT staff may need to know who's connected via VPN for troubleshooting purposes but should not be logging into the actual device to retrieve that data.
To that end, this is a very simple script that uses SNMP to gather info from the concentrator. The most difficult part of writing the code was figuring out why my first attempt was so very slow. Turns out that while on most devices it's faster to run an SNMP walk of an OID and then parse out the data, the Cisco/Altiga 3000-series concentrators are very sluggish with SNMP walks, and using a walk to determine the number of connected users followed by snmpgets of each data point ended up being twice as fast as a single walk. It's not nearly as elegant from a coding perspective, but did I mention that it's twice as fast?