Check it out: Deep into APC hardware management

I just barely finished turning up two new datacenters in two different states within two weeks. Exhausting? Definitely. On the plus side, however, I wrote several new tools and plugins to manage all of the APC gear that went into both sites with Nagios and Cacti. First, a little background. Both datacenters were built to be nearly identical to each other -- from rack layout to equipment, to color-coded patch cab

I just barely finished turning up two new datacenters in two different states within two weeks. Exhausting? Definitely. On the plus side, however, I wrote several new tools and plugins to manage all of the APC gear that went into both sites with Nagios and Cacti.

First, a little background. Both datacenters were built to be nearly identical to each other -- from rack layout to equipment, to color-coded patch cabling. The major difference is that one site is cooled with APC ACSC100 In-row air units, and the other cooled with ACRC100 In-row water-cooling units. Both sites are powered from APC Symmetra PX UPSes and PDUs, and use APC racks and 3-phase zero-U rackmount PDUs. In addition, several NetBotz WallBotz 500 units were implemented to provide external environmental monitoring and surveillance of the rooms. Basically, it's all APC gear. I'll be posting more on the build process over the next few weeks, but I wanted to get some of the code out there first.

I wrote two main plugins for Nagios and Cacti to assist in monitoring all this new stuff. The Nagios plugin checks the most pertinent data on the ACRC and ACSC units, as well as the main sensors on the NetBotz units, and the load on each phase on the PDUs. It's come in very handy since the sites were turned up, since I have a easily-digested central view of all PDUs, or all AC units on one page. Tweaking parameters on the AC units becomes very simple when you have all the data in one place, versus having to log into each unit to get status info, or even using APC's Infrastruxure Central Console.

I've released the Nagios plugin, check_apcext, and will be posting the Cacti templates soon. Here's the overview of the Nagios plugin, and a link to the NagiosExchange page. Enjoy.

Usage: ./check_apcext.pl -H <hostip> -C <community> -p <parameter> -w <warnval> -c <critval>

Parameters:

APC NetBotz

nbmstemp NetBotz main sensor temp

nbmshum NetBotz main sensor humidity

nbmsairflow NetBotz main sensor airflow APC Metered Rack PDU (3 phase)

rpduamps Amps on each phase

APC ACSC In-Row

acscstatus System status (on/standby)

acscload Cooling load

acscoutput Cooling output

acscsupair Supply air

acscairflow Air flow

acscracktemp Rack inlet temp

acsccondin Condenser input temp

acsccondout Condenser outlet temp APC ACRC In-Row

acrcstatus System status (on/standby)

acrcload Cooling load

acrcoutput Cooling output

acrcairflow Air flow

acrcracktemp Rack inlet temp

acrcsupair Supply air

acrcretair Return air

acrcfanspeed Fan speed

acrcfluidflow Fluid flow

acrcflenttemp Fluid entering temp

acrcflrettemp Fluid return temp

Thus, in checkcommands.cfg, place the following:

define command{

command_name check_apcext

command_line $USER1$/check_apcext.pl -H $HOSTADDRESS$ -C $ARG1$ -p $ARG2$ -w $ARG3$ -c $ARG4$

}

and in services.cfg, you'll have something similar to the following:

define service{

use generic-service

hostgroup_name acsc

service_description ACSC Status

is_volatile 0

contact_groups admins

check_command check_apcext!public!acscstatus

}

define service{

use generic-service

hostgroup_name acsc

service_description ACSC Rack Temps

is_volatile 0

contact_groups admins

check_command check_apcext!public!acscracktemp!90!95

}

... and so on, for all parameters you wish to inspect. There are two special cases:

1) ACSC and ACRC status has no warn/critical values -- it's OK if the unit is operating, and WARNING if it's on standby

2) Rack PDUs will flag as WARNING or CRITICAL if any of the three phases is beyond the threshold.

TODO:

1) NetBotz external sensor monitoring

2) Other rack PDUs (although I don't have any to test)

3) Bugfixes?

From CIO: 8 Free Online Courses to Grow Your Tech Skills
Join the discussion
Be the first to comment on this article. Our Commenting Policies