Kings of open source monitoring
Built on open source, OpenNMS and Zenoss Enterprise take different paths to rich, scalable, and extensible network and systems monitoring
OpenNMS supports several methods of alert notification. Of course e-mail and e-mail-based pager notifications are supported out of the box, but OpenNMS can also send IM alerts via the XMPP (Jabber) protocol, as well as traditional numeric and text pager services. If you have another type of notification service that you want to receive, then you can designate an OpenNMS XML configuration file to use a command-line utility of your choice to send notification messages. OpenNMS uses the concept of a duty schedule to provide further flexibility and reusability of the alert and notification rules. Duty schedules can be applied to users, groups, and roles, preventing off-duty employees' pagers from waking them up in the middle of the night when they're not supposed to be working.
My company has used various versions of OpenNMS in production for more than five years now, and we have seen that it scales up very well to monitor thousands of devices. In fact, OpenNMS has at least one customer with 144,000 devices being monitored. Of those 144,000 devices, SNMP performance data is collected on 50,000 interfaces, resulting in 450,000 data points being amassed every five minutes.
A common weak spot for open source software projects is documentation, and OpenNMS is no exception. The documentation is supplied via the opennms.org wiki, which should provide for easier collaboration on documentation but only partially delivers the goods. Although there is plenty of good documentation for OpenNMS available, the organization of that documentation is odd. Instead of being written as a beginning-to-end software manual, it is a collection of docs on individual OpenNMS features. I should note that OpenNMS does offer a set of how-to and reference docs as part of the installation, but these are not extensive documents, and they are often somewhat outdated. The company recognizes that documentation is a weakness of the project, and it is working on a new set of documentation for its upcoming 1.8 release.
Another common weak spot for open source software projects is the user interface. OpenNMS seems to have almost a love/hate relationship with its Web-based GUI. The Web interface is attractive in its simplicity, but the lack of AJAX features make it feel a bit clunky; for instance, it takes several clicks and full page loads in the browser to alter a configuration setting. However, one side effect of a simple Web interface is that it is very fast in a Web browser. The Web interface is being updated for the version 1.8 release to include more AJAX features to reduce page loads.
OpenNMS comes up short on some enterprise-grade features, and these will be especially apparent for xSP service provider companies. For example, OpenNMS does not have a full ACL system to restrict users to particular nodes or screens within the Web interface. Currently, admins can set up read-only, view-controlled dashboards for select users. This provides only some of the functionality needed in a full ACL setup, because users with limited access cannot move beyond the dashboard screen. OpenNMS is working on an implementation of full-blown ACLs for a dot-release of its upcoming version 1.8 series.
A feature that has become more common among high-end network monitoring systems over the past few years is network topology discovery and mapping. This allows the system to find switches and routers and provide a simple network diagram. Some implementations also use the network topology data to improve outage alerts by notifying administrators about a router outage, but not sending notifications about the unreachable devices behind the router. Thanks to this "root-cause analysis" instead of receiving hundreds of alerts during a router outage, administrators would receive only a single notification about the router itself.
The current version (1.6.5) of OpenNMS does topology discovery, but its auto-generated network map only works with Internet Explorer. This will change with OpenNMS 1.8, when a switch from SVG 1.2 to SVG 1.1 will allow network maps to be rendered in most modern browsers. Another disappointment is that OpenNMS does not currently use topology data to automatically set up root-cause relationships. It does provide a way to manually configure root-cause relationships for smarter alert notifications, but this can require a lot of manual configuration for a large deployment. Auto configuration of root-cause relationships is not slated for the 1.8 series, which puts its inclusion at an unknown future date.
A final enterprise feature not fully implemented by OpenNMS is distributed collection of monitoring data. With a network monitoring system, we ideally should be able to add multiple collection servers to our monitoring system, all of which gather monitoring data from nearby devices and report back to the main monitoring server. Currently, OpenNMS can split the monitoring across multiple machines, but these systems will write directly to the primary database and thus need to be located on a fast network link to the primary OpenNMS server. The OpenNMS developers are in the process of making each component of the system capable of handling distributed collection. However, this will be done piecemeal as the developers make other improvements to each component over the entire remainder of the 1.x series of releases. Full-fledged distributed collection will be the defining criteria for the 2.0 release, with no target date currently set.