Since the dawn of the business network, there has been a need to ensure that the network services provided to the enterprise are alive and responsive. Traditionally, in midsized businesses, this role has been filled by complex, closed source, and fantastically expensive solutions from manufacturers such as BMC, CA, HP, and IBM. And while these extravagant expenses make no customer happy, many users of these packages also complain of their complexity. Enough administrators have spent enough time wrangling with their monitoring systems to make a lot of smart people imagine that there must be a better way.
[ Sidebar: GroundWork: Something old, something new ]
Enter the latest salvo in this war: Zenoss Core 2, an enterprise-class open source monitoring package that has been built from the ground up to replace complex, closed source monitoring solutions. It certainly won't take the place of every function of these high-end solutions, but for the vast majority of IT shops, Zenoss Core will be deployed faster, be managed by fewer people, and cost a fraction as much as its closed source rivals.
Zenoss' key strength is a unified design that collects many types of information from numerous sources and displays them in an intelligent way. While many monitoring products feel like an amalgamation of several different pieces of software that have been stapled together, Zenoss stands apart with a unified, object-based repository and a tightly integrated set of tools and reports, yet doesn't draw itself into a corner as far as extensibility and future growth are concerned.
The latest Zenoss Core releases, Versions 2.0 and 2.1, sport several new features as well as a raft of bug fixes. New features include integration with Google Maps, providing interactive network topology views, plus network status visualizations and a drag-and-drop dashboard that allows users to assemble the dashboard components that they use most often. Although these features may seem somewhat superfluous, they are actually some of the most sought-after features in a management system. In a real-world scenario, it is sometimes far more helpful to have a large diagram with one system or site lit up in red than it is to receive an e-mail or SMS with error code in it.
Inside Zenoss Core
Zenoss is built on the open source Zope application server and the Python programming language, which provide a solid, standards-based development platform that is largely responsible for Zenoss' meteoric growth. The class-based relationships between monitored devices, performance data, event logs, and all of their associated organizational containers are well thought out and easy to navigate. The underlying data structures are equally straightforward, making the software easy for developers to extend and grow. These factors fuel a dynamic open source project that is one of the most active on SourceForge.
Zenoss is distributed either in the form of Linux RPM (Red Hat Package Manager) packages or as a prebuilt VMware appliance. It is readily compatible with a wide range of popular Linux distributions as well as Apple's OS X. Distribution in the form of a VMware appliance makes Zenoss easy to evaluate and helps pave the way for shops with no Linux expertise or available dedicated hardware to implement it. The RPM installation is nicely scripted and works well enough such that an admin with very limited Linux experience will find it relatively painless to get up and running. Upgrades are also relatively easy to accomplish – usually only requiring the application of a new RPM and the execution of a data migration script.
After Zenoss is installed and running, the Web management GUI becomes available and you can start adding devices. Depending on the type of device being added, generally all that's needed for full discovery is a hostname and a read-only SNMP community string. The included device modeling software is intelligent enough to figure out whether it is parsing a switch, a server, a UPS, or a number of other basic types of devices, and determine which operating system is running. The vast majority of properly configured SNMP-capable devices will be automatically detected without very much direct intervention. If a modeler has been written to recognize the device (in the case of Dell or HP servers, for instance), a great deal of extra hardware-specific information can also be gleaned from the manufacturer's management agents.
As with any auto-discovery process, there are always devices that won't be detected correctly, but these will generally be the exception rather than the rule – even in fairly diverse environments. Moreover, it's fairly easy to suggest to Zenoss what types of devices they are and how they should be treated if the device modeler doesn't quite get it to begin with. Only the more unusual devices such as a network-attached Fibre Channel switch or tape backup unit will be entirely unrecognized. Even in these cases, some statistics can still be recorded about the Ethernet interfaces and basic SNMP information about the device so long as it implements a standard SNMP MIB (Message Information Block).
Device support is always a challenge for monitoring packages, and this is one area in which commercial software solutions are generally more capable; after all, they typically have had more capital to invest in developing monitoring plug-ins. However, this increased capability is usually at the expense of user and community extensibility – not a trade-off Zenoss has made, and thankfully so. Zenoss is betting that a combination of internal commercial development and a great deal of community development will bridge this gap and make Zenoss' built-in device support comparable with that of much larger competitors. Although this is admittedly not the case today, Zenoss Core did correctly recognize and model more than 80 percent of a rather complex corporate network that I used for testing.
Once you have a number of devices in the system, the simple ingenuity of the unified object database becomes clearer. Each device Zenoss monitors becomes linked to many other objects within the database – most without any user intervention. For example, an HP ProLiant server running Windows Server 2003 with the HP Insight Manager agents installed will be related to automatically created objects that represent every piece of hardware and software within the box, all the way down to individual Microsoft hotfixes that are installed and RAID cards being used. Selecting any of these objects from within the device view will switch your perspective to the new object. You can then see what other devices share the same object. For example, if HP sends you an advisory that dictates a critical firmware upgrade for a specific type of RAID controller, it is very easy to identify all the monitored devices using that card. As such, Zenoss becomes more than just a monitoring framework – it can just as easily perform a broad set of inventory management tasks just by virtue of the fact that it tracks the relationships among all the devices it has collected.
Events and alerts
As soon as a device is added and properly modeled, Zenoss will immediately start collecting performance and event data about the device. Performance data generally include network interface, CPU, memory, and disk statistics, which Zenoss stores in standard RRDtool round-robin databases. All of these performance metrics are displayed in an intuitive graph viewer. Different types of devices will have different degrees of performance data recorded by default. Fortunately, it's fairly easy to define new performance characteristics for monitoring, though it does require some knowledge of SNMP and which MIBs the devices in question will answer to. A built-in SNMP browser integrated with some kind of monitoring wizard would make this task far easier – perhaps we should look for this in a future release.
Detailed event data is captured and recorded into a MySQL database back end. Events can be acknowledged and moved to history, and the admin handling them can make notes to provide a historical record of what happened, as well as how and when it was resolved. This provides the data necessary for determining the historical uptime of a device. It also provides a way of identifying recurring events and how they might be correlated. Each Zenoss user can define which types of devices they want to be alerted about, what method should be used to alert them, and when they do or don't want to be alerted, as well as for what types of failures. Alerting is generally done via e-mail, though Zenoss can also generate SNPP (Simple Network Paging Protocol) and TAP (Telelocator Alphanumeric Paging) pages through the use of a gateway package such as Sendpage if it's required.
Events are generated via several different means. Zenoss can automatically monitor a device via ICMP (Internet Control Message Protocol) pings, TCP probes to service ports, process table monitoring via SNMP, and Windows process and service monitoring via WMI (Windows Management Instrumentation). In previous versions of Zenoss, WMI discovery, modeling, and monitoring functionalities were implemented via an outboard service that would be installed on a Windows proxy server and communicate with the Linux-based Zenoss host. This was due to the absence of a Linux-compatible WMI stack, and it could prove to be unwieldy – especially if the proxy server was experiencing a problem.
In Zenoss Core 2, this functionality has been integrated natively within the main installation of Zenoss through the use of the WMI implementation introduced in Samba 4.0. Zenoss also supports the use of standard Nagios plug-ins, which immediately provide a huge library of specialized monitoring tools. Setting these up is not a fully automated process, and doing it correctly does require some knowledge of how the plug-ins work.
Tackling the test network
In my testing, I implemented Zenoss in a production network consisting of approximately 30 servers and about as many network devices. The test network was largely Windows-based, but also included a number of Linux and VMS hosts as well as a huge variety of network equipment. I downloaded Zenoss Core 2.1 as an RPM and installed it into a CentOS Linux virtual machine. Within a few minutes, I had the Web interface up and running and manually added a few test devices. By far the most time-consuming part of the initial setup was configuring the servers and network devices in the test environment with the proper SNMP settings. Once that was done, getting them to be properly recognized by Zenoss was easy. If your environment is already configured correctly, you can use an automated network discovery feature to detect and model whole subnets en masse.
Slightly more difficult was getting the WMI functionality of Zenoss to operate properly. I found the WMI implementation to be sensitive to the case of the device name. In some cases it seemed to work with a capitalized name, and in other cases it would work only with a lowercase name – regardless of the actual host name capitalization. Additionally, domain controllers required slightly different Windows user name syntax in order to function correctly. These wrinkles were easy to iron out in my test environment, but in a much larger Windows environment, they could take a significant amount of time to work through. Zenoss fixed many other WMI-related problems in previous minor releases, but it looks as though there are still a few left. But overall, once the WMI subsystem was configured properly, it worked well.
Once my Windows machine names were sorted out, Zenoss discovered all Windows services running on the servers and made it easy to enable monitoring of the services per device or globally. Even given the hitches, it still took far less time to deploy Zenoss than an agent-based installation would have required.
Not counting the time getting SNMP configured on the servers (the only real client-side task involved in deployment), I spent perhaps two or three hours working with Zenoss to get the bulk of the servers and network devices added, monitored, and alerting with a useful set of service and performance thresholds. Much more tweaking would be required to get Zenoss to report on all the various services I'd want to monitor and to tune the alerting rules properly, but still, a few hours from zero to monitoring is darn fast. Even the basic level of reporting I enabled could take days of tedious work with other tools I've used.
Having trouble installing and setting up Win10? You aren’t alone. Here are many of the most common...
Win7 Update scans got you fuming? Here’s how to make the most of Microsoft’s 'magic' speed-up patch
Picking an Android phone can be difficult, but we're here to help. These are the top Android phones you...
Sponsored by Hewlett Packard Enterprise
Sponsored by Intel
Sponsored by Puppet
In fact, wait as long as Microsoft will let you, since this is mostly a minor upgrade
The demise of Visual Studio LightSwitch shouldn’t prevent power users from building line-of-business...
Google's container orchestration platform can now scale to massive clusters
IT expertise is no match for execs' stubbornness and agendas, even under dire circumstances