Monitoring virtual servers for availability, performance, health, and workload capacity has never been easy, but Operations Manager goes a long way toward that goal

Operations Manager monitors every aspect of a vSphere build, from the hypervisor to the individual VM, including CPU, RAM, network I/O, and storage I/O. It constructs profiles of each monitored object over time, then uses that profile to determine normal and abnormal behavior. Thus, if a particular set of VMs spikes CPU utilization every night at 11 p.m. over a week or two, Operations Manager will determine this is a normal occurrence, then factor it into determining when to trigger an alert or flag a subsystem for abnormal behavior. This type of profiling is very useful, as it prevents false positive alerts and smooths out trends.

Naturally, the learning process initially takes many weeks. During this time, Operations Manager shows data for every monitored aspect, but refrains from making determinations on normal/abnormal conditions, and the overall health scoring will not be completely accurate. Once enough data has been collected and analyzed, Operations Manager can begin making accurate determinations on the health of the monitored objects.

Navigating the UI
Operations Manager's UI is Flash-based and well appointed. The left sidebar is a hierarchical tree view of all monitored vCenter servers and child objects, with the center area displaying whatever element is currently in focus. The basic vSphere Datacenter/Cluster/Host/VM tree is present, but you can define your own groups and display data relevant to those group members alone. This is very handy, as it lets you collect certain objects relevant to a particular application or framework together, as well as get analysis and monitoring data displays for just those objects, rather than a whole cluster or a single host or VM.

These groups can be configured manually or dynamically, with manual selection allowing the addition of specific objects, and dynamic selection appending objects to the group based on defined criteria, such as workload, child/parent status, and name. If you wanted to group together VMs named web01, web02, web03, and so on, you could create a dynamic group and define name as contains "Web." From there, all VMs with "Web" in the name will automatically become part of that group. It should be noted that whenever you define a group, Operations Manager will need to gather data for the newly defined group before it can report on it. That is, it will not be able to provide Health, Risk, and Efficiency scores right away, and some badges and values can take up to 24 hours to calculate.

The UI is reasonably fast and responsive, and the graphs and data displays are clean and easily digested. Given the vast amounts of data on display, it's somewhat of a challenge to absorb every element at first, but after some time working with the UI, you begin to know where to look for certain information, and you can access it quite quickly.

Monitoring clusters, hosts, and VMs
The overview of this information is displayed through a series of grades and symbols. There is a dashboard view for every monitored object, from the World, which is inclusive of all linked vCenter instances, down to the physical host and data store level, as well as every VM on the system. Clicking a cluster header and selecting Dashboard will show a series of three columns: Health, Risk, and Efficiency. Each column will be graded from 1 to 100, with a badge reflective of the graded status.

For instance, a physical host or cluster might show a Health of 84, a Risk of 27, and an Efficiency of 20. At a high level, this means the selected object is in good shape in terms of available resources and workload, and Risk is fairly low because there are no expected conditions that should upset proper operations. Efficiency, however, is quite low in this example, perhaps due to a number of powered-on but dormant or low-utilization VMs, and several that are oversized.

