Know your options for infrastructure monitoring

As the number and variety of monitoring tools grow, it's getting harder to track which metrics to collect and why. Ernest Mueller of CopperEgg walks you through the decision points

Page 2 of 3

Available tooling

Modern IT systems are complex, so a host of tools has sprung up that can instrument specific points within those systems using a variety of methods. Here's a breakdown of the most common tool instrumentation approaches you can apply to a typical three-tier Web system:

  1. Browser RUM (real user monitoring) using JavaScript instrumentation embedded in Web pages that sample information on the user experience. RUM can capture actual user experience across a wide part of the service, but its scope is limited to what your users are currently doing -- it also generates a lot of data. Web analytics are closely related to browser RUM. Pure API traffic isn't captured, and mobile apps typically require a separate implementation.
  2. Global probes generate synthetic Web transactions applied to the system from various external geographic locations. These have the benefit of repeatedly testing the service in the same way from various points and provide great performance-over-time information that can be used effectively for measuring SLA attainment. However, they can't exercise all parts of the service and generate load on the service in the process.
  3. Network RUM is based on network capture of user traffic on the server side. It's limited to when you have physical access to the network, and it doesn't see activity from browser or CDN caches. However, it can see more protocols than Browser RUM and doesn't suffer from browser compatibility issues.
  4. Local probes apply synthetic Web transactions to the system from inside the service network. They're very actionable for alerting (if it fails, the service is most likely down), but do not cover the full chain required to deliver the service to an end-user. A variation of this applies a probe from onboard an individual system to services running on that system itself.
  5. Network APM (application performance monitoring) analyzes the behavior of system components by watching the interchange on the network. It covers many protocols and provides insight into network-based performance issues, but it's blind to the complexity that lies behind that IP address.
  6. Database APM offers deep-dive analysis of database activity and performance statistics. It provides plenty of information, including database errors and performance issues, but does not expose issues across the other 90 percent of the stack. Also, support for the explosion in diversity of NoSQL/NewSQL data stores is a challenge.
  7. Network monitoring of network devices and flows is important to identify and diagnose network problems, but it usually does not take into account the higher-level operation of applications, services, and users that is more meaningful to the business.

Those are merely the types of monitoring external to the servers and applications themselves. By decomposing a specific system, we can observe many more instrumentation techniques.

| 1 2 3 Page 2