Visibility into the data center is highly pursued, but very difficult to achieve. Determining which statistical metrics are important to collect, how this data should be viewed and measured, and what weight this information has on operations is a never-ending challenge for data center administrators.
Virtualization complicates matters even further. When VM operations data is abstracted across many applications running on many physical hosts, several questions arise. Are you measuring the right data? Are you measuring it in the right way? Is this data measured in the correct location?
Misinterpreted data can be as harmful as no data, leading to incorrect storage design and management decisions that ultimately harm application performance and waste money. So how do you bring accurate infrastructure analytics to a virtualized data center, and what benefits are achieved when this level of visibility is in place?
There are two primary ways that companies manage storage in a virtualized environment. One approach is to use monitoring solutions that pull data from the hypervisor management server (such as VMware vCenter). However, these solutions are problematic for at least two reasons:
- They only see data provided by vCenter Server, which is primarily focused on CPU and memory metrics. There is little insight into storage I/O, which accounts for approximately 70 percent of all VM performance problems.
- They are dependent on vCenter’s ability to collect and measure data. If vCenter cannot see the correct data, doesn’t sample the data at the right rate, or is limited in how much data it can collect, these solutions will be ineffective.
A second approach is to use monitoring tools provided by the storage array vendor. These, too, are often inadequate for a number of reasons:
- They collect data in the wrong location. It might be interesting data, but it is not necessarily accurate when assessing VM performance. That is because observed VM latency can be impacted by many parts of the storage stack, not only the array. As a result, an array might report low latency when the latency actually observed by the VM is quite high.
- Arrays have limited visibility. Valuable intelligence about VM workload characteristics is lost the moment the I/O leaves the host. This is one of the reasons why arrays are extremely limited in their ability to provide real intelligence as it relates to specific VMs.
- Storage monitoring tools are tied to specific storage products. In a data center with multiple types of storage arrays, this is an incomplete solution.
To address the above challenges, storage array vendors often use vCenter in an attempt to make correlations between VM-related statistics and storage array statistics. The result, unfortunately, is an odd amalgamation of disparate statistical data that can be difficult to decipher and correlate -- and therefore difficult to trust.
The architecture behind PernixData Architect
PernixData created Architect to address the above challenges. Architect is a hardware-agnostic platform for storage management and design that marries VM and storage intelligence to gain an end-to-end view of storage operations and performance.
The Architect agent is deployed as a kernel-level module, which allows us to gain a complete and accurate understanding of the data passed through the virtualization I/O stack. Operating at the kernel level gives us fine-grained control over which data we collect, which includes data that is not captured by the hypervisor itself and, ultimately, vCenter.
Thus Architect extends the capabilities of the hypervisor and vCenter to provide operations statistics, analytics, and even design recommendations. The entire solution consists of the following components:
- The software agent, called a Host Extension Module, is installed on each ESXi host in a vSphere cluster. Because PernixData Architect is a VMware Partner Verified product, it can be easily installed and updated using VMware's Update Manager.
- The Architect Management Server. This is a virtual machine that is responsible for collecting the data from all of your virtualization hosts and presenting it via the Web-based UI.
Unlike vCenter and storage management solutions that rely on vCenter, Architect sits in the data path, capturing every I/O issued by VMs living on that host. The data path is in the ideal location for a storage control plane as it is not dependent on communications with the storage array, and it works across all types of arrays. Architect has no dependencies on storage fabrics or protocols for compatibility, nor does it require additional licensing on your storage systems or elevated vSphere licensing.
PernixData recognizes that in storage infrastructures, milliseconds matter. To provide deeper granularity, Architect collects unique data continuously in 20-second intervals, versus 5- or 10-minute sampling intervals of other products. Proper sampling size is critical when identifying performance behaviors that are transient in nature.
As infrastructures generate an enormous amount of metadata, there are often technical imitations to collecting all of the information needed for detailed analysis. Architect addresses this by leveraging server media for a scale-out approach to data collection. This enables Architect to collect substantially more information than what is available in vCenter or in third-party products that hook into vCenter.
A walk through the Architect UI
Collecting the data is only one part of the equation. The information also must be presented in consumable, helpful form to the user. Architect was designed to provide a superior experience through an easy-to-use, HTML5-based interface that presents the right information at the right time.
With the clean and simple UI, we aimed to make it easy for the user to discover relevant data and take appropriate action (see Figure 2). Instead of overwhelming users with an abundance of red, yellow, and green lights (“death by information overload”), Architect communicates information using "progressive disclosure" -- invaluable metrics are presented first, with drill-downs available as needed.
Figure 3 shows how a data center’s operational traits can be easily analyzed. With a quick glance, you can see read/write ratios, block size distribution, and the percentage of VMs that are write intensive. These are important measurements for any environment, but have historically been very difficult to access.
To get more information, you can drill down into any key metric, such as IOPS, latency, or throughput. Figure 4 shows how, with a single click, you can see the 10 top VMs contributing to IOPS.
With another click, you can see read/write ratio for any one of those VMs, in real time (Figure 5). It is this progressive approach that makes Architect an effective tool for learning about your environment.
Data-driven decision making
The first step in accurate storage design and management is to understand how much data is being used and how it is being used. Working set characteristics and storage block sizes -- to take two examples -- can have a dramatic impact on your environment. With Architect, IT administrators have a straightforward path to the following:
• Gather working set intelligence. Working sets are constantly changing based on VM, application, and infrastructure requirements. While hypervisors and storage arrays can provide working set statistics in isolation, accurately reporting on this data across an entire virtualized environment is a challenge. Because Architect sits in the path of all I/O, it can accurately collect working set data on a per-VM and per-host basis across the entire environment. For example, Architect lets you easily see how much of your hot data consists of reads versus writes, which is critical for making various storage design decisions (Figure 6).
Accurately determining working set sizes plays an important role in the design and operation of your environment. This knowledge can be used to help properly size top performing tiers of your persistent storage, or to properly size caching tiers in hyperconverged infrastructure solutions, hybrid arrays, or server-side acceleration solutions like PernixData FVP. This data can even be used when estimating resource requirements for the replication of data to another data center, and when providing chargeback or showback information to your primary consumers.
• Optimize performance based on block sizes. Often considered one of the most misunderstood performance metrics in a data center, block sizes have a profound impact on the performance of a storage infrastructure. VMs, and the applications that run in them, suffer when they use block sizes that the storage system cannot adequately handle. With the rising popularity of flash in the data center, the impact of large block sizes is even more noticeable due to the more erratic handling of reads and writes by flash media.
With Architect, one can easily use block information to optimize VM performance. For example, at a specific point in time you can look at a VM’s latency, determine if reads or writes are contributing to that latency, then look at the block sizes during that period of latency (Figure 7). Understanding latency per block size will help you make more intelligent design and optimization decisions in your environment.
You can then compare historical data for accurate trending analysis. In the example below (Figure 8), we see latency associated with block sizes over a period of 10 minutes for a specific VM. The chart shows us there was significant latency associated with large block sizes, though the VM was running on flash-based storage.
Figure 9 shows how block sizes can change substantially for a single workload even during a short period time (say, one hour). Understanding this variability is critical to proper storage design, optimization, and management.
The bigger picture
It is not enough to look at individual metrics alone. One must be able to correlate these measurements across data centers for holistic storage management. For example, you need visibility into how IOPS, latency, and throughput collectively impact the performance of every VM.
Architect has a VM performance plot that makes this easy. As Figure 10 shows, you can quickly analyze thousands of VMs at once to rapidly identify potential problems due to latency, IOPS, or both.
In addition, you can drill down into the performance of a single VM or cluster of VMs, with individual graphs showing how latency, IOPS, and throughput vary over time (Figure 11).
As we’ve seen, Architect collects all of the operations data from your VM and storage environment in real time and makes it easy to digest. But in addition to depicting what is currently happening within your data center, Architect predicts what will happen in the future and provides detailed design recommendation to avoid performance problems before they arise. Figure 12 shows some of these recommendations.
Examples of alerts and recommendations made by Architect include:
- Flash or network bottlenecks as a result of large block-size I/Os
- Identifying overlapping I/Os on a specific workload
- Recommendations for VMs with different types of write I/O intensity
- How much high-performing storage is required per workload
- Infrastructure recommendations for the PernixData FVP peering network
- Storage acceleration recommendations based on workload and environmental conditions
PernixData Architect collects more complete information on VM storage I/O than vCenter-based solutions, and it provides a more complete view of the virtualization environment than storage array-based solutions. With Architect’s infrastructure analytics, you can eliminate costly overprovisioning or improper sizing of storage for your virtualized environment. You can quickly diagnose storage performance issues with VMs and your infrastructure. You can optimize existing and future applications to ensure that changing workloads always perform at peak performance.
Architect measures the right data, at the right location, in the right way. By pairing that high-quality data collection with superior visual analysis tools, Architect gives you better visibility into your vSphere clusters, a better understanding of your VMware environment, and ultimately, the data-driven intelligence to help you design, operate, and optimize your virtualized data center.
Pete Koehler is technical marketing engineer at PernixData.
New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to firstname.lastname@example.org.