Get predictable performance from flash storage

How storage QoS shields mission-critical apps from latency spikes and resource contention in all-flash and hybrid arrays

As IT shops begin to deploy flash arrays to consolidate diverse applications, the conversation is shifting from “We need flash performance” to “We need predictable flash performance.” The impetus for the focus on predictability is that latency spikes and resource contention can easily impact hybrid and all-flash arrays, causing applications to miss their SLAs.

Latency stems from the packaging of the array (disk controllers, algorithms, NICs, RAID), not the flash itself. Some will argue that latency spikes in all-flash are less problematic than in slower, disk-based arrays, but as more all-flash arrays are used to consolidate workloads, latency spikes have become more commonly observed. Resource contention rears its ugly head whenever workloads are consolidated on an array.

Storage quality of service (QoS) provides a way to control and prioritize the impacts of latency and resource contention so that mission-critical application workloads see consistent storage performance. How does it work exactly?

Let’s start by defining the three main categories of storage QoS functionality:

  • Service levels
  • Management capabilities
  • Data path and data service automation

I’ll look at each in more detail and outline the key concepts in each category.

Define a storage service level

Fundamentally, service levels are used to define how the storage array manages performance during events that impact performance. Service levels are composed of two critical elements:

  1. Targets, or the ability to define the amount of performance to allocate or reserve for a workload
  2. Priorities, or the ability to define priorities for how the system will meet each workload’s performance targets

Performance targets can be defined in terms of IOPS, bandwidth, latency, or burst settings. The following details how a storage system might use targets to reserve or limit performance resources based on storage QoS settings.  

  • Minimums: Uses minimum IOPS or bandwidth and maximum latency to reserve performance resources
  • Maximums: Uses maximum IOPS or bandwidth and minimum latency to limit performance resources consumed
  • Burst: Temporarily increases IOPS or bandwidth maximums or reduces latency minimums

In addition to how much performance a workload should have access to, the design of a QoS engine must consider overprovisioning. Overprovisioning performance (similar to the concept of overprovisioning capacity) is the ability to reserve more resources than are physically available to the system. The assumption is that all workloads will never run at peak demand simultaneously. Without overprovisioning, the system would allocate resources by peak workload -- that is, if any workload was not running at its peak there would be unused resources available. This approach can be inefficient and expensive, especially for service providers who could be allocating unused resources to clients and charging for them.

However, with overprovisioning comes a risk that resource contention may occur. This is where the benefits of setting QoS priorities come into play. With priorities, storage QoS fulfills the performance requirements of higher-priority workloads by automatically throttling the performance of less critical workloads during periods of resource contention. The following describes the different ways a storage QoS system can prioritize workload performance:

  • One workload takes priority: Always gives preference to the identified workload. All other workloads will be impacted the same degree if the priority workload requires more resources.
  • Priority by ratio: Allows a number of I/Os per a preset ratio. For example, Workload 1 = 10 percent, Workload 2 = 40 percent, Workload 3 = 20 percent, Workload 4 = 40 percent. In this case, if the system experiences contention, Workload 1 will get one I/O processed, Workload 2 will get four I/Os processed, and so on.
  • Priority by service level: Service levels are typically predefined according to a Gold/Silver/Bronze or Mission Critical/Business Critical/Non Critical scheme. By categorizing all workloads into a service level, the system knows how to make trade-offs in any situation.

The first approach, where one workload takes priority, is old and not very useful. The second option, priority by ratio, will work in certain scenarios but is limited. That is, if the overall performance available to a system is reduced (such as during a firmware upgrade or a RAID rebuild), all workloads will be reduced the same amount, which can negatively impact critical workloads. The third option, prioritizing by service levels, uses performance targets as an input and dynamically changes the I/O ratios based on current overall workload conditions in real time. Thus, prioritizing by service levels delivers greater consistency for higher-priority application workloads all the time, no matter what overall system performance looks like. The ability to do this requires real-time automated control over the I/O queue, in addition to the real-time automation of memory, metadata, cache, and tier management.

Static QoS implementations -- those that limit controls to minimums, maximums, and burst settings -- do not allow administrators to prioritize workloads versus one another. Rather, administrators must manually update target settings whenever application priorities change.

Simplifying QoS with policy-based management

Storage QoS is a complicated feature that can be overwhelming to manage unless the implementation is simplified so that the burden of managing system performance doesn’t outstrip its benefits. A number of management techniques can greatly simplify the implementation of storage QoS:

  • Predefined performance targets: It’s rarely known exactly how much performance a workload needs or where it should be capped. Having predefined performance targets gives users a good starting place and removes the uncertainty around setting minimum, maximum, and burst QoS settings.
  • Predefined priority levels: Stack ranking the importance of every single application workload can be challenging, perhaps impossible. Having a simplified priority framework such as Gold/Silver/Bronze provides a way to place one workload over another without having to stack rank all of them.
  • Predefined service levels (including both targets and priorities): Taking the next step of the first two techniques is combining both performance targets and workload priorities into simple predefined service levels.
  • Modify in real time: The ability to change any of these settings in real time and have the system react immediately allows quick fixes to potential performance issues.
  • Schedule changes: When workloads have a known cycle, the ability to automate a change in priority for a given application is quite useful. For example, in cases where an ERP system is required to perform month-end reporting during the last week of the month, the system could be scheduled into a service level with a higher priority and a higher performance target for that week. The higher service level would simultaneously give that application more performance and increase the consistency of the performance.

Automating the data path and data services

Service-level targets and priorities are only half the solution. To ensure consistent performance for mission-critical applications, the storage system needs internal low-level software capabilities that use both administrator inputs and real-time workload metrics to control the data path and automate data services such as caching. The following data path and data services capabilities are critical for an effective Storage QoS implementation:

  • Parallel I/O processing: Physically or logically separates data path processing from different application sources.
  • QoS controlled cache management: Using the user inputs on performance targets and workload priorities, the data stored in any cache is actively managed to ensure higher-priority workloads are hitting their performance targets.
  • QoS controlled tier migration: Using the user inputs on performance targets and workload priorities, the data stored in any tier is actively managed to ensure higher-priority workloads are hitting their performance targets.
  • I/O queue management: As I/O requests hit the system from various applications, the system dynamically prioritizes which requests get processed first based on the point in time workload and the user-defined performance targets and priorities.
  • Prioritized system tasks: Using the user-defined performance targets and priorities compared with whether or not recent I/Os from a specific application have achieved the targets, the system determines which tasks should and should not be executed. System tasks include garbage collection, device rebuild, postprocess deduplication, and more.

Whether you deploy an all-flash or a hybrid array, you should expect fast, predictable performance. However, there are two common barriers to success: latency spikes and resource contention. Storage QoS functionality reduces the impacts of latency spikes and resource contention, helping to deliver consistent and predictable performance for mission-critical applications.

Storage QoS even goes beyond managing performance to automate many types of data services, including data protection, encryption, and data placement. For a specific example and deeper technical dive into how a QoS engine is implemented, check out the materials on the NexGen Storage QoS patent.

Chris McCall is senior vice president of marketing at NexGen Storage.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to