Storage Basics – Part V: Controllers, Cache and Coalescing

My Storage Basics series has been neglected for some time (sick kids, snow storms, VMware Upgrades, SAN implementations and some Cisco switch upgrades took all my free time), so let’s jump right in to Part V – Cache, Controllers, and Coalescing. Between the alliteration and fancy words, it might seem like I am about to tell a tale of international espionage. Unfortunately, my introductory treatment of these aspects of a storage system will probably not keep you on the edge of your seat – but I’ll try to keep it interesting.

Throughout this series, we’ve been working our way from the basic building block of any storage system – the disks – outwards towards the brains of the operation – the controller. You’ll recall that in Part II I introduced IOPS and the math that goes into calculating the IOPS capacity of a disk array. In Part III we considered a RAID implementation’s impact on performance and availability. And most recently in Part IV we looked at the common interface types when dealing with storage arrays. If we put the previous parts together we still don’t have a functional storage system. The missing piece is the controller. Simply put, the storage controller is the hardware adapter between the disks and the servers that connect to the storage. The controller has a specific ‘interface‘ type, is responsible for RAID operations, and handles advanced storage functionality. A controller can be as simple as the Dell PERC or HP Smart Array add-in card on your server, or as complex as the Storage Processor in an enterprise class Storage Area Network (SAN) such as an EMC CLARiiON or NetApp FAS.

Controllers

As we look at controllers and the advanced features they provide we’ll see that some of the earlier performance equations start to break down. The simplest controllers take disk read/write commands from the operating system and send commands down to the disk(s) attached to be read or written. This gets data onto the disk, but often does not do so in an efficient or reliable manner. RAID-capable controllers take on the added responsibility of configuring disks in the desired RAID level, calculating & writing parity data, and writing the data in disk-spanning stripes or mirrors depending on the RAID level.

Cache

To increase performance and improve reliability, storage vendors implement a caching system on their controllers. Cache is memory that acts as a buffer for disk I/O, and is usually battery-backed to prevent data loss in the event of a power failure. Because of the exponentially greater speed of RAM over spinning magnetic disks, cache can improve performance by orders of magnitude. Cache can operate on both reads and writes to disk.

When dealing with writes, the controller cache is typically used in one of two ways: write-through or write-back. In write-through mode, data is written to volatile cache and then to disk, and only acknowledged as written once the data resides on the non-volatile disk. Write-back mode allows the controller to acknowledge the data as having been written as soon as it is held in cache. This allows the cache to buffer writes quickly and then write them to the slower disk when the disk has cycles to accept I/O. The greater your cache size, the more data that can be buffered, ultimately resulting in better performance as measured in both IOPS and throughput. This graph from my article on troubleshooting write performance on an IBM DS3300 iSCSI array shows how throughput increased and latency decreased when enabling write cache. The extent to which cache increases performance is highly dependent on the workload characteristics (I/O size, randomness, and ratio of reads:writes).

Read-cache acts as a buffer for reads in a couple ways. First, some controllers attempt to ‘read-ahead’, anticipating future read requests from the operating system and buffering what it expects to be the next blocks of desired data. Some entry-level controllers simply buffer the next physical chunk of data and fill cache memory with it, while more advanced controllers may attempt to predict the right block of data based on previous requests (you just asked for 3 blocks in a row, I’m guessing you’ll come asking for the 4th next so I’ll just buffer it in fast cache for you now). Secondly, read cache holds data that has been previously read, regardless of any pre-fetching the controller may have done. This allows for much faster subsequent access of the same data because it is held in the faster cache, eliminating the need for the controller to go to disk for the data again. Just like with write cache, the extent to which cache increases performance is highly dependent on the workload characteristics.

A given storage array controller only has so much cache to work with. A Dell PERC5/E, for example, has 256MB of cache that can be used for both read and write. While this may be enough for a direct-attached storage array, SAN’s serving multiple systems demand more cache. In contrast, an EMC CLARiiON CX4-960 has 32GB. Some storage vendors, such as NetApp, are getting creative with cache. NetApp’s Performance Acceleration Module (PAM) is an add-in card that provides up to a whopping 512GB of Layer 2 cache to the storage system.

Caching mechanisms can dramatically influence performance under the right conditions. With healthy cache in place, IOPS calculations become skewed. However, cache can be exhausted or may not hold the data you are interested in. If cache is insufficient to satisfy read requests, or has reached its high-water mark for writes, performance can drop off. When cache is exhausted, the backing disk must be able to satisfy the I/O workload or performance will be unacceptable. This is where the IOPS calculations kick in, and where having the right disk type and configuration really matters.

Queuing & Coalescing

Advanced storage systems introduce additional features to reduce I/O contention and improve cache utilization. I won’t go into all of the features here because they vary by storage vendor. However, I will point out two common techniques – queuing and coalescing.

Queuing refers to the ability of a storage system to queue storage commands for later processing. Queuing can take place at various points in your storage environment, from the HBA to the storage processor/controller. A little queuing may be OK depending on your workload, but too many outstanding I/Os can negatively impact performance (this is measured in latency). Queue depths can be adjusted on many components in your storage and VMware landscape, but check with your vendor’s support group before you make changes to these settings.

Coalescing is performed by some storage systems to modify the character of the workload. To better understand coalescing, picture a bunch of random write activity. Without cache in place, the disk heads will be bouncing all over the platters trying to get the data on to disk. A little write cache will allow the storage array to acknowledge the write for the OS, but the array still needs to de-stage that data from cache to disk quickly to prevent cache exhaustion. The back-end disks will still be doing the chicken dance, bouncing around trying to write the random workload…. Now picture an intelligent system that re-orders the random writes that are held in cache and writes them to the disk in nice sequential stripes. The disk heads will be less prone to jumping around the platter and the behavior will start to look more like a nice waltz than the funky chicken dance. Coalescing is used for writes, not reads, so not all workloads benefit.

Wrap-up

With this article on Controllers, Cache, and Coalescing we’ll end our look at the basic building blocks of a storage array. Before we end the Storage Basic series I am planning a few more articles on Storage Workload Characterization (which has been mentioned, but not directly addressed in this and previous articles), Identifying a Stressed Storage System, and Best Practices for Storage Performance in a VMware Environment.

If you are interested in more reading on Controllers, Cache, and Coalescing, I recommend the following:

Additional Reading: