Perhaps another good example would be to illustrate how ZFS handles simple disk mirrors. In a traditional two-disk mirror, reads from the mirror are handled in a round-robin fashion to increase read times. This means that if there’s bit rot on one disk but not on the other, there's a fifty-fifty chance that data requested by an application will be invalid. With traditional RAID configurations, this data corruption will be largely unnoticed by the underlying layers, but the application will certainly realize that there’s a problem. ZFS overcomes data corruption by checksumming each block as it’s returned from disk. If there’s a disparity between the 256-bit checksum and the block, ZFS will terminate the request and pull the block from the other member of the mirror set, matching the checksums and delivering the valid data to the application. In a subsequent operation, the bad block seen on the first disk is replaced with the valid data from the second, essentially providing a continuous file system check.
But aren’t checksums expensive? Yes. Well, at least they used to be. In the era of multicore CPUs, delegating a single core of a CPU to performing checksums still leaves plenty of horsepower to handle everything else. The benefits offered by this form of I/O consistency validation eclipse the performance hits on modern hardware, and judging by my performance tests, it’s certainly not an issue.
Beyond the mirror
Of course, ZFS is capable of handling many more than two drives. In fact, it’s a 128-bit file system. Thus, the total capacity addressable by ZFS not only exceeds the limits of earthbound storage, but the power requirements for the number of drives required to reach this limit would be enough to boil the earth’s oceans. That’s serious scalability.
ZFS has a number of neat tricks for managing numerous drives. Because all disk is thrown into a single pool, adding drives to existing arrays is instantaneous, and it requires no re-initialization. During quiescent periods, ZFS will reallocate the data across all disks for better performance, even while making newly added storage immediately available, with writes crossing all drives and reads coming from the original array members.
It appears that Sun also gave careful consideration to disk workload profiling. Server file systems are commonly asked to handle multiple sequential requests to single files. At first blush, these calls may appear to be random I/O, but a closer look will often reveal they are not so random. ZFS can smooth this type of workload with intelligent read-ahead caching at the block level, resulting in significant performance gains for streaming media and for some database workloads.
Another facet of the advanced I/O scheduling in ZFS is request prioritization. When a system is I/O bound, it’s generally due to the disk not keeping up with requests, or major swap operations. Once those requests stack up, basic system interaction slows to a crawl, and there’s nothing more frustrating than trying to kill the misbehaving process with a command that takes forever to run because it needs to be fetched from the very same disk that the runaway process is thrashing. Because ZFS gives reads priority over writes, the read necessary to execute the kill command in these cases gets pushed to the front of the queue, allowing order to be restored in a timely manner.
Paul Venezia is senior contributing editor of the InfoWorld Test Center and writes The Deep End blog.
Talkback
E-mail
Printer Friendly
Reprints



