It’s somewhat surprising that in the past five years, file systems haven’t changed much on any platform. There are dozens of file systems available for UNIX-like operating systems -- ext3, XFS, UFS, and ReiserFS for example -- and Microsoft’s ubiquitous NTFS, but since the journaling revolution, there’s been a dearth of innovation in mainstream file systems, until now.
[See video: Screencast: Sun's ZFS on Thumper. ]
Soon after I started working with ZFS (Zettabyte File System) , one thing became clear: the file system of the next 10 years will either be ZFS or something extremely similar. The fluidity, the malleability, and the scalability of ZFS far surpass anything available now on any platform. We’re talking about a file system that can address 256 quadrillion zettabytes of storage, and that can handle a maximum file size of 16 exabytes. For reference, a zettabyte is equal to one billion terabytes. In order to bend your mind around what ZFS is and what it can do, you need to toss out just about everything you know about file systems and start over.
[ Sun ZFS was selected for an InfoWorld Technology of the Year award. See the slideshow to view all the winners in the storage category. ]
Perhaps the easiest way to communicate the underlying concepts of ZFS is a comparison the Sun developers drew during the design stages of the file system back in 2001. When you add RAM to a server, you don’t partition it and allocate one DIMM to this application and another DIMM to that application; you throw all of the RAM into a pile and let the memory manager decide who gets what and when. That simple, pragmatic view forms the basis of ZFS: There are no partitions and no fixed block sizes, no file system consistency check, no RAID initialization procedure, and no inodes – there’s just a pile of disk with ZFS in between.
I worked with ZFS extensively on Sun’s 48-disk Sun Fire X4500 server (see companion review), aptly named the Thumper. In fact, without ZFS, the Thumper wouldn’t be half the solution it is. Simply addressing the sheer number of physical drives in the X4500, not to mention the logical volume sizes that are possible, is at best difficult with any other file system. With ZFS, it’s surprisingly simple.
ZFS is a CLI adventure now; you get no luxurious GUI tools to manage the file systems. Given the focus of ZFS, that’s hardly surprising. ZFS is also very simple in practice – now that’s surprising. Creating a ZFS pool of drives can be done in one line. Creating volumes in that pool is another line. Turning a volume into an NFS share or iSCSI target can be accomplished within the same line as the volume creation, and everything is instantaneous – no waiting for RAID initialization or file system creation. Creating a 20TB pool and a few volumes on the X4500 took about 20 seconds (the time required to type in the commands) and it was ready to go. To see for yourself just how fast and easy it is to drive ZFS, click to the accompanying screencast.
Under the covers
There’s far more to ZFS than is possible to cover in this space, so I’m hitting the high points. Starting with the essentials, ZFS is comprised of three parts. The ZPL (ZFS POSIX Layer) runs at a high level, taking instruction from the OS on I/O requests. Below that is the DMU (Data Management Unit) that takes those instructions and translates them into transaction batches. Rather than requesting data blocks and sending single write requests, ZFS batches these into object-based transactions that can be optimized before any disk activity occurs. Once this is done, the batches are handed off to the SPA (Storage Pool Allocator) to schedule and aggregate the raw I/O. The copy-on-write basis of I/O transactions, coupled with checksums performed on a per block basis, precludes the need for journaling. An abrupt power loss will be recoverable at any point.
Paul Venezia is senior contributing editor of the InfoWorld Test Center and writes The Deep End blog.
Talkback
E-mail
Printer Friendly
Reprints



