Will ZFS be in Leopard?

So yesterdays WWDC’s keynote wasn’t exactly heavy on technical detail. And there isn’t anything on the Apple site. But there are some indications that Leopard OS X will support the ZFS file system.

Consider the following quotes from Apple’s description of Time Machine:

… what makes Time Machine different from other backup applications is that it not only keeps a spare copy of every file, it remembers how your system looked on any given day — so you can revisit your Mac as it appeared in the past.

And…

The first time you attach an external drive to your Mac, Time Machine asks if you’d like to use that drive as your backup. Say yes and Time Machine takes care of everything else. Automatically. In the background. You’ll never have to worry about backing up again.”

To me that sounds a lot like Time Machine is using ZFS Snapshots.

Snapshots are copies of the entire file system, they are not the same as backups; they are much more efficiency and they are faster. This is because a snapshot only stores the individual disk blocks that have changed, so it uses far less disk space than a traditional backup. Snapshots also happen instantaneously regardless of the size of the file system size, indeed the time it takes to create a snapshot is often so small that there appears to be no delay. In other words backups happen automatically. In the background. And the entire system is backed-up.

——

UPDATE
It seems that ZFS will not be the default system in Leopard (bugger), instead:

ZFS “is only available a read-only option from the command line,” according to an Apple spokesperson.

In a follow-up interview today, Croll explained, “ZFS is not the default file system for Leopard. We are exploring it as a file system option for high-end storage systems with really large storage. As a result, we have included ZFS — a read-only copy of ZFS — in Leopard.”

“Read-only means that at a later date, if there are ZFS volumes, those systems would be able to read ZFS volumes,” Croll added. “You cannot write data into the system. It will allow you to read ZFS volumes later.”

The Zettabyte File System (ZFS) is coming to Mac OS X – what is it?

Since Mac OS 8.1 (nine years ago) Apple OS has run on the HFS+ filesystem (which in turn is based on the 22 year old HFS), but maybe soon we will see a major upgrade with the introduction of the Zettabyte File System (ZFS). ZFS is very powerful for a number of reasons – and could make a huge difference to the user experience.

ZFS is a 128-bit file system, which means it can store 18 billion billion (18.4 × 1018) times more data than the current 64-bit systems. The limitations of ZFS are designed to be so large that they will never be encountered in practice, as an example of how large these numbers are, if 1,000 files were created every second, it would take about 9,000 years to fill the file system. As project leader Bonwick said:

Populating 128-bit file systems would exceed the quantum limits of earth-based storage. You couldn’t fill a 128-bit storage pool without boiling the oceans.” [Seth Lloyd “Ultimate physical limits of computation”
Nature 406,1047-1054 (2000)]

There are, however, a number of other notable features:

Pooled storage
ZFS can span a file system seamlessly across multiple disks and more can be added at anytime. This is good because it means a new hard disk can be added at any time, thereby adding redundancy and increasing performance by spreading i/o access across multiple disks. But is also improves the UX because users don’t have to worry about volumes, they just have storage.

Stability and data integrity
ZFS provides three core components to its data integrity model:

  • Everything is copy-on-write which means live data is never overwritten
  • Everything is transactional – sets of changes either suceed or fail as a whole
  • Everything is checksummed – preventing silent data corruption

All this results in an incredibly robust filesystem, during Sun’s tests [pdf] it has been subjected to over a million forced, violent crashes without losing data integrity or leaking a single block.

The use of checksums on all data and metadata allows for ‘self healing‘ – ZFS can repair (using the data from the other mirror) silent data corruption by detecting the corruption before passing the data of to the process that asked for it.

ZFS self healing

Snapshots

A snapshot is a copy of the entire file system, snapshots are not the same as backups, the two most significant differences are efficiency and speed.

A snapshot only stores the individual disk blocks that have changed, this means that a snapshot uses far less disk space than a traditional backup. Snapshots also happen instantaneously regardless of the size of the file system size, indeed the time it takes to create a snapshot is often so small that there appears to be no delay.

So what might this all mean?

Beyond the obvious benefits related to performance and data integrity there may also be important UX considerations.

I’ve written previously about the issues of the two copy file system, now the ZFS’s use of snapshots would mean that there would be very little performance or storage overhead in automatically versioning data. This would mean Apple could remove the Save dialogue box from much of the UI; files could automatically be safely saved in the background with old versions retrieved via Time Machine as needed thereby removing the need for explicit saves and hiding more of the filesystem from the user.