You are on page 1of 15

ZFS FAQ

ZFS Frequently Asked Questions (FAQ)
For information about dedup, see the ZFS Dedup FAQ. 1. 2. 3. 4. 5. ZFS Product Release Questions ZFS Technical Questions ZFS/UFS Comparison Questions ZFS Administration Questions ZFS and Other Product Interaction Questions

ZFS Product Release Questions
1. 2. 3. How can I get ZFS? When will ZFS be available for <insert OS here> What does ZFS stand for?

How can I get ZFS?
ZFS is available in the following releases:

o o o

Solaris Nevada release, build 27a and later Solaris Express releases Starting in the Solaris 10 6/06 release

or snapshots of a file system. . ZFS gained a lot more features besides 128-bit capacity. file system. ZFS Technical Questions 1." The largest SI prefix we liked was 'zetta' ('yotta' was out of the question). and why 128 bits is enough. Over time. For more information on CDDL. easy administration. ZFS can store trillions of items: files in a file system. Why does ZFS have 128-bit capacity? What limits does ZFS have? Why does ZFS have 128-bit capacity? File systems have proven to have a much longer lifetime than most traditional pieces of software. 2. such as rock-solid data integrity. At this point. Given the fact that UFS has lasted in its current form (mostly) for nearly 20 years. Moore's law starts to kick in for storage. due in part to the fact that the on-disk format is extremely difficult to change. ZFS can store billions of names: files or directories in a directory. it's not unreasonable to expect ZFS to last at least 30 years into the future. see Jeff's blog entry. or snapshots in a pool. file. the name was a reference to the fact that ZFS can store 256 quadrillion zettabytes (where each ZB is 270 bytes). and we start to predict that we'll be storing more than 64 bits of data in a single filesystem. ZFS was an acronym for "Zettabyte File System. and a simplified model for managing your data. or file attribute.When will ZFS be available for <insert OS here> Projects are under way to port ZFS to FreeBSD and to Linux (using FUSE). Since ZFS is a 128-bit file system. What does ZFS stand for? Originally. volumes. ZFS can store 16 Exabytes in each storage pool. file systems in a file system. What limits does ZFS have? The limitations of ZFS are designed to be so large that they will never be encountered in any practical operation. file systems. For a more thorough description of this topic. see the licensing FAQ.

these errors . Why doesn't ZFS have an fsck-like utility? Why does du(1) report different file sizes for ZFS and UFS? Why doesn't the space consumption that is reported by the df command and the zfs list command match? 3. A more insidious problem occurs with faulty hardware or software. With ZFS.ZFS/UFS Comparison Questions 1. The addition of journalling has solved some of these problems. such as creating a directory entry before updating the parent link. ZFS does not suffer from this problem because data is always consistent on disk. an fsck utility will be of little benefit. Even file systems or volume managers that have per-block checksums are vulnerable to a variety of other pathologies that result in valid but corrupt data. it will be likely be unrepairable. ZFS provides the ability to 'scrub' all data within a pool while the system is live. which can be reliably repaired. o Repair on-disk state . This can be time consuming and expensive. but failure to roll the log may still result in a file system that needs to be repaired. In this case. Can I set quotas on ZFS file systems? Why doesn't ZFS have an fsck-like utility? There are two basic reasons to have an fsck-like utility: o Verify file system integrity . this involves running fsck while the file system is offline. In this case. the failure mode is essentially random. In either case. the on-disk state of some file systems will be inconsistent. There are future plans to enhance this to enable background scrubbing. Since the corruption matches no known pathology.If a machine crashes. and most file systems will panic (if it was metadata) or silently return bad data to the application. finding and repairing any bad data in the process. administrators simply want to make sure that there is no on-disk corruption within their file systems. 2. With most file systems. there are well known pathologies of errors. Instead.Many times.

no such pattern has been seen. the last block of the file is generally about 1/2 full. You can work around this by enabling compression. If any ZFS properties. such as compression and quotas. This size includes metadata as well as compression. approximately 64 KB is wasted per file. but the bugs would have to result in a consistent pattern of corruption to be repaired by a generic tool.5 KB (3 times the space). The impact is more extreme for a recordsize set to 512 bytes. In addition. df doesn't understand descendent datasets or whether snapshots exist. but will result in an I/O error when trying to read the block. consider that df is reporting the pool size and not just file system sizes. With the default recordsize set to 128 KB.will be (statistically) nonexistent in a redundant configuration. . where each 512-byte logical block consumes 1. In an non-redundant config. When you compare the space consumption that is reported by the df command with the zfs list command. ZFS is equally vulnerable to software bugs. reconciling the space consumption that is reported by df might be difficult. The integration of RFE 6812608 would resolve this scenario. Of course. even when compression is off. the du command reports the size of the data blocks within the file. du reports the actual size of the file as stored on disk. and compresses very well. During the 5 years of ZFS development. these errors are correctly detected. which might be a large impact. are set on file systems. It is theoretically possible to write a tool to repair such corruption. o On a RAIDZ-2 pool. Why does du(1) report different file sizes for ZFS and UFS? Why doesn't the space consumption that is reported by the df command and the zfs list command match? On UFS. every block consumes at least 2 sectors (512-byte chunks) of parity information. This reporting really helps answer the question of "how much more space will I get if I remove this file?" So. Even if your data is already compressed. On ZFS. though any such attempt would likely be a one-off special tool. you will still see different results between ZFS and UFS. but because it can vary. and be a much larger percentage for small blocks. The space consumed by the parity information is not reported. an impact to space reporting might be seen. the unused portion of the last block will be zero-filled. Consider the following scenarios that might also impact reported space consumption: o For files that are larger than recordsize.

the ZFS model enables you to easily set up one file system per user.5K AVAIL 20. as well as on entire portions of a file system hierarchy. you should leave the recordsize at the default (128 KB).ZFS file systems can be used as logical administrative control points. For example: # zfs create tank/home/users/user1 # zfs create tank/home/users/user2 # zfs list -r tank/home/users NAME tank/home/users tank/home/users/user1 USED 76. ZFS provides several different quota features: o File system quotas (quota property) . o The df command is not aware of deduplicated file data. perform backups. if space efficiency is your primary concern. ZFS quotas can be set on file systems that could represent users.5K MOUNTPOINT /tank/home/users /tank/home/users/user1 .Regardless of the data being stored. manage properties. groups. take snapshots. ZFS file system quotas are flexible and easy to set up. projects. Per-user quotas were introduced because multiple users had to share the same file system. A quota can be applied when the file system is created. and so on. and enable compression (to the default of lzjb).5K 24.0G 20. ZFS quotas are intentionally not associated with a particular user because file systems are points of administrative control.5K 24.0G REFER 27. and so on. For example: # zfs create -o quota=20g tank/home/users User file systems created in this file system automatically inherit the 20-Gbyte quota set on the parent file system. For home directory servers. This allows quotas to be combined in ways that traditional peruser quotas cannot. Can I set quotas on ZFS file systems? Yes. which allow you to view usage.

7.tank/home/users/user2 24. but user/group quotas are needed in some environments. such as universities that must manage many student user accounts. The userquota or groupquota space calculation does not include space that is used by descendent datasets. and expiration and purge features.Limits the amount of space that is consumed by the specified user or group. similar to the refquota property. ZFS Administration Questions 1. is using mail server software that includes a quota feature. such as the Sun Java System Messaging Server.0G 24.5K /tank/home/users/user2 ZFS quotas can be increased when the disk space in the ZFS storage pools is increased while the file systems are active. file system quotas are appropriate for most environments. quota warning messages. including file systems and snapshots o User and group quotas (userquota and groupquota properties) . Why doesn't the space that is reported by the zpool list command and the zfs list command match? What can I do if ZFS file system panics on every boot? Does ZFS support hot spares? Can devices be removed from a ZFS pool? Can I use ZFS as my root file system? What about for zones? Can I split a mirrored ZFS configuration? Why has the zpool command changed? Why doesn't the space that is reported by the zpool list command and the zfs list command match? . In general. This software provides user mail quotas. 4. 6.5K 20. An alternative to user-based quotas for containing disk space used for mail. o Reference file system quotas (refquota property) . such as snapshots and clones. 2. RFE 6501037 has integrated into Nevada build 114 and the Solaris 10 10/09 release. 5. without having any down time. 3.File system quota that does not limit space used by descendents.

if any.5K FREE 136G CAP 0% DEDUP 1. # zpool create tank c0t6d0 # zpool list tank NAME tank SIZE 136G ALLOC 95.00x HEALTH ONLINE ALTROOT - # zfs list tank NAME tank USED 72K AVAIL 134G REFER 21K MOUNTPOINT /tank . The zfs list command lists the usable space that is available to file systems. which is disk space minus ZFS pool redundancy metadata overhead. o A non-redundant storage pool created with one 136-GB disk reports SIZE and initial FREE values as 136 GB. The initial AVAIL space reported by the zfs list command is 134 GB.00x HEALTH ONLINE ALTROOT - # zfs list tank NAME tank USED 72K AVAIL 134G REFER 21K o MOUNTPOINT /tank A mirrored storage pool created with two 136-GB disks reports SIZE as 136 GB and initial FREE values as 136 GB.The SIZE value that is reported by the zpool list command is generally the amount of physical disk space in the pool. but varies depending on the pool's redundancy level. See the examples below. The initial AVAIL space reported by the zfs list command is 134 GB. # zpool create tank mirror c0t6d0 c0t7d0 # zpool list tank NAME tank SIZE 136G ALLOC 95.5K FREE 136G CAP 0% DEDUP 1. This reporting is referred to as the deflated space value. due to a small amount of pool metadata overhead. due to a small amount pool metadata overhead.

do the following: <boot using '-m milestone=none'> # mount -o remount / # rm /etc/zfs/zpool. # zpool create tank raidz2 c0t6d0 c0t7d0 c0t8d0 # zpool list tank NAME tank SIZE 408G ALLOC 286K FREE 408G CAP 0% DEDUP 1. import the root pool. which includes redundancy overhead. such as background scrubbing).9K MOUNTPOINT /tank What can I do if ZFS file system panics on every boot? ZFS is designed to survive arbitrary hardware failures through the use of redundancy (mirroring or RAID-Z). if you find yourself in the situation where you cannot boot due to a corrupt pool. You will have to re-create your pool and restore from backup. resolve the issue that is causing the failure. This reporting is referred to as the inflated disk space value.00x HEALTH ONLINE ALTROOT - # zfs list tank NAME tank USED 73. then you must boot from alternate media. Unfortunately. such as parity information. certain failures in nonreplicated configurations can cause ZFS to panic when trying to load the pool. see the ZFS Troubleshooting Guide.2K AVAIL 133G REFER 20. The initial AVAIL space reported by the zfs list command is 133 GB. export the root pool.o A RAIDZ-2 storage pool created with three 136-GB disks reports SIZE as 408 GB and initial FREE values as 408 GB.cache # reboot This will remove all knowledge of pools from your system. In the meantime. . If a ZFS root file system panics. and will be fixed in the near future (along with several other nifty features. due to the pool redundancy overhead. This is a bug. For more information. and reboot the system.

This feature is planned for a future release and can be tracked with CR 4852783.Does ZFS support hot spares? Yes. Can devices be removed from a ZFS pool? Removal of a top-level vdev. See RFE 6421958 to recursively send snapshots that will improve the replication process across systems. You can replace a device with a device of equivalent size in both a mirrored or RAID-Z configuration by using the zpool replace command. You can remove a device from a mirrored ZFS configuration by using the zpool detach command. For more information about hot spares. Additional ZFS zone root configurations that can be patched and upgraded are supported starting in the Solaris 10 5/09 release. Can I use ZFS as my root file system? What about for zones? You can install and boot a ZFS root file system starting in the SXCE build 90 release and starting in the Solaris 10 10/08 release. Can I split a mirrored ZFS configuration? Yes. In addition to ZFS clone and snapshot features. For more information. such as an entire RAID-Z group or a disk in an unmirrored configuration. remote replication of ZFS file systems is provided by the Sun StorageTek Availability Suite product. you cannot create a cachefs cache on a ZFS file system. see the ZFS Administration Guide. ZFS can be used as a zone root path in the Solaris 10 10/08 release. is not currently supported. and the Solaris 10 11/06 release. The best method for cloning and backups is to use ZFS clone and snapshot features. see the ZFS Admin Guide. For information about using ZFS clone and snapshot features. see ZFS Boot. build 42. you can split a mirrored ZFS configuration for cloning or backup purposes in the SXCE. the Solaris Express July 2006 release. In addition. see the ZFS Admin Guide. the ZFS hot spares feature is available in the Solaris Express Community Release. but configurations that can be patched and upgraded are limited. For more information. . build 131 release.

# zpool status pool: export . raidz2.AVS/ZFS demonstrations are available here. Why has the zpool command changed? Changes to the zpool command in Nevada. Keep the following cautions in mind if you attempt to split a mirrored ZFS configuration for cloning or backup purposes: o o Support for splitting a mirrored ZFS configuration integrated with RFE 5097228. back up the data on the disk. and so on) and appends a unique numeric number. The zpool status and zpool import commands now display configuration information using this new naming convention. and then use this data to create a cloned pool.This change adds a top-level virtual device name to support device removal operations. You cannot remove a disk from a mirrored ZFS configuration. then you will need to do the following steps: o o o o zpool export pool-name Hardware-level snapshot steps zpool import pool-name Any attempt to split a mirrored ZFS storage pool by removing disks or changing the hardware that is part of a live pool could cause data corruption. o If you want to use a hardware-level backup or snapshot feature instead of the ZFS snapshot feature. The top-level device name is constructed by using the logical name (mirror. Device name changes due to integration of 6574286 (removing a slog doesn't work) . For example. the following configuration contains two top-level virtual devices named mirror-0 and mirror-1. builds 125-129. are as follows: 1.

state: ONLINE scrub: none requested config: NAME export mirror-0 c1t3d0 c1t4d0 logs mirror-1 c1t5d0 c1t6d0 STATE ONLINE ONLINE ONLINE ONLINE READ WRITE CKSUM 0 0 0 0 0 0 0 0 0 0 0 0 ONLINE ONLINE ONLINE 0 0 0 0 0 0 0 0 0 In this release. you could potentially remove the mirrored log device ( mirror-1) as follows: # zpool remove export mirror-1 # zpool status export pool: export state: ONLINE scrub: none requested config: NAME export mirror-0 c1t3d0 STATE ONLINE ONLINE ONLINE READ WRITE CKSUM 0 0 0 0 0 0 0 0 0 .

7G FREE 881G 902G CAP 5% 2% DEDUP 1. The previous zpool list used and available columns have changed to report allocated and free physical blocks. For example: # zpool list NAME export rpool SIZE 928G 928G ALLOC 47.7G FREE 881G 902G CAP 5% 2% DEDUP 1. The zpool list output has changed due to integration of 6897693 (deduplication can only go so far) .40x SOURCE - 3. 2.In previous releases.77x 1. ZFS and Other Product Interaction Questions 1. and spare devices can be removed from a pool. only cache.77x 1.5G 25. You can also display the value of this read-only property by using the zpool get command. New dedup ratio property due to integration of 6677093 (zfs should have dedup capability) .40x HEALTH ONLINE ONLINE ALTROOT - Any scripts that utilized the old used and available properties of the zpool command should be updated to use the new naming conventions.The zpool list command includes dedupratio for each pool. # zpool list NAME export rpool SIZE 928G 928G ALLOC 47.40x HEALTH ONLINE ONLINE ALTROOT - # zpool get dedup rpool NAME rpool PROPERTY dedupratio VALUE 1. log.c1t4d0 ONLINE 0 0 0 Currently.5G 25. Is ZFS supported in a clustered environment? . These changes should help clarify the accounting difference reported by the zpool list and zfs list commands. the zpool list command reported used and available physical block space and zfs list reports used and available space to the file system.

This support allows for live failover between systems.4. go to the Open High-Availability Cluster community page.1. Solaris Cluster 3. including ZFS ACLs. you can unconfigure the disk. This means the disk no longer provides a quorum vote to the cluster.2 is not supported on the OpenSolaris or Nevada releases. you can configure that disk as a quorum device. After a disk is added to a storage pool. ZFS ACLs are also preserved. o Computer Associates' BrightStor ARCserve product backs up and restores ZFS file systems. When a configured quorum device is added to a storage pool. add it to the storage pool. If you use Solaris Cluster 3.2 supports a local ZFS file system as highly available (HA) in the Solaris 10 11/06 release. 3. For information about using the open-source Solaris Cluster version. This work has not yet been scoped. ZFS works great when shared in a distributed NFS environment. Or.2. but ZFS ACLs are not preserved. Which third party backup products support ZFS? Does ZFS work with SAN-attached devices? Is ZFS supported in a clustered environment? Solaris Cluster 3. we plan on investigating ZFS as a native cluster file system to allow concurrent access. including ZFS ACLs. Which third party backup products support ZFS? o o o EMC Networker 7.3. In the long term. then reconfigure the disk as a quorum device.2. . with automatic import of pools between systems. ZFS is not a native cluster. review the following caution: Do not add a configured quorum device to a ZFS storage pool. or parallel file system and cannot provide concurrent access from multiple. the disk is relabeled and the quorum configuration information is lost. distributed.2 to configure a local ZFS file system as highly available.5 backs up and restores ZFS file systems. IBM Tivoli Storage Manager client software (5. different hosts. backs up and restores ZFS file systems.2) backs up and restores ZFS file systems with both the CLI and the GUI. Veritas Netbackup 6.

ZFS could not correct an error introduced by the array. dynamic striping. This limitation includes sharing SAN disks as shared hot spares between pools on different systems.Does ZFS work with SAN-attached devices? Yes. you cannot take advantage of features such as RAID-Z. If your ZFS storage pool only contains a single device. . In this case. consider a SAN-attached hardware-RAID array. the pool contains no duplicate data that ZFS needs to correct detected errors. if your storage pool contains no mirror or RAID-Z top-level devices. whether from SAN-attached or direct-attached storage. You cannot share SAN disks between pools on the same system or different systems. ZFS functions as designed with SAN-attached devices. Some storage arrays can detect checksum errors. pool viability depends entirely on the reliability of the underlying storage devices. If you use two LUNs from this array to construct a mirrored storage pool. In this case. as long as all the drives are only accessed from a single host at any given time. and so on. but might not be able to detect the following class of errors: o o o Accidental overwrites or phantom writes Mis-directed reads and writes Data path errors Keep the following points in mind when using ZFS with SAN devices: o Overall. ZFS can only report checksum errors but cannot correct them. I/O load balancing. For example. If you use a single LUN from this array to build a single-disk pool. ZFS then would have duplicate data available to correct detected errors. ZFS could typically correct errors introduced by the array. However. ZFS can report and correct checksum errors. ZFS always detects silent data corruption. or three LUNs to create a RAID-Z storage pool. If your storage pool consists of mirror or RAID-Z devices built using storage from SAN-attached devices. ZFS works with either direct-attached devices or SAN-attached devices. In all cases where ZFS storage pools lack mirror or RAID-Z top-level virtual devices. set up to present LUNs to the SAN fabric that are based on its internally mirrored disks.

if you use ZFS with SAN-attached devices. .o If you expose simpler devices to ZFS. you can better leverage all available features. In summary. you can take advantage of the self-healing features of ZFS by configuring redundancy in your ZFS storage pools even though redundancy is available at a lower hardware level.