You are on page 1of 32

<Insert Picture Here>

Oracle Solaris ZFS


Satyajit Tripathi
4-3
Objectives

To understand, what makes ZFS Unique


To know, what facilities are provided by ZFS
To learn, how to do ZFS administration

4-4
Agenda

Introduction to ZFS
ZFS Setup
ZFS Components
ZFS Storage Pool and File System
ZFS Properties
General Architecture
ZFS Features Simplifying Deployment
ZFS Administration
Web-Based Management UI
ZFS Limitations
Best Practices

4-5
Introduction to ZFS

First of its kind 128-bit file system


Acronym for Zettabyte File System or simply ZFS
Storage capacity of 256 quadrillion zettabytes
Directories with possibly 256 trillion entries
No limits on number of file systems or files
Dynamic metadata allocation e.g. I-node pre-allocation
Data integrity management using 256-bit checksum
Its a(n) :-
revolution over traditional file system
fundamentally new approach to data and volume management
transactional file system with self healing capabilities
design for robustness, scalability, and easy administration
architecture with storage pool of heterogeneous devices
ZFS is the default Root file system in Oracle Solaris 11

4-6
Setup Requirement

A machine, SPARC or x86/x64 with Solaris 10 6/06 or newer


Minimum disk size of 128 MB for ZFS environment
Minimum disk space of 64 MB required for storage pool
Recommended memory of at least 1 GB or more

4-7
ZFS Components

ZFS Components comprise of virtual devices like


Whole disk (Recommended), Disk slice, or Files
Components should follow the naming conventions like
Empty components are not permitted
Name can contain only alphanumeric, except
Underscore (_), Hyphen (-), Colon (:), Period (.)
Pool names should begin with a letter, except
beginning sequence c[0-9] is not allowed
name log and cache is reserved
names beginning with mirror, raidz and spare not allowed
names beginning with percent symbol (%) not allowed
Dataset names must begin with an alphanumeric character
Dataset names must not contain a percent symbol (%)

4-8
Storage Pool and ZFS File System

zpool is constructed using virtual devices


zpool can be configured as
Non Redundant (similar to traditional RAID-0)
Mirrored (similar to traditional RAID-1)
RAID-Z (similar to RAID-5 or single parity using 3 devices)
RAID-Z2 (group of 4 or more devices)
RAID-Z3 (featuring Triple parity)
zpool may additionally consist of
Hot Spare for failing disks
Read Cache devices (L2ARC)
Write Cache devices (ZIL)
zpool can contain heterogeneous storage devices. No limitation
zpool can be dynamically expanded without re-configuration
Multiple ZFS file system or dataset can be created in a zpool
File system property quota and reservation can be set

4-9
ZFS Properties

ZFS Properties are of two types Native and User-defined


Native properties control file system behavior, User-defined don't
Native properties can be read-only or settable
Many settable properties are inherited from parent
All settable properties have associated source as either
default (local and not inherited)
local (explicitly set on the dataset)
inherited from <dataset-name> (specifies the dataset source)
Native read-only properties comprise of
Available compressratio creation mounted origin type used
Settable properties include (See Appendix-I for complete list)
aclinherit aclmode canmount checksum compression dedup
devices encryption mountpoint quota readonly recordsize
reservation setuid sharenfs snapdir volsize zoned
User properties can be specified as <module>:<property>=<value>

4 - 10
General Architecture
File System Device GUI Application

JNI

libzfs User
Interface ZFS POSIX Layer
ZFS
/dev/zfs
Kernel
Volume

ZFS Attribute ZFS


Traversal
Processor Intent Log
Transactional
Dataset and Snapshot
Objects Data Management Unit Layer

Adaptive Replacement
Cache (ARC)

Pooled ZFS I/O Pipeline


Storage Configuration
Virtual Device
Layered Driver
Interface (LDI)

4 - 11
Agenda

Introduction to ZFS
ZFS Setup
ZFS Components
ZFS Storage Pool and File System
ZFS Properties
General Architecture
ZFS Features Simplifying Deployment
ZFS Administration
Web-Based Management UI
ZFS Limitations
Best Practices

4 - 12
ZFS Features Simplifying Deployment

Root Pool File system


Boot Environment Enhancement
Delegation in a Zone
Deduplication, Compression, Encryption
Snapshot
Clone
ZFS Application Interface

4 - 13
Root Pool File System
To boot from a ZFS file system specify
Boot device identified as a storage pool
and ZFS root file system within the pool
Install ZFS root file system
Automated Installer on a SPARC or x86 based system
Live CD on a x86 based system
Recommendation for ZFS root file system
Memory capacity 1 GB
Disk capacity 13 GB
Swap size ( physical memory) and dump devices in root pool
Boot Environment size 4-6 GB
Solaris OS Components residing in root file system
Requirements for Storage Pool configuration
Disk should be labelled as SMI and should be < 2 TB
Disk must contain Solaris fdisk partition (x86 system)
Configure root pool mirror only after root pool is installed
Enable Compression only after root pool is installed
Do not rename root pool name after initial installation
4 - 14
Boot Environment Enhancement
Boot Environment (BE) enhanced
Managed using beadm(1M)
Deprecated Live Upgrade lu* command replaced
Auto install (new feature) facilitates
Mirrored ZFS root and auto apply bootblock
Creates swap and dump devices on root pool
Boot support for Hot Spare
IPS facilitates BE update using latest build
No need to apply individual patches
Use pkg image-update or pkg update
No need to create separate boot environment
pkg update creates new BE automatically
Just boot using new Boot Environment
Send stream to manage ZFS root properties

4 - 15
Oracle Solaris ZFS Boot Environment

4 - 16
Delegation in a Zone
In a zone use zonecfg, add device for ZFS volume
In a zone create or modify zpool is not allowed
In a zone privileged user can modify ZFS properties
except sharenfs zoned quota reservation
In a global zone, Administrator can modify ZFS properties
except sharenfs mountpoint
Delegation of dataset to a zone
Property zoned must be specifically marked
On first boot of a zone containing ZFS dataset
Property zoned (boolean) is automatically turned on
Dataset cannot be mounted or shared in global zone, if
Dataset property zoned=true
Removing dataset from a zone will not set zoned=false

4 - 17
Deduplication, Compression, Encryption
Enable property dedup on ZFS file system to
Synchronously remove redundant data blocks
Property scope is entire zpool
Unique data stored on shared common components
Use dedup ratio to estimate space saving possibilities
Enable property compression on ZFS file system to
Transparently compress the data blocks
Property scope is individual ZFS file system
Retrieve compression ratio using zfs utility
Enable property encryption on ZFS file system to
Encrypt data before storing in ZFS file system
Property scope is the file system and inherited by descendants
File system owner's key is required to access encoded data
Wrapping key encrypts the Data encryption key
Stored in a file (as raw or hex) or derived from the passphrase
Encryption policy is inherited by the descendant file system
Policy of the inherited file system cannot be modified

4 - 18
Snapshot
Use zfs snaphot fs@snapN to create snapshot
Takes only one argument, i.e. snapshot name fs@snapN
Instantly creates the Read-only snapshot of fs
fs@snapN is stored on the same zpool
Initially the space is shared by fs and the snapshot
fs@snapN initially consumes no additional disk space
fs@snapN grows in size as active dataset fs changes
Provides persistence across system reboot
Directory .zfs/snapshot lists all snapshots
Use command zfs list -t snapshot
Theoretically maximum N = 264
Use zfs rollback -r fs@snapN to create snapshot
By default rollback to the most recent snapshot
To rollback to N,intermediate snapshots must be destroyed
To rollback the file system must be unmounted and remounted

4 - 19
Clone
Use zfs clone pool/fs1@snapN pool/fs2
Takes two argument, snapshot name and the new file system
Created using a snapshot only
Results in new file system with contents of original file system
The new file system is Writable
Snapshot cannot be deleted until clone exists
Use zfs send or recv to save or replicate ZFS file system
Creates stream representation of snapshot to transfer
Incremental changes can be saved between snapshots
Individual file restoration not possible
Entire file system must be restored
Receive full stream to recreate the entire file system
Different property values of ZFS snapshot streams
Receive stream with property value specified different than Send
Specify at Receive to use the original property
Specify at Receive to disable specific file system property
Use zfs send -I and -R, or recv -F for Complex streams
4 - 20
Delegated Administration
Refined permissions to specific user, group or everyone
Delegated Permissions supported by ZFS of 2(Two) types
Individual Permissions
zfs allow satya create,destroy,mount,snapshot zfsN
Permission Sets
zfs allow mystaff @myset zfsN
Advantage of Delegated Administration
Permissions to follow the zpool when migrated
Control over permission propagation or Dynamic inheritance
Newly created file system can automatically pick up Permissions
Ability to create snapshot over NFS
Disable delegation property of ZFS pool
By default zpool property delegation=on

4 - 21
New ACL Model

ZFS provides pure ACL, and all files have associated ACL
Solaris ACL new model is based on NFSv4 specification
Set ACL using chmod ls, and not setfacl getfacl
ACL comprise of multiple Access Control Entries (ACE)
ACLs are fine grained compared to standard file permissions
Use ACL-aware cp mv tar cpio rcp to transfer UFS file to ZFS
Translates POSIX-draft based ACL to equivalent NFSv4 ACL
Use ufsrestore on ZFS to restore, unlike tar cpio (UFS)
By default, ACLs are not inherited unless Flag is specified
file_inherit dir_inherit
inherit_only no_propagate
Set ZFS property aclinherit to restricted (default) or
discard, noallow, passthrough, passthrough-x
Set ZFS property aclmode to groupmask (default) or
discard, passthrough

4 - 22
Agenda

Introduction to ZFS
ZFS Setup
ZFS Components
ZFS Storage Pool and File System
ZFS Properties
General Architecture
ZFS Features Simplifying Deployment
ZFS Administration
Web-Based Management UI
ZFS Limitations
Best Practices

4 - 23
Administration
ZFS supports both CLI and Web based Administration
Use CLI command to create ZFS pool poolN
zpool create poolN c0t0d0 c0t1d0 c0t1d2
Use command to create zpool with disk mirroring
zpool create poolN mirror c0t0d0 c0t1d0
Use command to define Log or Cache devices in poolN
zpool create poolN c0t0d0 log c0t1d0 cache c0t1d2
Use command to create a file system in poolN
zfs create poolN/zfsN
Use command to add devices to poolN
zpool add poolN c1t1d1
Use command to set file system property
zfs set compression=on poolN/zfsN
Use command to get property value
zfs get compressratio
Use command to remove a file system
zfs destroy poolN/zfsN

4 - 24
Web-Based Management
Use https://host:6789/zfs for ZFS Administration
Create new storage pool
Add capacity to existing pool
Export zpool to another system
Import zpool from another system
View and monitor storage pools
Create new file system
Create volume configuration
Take snapshots
Rollback using snapshot
Use /usr/sbin/smcwebserver start or enable
To start the web console server

4 - 25
ZFS Limitations
It is not possible to reduce the number of top-level vdev in a zpool
It is not possible to add disk as a column to RAID-Z vdev
Virtual devices cannot be nested in a zpool
Mirror or RAID-Z top-level vdev can only contain files or disks
ZFS cannot provide concurrent access from multiple hosts
ZFS expects a disk cache flush command to commit data to media
ZFS defragmentation can impact sequential read performance
Block Pointer Rewrite functionality will eliminate defragementation issue
ZFS can only detect or report but repair silent data corruption errors
Unless explicitly specified copies=N (where N>1)
ZFS RAID resilvering may take long time
ZFS does not support TRIM which is used with SSD

4 - 26
Best Practices
Create zpool using whole disk instead of disk slices (label EFI)
Provides file system safety by automatic enabling write cache
In case of Root pool use disk slice instead of whole disk (label SMI)
Allocate entire disk capacity to slice 0
Create zpool with several group of vdev instead of single large vdev
Improves IOPS performance
Keep vdev belonging to one zpool of similar sizes
Reads get skewed to larger vdev as zpool fills up, impacts adversely
Do not create zpool that contain components from another zpool
RAID-Z is not recommended for random read, e.g. Databases
Variable covariance between random and sequential reads
Sequential read of fragmented files adversely impact random reads
Match ZFS record size to db block size for OLTP workload
Keep pool space under 80% utilization for maintaining performance
Mirrored pool or hardware RAID is preferred over RAID-Z

4 - 27
Appendix-I
Use zfs set for ZFS Properties in Oracle Solaris 11 Express
PROPERTY EDIT INHERIT VALUES

available NO NO <size>
compressratio NO NO <1.00x or higher if compressed>
creation NO NO <date>
defer_destroy NO NO yes | no
keystatus NO NO undefined | unavailable | available
mounted NO NO yes | no
origin NO NO <snapshot>
referenced NO NO <size>
rekeydate NO NO <date>
type NO NO filesystem | volume | snapshot
used NO NO <size>
usedbychildren NO NO <size>
usedbydataset NO NO <size>
usedbyrefreservation NO NO <size>
usedbysnapshots NO NO <size>
userrefs NO NO <count>
aclinherit YES YES discard | noallow | restricted | passthrough | passthrough-x
atime YES YES on | off
canmount YES NO on | off | noauto
casesensitivity NO YES sensitive | insensitive | mixed
checksum YES YES on | off | fletcher2 | fletcher4 | sha256
compression YES YES on | off | lzjb | gzip | gzip-[1-9] | zle
copies YES YES 1 | 2 | 3
dedup YES YES on | off | verify | sha256[,verify]
devices YES YES on | off
encryption NO YES on | off | aes-128-ccm | aes-192-ccm | aes-256-ccm | aes-128-gcm | aes-
192-gcm | aes-256-gcm

4 - 28
Appendix-I
PROPERTY EDIT INHERIT VALUES

exec YES YES on | off


keysource YES YES raw | hex | passphrase,prompt | file://<path>
logbias YES YES latency | throughput
mlslabel YES YES <sensitivity label>
mountpoint YES YES <path> | legacy | none
nbmand YES YES on | off
normalization NO YES none | formC | formD | formKC | formKD
primarycache YES YES all | none | metadata
quota YES NO <size> | none
readonly YES YES on | off
recordsize YES YES 512 to 128k, power of 2
refquota YES NO <size> | none
refreservation YES NO <size> | none
reservation YES NO <size> | none
rstchown YES YES on | off
secondarycache YES YES all | none | metadata
setuid YES YES on | off
sharenfs YES YES on | off | share(1M) options
sharesmb YES YES on | off | sharemgr(1M) options
snapdir YES YES hidden | visible
sync YES YES standard | always | disabled
utf8only NO YES on | off
version YES NO 1 | 2 | 3 | 4 | current
volblocksize NO YES 512 to 128k, power of 2
volsize YES NO <size>
vscan YES YES on | off
xattr YES YES on | off
zoned YES YES on | off
userused@... NO NO <size>
groupused@... NO NO <size>
userquota@... YES NO <size> | none
groupquota@... YES NO <size> | none

4 - 29
References
Download Oracle Solaris 11 Express
www.oracle.com/technetwork/server-storage/solaris11/overview/

Oracle Solaris ZFS Administration Guide


download.oracle.com/docs/cd/E19963-01/html/821-1448

Oracle Solaris 11 Information Library


download.oracle.com/docs/cd/E19963-01/index.html

Write to ISV Technical Support


<isvsupport_ww@oracle.com>

4 - 30
<Insert Picture Here>

Oracle Solaris 11 Express ZFS


<Satyajit.Tripathi@Oracle.COM>

You might also like