You are on page 1of 19

With:

ZFS Send
ZFS REPLICATION ZFS Receive

Jason Pack
CISC-361
Spring 2011
ARCHITECTURE

  Virtual devices (“vdev”s) in groups (“zpool”s)

  vdev: A block device


  Collection of files
  Physical storage device, or group thereof
  Logical section of storage device, or group thereof
  i.e. hard drive partition
  May have internal redundancy

  zpool: Collection of vdev’s


  Data is striped across all vdev’s in the pool
  Redundancy possible among vdev’s

architecture | data structure | data manipulation | replication | zfs alternatives


ARCHITECTURE: VDEV

Storage Storage
Computer Directory File
array medium

Storage /tmp/zfs_foo
disk1 part1 /
array

Hard disk disk2 part2

Flash disk disk3

Floppy

disk

Tape drive

architecture | data structure | data manipulation | replication | zfs alternatives


ARCHITECTURE: ZPOOL

zpool0 [/dev/sdb] zpool1


[/tmp]
[/etc]

[/dev/sdb2]

[/dev/sda1] [/usr]
[/dev/sda2]
[/dev/sda3]
[ /usr/local ]

architecture | data structure | data manipulation | replication | zfs alternatives


ARCHITECTURE: ZPOOL

  Can be configured to store duplicates of data block


  On same vdev – internal redundancy
  Across multiple vdev’s – zpool redundancy
  Tolerates failure or removal of a vdev

  ZFS preferentially distributes redundant physical blocks


across different vdev’s
  Best practice: each vdev has internal redundancy

architecture | data structure | data manipulation | replication | zfs alternatives


ARCHITECTURE: ZPOOL

  Data is striped across all vdev’s in a zpool.


  Striping is dynamic, enhancing scalability
  Adding a vdev causes new writes to be striped to it, but old stripes
remain on old vdev’s.

zpool

vdev1 vdev2 vdev3

data[1/2] data[2/2] data[1/2] data[2/2]

architecture | data structure | data manipulation | replication | zfs alternatives


ARCHITECTURE: ZPOOL

  Data is striped across all vdev’s in a zpool.


  Striping is dynamic, enhancing scalability
  Adding a vdev causes new writes to be striped to it, but old stripes
remain on old vdev’s.

zpool

vdev1 vdev2 vdev3

data data newdata data data newdata newdata


[1/2] [2/2] [1/1] [1/2] [2/2] [1/1] [1/1]

architecture | data structure | data manipulation | replication | zfs alternatives


DATA STUCTURE

  Data Block: Physical block on a vdev


  File Control Block (FCB): Metadata for a file or directory
  Checksum value for each child block is stored at its parent
  Including FCB, data block, and root node
  References to child FCB(s) and/or child data block(s)

  Can have redundant metadata FCBs for each data block


  Can have redundant physical blocks for each FCB

  Tree of file control blocks (FCBs) rooted at “Uberblock”

architecture | data structure | data manipulation | replication | zfs alternatives


DATA MANIPULATION

  All operations are Copy -on-Write (COW)


1.  Duplicate FCB(s) for target file
2.  Duplicate data block(s) for target file
3.  Add references to new data block(s) to new FCB(s)
4.  Calculate checksum for new data block(s), store in new FCB(s)

Image source: www.opensolaris.org/os/community/zfs


DATA MANIPULATION

5.  Calculate checksum for new FCB(s).


6.  Duplicate parent FCB(s), set pointers to new FCB’s.
7.  Store new FCB(s)’ checksum in new parent.
•  don’t de-allocate old FCB
8.  Loop over steps 6 and 7 until reaching pointer to uberblock

Image source: www.opensolaris.org/os/community/zfs


DATA MANIPULATION

9.  Save old uberblock pointer as a snapshot


10. U pdate new uberblock pointer to refer to new uberblock
  The previous state of the entire filesystem is saved in the
“snapshot” rooted at the old uberblock.

Image source: www.opensolaris.org/os/community/zfs


REPLICATION

  Snapshots are incremental: record changes in filesystem


  Old uberblock pointer points to tree that contains old FCB’s before the
write, and all the old FCB’s.
  $  zfs  send  
  Input:
  FCB (a file or directory)
  Writes stream containing entire tree rooted at that FCB to stdout
  Tree  file, network stream, or a pipe
  $  zfs  receive  
  Input:
  Stream generated by a zfs  send command
  Destination vdev (optional)
  Creates snapshot from stream
  Writes snapshot to file or to vdev
  Can be written to filesystem to force a rollback to that snapshot

architecture | data structure | data manipulation | replication | zfs alternatives


REPLICATION: EXAMPLES

Write stream of entire filesystem pool/fs at snapshot “a” to file


$  zfs  send  pool/fs@a  >  /tmp/backup_full    
 
 
pool/fs@a fs@a /tmp/backup_full
 
 

Read stream of pool/fs from file, write to poolB/received  


$  zfs  recv  poolB/received/fs@a  <  /tmp/backup_full  
/tmp/backup_full
  poolB/received/fs

architecture | data structure | data manipulation | replication | zfs alternatives


REPLICATION: EXAMPLES

Write stream of incremental changes to filesystem pool/fs at


snapshot “a” to file  
$  zfs  send  -­‐i  a  pool/fs@b  >  /tmp/backup.today    
 
  pool/fs@b Changes in fs /tmp/backup_full
 
 
Read stream of pool/fs from file, write to poolB/received  
$  zfs  receive  poolB/received/fs  <  /tmp/backup.today  

/tmp/backup_full
poolB/received/fs

architecture | data structure | data manipulation | replication | zfs alternatives


ALTERNATIVES

  Believe it or not, there were other solutions before ZFS


  RAID – Redundant Array of Inexpensive Disks
  Flexible – different configurations for speed vs integriy
  Usually requires special hardware
  Does not detect R/W errors from hardware stack

  Rsync
  Command-line tool
  Synchronizes filesystems, either local or connected via network
  Works independently of filesystem type
  Multiplatform support

  Let’s contrast rsync backups with ZFS backups

architecture | data structure | data manipulation | replication | zfs alternatives


RSYNC

$  rsync  [options]  source  host:destination  

  Synchronize remote or local filesystem with a local one


  Replaces rcp/scp (older UNIX remote file transfer tools)
  Works remotely over shell connection (i.e. SSH) or rsync protocol

  Transfers only the dif ferences between two filesystems


  Break each file into fixed-size pieces
  Calculate and rolling checksum for each piece
  *Rolling checksum: hash of data in fixed-size window moving through file
  Receiver sends checksums to sender
  Sender compares them to rolling checksums from its version of file
  Verifies differences using MD5 hash.
  Sender sends differing parts to receiver, and instructions for merging
  Receiver merges changes into its version of the file

architecture | data structure | data manipulation | replication | zfs alternatives


RSYNC VS ZFS REPLICATION

  Both use checksum of fixed-size data blocks in a file to


determine differences between two filesystems

  ZFS keeps copy of old files via snapshot system


  Checksums always stored separately from data blocks
  Rsync only keeps track of difference between source and dest
  Integrity of rsync-managed data is only as reliable as the filesystem

  Rsync operates higher up on the software stack than ZFS


  Vulnerable to errors in intermediate layers

architecture | data structure | data manipulation | replication | zfs alternatives


SUMMARY

  ZFS provides hardware-independent data redundancy


  Dynamic striping across multiple physical media
  Maintains multiple copies of data blocks.
  Verify data integrity by comparing checksums
  ZFS provides hardware-independent filesystem integrity
  Maintains multiple copies of file control blocks
  Verify filesystem integrity by comparing checksums of FCB’s
  ZFS provides incremental backup support
  Snapshots of filesystem made possible by copy-on-write
  ZFS has advantages over rsync
  Checksums are always stored separately from data
  ZFS checksums are created at write-time,
while rsync checksums generated on-demand
  ZFS guarantees accuracy of checksums while rsync may not

architecture | data structure | data manipulation | replication | zfs alternatives


FIN.

Any questions?

T h ank s to t h e Ope nS olaris community for ZFS info rmation


T h ank s to t h e B S D docum e nt at ion for r sync inform ation

architecture | data structure | data manipulation | replication | zfs alternatives

You might also like