Solaris 10 Administration Topics Workshop 3 - File Systems

By Peter Baer Galvin
For Usenix

Last Revision April 2009
Copyright 2009 Peter Baer Galvin - All Rights Reserved

Saturday, May 2, 2009

About the Speaker
Peter Baer Galvin - 781 273 4100 pbg@cptech.com www.cptech.com peter@galvin.info My Blog: www.galvin.info Bio
Peter Baer Galvin is the Chief Technologist for Corporate Technologies, Inc., a leading systems integrator and VAR, and was the Systems Manager for Brown University's Computer Science Department. He has written articles for Byte and other magazines. He was contributing editor of the Solaris Corner for SysAdmin Magazine , wrote Pete's Wicked World, the security column for SunWorld magazine, and Pete’s Super Systems, the systems administration column there. He is now Sun columnist for the Usenix ;login: magazine. Peter is co-author of the Operating Systems Concepts and Applied Operating Systems Concepts texbooks. As a consultant and trainer, Mr. Galvin has taught tutorials in security and system administration and given talks at many conferences and institutions.

Copyright 2009 Peter Baer Galvin - All Rights Reserved

2

Saturday, May 2, 2009

Objectives
Cover a wide variety of topics in Solaris 10 Useful for experienced system administrators Save time Avoid (my) mistakes Learn about new stuff Answer your questions about old stuff Won't read the man pages to you Workshop for hands-on experience and to reinforce concepts Note – Security covered in separate tutorial

Copyright 2009 Peter Baer Galvin - All Rights Reserved

3

Saturday, May 2, 2009

More Objectives
What makes novice vs. advanced administrator? Bytes as well as bits, tactics and strategy Knows how to avoid trouble How to get out of it once in it How to not make it worse Has reasoned philosophy Has methodology
Copyright 2009 Peter Baer Galvin - All Rights Reserved
4

Saturday, May 2, 2009

Prerequisites
Recommend at least a couple of years of Solaris experience experience Or at least a few years of other Unix Best is a few years of admin experience, mostly on Solaris

Copyright 2009 Peter Baer Galvin - All Rights Reserved

5

Saturday, May 2, 2009

About the Tutorial
Every SysAdmin has a different knowledge set A lot to cover, but notes should make good reference
So some covered quickly, some in detail
Setting base of knowledge

Please ask questions
But let’s take off-topic off-line Solaris BOF
Copyright 2009 Peter Baer Galvin - All Rights Reserved
6

Saturday, May 2, 2009

Fair Warning
Sites vary Circumstances vary Admin knowledge varies My goals Provide information useful for each of you at your sites Provide opportunity for you to learn from each other
Copyright 2009 Peter Baer Galvin - All Rights Reserved
7

Saturday, May 2, 2009

Why Listen to Me
20 Years of Sun experience Seen much as a consultant Hopefully, you've used:
My Usenix ;login: column The Solaris Security FAQ SunWorld “Pete's Wicked World” SunWorld “Pete's Super Systems” Unix Secure Programming FAQ (out of date) Operating System Concepts (The Dino Book), now 8th ed Applied Operating System Concepts

The Solaris Corner @ www.samag.com

Copyright 2009 Peter Baer Galvin - All Rights Reserved

8

Saturday, May 2, 2009

Slide Ownership
As indicated per slide, some slides copyright Sun Microsystems Feel free to share all the slides - as long as you don’t charge for them or teach from them for fee

Copyright 2009 Peter Baer Galvin - All Rights Reserved

9

Saturday, May 2, 2009

Overview
Lay of the Land

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Saturday, May 2, 2009

Schedule

Times and Breaks

Copyright 2009 Peter Baer Galvin - All Rights Reserved

11

Saturday, May 2, 2009

Coverage
Solaris 10+, with some Solaris 9 where needed Selected topics that are new, different, confusing, underused, overused, etc

Copyright 2009 Peter Baer Galvin - All Rights Reserved

12

Saturday, May 2, 2009

Outline
Overview Objectives Choosing the most appropriate file system(s) UFS / SDS Veritas FS / VM (not in detail) ZFS

Copyright 2009 Peter Baer Galvin - All Rights Reserved

13

Saturday, May 2, 2009

Polling Time
Solaris releases in use? Plans to upgrade? Other OSes in use? Use of Solaris rising or falling? SPARC and x86 OpenSolaris?
Copyright 2009 Peter Baer Galvin - All Rights Reserved
14

Saturday, May 2, 2009

Your Objectives?

Copyright 2009 Peter Baer Galvin - All Rights Reserved

15

Saturday, May 2, 2009

Lab Preparation
Have device capable of telnet on the USENIX network Or have a buddy Learn your “magic number” Telnet to 131.106.62.100+”magic number” User “root, password “lisa” It’s all very secure
Copyright 2009 Peter Baer Galvin - All Rights Reserved
16

Saturday, May 2, 2009

Lab Preparation
Or... Use virtualbox Use your own system Use a remote machine you have legit access to

Copyright 2009 Peter Baer Galvin - All Rights Reserved

17

Saturday, May 2, 2009

Choosing the Most Appropriate File Systems

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Saturday, May 2, 2009

Choosing the Most Appropriate File Systems
Many file systems, many not optional (tmpfs et al) Where you have choice, how to choose? Consider Solaris version being used < S10 means no ZFS ISV support For each ISV make sure desired FS is supported Apps, backups, clustering Priorities Now weigh priorities of performance, reliability, experience, features, risk / reward
Copyright 2009 Peter Baer Galvin - All Rights Reserved
19

Saturday, May 2, 2009

Consider...
Root file system

Pros and cons of mixing file systems Not much value in using vxfs / vxvm here unless used elsewhere Interoperability (need to detach from one type of system and attach to another?) Cost Supportability & support model Non-production vs. production use
Copyright 2009 Peter Baer Galvin - All Rights Reserved
20

Saturday, May 2, 2009

Root Disk Mirroring
The Crux of Performance

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Saturday, May 2, 2009

Topics

•Root disk mirroring •ZFS

Copyright 2009 Peter Baer Galvin - All Rights Reserved

22

Saturday, May 2, 2009

Root Disk Mirroring
Complicated because Must be bootable Want it protected from disk failure And want the protection to work Can increase or decrease upgrade complexity Veritas Live upgrade
Copyright 2009 Peter Baer Galvin - All Rights Reserved
23

Saturday, May 2, 2009

Manual Mirroring
Vxvm encapsulation can cause lack of availability Vxvm needs a rootdg disk Any automatic mirroring can propagate errors Consider
Use disksuite (Solaris Volume Manager) to mirror boot disk Use 3rd disk as rootdg, 3rd disksuite metadb, manual mirror copy Or use 10Mb rootdg on 2 boot disks in disksuite to do the mirroring Best of all worlds – details in column at www.samag.com/solaris
Copyright 2009 Peter Baer Galvin - All Rights Reserved
24

Saturday, May 2, 2009

Manual Mirroring
Sometimes want more than no mirroring, less than real mirroring Thus "manual mirroring" Nightly cron job to copy partitions elsewhere Can be used to duplicate root disk, if installboot used Combination of newfs, mount, ufsdump | ufsrestore Quite effective, useful, and cheap Easy recovery from corrupt root image, malicious error, sysadmin error Has saved at least one client But disk failure can require manual intervention Complete script can be found at www.samag.com/solaris
Copyright 2009 Peter Baer Galvin - All Rights Reserved
25

Saturday, May 2, 2009

Best Practice – Root Disk
Have 4 disks for root! 1st is primary boot device 2nd is disksuite mirror of first 3rd is manual mirror of 1st 4th is manual mirror, kept on a shelf! Put nothing but systems files on these disks (/, /var, /opt, /usr, swap)

Copyright 2009 Peter Baer Galvin - All Rights Reserved

26

Saturday, May 2, 2009

Aside: Disk Performance
Which is faster?

73GB drive 10000 RPM 3Gb/sec

300GB drive 10000 RPM 3Gb/sec
27

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Saturday, May 2, 2009

UFS / SDS

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Saturday, May 2, 2009

UFS Overview
Standard Pre-Solaris 10 file system Many years old, updated continously But still showing its age No integrated volume manager, instead use SDS (disk suite) Very fast, but feature poor For example snapshots exist but only useful for backups Painful to manage, change, repair
Copyright 2009 Peter Baer Galvin - All Rights Reserved
29

Saturday, May 2, 2009

Features

64-bit pointers 16TB file systems (on 64-bit Solaris) 1TB maximum file size metadata logging (by default) increases consistent after a crash performance and keeps file systems (usually) Lots of ISV and internal command (dump) support Only bootable Solaris file system (until S10 10/08) Dynamic multipathing, but via separate “traffic manager” facility
Copyright 2009 Peter Baer Galvin - All Rights Reserved
30

Saturday, May 2, 2009

Issues
Sometimes there is still corruption Need to run fsck Sometimes it fails Many limits Many features lacking (compared to ZFS) Lots of manual administration tasks format to slice up a disk newfs to format the file system, fsck to check it mount and /etc/vfstab to mount a file system share commands, plus svcadm commands, to NFS export Plus separate volume management
Copyright 2009 Peter Baer Galvin - All Rights Reserved
31

Saturday, May 2, 2009

Volume Management
Separate set of commands (meta*) to manage volumes (RAID et al) For example, to mirror the root file system Have 2 disks with identical partitioning Have 2 small partition per disk for meta-data (here slices 5 and 6)
newfs the file systems

Create meta-data state databases (at least 3, for quorum) # metadb -a /dev/dsk/c0t0d0s5 # metadb -a /dev/dsk/c0t0d0s6 # metadb -a /dev/dsk/c0t1d0s5 # metadb -a /dev/dsk/c0t1d0s6
Copyright 2009 Peter Baer Galvin - All Rights Reserved
32

Saturday, May 2, 2009

Volume Management (cont)
Initialize submirrors (components of mirrors) and mirror the partitions - here we do /, swap, and /var
# metainit -f d10 1 1 c0t0d0s0 # metainit -f d20 1 1 c0t1d0s0 # metainit d0 -m d10

Make the new / bootable
# metaroot d0 # metainit -f d11 1 1 c0t0d0s1 # metainit -f d21 1 1 c0t1d0s1 # metainit d1 -m d11 # metainit -f d14 1 1 c0t0d0s4 # metainit -f d24 1 1 c0t1d0s4 # metainit d4 -m d14

# metainit -f d17 1 1 c0t0d0s7 # metainit -f d27 1 1 c0t1d0s7 # metainit d7 -m d17
Copyright 2009 Peter Baer Galvin - All Rights Reserved
33

Saturday, May 2, 2009

Volume Management (cont)
Update /etc/vfstab to reflect new meta devices
/dev/md/dsk/d1 /dev/md/dsk/d4 /dev/md/dsk/d7 swap ufs no 1 1 yes yes /dev/md/rdsk/d4 /var

/dev/md/rdsk/d7 /export ufs

Finally attach the submirror to each device to be mirrored
# metattach d0 d20 # metattach d1 d21 # metattach d4 d24 # metattach d7 d27

Now the root disk is mirrored, and commands such as Solaris upgrade, live upgrade, and boot understand that

Copyright 2009 Peter Baer Galvin - All Rights Reserved

34

Saturday, May 2, 2009

Veritas VM / FS

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Saturday, May 2, 2009

Overview
64-bit

A popular, commercial addition to Solaris Integrated volume management (vxfs + vxvm) Mirrored root disk via “encapsulation” Good ISV support Good extended features such as snapshots, replication Shrink and grow file systems Extent based (for better and worse), journaled, clusterable Cross-platform
Copyright 2009 Peter Baer Galvin - All Rights Reserved
36

Saturday, May 2, 2009

Features
Very large limits Dynamic multipathing included Hot spares to automatically replace failed disks Dirty region logging (DRL) volume crash

transaction logs for fast recovery from But still can require consistency check
Copyright 2009 Peter Baer Galvin - All Rights Reserved
37

Saturday, May 2, 2009

Issues
$$$ Adds supportability complexities (who do you call) first) Complicates OS upgrades (unencapsulate Fairly complex to manage Comparison of performance vs. ZFS at
http://www.sun.com/software/whitepapers/ solaris10/zfs_veritas.pdf
Copyright 2009 Peter Baer Galvin - All Rights Reserved
38

Saturday, May 2, 2009

ZFS

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Saturday, May 2, 2009

ZFS

Looks to be the “next great thing” Shipped officially in S10U2 (the 06/06 release) From scratch file system Includes volume management, file system, reliability, scalability, performance, snapshots, clones, replication 128-bit file system, almost everything is “infinite” Checksumming throughout Simple, endian independent, export/importable… Still using traffic manager for multipathing

(some following slides are from ZFS talk by Jeff Bonwick and Bill Moore – ZFS team leads at Sun)
Copyright 2009 Peter Baer Galvin - All Rights Reserved

40

Saturday, May 2, 2009

Trouble with Existing Filesystems
No defense against silent data corruption
Any defect in disk, controller, cable, driver, or firmware can corrupt data silently; like running a server without ECC memory

Brutal to manage
Labels, partitions, volumes, provisioning, grow/shrink, /etc/ vfstab... Lots of limits: filesystem/volume size, file size, number of files, files per directory, number of snapshots, ... Not portable between platforms (e.g. x86 to/from SPARC)

Dog slow
Linear-time create, fat locks, fixed block size, naïve prefetch, slow random writes, dirty region logging
Copyright 2009 Peter Baer Galvin - All Rights Reserved
41

Saturday, May 2, 2009

Design Principles
Pooled storage
Completely eliminates the antique notion of volumes Does for storage what VM did for memory

End-to-end data integrity
Historically considered “too expensive” Turns out, no it isn't And the alternative is unacceptable

Transactional operation
Keeps things always consistent on disk Removes almost all constraints on I/O order Allows us to get huge performance wins
Copyright 2009 Peter Baer Galvin - All Rights Reserved
42

Saturday, May 2, 2009

Why “volumes” Exist

In the beginning, each filesystem managed a single disk Customers wanted more space, bandwidth, reliability
Rewrite filesystems to handle many disks: hard Insert a little shim (“volume”) to cobble disks together: easy

An industry grew up around the FS/volume model
Filesystems, volume managers sold as separate products Inherent problems in FS/volume interface can't be fixed
Copyright 2009 Peter Baer Galvin - All Rights Reserved
43

Saturday, May 2, 2009

Traditional Volumes
FS Volume (stripe) FS Volume (mirror)

Copyright 2009 Peter Baer Galvin - All Rights Reserved

44

Saturday, May 2, 2009

ZFS Pools
Abstraction: malloc/free No partitions to manage Grow/shrink automatically All bandwidth always available All storage in the pool is shared

Copyright 2009 Peter Baer Galvin - All Rights Reserved

45

Saturday, May 2, 2009

ZFS Pooled Storage
FS FS Storage Pool (RAIDZ) FS FS FS Storage Pool (Mirror)

Copyright 2009 Peter Baer Galvin - All Rights Reserved

46

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

47

Saturday, May 2, 2009

ZFS Data Integrity Model
Everything is copy-on-write
Never overwrite live data On-disk state always valid – no “windows of vulnerability” No need for fsck(1M)

Everything is transactional
No need for journaling Everything is checksummed No silent data corruption

Related changes succeed or fail as a whole

No panics due to silently corrupted metadata
Copyright 2009 Peter Baer Galvin - All Rights Reserved

48

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

49

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

50

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

51

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

52

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

53

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

54

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

55

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

56

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

57

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

58

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

59

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

60

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

61

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

62

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

63

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

64

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

65

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

66

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

67

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

68

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

69

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

70

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

71

Saturday, May 2, 2009

Terms
Pool - set of disks in one or more RAID formats (i.e. mirrored stripe) No “/” File system - mountable-container of files Data set - file system, block device, Named via pool/path[@snapshot]
Copyright 2009 Peter Baer Galvin - All Rights Reserved
72

snapshot, volume or clone within a pool

Saturday, May 2, 2009

Terms (cont)
ZIL - ZFS intent log On-disk duplicate of in-memory log of changes to make to data sets Write goes to memory, ZIL, is ARC - in-memory read cache L2ARC - level 2 ARC - on flash memory
Copyright 2009 Peter Baer Galvin - All Rights Reserved
73

acknowledged, then goes to disk

Saturday, May 2, 2009

What ZFS doesn’t do
Can’t remove individual devices from pools Rather, replace the device, or 3-way mirror Can’t shrink a pool (yet) Can add individual devices, but not optimum (yet) If adding disk to RAIDZ or RAIDZ2, then end up with RAIDZ(2)+ 1 concatenated device Add a mirror pair or RAIDZ(2) set
Copyright 2009 Peter Baer Galvin - All Rights Reserved
74

including the device and then remove the device

Instead add full RAID elements to a pool

Saturday, May 2, 2009

# zpool

missing command

zpool
create [-fn] [-o property=value] ... [-O file-system-property=value] ... [-m mountpoint] [-R root] <pool> <vdev> ... destroy [-f] <pool> add [-fn] <pool> <vdev> ... remove <pool> <device> ... list [-H] [-o property[,...]] [pool] ... iostat [-v] [pool] ... [interval [count]] status [-vx] [pool] ... online <pool> <device> ... offline [-t] <pool> <device> ... clear <pool> [device]

usage: zpool command args ... where 'command' is one of the following:

Copyright 2009 Peter Baer Galvin - All Rights Reserved

75

Saturday, May 2, 2009

zpool (cont)
scrub [-s] <pool> ...

attach [-f] <pool> <device> <new-device> detach <pool> <device> replace [-f] <pool> <device> [new-device]

import [-d dir] [-D] import [-o mntopts] [-o property=value] ... [-d dir | -c cachefile] [-D] [-f] [-R root] -a import [-o mntopts] [-o property=value] ... [-d dir | -c cachefile] [-D] [-f] [-R root] <pool | id> [newpool] export [-f] <pool> ... upgrade upgrade -v upgrade [-V version] <-a | pool ...> history [-il] [<pool>] ... get <"all" | property[,...]> <pool> ... set <property=value> <pool>

Copyright 2009 Peter Baer Galvin - All Rights Reserved

76

Saturday, May 2, 2009

zpool (cont)
# zpool create ezfs raidz c2t0d0 c3t0d0 c4t0d0 c5t0d0 # zpool status -v pool: ezfs state: ONLINE scrub: none requested config: NAME ezfs raidz c2t0d0 c3t0d0 c4t0d0 c5t0d0 STATE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE READ WRITE CKSUM 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

errors: No known data errors

Copyright 2009 Peter Baer Galvin - All Rights Reserved

77

Saturday, May 2, 2009

zpool (cont)
pool: zfs state: ONLINE scrub: none requested config: NAME zfs raidz c0d0s7 c0d1s7 c1d1 c1d0 STATE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE READ WRITE CKSUM 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

errors: No known data errors

Copyright 2009 Peter Baer Galvin - All Rights Reserved

78

Saturday, May 2, 2009

zpool (cont)
(/)# zpool iostat -v capacity pool used avail ---------- ----- ----bigp 630G 392G raidz 630G 392G c0d0s6 c0d1s6 c1d0s6 c1d1s6 ---------- ----- ----operations read write ----- ----2 4 2 4 0 2 0 2 0 2 0 2 ----- ----bandwidth read write ----- ----41.3K 496K 41.3K 496K 8.14K 166K 7.77K 166K 24.1K 166K 22.2K 166K ----- -----

Copyright 2009 Peter Baer Galvin - All Rights Reserved

79

Saturday, May 2, 2009

# zpool status -v pool: rpool state: ONLINE

zpool (cont)
NAME rpool mirror c0d0s0 c0d1s0 STATE ONLINE ONLINE ONLINE ONLINE READ WRITE CKSUM 0 0 0 0 0 0 0 0 0 0 0 0

scrub: none requested config:

errors: No known data errors pool: zpbg state: ONLINE scrub: none requested config: NAME zpbg raidz1 c4t0d0 c4t1d0 c5t0d0 c5t1d0 c6t0d0 STATE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE ONLINE READ WRITE CKSUM 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

errors: No known data errors

Copyright 2009 Peter Baer Galvin - All Rights Reserved

80

Saturday, May 2, 2009

zpool iostat -v capacity pool used avail ---------- ----- ----rpool 6.72G 225G mirror 6.72G 225G c0d0s0 c0d1s0 ---------- ----- ----zpbg 3.72T 833G raidz1 3.72T 833G c4t0d0 c4t1d0 c5t0d0 c5t1d0 c6t0d0 ---------- ----- -----

zpool (cont)
operations read write ----- ----0 1 0 1 0 0 0 0 ----- ----0 0 0 0 0 0 0 0 0 0 0 0 0 0 ----- ----bandwidth read write ----- ----9.09K 11.6K 9.09K 11.6K 5.01K 11.7K 5.09K 11.7K ----- ----32.0K 1.24K 32.0K 1.24K 9.58K 331 10.3K 331 10.4K 331 10.3K 331 9.54K 331 ----- ----81

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Saturday, May 2, 2009

zpool (cont)
Note that for import and export, a pool is the delineator You can’t import or export a file system because it’s an integral part of a pool Might cause you to use smaller pools than other

Copyright 2009 Peter Baer Galvin - All Rights Reserved

82

Saturday, May 2, 2009

# zfs

zfs
create [-p] [-o property=value] ... <filesystem> create [-ps] [-b blocksize] [-o property=value] ... -V <size> <volume> destroy [-rRf] <filesystem|volume|snapshot>

missing command usage: zfs command args ... where 'command' is one of the following:

snapshot [-r] [-o property=value] ... <filesystem@snapname| volume@snapname> rollback [-rRf] <snapshot> clone [-p] [-o property=value] ... <snapshot> <filesystem|volume> promote <clone-filesystem> rename <filesystem|volume|snapshot> <filesystem|volume|snapshot> rename -p <filesystem|volume> <filesystem|volume> rename -r <snapshot> <snapshot> 83

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Saturday, May 2, 2009

zfs (cont)
list [-rH] [-o property[,...]] [-t type[,...]] [-s property] ... [-S property] ... [filesystem|volume|snapshot] ... set <property=value> <filesystem|volume|snapshot> ... get [-rHp] [-o field[,...]] [-s source[,...]] <"all" | property[,...]> [filesystem|volume| snapshot] ... inherit [-r] <property> <filesystem|volume|snapshot> ... upgrade [-v] upgrade [-r] [-V version] <-a | filesystem ...> mount mount [-vO] [-o opts] <-a | filesystem> unmount [-f] <-a | filesystem|mountpoint> share <-a | filesystem> unshare [-f] <-a | filesystem|mountpoint>

Copyright 2009 Peter Baer Galvin - All Rights Reserved

84

Saturday, May 2, 2009

zfs (cont)

send [-R] [-[iI] snapshot] <snapshot> receive [-vnF] <filesystem|volume|snapshot> receive [-vnF] -d <filesystem> allow [-ldug] <"everyone"|user|group>[,...] <perm|@setname>[,...] <filesystem|volume> allow [-ld] -e <perm|@setname>[,...] <filesystem|volume> allow -c <perm|@setname>[,...] <filesystem|volume> allow -s @setname <perm|@setname>[,...] <filesystem|volume> unallow [-rldug] <"everyone"|user|group>[,...] [<perm|@setname>[,...]] <filesystem|volume> unallow [-rld] -e [<perm|@setname>[,...]] <filesystem|volume> unallow [-r] -c [<perm|@setname>[,...]] <filesystem|volume>

unallow [-r] -s @setname [<perm|@setname>[,...]] <filesystem| volume> Each dataset is of the form: pool/[dataset/]*dataset[@name] For the property list, run: zfs set|get For the delegated permission list, run: zfs allow|unallow

Copyright 2009 Peter Baer Galvin - All Rights Reserved

85

Saturday, May 2, 2009

# zfs get usage:

zfs (cont)
get [-rHp] [-o field[,...]] [-s source[,...]] <"all" | property[,...]> [filesystem|volume|snapshot] ... PROPERTY available compressratio creation mounted origin referenced type used aclinherit passthrough aclmode atime EDIT NO NO NO NO NO NO NO NO YES YES YES INHERIT NO NO NO NO NO NO NO NO YES YES YES VALUES <size> <1.00x or higher if compressed> <date> yes | no <snapshot> <size> filesystem | volume | snapshot <size> discard | noallow | restricted | discard | groupmask | passthrough on | off

missing property argument

The following properties are supported:

Copyright 2009 Peter Baer Galvin - All Rights Reserved

86

Saturday, May 2, 2009

zfs (cont)
canmount YES NO YES YES YES YES YES YES YES NO YES YES YES YES YES YES YES casesensitivity checksum sha256 compression copies devices exec mountpoint nbmand normalization formKD primarycache quota readonly recordsize refquota refreservation reservation

NO YES YES YES YES YES YES YES YES YES YES NO YES YES NO NO NO

on | off | noauto sensitive | insensitive | mixed on | off | fletcher2 | fletcher4 | on | off | lzjb | gzip | gzip-[1-9] 1 | 2 | 3 on | off on | off <path> | legacy | none on | off none | formC | formD | formKC | all | none | metadata <size> | none on | off 512 to 128k, power of 2 <size> | none <size> | none <size> | none
87

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Saturday, May 2, 2009

zfs (cont)
secondarycache setuid shareiscsi sharenfs options sharesmb options snapdir utf8only version volblocksize volsize vscan xattr zoned YES YES YES YES YES YES NO YES NO YES YES YES YES YES YES YES YES YES YES YES NO YES NO YES YES YES all | none | metadata on | off on | off | type=<type> on | off | share(1M) on | off | sharemgr(1M) hidden | visible on | off 1 | 2 | 3 | current 512 to 128k, power of 2 <size> on | off on | off on | off

Sizes are specified in bytes with standard units such as K, M, G, etc. User-defined properties can be specified by using a name containing a colon (:).

Copyright 2009 Peter Baer Galvin - All Rights Reserved

88

Saturday, May 2, 2009

zfs (cont)
(/)# zfs list NAME USED AVAIL REFER bigp 630G 384G bigp/big 630G 384G 630G (root@sparky)-(7/pts)-(06:35:11/05/05)(/)# zfs snapshot bigp/big@5-nov (root@sparky)-(8/pts)-(06:35:11/05/05)(/)# zfs list NAME USED AVAIL REFER bigp 630G 384G bigp/big 630G 384G 630G bigp/big@5-nov 0 630G MOUNTPOINT /zfs/bigp /zfs/bigp/big

MOUNTPOINT /zfs/bigp /zfs/bigp/big /zfs/bigp/big@5-nov

# zfs send bigp/big@5-nov | ssh host zfs receive poolB/received/ big@5-nov # zfs send -i 5-nov big/bigp@6-nov | ssh host \ zfs receive poolB/received/big
Copyright 2009 Peter Baer Galvin - All Rights Reserved
89

Saturday, May 2, 2009

# zpool history History for 'zpbg': 2006-04-03.11:47:44 zpool create -f zpbg raidz c5t0d0 c10t0d0 c11t0d0 c12t0d0 c13t0d0 2006-04-03.18:19:48 2006-04-03.18:41:39 2006-04-03.19:04:22 2006-04-03.19:37:56 2006-04-03.19:44:22 2006-04-03.20:12:34 2006-04-03.20:14:32 zfs zfs zfs zfs zfs zfs zfs receive zpbg/imp receive zpbg/home receive zpbg/photos set mountpoint=/export/home zpbg/home receive zpbg/mail set mountpoint=/var/mail zpbg/mail receive zpbg/mqueue

zfs (cont)

2006-04-03.20:15:01 zfs set mountpoint=/var/spool/mqueue zpbg/ mqueue # zfs create -V 2g tank/volumes/v2 # zfs set shareiscsi=on tank/volumes/v2 # iscsitadm list target Target: tank/volumes/v2 iSCSI Name: iqn.1986-03.com.sun:02:984fe301-c412-ccc1-cc80cf9a72aa062a Connections: 0
Copyright 2009 Peter Baer Galvin - All Rights Reserved
90

Saturday, May 2, 2009

zpool history -l
Shows user name, host name, and zone of command
# zpool history -l users History for ’users’: 2008-07-10.09:43:05 zpool create users mirror c1t1d0 c1t2d0 [user root on corona:global] 2008-07-10.09:43:13 zfs create users/marks [user root on corona:global] 2008-07-10.09:43:44 zfs destroy users/marks [user root on corona:global] 2008-07-10.09:43:48 zfs create users/home [user root on corona:global] 2008-07-10.09:43:56 zfs create users/home/markm [user root on corona:global] 2008-07-10.09:44:02 zfs create users/home/marks [user root on corona:global]
Copyright 2009 Peter Baer Galvin - All Rights Reserved
91

Saturday, May 2, 2009

zpool history -i
Shows zfs internal activities - useful for debugging
# zpool history -i users History for ’users’: 2008-07-10.09:43:05 zpool create users mirror c1t1d0 c1t2d0 2008-07-10.09:43:13 [internal create txg:6] dataset = 21 2008-07-10.09:43:13 zfs create users/marks 2008-07-10.09:43:48 [internal create txg:12] dataset = 27 2008-07-10.09:43:48 zfs create users/home 2008-07-10.09:43:55 [internal create txg:14] dataset = 33

Copyright 2009 Peter Baer Galvin - All Rights Reserved

92

Saturday, May 2, 2009

ZFS Delegate Admin
Use zfs allow and zfs unallow to grant and remove permissions delegation enabled Then delegate
# zfs allow cindys create,destroy,mount,snapshot tank/cindys # zfs allow tank/cindys ------------------------------------------------------------Local+Descendent permissions on (tank/cindys) user cindys create,destroy,mount,snapshot ------------------------------------------------------------# zfs unallow cindys tank/cindys # zfs allow tank/cindys

Use “delegation” property to manage if

Copyright 2009 Peter Baer Galvin - All Rights Reserved

93

Saturday, May 2, 2009

ZFS - Odds and Ends
zfs get all will display all set attributes of all ZFS file systems Recursive snapshots (via -r) as of S10 8/07 zfs clone makes a RW copy of a snapshot zfs promote sets the root of the file system to be the specified clone You can undo a zpool destroy with zpool import -D As of S10 8/07 ZFS is integrated with FMA As of S10 11/06 ZFS supports double-RAID parity
Copyright 2009 Peter Baer Galvin - All Rights Reserved
94

Saturday, May 2, 2009

ZFS “GUI”
Did you know that Solaris has an admin GUI? Webconsole enabled by default Turn off via svcadm if not used By default (on Nevada B64 at least) ZFS only on-by-default feature

Copyright 2009 Peter Baer Galvin - All Rights Reserved

95

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

96

Saturday, May 2, 2009

ZFS Automatic Snapshots
In Nevada 100 (LSARC 2008/571) - will be in OpenSolaris 2008.11 SMF service and GNOME app Can take automatic scheduled snapshots By default all zfs file systems, at boot, then every 15 minutes, every hour, every day, etc Auto delete of oldest snapshots if user-defined amount of space is not available Can perform incremental or full backups via those snapshots Nautilus integration allows user to browse and restore files graphically
Copyright 2009 Peter Baer Galvin - All Rights Reserved
97

Saturday, May 2, 2009

ZFS Automatic Snapshots (cont)

One SMF service per time frequency:
frequent hourly daily weekly monthly snapshots every 15 mins, keeping 4 snapshots snapshots every hour, keeping 24 snapshots snapshots every day, keeping 31 snapshots snapshots every week, keeping 7 snapshots snapshots every month, keeping 12 snapshots

Details here: http://src.opensolaris.org/source/xref/jds/zfssnapshot/README.zfs-auto-snapshot.txt

Copyright 2009 Peter Baer Galvin - All Rights Reserved

98

Saturday, May 2, 2009

ZFS Automatic Snapshots (cont)
Service properties provide more details zfs/fs-name The name of the filesystem. If the special filesystem name "//" is used, then the system snapshots only filesystems with the zfs user property "com.sun:auto-snapshot:<label>" set to true, so to take frequent snapshots of tank/timf, run the following zfs command: # zfs set com.sun:auto-snapshot:frequent=true tank/timf The "snap-children" property is ignored when using this fs-name value. Instead, the system automatically determines when it's able to take recursive, vs. non-recursive snapshots of the system, based on the values of the ZFS user properties. zfs/interval [ hours | days | months | none]

When set to none, we don't take automatic snapshots, but leave an SMF instance available for users to manually fire the method script whenever they want - useful for snapshotting on system events. zfs/keep How many snapshots to retain - eg. setting this to "4" would keep only the four most recent snapshots. When each new snapshot is taken, the oldest is destroyed. If a snapshot has been cloned, the service will drop to maintenance mode when attempting to destroy that snapshot. Setting to "all" keeps all snapshots. zfs/period How often you want to take snapshots, in intervals set according to "zfs/ interval" (eg. every 10 days)

Copyright 2009 Peter Baer Galvin - All Rights Reserved

99

Saturday, May 2, 2009

ZFS Automatic Snapshots (cont)
zfs/snapshot-children "true" if you would like to recursively take snapshots of all child filesystems of the specified fs-name. This value is ignored when setting zfs/fs-name='//' zfs/backup [ full | incremental | none ] zfs/backup-save-cmd The command string used to save the backup stream. zfs/backup-lock You shouldn't need to change this - but it should be set to "unlocked" by default. We use it to indicate when a backup is running. zfs/label A label that can be used to differentiate this set of snapshots from others, not required. If multiple schedules are running on the same machine, using distinct labels for each schedule is needed - otherwise oneschedule could remove snapshots taken by another schedule according to it's snapshot-retention policy. (see "zfs/keep") zfs/verbose Set to false by default, setting to true makes the service produce more output about what it's doing. zfs/avoidscrub Set to false by default, this determines whether we should avoid taking snapshots on any pools that have a scrub or resilver in progress. More info in the bugid: 6343667 need itinerary so interrupted scrub/resilver doesn't have to start over Copyright 2009 Peter Baer Galvin - All Rights Reserved
100

Saturday, May 2, 2009

ZFS Automatic Snapshot (cont)

http://blogs.sun.com/erwann/resource/ menu-location.png

Copyright 2009 Peter Baer Galvin - All Rights Reserved

101

Saturday, May 2, 2009

ZFS Automatic Snapshot (cont)

If life-preserver icon enabled in file browser, then backup of directory is available Press to bring up nav bar

Copyright 2009 Peter Baer Galvin - All Rights Reserved

102

Saturday, May 2, 2009

ZFS Automatic Snapshot (cont)
Drag slider into past to show previous version of files in the directory Then right-click on afile and select “Restore to Desktop” if you want it back More features coming

Press to bring up nav bar
Copyright 2009 Peter Baer Galvin - All Rights Reserved
103

Saturday, May 2, 2009

ZFS Status
Netbackup, Legato support ZFS for backup / restore VCS supports ZFS as file system of clustered services app runs on Most vendors don’t care which file system Performance as good as other file systems Feature set better
Copyright 2009 Peter Baer Galvin - All Rights Reserved
104

Saturday, May 2, 2009

ZFS Futures
Support by ISVs
Backup / restore
Some don’t get metadata (yet) Use zfs send to emit file containing filesystem

Clustering (see Lustre)

Performance still a work in progress Being ported to BSD, Mac OS Leopard Check out the ZFS FAQ at
http://www.opensolaris.org/os/community/zfs/faq/

Copyright 2009 Peter Baer Galvin - All Rights Reserved

105

Saturday, May 2, 2009

ZFS Performance
billm   Reply On Thu, Nov 17, 2005 at 05:21:36AM -0800, Jim Lin wrote: > Does ZFS reorganize (ie. defrag) the files over time? Not yet.

From http://www.opensolaris.org/jive/thread.jspa? messageID=14997

> If it doesn't, it might not perform well in "write-little read-much" > scenarios (where read performance is much more important than write > performance). As always, the correct answer is "it depends". Let's take a look at several cases: - Random reads: No matter if the data was written randomly or sequentially, random reads are random for any filesystem, regardless of their layout policy. Not much you can do to optimize these, except have the best I/O scheduler possible. Copyright 2009 Peter Baer Galvin - All Rights Reserved
106

Saturday, May 2, 2009

ZFS Performance (cont)
- Sequential writes, sequential reads: With ZFS, sequential writes lead to sequential layout on disk. So sequential reads will perform quite well in this case. - Random writes, sequential reads: This is the most interesting case. With random writes, ZFS turns them into sequential writes, which go *really* fast. With sequential reads, you know which order the reads are going to be coming in, so you can kick off a bunch of prefetch reads. Again, with a good I/O scheduler (which ZFS just happens to have), you can turn this into good read performance, if not entirely as good as totally sequential. Believe me, we've thought about this a lot. There is a lot we can do to improve performance, and we're just getting started.
Copyright 2009 Peter Baer Galvin - All Rights Reserved
107

Saturday, May 2, 2009

ZFS Performance (cont)
For DBs and other direct-disk-accesswanting applications There is no direct I/O in ZFS But can get very good performance by matching I/O size of the app (e.g. file system Oracle uses 8K) with recordsize of zfs This is set at filesystem create time
Copyright 2009 Peter Baer Galvin - All Rights Reserved
108

Saturday, May 2, 2009

ZFS Performance (cont)
NFS does sync writes Put the ZIL on another disk, or on SSD ZFS aggressively uses memory for caching Low priority user, but can cause temporary conflicts with other users Use arcstat to monitor memory use

The ZIL can be a bottleneck on NFS servers

http://www.solarisinternals.com/wiki/index.php/ Arcstat
Copyright 2009 Peter Baer Galvin - All Rights Reserved
109

Saturday, May 2, 2009

ZFS Backup Tool
Runs from a central host Scans clients for new ZFS filesystems

Zetaback is a thin-agent based ZFS backup tool

Manages varying desired backup intervals (per host) for full backups incremental backups Maintain varying retention policies (per host) Summarize existing backups Restore any host:fs backup at any point in time to any target host

https://labs.omniti.com/trac/zetaba
Copyright 2009 Peter Baer Galvin - All Rights Reserved
110

Saturday, May 2, 2009

zfs upgrade
On-disk format of ZFS changes over time Forward-upgradeable, but not backward compatible Watch out when attaching and detaching zpools Also “sent” not readable by older zfs versions
# zfs upgrade This system is currently running ZFS The following filesystems are out of upgraded, these filesystems (and any subsequent snapshots) will no longer versions. VER FILESYSTEM --- -----------1 datab 1 datab/users 1 datab/users/area51 filesystem version 2. date, and can be upgraded. After being ’zfs send’ streams generated from be accessible by older software

Copyright 2009 Peter Baer Galvin - All Rights Reserved

111

Saturday, May 2, 2009

Automatic Snapshots and Backups

Unsupported services, may become supported
http://blogs.sun.com/timf/entry/ zfs_automatic_snapshots_0_10 http://blogs.sun.com/timf/entry/ zfs_automatic_for_the_people

Copyright 2009 Peter Baer Galvin - All Rights Reserved

112

Saturday, May 2, 2009

ZFS - Smashing!

http://www.youtube.com/watch?v=CN6iDzesEs0&fmt=18
Copyright 2009 Peter Baer Galvin - All Rights Reserved
113

Saturday, May 2, 2009

Storage Odds and Ends
iostat -y raidctl fsstat

shows performance info on multipathed devices

is RAID configuration tool for multiple RAID controllers

file-system based stat command
name name chng 0 attr get 0 attr lookup rddir set 0 0 0 ops 0 52.0K 0 4.26M 0 35.4K 0 0 0 0 ops 0 read read ops bytes 0 0 write write ops bytes 0 0 0 0 ufs 0 proc 0 nfs

# fsstat -F new

file remov 0 0 0 0 0 0

0 26.0K 0 0

354 4.71K 1.56M 0 0 0

53.2K 1.02K 24.0K 8.99M 48.6K 0 0 0 2.94K 0 83 0 0 0 0

161K 44.8M 11.8G 23.1M 6.58G zfs 0 0 0 0 0 lofs

7.26K 2.84K 4.30K 31.5K 0 0 0 0 0 0 0 0 0 0 0 0 410 0 0 0

6 40.5K 41.3M 45.6K 39.2M tmpfs 0 0 0 0 33 11.0K 0 0 0 0 0 0 0 0 0 0 0 mntfs 0 nfs3 0 nfs4 0 autofs
114

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Saturday, May 2, 2009

Build an OpenSolaris Storage Server in 10 Minutes
http://developers.sun.com/openstorage/articles/opensolaris_storage_server.html
Example 1: ZFS Filesystem Objectives: Understand the purpose of the ZFS filesystem. Configure a ZFS pool and filesystem. Requirements: A server (SPARC or x64 based) running the OpenSolaris OS. Configuration details from the running server. Step 1: Identify your Disks. Identify the storage available for adding to the ZFS pool using the format(1) command. Your output will vary from that shown here:

# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0t2d0 /pci@0,0/pci1022,7450@2/pci1000,3060@3/sd@2,0 1. c0t3d0 /pci@0,0/pci1022,7450@2/pci1000,3060@3/sd@3,0 Specify disk (enter its number): ^D

Copyright 2009 Peter Baer Galvin - All Rights Reserved

115

Saturday, May 2, 2009

Build an OpenSolaris Storage Server in 10 Minutes - cont

Step 2: Add your disks to your ZFS pool.

# zpool create -f # zpool list NAME mypool SIZE 10G

mypool c0t3d0s0 AVAIL 10.0G CAP 0% HEALTH ONLINE ALTROOT -

USED 94K

Step 3: Create a filesystem in your pool.

# zfs create mypool/myfs # df -h /mypool/myfs Filesystem mypool/myfs size 9.8G used 18K avail capacity 9.8G 1% Mounted on /mypool/myfs

Copyright 2009 Peter Baer Galvin - All Rights Reserved

116

Saturday, May 2, 2009

Build an OpenSolaris Storage Server in 10 Minutes - cont
Example 2: Network File System (NFS) Objectives: Understand the purpose of the NFS filesystem. Create an NFS shared filesystem on a server and mount it on a client. Requirements: Two servers (SPARC or x64 based) - one from the previous example - running the OpenSolaris OS. Configuration details from the running systems. Step 1: Create the NFS shared filesystem on the server. Switch on the NFS service on the server:

# svcs nfs/server STATE disabled STIME 6:49:39 FMRI svc:/network/nfs/server:default

# svcadm enable nfs/server
Share the ZFS filesystem over NFS:

# zfs set sharenfs=on mypool/myfs # dfshares RESOURCE x4100:/mypool/myfs SERVER ACCESS TRANSPORT x4100 117

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Saturday, May 2, 2009

Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 2: Switch on the NFS service on the client. This is similar to the the procedure for the server:

# svcs nfs/client STATE disabled STIME 6:47:03 FMRI svc:/network/nfs/client:default

# svcadm enable nfs/client
Mount the shared filesystem on the client:

# mkdir /mountpoint # mount -F nfs x4100:/mypool/myfs /mountpoint # df -h /mountpoint Filesystem size used 18K avail capacity 9.8G 1% Mounted on /mountpoint x4100:/mypool/myfs 9.8G

Copyright 2009 Peter Baer Galvin - All Rights Reserved

118

Saturday, May 2, 2009

Build an OpenSolaris Storage Server in 10 Minutes - cont
Example 3: Common Internet File System (CIFS) Objectives: Understand the purpose of the CIFS filesystem. Configure a CIFS share on one machine (from the previous example) and make it available on the other machine. Requirements: Two servers (SPARC or x64 based) running the OpenSolaris OS. Configuration details provided here. Step 1: Create a ZFS filesystem for CIFS.

# zfs create -o casesensitivity=mixed mypool/myfs2 # df -h /mypool/myfs2 Filesystem size used avail capacity Mounted on 9.8G 1% /mypool/myfs2

mypool/myfs 2 9.8G 18K

Step 2: Switch on the SMB Server service on the server.

# svcs smb/server STATE disabled STIME 6:49:39 FMRI svc:/network/smb/server:default

# svcadm enable smb/server

Copyright 2009 Peter Baer Galvin - All Rights Reserved

119

Saturday, May 2, 2009

Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 3: Share the filesystem using CIFS.

# zfs set sharesmb=on mypool/myfs2
Verify using the following command:

# zfs get sharesmb mypool/myfs2 NAME mypool/myfs2 PROPERTY sharesmb VALUE on SOURCE local

Step 4: Verify the CIFS naming. Because we have not explicitly named the share, we can examine the default name assigned to it using the following command:

# sharemgr show -vp default nfs=() zfs zfs/mypool/myfs nfs=() /mypool/myfs zfs/mypool/myfs2 smb=() mypool_myfs2=/mypool/myfs2
Both the NFS share (/mypool/myfs) and the CIFS share (mypool_myfs2) are shown. Step 5: Edit the file /etc/pam.conf to support creation of an encrypted version of the user's password for CIFS. Add the following line to the end of the file:

other password required pam_smb_passwd.so.1 nowarn

Copyright 2009 Peter Baer Galvin - All Rights Reserved

120

Saturday, May 2, 2009

Build an OpenSolaris Storage Server in 10 Minutes - cont

Step 6: Change the password using the passwd command.
# passwd username New Password: Re-enter new Password: passwd: password successfully changed for root

Now repeat Steps 5 and 6 on the Solaris client. Step 7: Enable CIF client services on the client node.
# svcs smb/client STATE disabled STIME 6:47:03 FMRI svc:/network/smb/client:default

# svcadm enable smb/client

Copyright 2009 Peter Baer Galvin - All Rights Reserved

121

Saturday, May 2, 2009

Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 8: Make a mount point on the client and mount the CIFS resource from the server. Mount the resource across the network and check it using the following command sequence:
# mkdir /mountpoint2 # mount -F smbfs //root@x4100/mypool_myfs2 Password: ******* # df -h /mountpoint2 Filesystem size used avail capacity Mounted on 9.8G 1% / //root@x4100/mypool_myfs2 9.8G 18K mountpoint2 # df -n / /mountpoint /mountpoint2 : ufs : nfs : smbfs
Copyright 2009 Peter Baer Galvin - All Rights Reserved
122

/mountpoint2

Saturday, May 2, 2009

Build an OpenSolaris Storage Server in 10 Minutes - cont
Example 4: Comstar Fibre Channel Target

Objectives Understand the purpose of the Comstar Fibre Channel target. Configure an FC target and initiator on two servers. Requirements: Two servers (SPARC or x64 based) running the OpenSolaris OS. Configuration details provided here. Step 1: Start the SSCSI Target Mode Framework and verify it. Use the following commands to start up and check the service on the host that provides the target: # svcs stmf STATE disabled STIME FMRI

19:15:25 svc:/system/device/stmf:default

# svcadm enable stmf # stmfadm list-state Operational Status: online Config Status : initialized

Copyright 2009 Peter Baer Galvin - All Rights Reserved

123

Saturday, May 2, 2009

Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 2: Ensure that the framework can see the ports. Use the following command to ensure that the target mode framework can see the HBA ports:
# stmfadm list-target -v Target: wwn.210000E08B909221 Operational Status: Online Provider Name Alias Sessions : qlt : qlt0,0 : 4

Initiator: wwn.210100E08B272AB5 Alias: ute198:qlc1 Logged in since: Thu Mar 27 16:38:30 2008 Initiator: wwn.210100E08B296A60 Alias: ute198:qlc3 Logged in since: Thu Mar 27 16:38:30 2008 Initiator: wwn.210000E08B072AB5 Alias: ute198:qlc0 Logged in since: Thu Mar 27 16:38:30 2008 Initiator: wwn.210000E08B096A60 Alias: ute198:qlc2 Logged in since: Thu Mar 27 16:38:30 2008

Copyright 2009 Peter Baer Galvin - All Rights Reserved

124

Saturday, May 2, 2009

Build an OpenSolaris Storage Server in 10 Minutes - cont
Target: wwn.210100E08BB09221 Operational Status: Online Provider Name Alias Sessions : qlt : qlt1,0 : 4

Initiator: wwn.210100E08B272AB5 Alias: ute198:qlc1 Logged in since: Thu Mar 27 16:38:30 2008 Initiator: wwn.210100E08B296A60 Alias: ute198:qlc3 Logged in since: Thu Mar 27 16:38:30 2008 Initiator: wwn.210000E08B072AB5 Alias: ute198:qlc0 Logged in since: Thu Mar 27 16:38:30 2008 Initiator: wwn.210000E08B096A60 Alias: ute198:qlc2 Logged in since: Thu Mar 27 16:38:30 2008

Copyright 2009 Peter Baer Galvin - All Rights Reserved

125

Saturday, May 2, 2009

Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 3: Create a device to use as storage for the target. Use ZFS to create a volume (zvol) for use as the storage behind the target:
# zpool list NAME mypool SIZE 68G USED 94K AVAIL 68.0G CAP 0% HEALTH ONLINE ALTROOT -

# zfs create -V 5gb mypool/myvol # zfs list NAME mypool mypool/myvol USED 5.00G 5G AVAIL 61.9G 66.9G REFER 18K 16K MOUNTPOINT /mypool -

Copyright 2009 Peter Baer Galvin - All Rights Reserved

126

Saturday, May 2, 2009

Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 4: Register the zvol with the framework. The zvol becomes the SCSI logical unit (disk) behind the target:
# sbdadm create-lu /dev/zvol/rdsk/mypool/myvol Created the following LU: GUID DATA SIZE SOURCE /dev/zvol/rdsk/mypool/ 6000ae4093000000000047f3a1930007 5368643584 myvol Confirm its existence as follows: # stmfadm list-lu -v LU Name: 6000AE4093000000000047F3A1930007 Operational Status: Online Provider Name Alias : sbd : /dev/zvol/rdsk/mypool/myvol

View Entry Count

: 0
127

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Saturday, May 2, 2009

Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 5: Find the initiator HBA ports to which to map the LUs. Discover HBA ports on the initiator host using the following command:
# fcinfo hba-port HBA Port WWN: 25000003ba0ad303 Port Mode: Initiator Port ID: 1 OS Device Name: /dev/cfg/c5 Manufacturer: QLogic Corp. Model: 2200 Firmware Version: 2.1.145 FCode/BIOS Version: ISP2200 FC-AL Host Adapter Driver: Type: L-port State: online Supported Speeds: 1Gb Current Speed: 1Gb Node WWN: 24000003ba0ad303

Copyright 2009 Peter Baer Galvin - All Rights Reserved

128

Saturday, May 2, 2009

Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 5: Find the initiator HBA ports to which to map the LUs. Discover HBA ports on the initiator host using the following command:
# fcinfo hba-port HBA Port WWN: 25000003ba0ad303 Port Mode: Initiator Port ID: 1 OS Device Name: /dev/cfg/c5 Manufacturer: QLogic Corp. Model: 2200 Firmware Version: 2.1.145 FCode/BIOS Version: ISP2200 FC-AL Host Adapter Driver: Type: L-port State: online Supported Speeds: 1Gb Current Speed: 1Gb Node WWN: 24000003ba0ad303 . . .

Copyright 2009 Peter Baer Galvin - All Rights Reserved

129

Saturday, May 2, 2009

Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 6: Create a host group and add the world-wide numbers (WWNs) of the initiator host HBA ports to it. Name the group mygroup:
# stmfadm create-hg mygroup # stmfadm list-hg Host Group: mygroup

Add the WWNs of the ports to the group:
# stmfadm add-hg-member -g mygroup wwn.210000E08B096A60 \ wwn.210100E08B296A60 \ wwn.210100E08B272AB5 \ wwn.210000E08B072AB5

Now check that everything is in order:
# stmfadmlist-hg-member -v -g mygroup

With the host group created, you're now ready to export the logical unit. This is accomplished by adding a view entry to the logical unit using this host group, as shown in the following command:
# stmfadm add-view -h mygroup 6000AE4093000000000047F3A1930007

Copyright 2009 Peter Baer Galvin - All Rights Reserved

130

Saturday, May 2, 2009

Build an OpenSolaris Storage Server in 10 Minutes - cont
Step 7: Check the visibility of the targets on the initiator host. First, force the devices on the initiator host to be rescanned with a simple script:
#!/bin/ksh fcinfo hba-port |grep "^HBA" |awk '{print $4}'|while read ln do fcinfo remote-port -p $ln -s >/dev/null 2>&1 done

The disk exported over FC should then appear in the format list:
# format Searching for disks...done c6t6000AE4093000000000047F3A1930007d0: configured with capacity of 5.00GB
Copyright 2009 Peter Baer Galvin - All Rights Reserved
131

Saturday, May 2, 2009

Build an OpenSolaris Storage Server in 10 Minutes - cont
... partition> p Current partition table (default): Total disk cylinders available: 20477 + 2 (reserved cylinders)

Part 0 1 2

Tag root swap backup

Flag wm wu wu wm wm wm wm wm

Cylinders 0 512 511 1023

Size 128.00MB 128.00MB 5.00GB 0 0 0 4.75GB 0

Blocks (512/0/0) (512/0/0) 262144 262144

0 - 20476 0 0 0 1024 - 20476 0

(20477/0/0) 10484224 (0/0/0) (0/0/0) (0/0/0) (19453/0/0) (0/0/0) 0 0 0 9959936 0

3 unassigned 4 unassigned 5 unassigned 6 usr

7 unassigned

partition>

Copyright 2009 Peter Baer Galvin - All Rights Reserved

132

Saturday, May 2, 2009

ZFS Root
system (as does OpenSolaris)

Solaris 10 10/08 (aka S10U6) supports installation with ZFS as the root file Note that you can’t as of U6 flash archive a ZFS root system(!) Can upgrade by using liveupgrade (LU) to mirror to second disk (ZFS pool) and upgrading there, then booting there lucreate to copy the primary BE to create an alternate BE # zpool create mpool mirror c1t0d0s0 c1t1d0s0 # lucreate -c c1t2d0s0 -n zfsBE -p mpool The default file systems are created in the specified pool and the non-shared file systems are then copied into the root pool Run luupgrade to upgrade the alternate BE (optional)

rebooted, it will be the new primary BE # luactivate zfsBE

Run luactivate on the newly upgraded alternatve BE so that when the system is

Copyright 2009 Peter Baer Galvin - All Rights Reserved

133

Saturday, May 2, 2009

Life is good

Once on ZFS as root, life is good Mirror the root disk with 1 command (if not mirrored):

# zpool attach rpool c1t0d0s0 c1t1d0s0

Note that you have to manually do an installboot on the mirrored disk Now consider all the ZFS features, used on the boot disk Snapshot before patch, upgrade, any change Undo change via 1 command Replicate to another system for backup, DR ...
Copyright 2009 Peter Baer Galvin - All Rights Reserved
134

Saturday, May 2, 2009

ZFS Labs
What pools are available in your zone?
What are their states? What is their performance like?

What ZFS file systems? Create a new file system Create a file there Take a snapshot of that file system Delete the file Revert to the file system state as of the snapshot How do you see the contents of a snapshot?
Copyright 2009 Peter Baer Galvin - All Rights Reserved
135

Saturday, May 2, 2009

ZFS Final Thought
Eric Schrock's Weblog Thursday Nov 17, 2005 UFS/SVM vs. ZFS: Code Complexity A lot of comparisons have been done, and will continue to be done, between ZFS and other filesystems. People tend to focus on performance, features, and CLI tools as they are easier to compare. I thought I'd take a moment to look at differences in the code complexity between UFS and ZFS. It is well known within the kernel group that UFS is about as brittle as code can get. 20 years of ongoing development, with feature after feature being bolted on tends to result in a rather complicated system. Even the smallest changes can have wide ranging effects, resulting in a huge amount of testing and inevitable panics and escalations. And while SVM is considerably newer, it is a huge beast with its own set of problems. Since ZFS is both a volume manager and a filesystem, we can use this script written by Jeff to count the lines of source code in each component. Not a true measure of complexity, but a reasonable approximation to be sure. Running it on the latest version of the gate yields: UFS: kernel= 46806 user= 40147 total= 86953 SVM: kernel= 75917 user=161984 total=237901 TOTAL: kernel=122723 user=202131 total=324854 ZFS: kernel= 50239 user= 21073 total= 71312 The numbers are rather astounding. Having written most of the ZFS CLI, I found the most horrifying number to be the 162,000 lines of userland code to support SVM. This is more than twice the size of all the ZFS code (kernel and user) put together! And in the end, ZFS is about 1/5th the size of UFS and SVM. I wonder what those ZFS numbers will look like in 20 years...

Copyright 2009 Peter Baer Galvin - All Rights Reserved

136

Saturday, May 2, 2009

Copyright 2009 Peter Baer Galvin - All Rights Reserved

137

Saturday, May 2, 2009

Where to Learn More
Wikipedia: http://en.wikipedia.org/wiki/ZFS ZFS blogs: http://blogs.sun.com/main/tags/zfs ZFS ports

Community: http://www.opensolaris.org/os/community/zfs

Apple Mac: http://developer.apple.com/adcnews FreeBSD: http://wiki.freebsd.org/ZFS Linux/FUSE: http://zfs-on-fuse.blogspot.com As an appliance: http://www.nexenta.com Beginner’s Guide to ZFS: http://www.sun.com/bigadmin/
features/articles/zfs_overview.jsp
Copyright 2009 Peter Baer Galvin - All Rights Reserved
138

Saturday, May 2, 2009

Sun Storage 7x10

Copyright 2009 Peter Baer Galvin - All Rights Reserved

139

Saturday, May 2, 2009

Speaking of Futures

The future of Sun storage? Announced 11/10/2008

Copyright 2009 Peter Baer Galvin - All Rights Reserved

140

Saturday, May 2, 2009

Most Scalable Storage System Design
• Hybrid Flash Storage Pools
> Data is intelligently placed in
Read/ L2ARC SSDs

DRAM, Flash or DIsk > Transparently Managed as one storage pool > Optimizes $/GB and $/IOP performance

Write/ ZIL SSDs

HDD Pool (SATA)

• Enterprise Grade Flash
> 3-5 year lifetime

Sun Copyright 2009Confidential:BaerOnly Peter Internal Galvin - All Rights Reserved

10

141

Saturday, May 2, 2009

Latency Comparison
Bridging the DRAM to HDD Gap
1S 100mS 10mS 1mS 100uS 10uS 1uS 100nS 10nS 1nS

TAPE HDD FLASH/ SSD DRAM CPU
Sun Peter Internal Galvin - All Rights Reserved Copyright 2009 Confidential:BaerOnly

35

142

Saturday, May 2, 2009

ZFS Hybrid Pool Example
Based on Actual Benchmark Results
3.2x 4% 4.9x

2x 11%

Read IOPs

Write IOPs

Cost

Storage Power (Watts)

Raw Capacity (TB)

Hybrid Storage Pool (DRAM + Read SSD + Write SSD + 5x 4200 RPM SATA) Traditional Storage Pool (DRAM + 7x 10K RPM 2.5”)

Copyright 2009 Peter Baer Galvin - All Rights Reserved
Sun Confidential: Internal Only

143
12

Saturday, May 2, 2009

Full Compliment of Storage Software
Included with the system at no additional cost
Data Data Protocols Protocols
• • • • • • • • • • NFS v3 and v4 CIFS ISCSI HTTP WebDAV FTP NDMP v4 FC Target (Roadmap) InfiniBand (Roadmap) SNMP • • • • • • • • •

Data Data Services

Services

Additional Data Management Management
• DTrace Analytics • Self-healing system and data • Simple out-of-the-box setup • Secure Browser UI and CLI • Advanced Networking • NIS, LDAP, and AD • Users, Rolls • Dashboard • Alerts • Phone Home • Scripting • Upgrade

Write Flash Acceleration Read Flash Acceleration RAID-Z DP (6) Mirroring Striping Active-active Clustering Remote Replication Antivirus Quarantine Snapshots (r/o, r/w, unlimited) • Compression

Copyright 2009 Peter Baer Galvin - All Rights Reserved

144

Saturday, May 2, 2009

Sun Peter Baer Galvin - All Rights Reserved Copyright 2009 Confidential: Internal Only

27

145

Saturday, May 2, 2009

Providing Unprecedented Storage Analytics
• Automatic real-time visualization of application and storage related workloads • Simple yet sophisticated instrumentation provides real-time comprehensive analysis • Supports multiple simultaneous application and workload analysis in real- time • Analysis can be saved, exported and replayed for further analysis. • Built on DTrace instrumentation
> > > >

NFSv3, NFSv4, CIFS, iSCSI ZFS and the Solaris i/o path CPU and Memory Utilization Networking (TCP, UDP, IP)
Copyright 2009 Peter Baer Galvin - All Rights Reserved
Sun Confidential: Internal Only
7

146

Saturday, May 2, 2009

ANSWERING KEY QUESTIONS
“What is CPU and Memory Utilization?” “How much storage is being utilized?” “How is disk performing? How many Ops/Sec?” “What Services are active?” “Which applications/users are causing performance issues?”
Sun Copyright 2009Confidential: Baer Galvin - All Rights Reserved Peter Internal Only
8

147

Saturday, May 2, 2009

Data Services
ZFS - Continued
"

• ZFS Useable Space
Double Parity RAID Double Parity RAID Wide Stripes

Market Leading Usable Space
Mirrored Single Parity RAID Striped

72%
Saturday, May 2, 2009

83%

42%

60%

90%
148

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Sun Storage 7000 Unified Storage Systems
Price, Performance, Capacity and Availability
7410 Cluster
288 x 3.5” SATAII Disks Up to 287TB* total storage Hybrid Storage Pool with Read / Write optimized SSD

7410
288 x 3.5” SATAII Disks Up to 287TB* total storage Hybrid Storage Pool with Read and Write optimized SSD

Price

7210
48x 3.5” SATAII Disks Up to 46TB total storage Hybrid Storage Pool with Write optimized SSD

7110
16x2.5”SAS Disks, 2.3TB Standard Storage Pool SSD is not used

*Up to 575TB soon after release

Capacity / Performance
149

Copyright 2009 Peter Baer Galvin - All Rights Reserved

Saturday, May 2, 2009

References
You Are Now Free to Move About Solaris

Copyright 2009 Peter Baer Galvin - All Rights Reserved

150

Saturday, May 2, 2009

References
[Kozierok] TCP/IP Guide, No Starch Press, 2005  [Nemeth] Nemeth et al, Unix System Administration Handbook, 3rd edition, Prentice Hall, 2001  [SunFlash] The SunFlash announcement mailing list run by John J. Mclaughlin. News and a whole lot more. Mail sunflash-info@sun.com  Sun online documents at docs.sun.com  [Kasper] Kasper and McClellan, Automating Solaris Installations, SunSoft Press, 1995

Copyright 2009 Peter Baer Galvin - All Rights Reserved

151

Saturday, May 2, 2009

References (continued)
[O’Reilly] Networking CD Bookshelf, Version 2.0, O’Reilly 2002  [McDougall] Richard McDougall et al, Resource Management, Prentice Hall, 1999 (and other "Blueprint" books)  [Stern] Stern, Eisler, Labiaga, Managing NFS and NIS, 2nd Edition, O’Reilly and Associates, 2001

Copyright 2009 Peter Baer Galvin - All Rights Reserved

152

Saturday, May 2, 2009

[Garfinkel and Spafford] Simson Garfinkel and Gene Spafford, Practical Unix & Internet Security, 3rd Ed, O’Reilly & Associates, Inc, 2003 (Best overall Unix security book)  [McDougall, Mauro, Gregg] McDougall, Mauro, and Gregg, Solaris Internals and Solaris Performance and Tools, 2007 (great Solaris internals, DTrace, mdb books)

References (continued)

Copyright 2009 Peter Baer Galvin - All Rights Reserved

153

Saturday, May 2, 2009

References (continued)
Subscribe to the Firewalls mailing list by sending "subscribe firewalls <mailing-address>" to Majordomo@GreatCircle.COM  USENIX membership and conferences. Contact USENIX office at (714)588-8649 or office@usenix.org  Sun Support: Sun’s technical bulletins, plus access to bug database: sunsolve.sun.com  Solaris 2 FAQ by Casper Dik:

ftp://rtfm.mit.edu/pub/usenet-by-group/comp.answers/Solaris2/FAQ

Copyright 2009 Peter Baer Galvin - All Rights Reserved

154

Saturday, May 2, 2009

References (continued)

Sun Managers Mailing List FAQ by John DiMarco: Sun's unsupported tool site (IPV6, printing)
http://playground.sun.com/

ftp://ra.mcs.anl.gov/sun-managers/faq

Sunsolve STBs and Infodocs
http://www.sunsolve.com

Copyright 2009 Peter Baer Galvin - All Rights Reserved

155

Saturday, May 2, 2009

References (continued)

comp.sys.sun.* FAQ by Rob Montjoy: ftp://
rtfm.mit.edu/pub/usenet-by-group/comp.answers/comp-sys-sun-faq

“Cache File System” White Paper from Sun:
http://www.sun.com/sunsoft/Products/Solaris-whitepapers/Solariswhitepapers.html

“File System Organization, The Art of Automounting” by Sun:
ftp://sunsite.unc.edu/pub/sun-info/white-papers/TheArtofAutomounting-1.4.ps

Solaris 2 Security FAQ by Peter Baer Galvin
http://www.sunworld.com/common/security-faq.html

Secure Unix Programming FAQ by Peter Baer Galvin
http://www.sunworld.com/swol-08-1998/swol-08-security.html Copyright 2009 Peter Baer Galvin - All Rights Reserved
156

Saturday, May 2, 2009

References (continued)

Firewalls mailing list FAQ: ftp://rtfm.mit.edu/pub/usenet-by-group/ Comp.answers/firewalls-faq There are a few Solaris-helping files available via anon ftp at ftp://ftp.cs.toronto.edu/pub/darwin/ solaris2 Peter’s Solaris Corner at SysAdmin Magazine
http://www.samag.com/solaris

Marcus and Stern, Blueprints for High Availability, Wiley, 2000 Privilege Bracketing in Solaris 10
http://www.sun.com/blueprints/0406/819-6320.pdf
Copyright 2009 Peter Baer Galvin - All Rights Reserved
157

Saturday, May 2, 2009

References (continued)
Peter Baer Galvin's Sysadmin Column (and old Pete's Wicked World security columns, etc)
http://www.galvin.info

My blog at http://pbgalvin.wordpress.com Operating Environments: Solaris 8 Operating Environment Installation and Boot Disk Layout by Richard Elling
http://www.sun.com/blueprints

(March 2000)

Sun’s BigAdmin web site, including Solaris and Solaris X86 tools and information’
http://www.sun.com/bigadmin

Copyright 2009 Peter Baer Galvin - All Rights Reserved

158

Saturday, May 2, 2009

References (continued)
DTrace
http://users.tpg.com.au/adsln4yb/ dtrace.html http://www.solarisinternals.com/si/dtrace/ index.php http://www.sun.com/bigadmin/content/dtrace/

Copyright 2009 Peter Baer Galvin - All Rights Reserved

159

Saturday, May 2, 2009

Sign up to vote on this title
UsefulNot useful