You are on page 1of 6

Booting up Solaris 10 from a SAN replicated LUN on a different Sun SPARC server July 9, 2010 By Andrew Lin The

quickest way to recover from a total disaster is to have some sort of replic ation implemented. There are two different methods of real-time replication, har dware and software. My experiences with software replication such as Symantec Ve ritas Volume replicator for AIX was not pleasing. It required constant maintenan ce and troubleshooting. The best is hardware replication if you can afford it. A lot of organizations pick software replication as it generally cost a lot less up front, but the cost of maintenance eventually adds up. I will explain how to recover a Solaris 10 server from hardware replicated SAN d isk. It took me sometime to figure out how to boot up from the replicated SAN LU N (disk), and many more hours to understand why the steps I applied works. In this example I have a SUN SPARC M3000 server with two Qlogic fiber channel ca rds (HBA) installed in the PCI solts. The HBAs were already configured to connec t to the SAN disk (LUN). This LUN contained the replicated copy of a production Slaris 10 server. The production server had two ZFS pool residing in a single LU N. Using the Solaris 10 installation CD boot up the Sparc server into single user m ode. boot cdrom -s The first thing you need to do is to see if the HBAs are working. The connected status indictes that communication between the server and switch is working. The re are two HBAs installed for redundancy, both connected to the same LUN. # luxadm -e port /devices/pci@0,600000/pci@0/pci@8/SUNW,qlc@0/fp@0,0:devctl CONNECTED /devices/pci@1,700000/pci@0/pci@0/SUNW,qlc@0/fp@0,0:devctl CONNECTED Now you need to find out if the SAN disk is visible from the server. Even though both HBAs are connected to the same SAN disk, you will see two separate SAN dis ks in the results below. It just means there are two paths to the SAN. # luxadm probe No Network Array enclosures found in /dev/es Found Fibre Channel device(s): Node WWN:50060e80058c7b10 Device Type:Disk device Logical Path:/dev/rdsk/c1t50060E80058C7B10d1s2 Node WWN:50060e80058c7b00 Device Type:Disk device Logical Path:/dev/rdsk/c2t50060E80058C7B00d1s2 In the above example the first LUN is c1t50060E80058C7B10d1s2. This is the logic al device name which is a symbolic link to the physical device name stored in th e /devices directory. Logical device names contain the controller number(c2), ta rget number (t50060E80058C7B10), disk number (d1), and slice number (s2). The next step is to find out how the disk is partitioned, the format command wil l give you that information. You need this information to understand how to boot up the disk. # format Searching for disks done AVAILABLE DISK SELECTIONS:

c2t50060E80058C7B00d1 1066>/pci@1.700000/pci@0/pci@0/SUNW.0/ssd@w5 0060e80058c7b00. VTOC is also known as SMI label.1 Select the first disk 0.60GB (3357/0/0) 53678430 1 unassigned wm 0 0 (0/0/0) 0 2 backup wm 0 65532 499. Please note that you canno t boot from a disk with EFI label.0.0/ssd@w5 0060e80058c7b10. then return quit Display the labels and slices (partitions).qlc@0/fp@0. format> verify Primary label contents: Volume name = < ascii name = pcyl = 65535 ncyl = 65533 acyl = 2 nhead = 15 nsect = 1066 Part Tag Flag Cylinders Size Blocks 0 root wm 0 3356 25. product and revision volname set 8-character volume name ! execute .600000/pci@0/pci@8/SUNW. If the disk was labeled with EFI (Extensible Firmwar e Interface). In the below example you can tell that the disk is lab eled as VTOC (Volume Table of Contents) because you can see the cylinders. Partition 0 (slic e 0) holds the operating system files. the boot disk. Slice 2 is the entire physical disk because i t contains all cylinders.1 1. then you would see sectors instead of cylinders.07GB (62176/0/0) 994194240 .66GB (65533/0/0) 1047872670 3 unassigned wm 0 0 (0/0/0) 0 4 unassigned wm 0 0 (0/0/0) 0 5 unassigned wm 0 0 (0/0/0) 0 6 unassigned wm 0 0 (0/0/0) 0 7 unassigned wm 3357 65532 474. Specify disk (enter its number): 0 selecting c1t50060E80058C7B10d1 [disk formatted] FORMAT MENU: disk select a disk type select (define) a disk type partition select (define) a partition table current describe the current disk format format and analyze the disk repair repair a defective sector label write label to the disk analyze surface analysis defect defect list management backup search for backup labels verify read and display labels save save new disk/partition definitions inquiry show vendor. In Solaris each slice is treated as a separate physical disk. 0 65532.qlc@0/fp@0. c1t50060E80058C7B10d1 1066>/pci@0.

Now we n eed to find out the physical path for Slice 0. Take a note of the status and action. epool and rpool. then the import may fail with the error cannot import rpool : pool may be in use from other system.sun. # zfs list f f . There ar e two ZFS pools in the below example. # zpool import pool: epool id: 16865366839830765202 state: ONLINE status: The pool was last accessed by another system. Ignore the error message about failed to create mountpoint. # zpool import -af cannot mount /epool : failed to create mountpoint cannot mount /rpool : failed to create mountpoint List the imported ZFS pools.5G config: rpool ONLINE c2t50060E80058C7B00d1s0 ONLINE Import the pool with the zpool import command. List the ZFS pool contained on the disk using the zpool import command. Slice 0 on this disk contains the boot files.8G 22% ONLINE List the ZFS filesystems. it was last accessed by server name (hosti d: 123456) .sun. see: http://www. The options -a will import all th e ZFS pools it can find. # zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT epool 472G 260G 212G 55% ONLINE rpool 25.Here is what we know so config: epool ONLINE c2t50060E80058C7B00d1s7 ONLINE pool: rpool id: 10594898920105832331 state: ONLINE status: The pool was last accessed by another system. we know that the disk name is c1t50060E80058C7B10d1 . the -f option will force the import. action: The pool can be imported using its name or numeric identifier and the -f lag. see: http://www. I know that the disk contains ZFS filesystems because it is a replica of the pro duction disk. it resid es in rpool. In the example below notice the mountpoint / is mounte d on the zfs filesystem rpool/ROOT/zfsboot. This is the boot partition. action: The pool can be imported using its name or numeric identifier and the -f lag. When a ZFS filesystem is moved to a different SPARC server it must first be imported because the hostid is different. The physical path for this disk is /pci@0.1.600000/pci@0/pci@8/SUNW.75G 19.0/ssd@w50060e80058c7b10.qlc@0/fp@0. If you do not spec ify the force option.

74G 17.4G 4.4G 1. # cd /dev/dsk # ls c1t50060E80058C7B10d1s0 c1t50060E80058C7B10d1s1 c1t50060E80058C7B10d1s2 c1t50060E80058C7B10d1s3 c1t50060E80058C7B10d1s4 c1t50060E80058C7B10d1s5 c1t50060E80058C7B10d1s6 c1t50060E80058C7B10d1s7 c2t50060E80058C7B00d1s0 c2t50060E80058C7B00d1s1 c2t50060E80058C7B00d1s2 c2t50060E80058C7B00d1s3 c2t50060E80058C7B00d1s4 c2t50060E80058C7B00d1s5 c2t50060E80058C7B00d1s6 c2t50060E80058C7B00d1s7 As stated earlier the physical path for the disk we are looking for is /pci@0.4G 1.48G /mnt rpool/ROOT/zfsboot/var 139M 17.0/ssd@w50060e80058c7b10. You need to replace ssd@ with disk@ when entering the path into EEPROM.00G 19.0/ssd@w50060e80058c7b10.00G rpool/swap 2..4G 21K legacy rpool/ROOT/zfsboot 4. In the below example the boot path starts at the first slash (/) right after /devices.4G 99K /rpool rpool/ROOT 4./.48G / rpool/ROOT/zfsboot/var 139M 17.00G 17.74G 17.4G 4.74G 17.qlc@0/fp@0. # zfs set mountpoint=/mnt rpool/ROOT/zfsboot Confirm that the mountpoint was changed.600000/pci@0/pci@8/SUNW. It is /pci@0.74G 17./devices/pci@0. We can derive from the physical path that the disk name is 50060e80058c7b10. qlc@0/fp@0.1.00G 17.1:a. Therefore we can derive that the logical boot disk is c1t50060E80058C7B10d1s0. # zfs list NAME USED AVAIL REFER MOUNTPOINT epool 260G 205G 21K /epool rpool 7.4G 105M /var rpool/dump 1.0/ssd@w50060e80058c7b10.4G 99K /rpool rpool/ROOT 4.00G 19.74G 17. # zfs mount rpool/ROOT/zfsboot List the logical disks.00G rpool/swap 2..4G 105M /mnt/var rpool/dump 1.600000/pci@0/pci@8/SUNW.NAME USED AVAIL REFER MOUNTPOINT epool 260G 205G 21K /epool rpool 7.60 0000/pci@0/pci@8/SUNW.4G 21K legacy rpool/ROOT/zfsboot 4. # ls -l c1t50060E80058C7B10d1s0 lrwxrwxrwx 1 root root 82 Jul 5 10:21 c1t50060E80058C7B10d1s0 -> . We also know from the output of the format command that the physical disk 50060e80 058c7b10 maps to the logical disk c1t50060E80058C7B10d1.74G 17. Now find out what physical path c1t50060E80058C7B10d1s0 is a symbolic link for a nd that is your complete boot path.qlc@0/fp@0. The boot slice is 0.4G 16K Now mount rpool/ROOT/zfsboot.4G 16K Change the mountpoint for rpoo/ROOT/zfsboot to /mnt so you can mount it to read the contents.1 :a .

0 EVENT-ID: 33e5a9f1-49ac-6ebc-f2a9-dff25dea6b86 DESC: A ZFS device failed. # cd /mnt/etc/zfs # ls zpool. REV: 1. Below are the steps to rename the zpool. There are insufficient replicas for the pool to continue functioning. IMPACT: Fault tolerance of the pool may be compromised. AUTO-RESPONSE: No automated response will occur. VER: 1. HOSTNAME: Andrew-Lin SOURCE: zfs-diagnosis.48G / rpool/ROOT/zfsboot/var 139M scrub: none requested config: NAME STATE READ WRITE CKSUM epool UNAVAIL 0 0 0 insufficient replicas c3t60060E8005652C000000652C00002100d0s7 UNAVAIL 0 0 0 cannot open The above error is caused by the zpool.74G 17.cache file to speed up the boot sequence. You may get the below error.4G for more info rmation. Make sure that you change directory out of /mnt to /.cache # mv zpool. as in this example.74G 17.4G 99K /rpool rpool/ROOT 4.cache file.cache.74G 17. CSN: PX654321.sun. SEVERITY: Major EVENT-TIME: Mon Jul 5 11:54:14 EDT 2010 PLATFORM: SUNW.old Now you need to reverse the changed you applied to the mountpoint earlier. Refer to http://sun. SUNW-MSG-ID: ZFS-8000-D3. otherwise the set mountpoint co mmand will fail with the error device busy. TYPE: Fault.If you have more than one ZFS pool the non root pool may not get mounted upon bo oting up the server.cache file.4G 105M /var . action: Attach the missing device and online it using zpool online . REC-ACTION: Run zpool status -x and replace the bad device. # zfs set mountpoint=/ rpool/ROOT/zfsboot cannot mount / : directory is not empty property may be set but unable to remount filesystem Confirm that the mount points were changed.SPARC-Enterprise. You s hould delete this file and the system will recreate a fresh one during the boot up sequence. This file contains the old pa ths of the disks from the previous server. see: http://www.4G 21K legacy rpool/ROOT/zfsboot 4. http://sun.cache{root}: zpool status -x pool: epool state: UNAVAIL status: One or more devices could not be opened. The default behavior of Solaris 10 is to read the path from the zpool. # zfs list NAME USED AVAIL REFER MOUNTPOINT epool 260G 205G 21K /epool rpool 7. Ignore the cannot mount / : directory i s not empty message.

4G 16K Shutdown the server.00G rpool/swap 2.4G 1. {0} ok boot .0/disk@w50060e80058 c7b10.rpool/dump 1.00G 19.600000/pci@0/pci@8/SUNW. setenv boot-device /pci@0.qlc@0/fp@0. # init 0 Now set the boot device in EEPROM.1:a The server is ready to be booted with the boot command.00G 17.