Professional Documents
Culture Documents
T. O. I
Hand Book
07/24/06
Disclaimer
This can be best used as tool to get you in the right frame of
mind (product wise) when preparing to go on a call.
toi.handbook@east.sun.com
http://webhome.east/boston/toi.html
Table of contents
Desktop configurations: ........................................................................................................ 1
Firmware revision number: ................................................................................................... 1
OBP Escape hatches ........................................................................................................... 1
nvalias, NVRAMRC ........................................................................................................... 2
reset Host ID ..................................................................................................................... 2
Boot sequence ................................................................................................................... 2
Run Levels ........................................................................................................................ 2
Restore Boot Block ........................................................................................................... 2
E1000/2000 info ............................................................................................................... 3
E series info ..................................................................................................................... 4
OBP commands ................................................................................................................ 5
OBP device path breakdown ............................................................................................. 6
Device tree listing - desktop ............................................................................................... 6
E- 450 information ............................................................................................................. 7
E- 10000 information ......................................................................................................... 8
Blacklist ............................................................................................................................. 10
Sysyem Bd power proceedure ............................................................................................ 10
E 10k component numbering ............................................................................................... 11
Scsi Array Model 100 ........................................................................................................ 12
Model 200 Array ............................................................................................................... 13
ssaadm commands ............................................................................................................. 13
Replace WWN on SSA ................................................................................................... 14
A1000 Array ..................................................................................................................... 14
D1000 Array .................................................................................................................... 14
RSM Disk Tray ................................................................................................................ 15
A3000/3500 Array ........................................................................................................... 16
A5000 Array .................................................................................................................... 16
luxadm commands ............................................................................................................. 17
Disk replacment in Veritas ................................................................................................ 18
A5000 min configuration ................................................................................................. 18
A5000 addressing ............................................................................................................... 19
A5000 Target assignments ................................................................................................ 19
RDAC ................................................................................................................................ 19
Raid Overview .................................................................................................................... 19
Raid Levels ....................................................................................................................... 20
Boot process ....................................................................................................................... 20
Diagnostic commands .......................................................................................................... 21
Diagnostic Files ................................................................................................................... 22
Watchdog resets .................................................................................................................. 23
What to look for on a watchdog reset ................................................................................. 24
Dump Analysis ..................................................................................................................... 25
abd commands ..................................................................................................................... 26
crash commands ................................................................................................................... 27
kadb ..................................................................................................................................... 27
Sunsolve ............................................................................................................................... 27
SunVTS ............................................................................................................................... 28
STORtools ........................................................................................................................... 29
Explorer Scripts .................................................................................................................... 30
Performance Analysis tools ................................................................................................. 31
Backup .............................................................................................................................. 32
ufsdump .............................................................................................................................. 32
ufsrestore ........................................................................................................................... 32
tar ...................................................................................................................................... 33
cpio ................................................................................................................................... 33
dd .................................................................................................................................... 33
How to get a core dump on a 2.x server ............................................................................ 34
Dump device bad when saving core on encapsulated root ................................................ 36
Uncompressing Files ........................................................................................................ 39
T300 (purple) ...................................................................................................................... 40
ACT (A Coredump Tool) ................................................................................................... 44
Advantages of Splitting a Drive into Multiple File Systems ............................................... 46
How to configure a system to run on a network ................................................................... 48
SEVM - How to recover a primary boot disk ..................................................................... 49
Disable DMP .................................................................................................................... 51
Memory Scrubber ............................................................................................................... 52
Display remote App GUI locally.......................................................................................... 52
Cluster 2.x .......................................................................................................................... 53
Encapsulating root after using Environmental CD to load O/S .......................................... 56
Adding a second network interface ...................................................................................... 56
Adding a default gateway ..................................................................................................... 56
Volume Manager (general info) ........................................................................................... 57
FTPing to and from sunsolve ............................................................................................ 60
Serengeti 3800, 4800, 6800 ............................................................................................... 61
mounting CDROM without vold ........................................................................................ 67
mailx: send files/messages ............................................................................................. 67
StarCat 15k notes ................................................................................................. 68
local-mac-address .................................................................................................. 73
SDS- How to mirror root .............................................................................................. 73
IPMP .................................................................................................. 75
T3B or T3+ Firmware Rev 2.1 New Functions: ............................................................... 76
Hitachi StorEdge 99X0 Arrays: ...................................................................................... 77
SunFire forgotten password ........................................................................................... 78
StorEdge Network FC Switch ....................................................................................... 79
Hitachi 9900v notes ..................................................................................................... 81
Minnow 3300 Array .................................................................................................... 84
Tuning ecache scrubber scan rate ..................................................................................... 86
VxWorks (serengeti) ........................................................................................................ 86
LVD adapter information ................................................................................................. 87
Replaceing a nordica bd in a 15K SC .............................................................................. 87
Serengetti/15k DR boards ............................................................................................. 87
Clean up non-root disc “controler” numbers .................................................................... 88
Starcat Portid cheat sheet ................................................................................................. 88
Starcat SC clean slate ..................................................................................................... 89
Starcat redx info ............................................................................................................ 89
StorADE ................................................................................................................... 90
Get FRU info from serengetti .......................................................................................... 90
Swap ...................................................................................................................... 91
Maserati Notes- StorEdge 6320 and 6120 ....................................................................... 92
Flash Archive interactive install ..................................................................................... 93
UltraSPARC III CPU Diagnostic Monitor (CDM) ......................................................... 94
SunFire Service Mode Password Generator ................................................................... 94
V440 ALOM, raidctl .................................................................................................... 94
Finding Solaris release and distribution loaded .............................................................. 95
Find local NIS servers ................................................................................................... 95
Network troubleshooting command, files, daemons ..................................................... 96
How to find your way around a B1600 ................................................................ 97
Cluster 3.x ........................................................................................... 103
SMSupgrade 1.4.1 info ........................................................................................... 106
Solaris 9 SVM (sds) disk replacement ............................................................................ 107
SC rebuild after total disk failure ............................................................................ 108
15K DR / hpost examples .......................................................................................... 109
smsbackup: manually check a backup file: ..................................................................... 110
3310/3510 Disk replacement: ........................................................................................... 110
How to mount a CD image file (.iso) as a filesystem: ....................................................... 110
Removing the top cover on a V20z .................................................................................. 111
Explorer -w scextended with cron .................................................................................. 111
Useful COD commands ..................................................................................................... 111
ALOM4v Ontaeri/Erie(Niagra) ........................................................................................... 111
Forgotten password (ALOM4v) .......................................................................................... 113
Solaris to Linux cross reference .......................................................................................... 113
SSH information ............................................................................................................... 114
Galaxy ILOM info ............................................................................................................. 115
SSH with SMS 1.5 ............................................................................................................. 115
Desktop Configurations
L1-a (stop-a) (Ctrl Break)* To stop a process in OBP or to bring a system down in solaris (not reccomended)
L1-f (stop-f) enters command mode on ttya before probing H/W, use 'fexit' to continue with initialization
sequence.
L1-d (stop-d) Sets diag-switch? parameter to true. Enables verbose output durring post.
L1-n (stop-n) Resets NVRAM contents to defaults. (not reccomended. see 'nvrecover')
L1 (stop) Runs POST in INIT mode (does not depend on security mode)
1 ok show-disks
2 select a disk controller a,b,c
3 ok nvalias (alias name) (ctl-y) ..... control-y is the yank command, and will give you the path you
selected in the show-disks command.
You have to type sd@n,n for Sbus or disk@n for PCI at the end.
Page 1
To recover NVRAMRC, printenv, veritos
ok nvrecover (ctl-c)
ok nvstore
ok 17 0 mkp (return)
ok banner
Boot Sequence
1 Beep (keyboard)
2 Led's blink, screen goes blank, (POST)
3 Banner
4 Testing memory (selftest#mem)
5 Boot (auto-boot?)
6 diag-switch?
7 prom loads boot block (UFS reader)
page 2
Deskside server
Key switch
-standby no power
-on normal
-diag verbose post, on board, master bd (1000,2000)
-secure prevents a (stop-a) and disables reset switch
*auto master- if you replace any CPU/Mem cards put new card in slot 0
1 ok print-nvram-stat
If you need to change a CPU board, you do not need to do anything with the NVRAM. There is a copy
on the control board and it wil be automatically transfered..... If you need to change a control board you
must use the proceedure in the FE handbook (pg. cpu81) to invalidate the contents of nvram on the
new control board.
Page 3
Ultra Enterprise 3000 Information
2 power supplies
6 cpus
I/O board w/sbus, internal scsi adpter
clock board, clock, voltage monitor, reset, console (keep firmware)
CPU boards
Clock boards
501-2975 83mhz
501-4286 83mhz
501-4946 83-90-100 (x500 servers)
501-5365 83-90-100 (x500 servers, shipped with the E6500)
page 4
OPB commands:
banner a brief decsription of the system. mac address, firmware level, host ID
boot -v will verbose boot the system from defaults set in printenv list and devalias file.
boot -a will boot without the use of /etc/system file (interactive boot)
boot -s will boot in single user mode
boot (alias) will boot the server from the specified alias in the devalias file
cd / will put you in a directory hiearchy for listing hardware paths. 'device-end' gets
you out of this mode
devalias shows you a listing of your device aliases
limit-ecache-size will allow you to boot a 400mhz 8meg cache processor on os 2.5.1 or 2.6 CD
solaris 7 works fine. Jumbo patch 105181-14 for 2.6 or 103640-27 for 2.5.1
nvalias is used to create an alias
page 5
Command to reset the line in the envronment to defaults:
You might also be able to switch the Sbus-probe-list order to change the C# in c#t#d#s#.
convert to decimal
divide by 2
round down sbus slot lun#
| | |
/sbus@7,0/SunWfas@3,8800000/sd@1,0
| |
result is bd # target#
ultra 2
/upa/sbus/hme path to on-board network
/fas@e path to on-board scsi devices
ultra 5,10
/upa/pci@1f/apb/pci@1,0 path to pci slots 1-3
/upa/pci@1f/apb/pci@1,1/ide@3 path to cdrom and disk
/network@1,1 path to on-board network
/m64b path to on-board graphics adapter
/ebus@1 path to system devices
ultra 30
/upa/pci@1f,2000 path to pci slots 1(33/66mhz) - 4 (33mhz)
page 6
/upa/pci@1f,4000/scsi@3 path to on-board scsi devices
/network@1,1 path to on-board network (hme)
/ebus@1 path to system devices
ultra 60
/upa/pci@1f,2000 path to pci slots 1(33/66mhz) -4 (33mhz)
/upa/pci@1f,4000/scsi@3 path to internal scsi devices
/scsi@3,1 path to external scsi devices
/network@1,1 path to network (hme)
/ebus@1 path to system devices
Ultra 450
and
Ultra Enterprise 450
ok setenv disk_led_assoc add a pci adapter to printenv list to get entries into prtconf so you
can do the following proceedure:
1. To find a drive path on an ultra 450, get the path '/pci@6,40001# - - - - - - - - - - /sd@0,0
from the format command.
2. Change the 'sd' to 'disk' and '0,0' to 0
3. #prtconf -vp | grep 'c#t#d#. . . . . . . . . . . . . /disk@#
4. results will be the slot# and the disk# will tell you the drive.
Device tree listing ----- ----- ------ ---- ---- FE Handbook 1 cpu-126 and cpu-128
mfg-options is a NVRAM variable is a decimal value that sets up the system as a workstation or a server.
the UE 450 is currently not offered as a workstation.
ok setenv mfg-options 0 (workstation default) Ultra 450
ok setenv mfg-options 49 (server default) Ultra Enterprise 450
upa-port-skip-list is a NVRAM variable used to skip probing of upa ports, following upa ports are used:
page 7
obdiag is a command you can run for prom based diagnostics
pcIO-probe-list is an NVRAM variable used to control the probe order for onboard PCI devices (/pci@1f,4000)
pci-slot-skip-list is an NVRAM variable used to skip probing of PCI devices plugged into the backpanel slots
memory-interleave is a NVRAM variable that controls how OBP sets memory interleaving
env-monitor is a NVRAM variable that determins how OBP responds to envronmental monitoring via the l2c
serial bus.
/associations The associations tree node contains entries representing catigories of assosiations or connections
between system components that are dispersed in the device tree.
ex: ok cd /associations/slot2dev
ok .properties
ok cd /associations/slot2led
ok .properties
ok cd /associations/slot2disk
ok .properties
E10000
page 8
Recreate a domain that previously existed (domain_history file)
ssp:domain% domain_create -d domain
domain_switch will change the domain your ssp window is conected to.
domain_history Displays the contents of domain_history file (contains removed domain info)
autoconfig Must be run when adding a new revision of a board to the system
May also be required when moving a board to a new slot
Not required if all boards are the same revision level
(Do not run on a system board that is running the OS, or on the
centerplane when any domain is running the OS)
bringup boot the domain ex: # bringup -A off -l32 will bring system to the <ok> prompt
and run hpost at level 32 (7-128)
ex: # bringup will bring up system (autoboot)
Redlist:
$SSPVAR/etc/platform_name/redlist is an ASCII file that enables the system administrator
or root to restrict, from the SSP, the configuration of the host system. It lists components
that POST cannot touch, and whose state POST cannot change. Redlisted components are
also considered effectively black- listed. Never use redlisting if blacklisting will do.
1. Have the customer bring down all jobs on the domain in question.
Next, they need to either use the shutdown command or use the init0 command
to bring the system to the <ok> prompt.
2. After this has been done, go to the ssp login window. Login as ssp and (ssp password)
3. At the SUNW_HOSTNAME prompt, enter either the platform name or the name of the
existing domain
4. Issue the 'domain_status' command , this will list all the domains and system boards
associated with each domain.
5. Issue the 'domain_switch (domain name)' command , to get to the proper domain.
6. Use the 'power -off -sb #' (#= system board #) command , to power off the system board to
be removed. MAKE SURE THE YELLOW LEDS ARE OFF BEFORE REMOVING BOARD.
7. After completing the work on the system board and the board has been reinstalled, use the
'power -on -sb #' (#=system board#) command, to return the power to the system board.
8. Next use the 'bringup' command to autoboot or the 'bringup -A off' to stop at the <ok>
prompt.
Page 10
Component Numbering
Processors
component Solaris Hostveiw Post
System Board 0 - 15 SB 0 - 15 sysbd 0- 15
proc. Mod. 0-3 /SUNW,ultraSparc@0,0 00-63 proc0.0 - proc 15.3
| |
proc. in hex (0 - 3f ) sysbd#.proc#
I/O ( SBus)
Memory
Component Post
System board memory mem x.0
|
system bd.#.bank#
SSP: (notes)
/etc/netmasks should be: 10.0.0.0 255.255.255.0 (for private net or cb1 will not come up)
share cdrom to load VTS share -F nfs -o ro,anon=0 /cdrom/cdrom0/s0
3.4 commands:
showfailover: Shows you the failover status
showdatasync Shows you the datasync status (from main to spare)
setfailover on enables failover
force forces a failover to spare
off disables failover to spare
setdatasync backup backup files to spare
ssp_backup creates a ssp_backup.cpio file ex: # ssp_backup /var/tmp
ssp_restore restores ssp_backup.cpio file ex: # ssp_restore /var/tmp/ssp_backup.cpio
ssp_config float lets you change the hostname for the floating hostname (name should be in the hosts
files of both SSPs and also in /etc/ssphostname on the domains)
Page 11
SCSI Array
MODEL 100
POST Located in the top left corner. (circle with line at 12:00) indicates post is running
Service Under POST icon (wrench). Service is needed, always displayed with another icon
Controller Located to the right of service icon (looks like a se scsi icon). indicates a controller
problem
Alpha- POST - test codes and status value of failing test are flashed continuously.
numerics Normal operation - Four lsd's of world wide number
Controller errors - Panic code is flashed continuously, and controller icon is on
Fan Fan failure or heat problem
Battery Fast write cache Low NVRAM battery voltage, battery should be replaced.
Drive a small solid rectangle represents an avalible drive
fibre Fiber optic link state. Two link icons A and B. Switched on when link is
established.
POST codes
100/110mhz
|
Model 11/2
|
size of drives
Layout:
__ _________POWER SUPPLY_________
| |d0 |d0 |d0 |
| F |d1 |d1 | d1 |
| A |d2 |d2 | d2 |
| N |d3 T0 | d3 T2 | d3 T4 |
| |d4 |d4 | d4 |
| T |_________________________________|
| R | d0 |d0 |d0__________|____________
| A| d1 |d1 |d1 | |
| Y | d2 T1 |d2 T3 | d2 T5____|_________ |
| | d3 |d3 | d3 | | |
| | d4 |d4 | d4 | c0t5d0s?
|__|_________________________________|
MIRROR arrays.
*Use channel B first on controller Fiber to copper adapter, 1 port for each host.
Solstice Disk Suite: "md" devices... Can change /etc/vfstab and /etc/system to bypass and use raw device
Use command 'metastat -s' to tie "md' device name in vfstab to physical partition name.
# solstice & (will run the GUI)
MODEL 200
The Sparc storage array model 200 is a rack mount disk array controller. Up to six differential SCSI disk
trays can be connected to it. Each tray can hold up to six drives. Ports are numbered 0-5 right to left, top
to bottom.
NVRAM LED Gives info on the SSA NVRAM. Press the NVRAM button when the SSA
is off, if the NVRAM LED comes on, then there is data pending on the
NVRAM that must be flushed to disk using the fastwrite software
command.
NVRAM button Used to determine if there is any data pending on the SSA NVRAM
DIAG switch Used to set the diag level of the SSA. DIAG position for normal
diagnostics. DIAG EXT for extended diagnostics.
Reset switch Resets array... Do not press while array is in use.
SYS OK LED Gives info on SSA status. Blinking is running normally.( freq=activity)
Off is no power or hung. Solid On is power but hung.
D1000 disk tray is used in the Storage Edge A3500 RAID array. 5,8, or 15 D1000's can be used,
depending on the configuration. It uses the same disk tray as the A1000, but different controller. It
has 2 sets of scsi connectors, you can run 2 scsi busses into it and divide the drives or jumper the
busses together and have the array on one buss.
Scsi Id and Array Id are set on the rear DIP switch ( D1000 can be configured for 1 or 2 busses)
sw1: Disk Array 1 Id: up: drive IDs 8-11 or 8-13, Down: drive IDs 0-3 or 0-5
sw2: Disk Array 2 Id: up: drive IDs 8-11 or 8-13, Down: drive IDs 0-3 or 0-5
sw3: Drives Remote Start: up wait for scsi command, Down: check sw4
sw4: Drives Delay Start: up: Start with delay (id*12), Down: start at power-on
sw5: Reserved
Module ID switch (rear): Wheel switch used to ID unit (1-5) when used in an A3500
configuration.
Page 14
Disk Layout: D1000
Array2 | Array1
sw2: down | 0 1 2 3 4 5 | 0 1 2 3 4 5| sw 1: down
sw2: up | 8 9 10 11 12 13| 8 9 10 11 12 13| sw 1: up
Front veiw
RSM are used in the Storage Edge A3000 RAID array. Each A3000 contains 5 RSM disk trays
*** Scsi Id for the tray is set on the I/O board. setting of 0-6 or 8-14, 8-14 is required for the
RDAC module.
* Scsi Id for the SEN card is a wheel selection and should be set to 15 (F).
RSM
_____front veiw___________
*** | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
_________or_____________
| 8 | 9 | 10 | 11 | 12 | 13 | 14 |
target IDs
Leds/switches-
Disk leds:
Red-fault, Green I/O activity
Panel leds:
Power on/off switch
Power indicator (green)
Power module A and B fault (red)
Fan module warning (amber)
Fan module falure (red)
Over temp (red)
Reset Alarm (pbs)
page 15
A3000/A3500
A3000
- 56 inch rack.
- contains 5 RSM disk trays
- 1 RDAC Module
- each RDAC module has dual hot plug RAID controllers
A3500
- 72 inch rack
- contains 5, 7, 15 D1000 disk trays
- 1, 2, or 3 RDAC modules
- each RDAC module has dual hot plug RAID controllers
A5000 (photon)
Model #'s
A5000 - 14 7200 rpm Drive of 9.1GB each
A5100 - 14 7200 rpm Drives of 18.2GB each
A5200 - 22 10000 rpm Drives of 9.1 GB each
RAID Manager
Commands:
# /usr/lib/osa/bin/rm6 to run
# /usr/lib/osa/lad will give ctd#s, controller serial #s and lun configurations
# fwutil /usr/lib/osa/fw/aaaaaaaaa.apd cxtxdxs0 Downloads appware to a controller (halt all i/0)
# fwutil /usr/lib/osa/fw/bbbbbbbb.bwd cxtxdxs0 Downloads bootware to a controller (halt all i/0)
# raidutil - c (c#t#d#) - b battery age info for that controllers (A3x00)
- r to reset battery age after replacement (A3x00)
page 16
luxadm commands for the A5000
luxadm probe -p Display information about all attached A5000s. This will give you the
enclosure names
luxadm display Use the display subcommand to display enclosure or device specific info
enclosure info ex: # luxadm display mars-0
device info ex: # luxadm display mars-0,f3 (f3= front disk slot# 3)
luxadm inq Use the inquiry subcommand to display inquiry info for the enclosure or
specific disk
enclosure info ex: # luxadm inq mars-0
device info ex: # luxadm inq mars-0,f4 (f4=front disk slot#4)
laxadm led_blink Use the led_blink subcommand to start flashing the yellow led
associated with a specific disk.
ex: # luxadm led_blink mars-0,f2 (f2=front disk slot 2)
luxadm led_off Use the led_off subcommand to turn off the yellow LED
associated with a specific disk.
ex: # luxadm led_off mars-0,r3 (r3= rear disk slot#3)
luxadm power_off Use the power_off subcommand to set an enclosure or disk to
power save mode
enclosure ex: # luxadm power_off mars-0
disk ex: # luxadm power_off mars-0,f5 (f5=front disk slot#5)
luxadm power_on Use the power_on subcommand to set a drive or enclosure to
its normal power on state.
enclosure ex: # luxadm power_on mars-0
disk ex: # luxadm power_on mars-0,f1 (f1=front disk slot#1)
luxadm remove_device Use this subcommand to 'hot remove' a device or enclosure, when
removing failed disk units for replacement. Verbose output will
walk you thru the proceedure
enclosure ex: # luxadm remove_device mars-0
disk only ex: # luxadm remove_device mars-0,f6
luxadm insert_device Use the insert_device subcommand for 'hot' insertion of a new disk or
enclosure. Use after the remove_device command to replace a failed
drive with a new one. Verbose output will walk you thru the proceedure.
ex: # luxadm insert_device mars-0,f5
luxadm reserve Use the reserve subcommand to reserve the specified disk(s) for exclusive
use by the host from which the subcommand was issued.
ex: # luxadm reserve mars-0,f6
luxadm release The release command releases the drive from the reserve state
ex: # luxadm release mars-0,f6
luxadm enclosure_name Use the enclosure_name subcommand to change the enclosure name of
one or more A5000s
ex: # luxadm enclosure_name mars1 pluto2
(change from pluto2 to mars1)
luxadm download Use the download command to download a prom image to the
FEPROMs on an A5000 interface board. Stop all activity on this
connection before downloading firmware, the array will recycle
automatically after the download.
ex: # luxadm download -s mars-0 (will download firmware from
default file /usr/lib/locale/C/LC_MESSAGES/ibfirmware)
ex: # luxadm download -s -f /special/upgrade/ibfirmware.latest
mars-0
-f you can specify the file name and do not use the default
page 17
luxadm fcal_s_download Use the fcal_s_download command to download new fcode into ALL
the FC100-HA sbus cards or display the current versions of the fcode
in each FC100-HA Sbus card.
display: ex: # luxadm fcal_s_download
download:
ex: # luxadm fcal_s_download -f /usr/lib/firmware/fc_s/fcal_s_fcode
remove 1. # vxdiskadm
2. item 4 (Remove disk for replacement), Enter disk name, Remove another disk? n
3. item 11 (Disable (offline)a disk device) offline the same disk so it can be removed, q
4. # vxdctl enable (This will reconfigure DMP)
5. # luxadm remove_device mars-0,f0 (mars-0,f0 is enclosure name, diskslot#) return
(physically remove disk drive) (return)
replacement 6. # luxadm insert_device mars-0,f0 (mars-0,f0 is enclosure name, diskslot#) return
(physically insert new disk) return
7. # vxdctl enable (This will reconfigure DMP)
8. #vxdiskadm
9. item 5 (Replace a failed or removed disk) Enter disk name, enter c#t#d#, continue y,
replace another? n, quit q
10. from here you have a choice of 2 ways to complete this. (most of the time this is up to
the customer to do) read both before choosing.
1. make new disk spare and spare disk part of the RAID
14 disk array The minimum configuration system has drives in slots 3, 6 in front and drives in
0, 3, and 6 in the rear. No other configuration is authorized. As disks are added they
should be spaced to minimize gaps between disks.
22 disk array The minimum configuration system has drives in slots 0, 5 in front and drives in
0, 3, 6,and 10 in the rear. No other configuration is authorized. As disks are added they
should be spaced to minimize gaps between disks.
Page 18
A5000 Addressing
RDAC Module
RAID Overveiw
page 19
RAID LEVELS
RAID 0
RAID 0 is actually a AID (Array of Interconnected Disks) the R (redundant) part just isn't
here. RAID 0 is being able to put multiple physical disks together to make it appear as
one large virtual disk. There is no parity drives or parity stripes.
RAID 1
RAID 1 is an array that is mirrored. That means there are 2 sets of disks, every disk has a
counter part that is an exact copy. If one fails the other will take its place.
RAID 3
RAID 3 has striped data across multiple volumes and a dedicated parity drive. If one of the
drives should fail, it's data can be reconstructed from the parity drive.
RAID 5
RAID 5 has striped data across multiple volumes as RAID 3, but also has it's parity striped
across multiple volumes. RAID 5 is also able recover from a failed disk.
Boot process
page 20
Diagnostic commands:
pkginfo -l Will give you a description of all the packages (w/o pkg name) or one package (w pkg
name)
prtdiag Display system configuration and diagnostic information (/usr/platform/ 'uname -m'/sbin)
prtconf -v Get system device information from POST probe
prtconf -vp Device tree info and PROM version (OBP)
page 21
Diagnostic commands continued:
prtvtoc List the vtoc (disk label) of a disk drive ex: prtvtoc /dev/rdsk/c0t0d0s0
psrinfo -v Will give you processor information
prsadm - f (-n) - f Will allow you to offline a processor. - n will online a specified processor
/usr/ucb/ps -aux Lists processes in CP utilization desending order.
pwck checks the password file for inconsistencies
sar Analyse system performance information (must be initialized in /etc/init.d/perf)
showrev -p list currently installed patches; patchadd -p in solaris 2.6 and above
snoop (-s) display and analyse network traffic
strings Search object and binaryfiles for ASCII strings
sysdef Analyse device and software configuration information.
swap Add, delete and monitor system swap areas
sum Calculate and print a checksum value for a named file
sys-unconfig Enables you to change information entered during sysidtool phase of installation
tail -f Leave file open for reading and display what is there
tic Terminfo compiler; translates a terminfo file from source to compiled format
timex List runtime and system activity information during command execution
traceroute Show the route followed by packet transfered in a subnet environment
truss Trace system calls issued and used by a program or command
tunefs Modify file system parameters that affect layout policies
uname Print platform, architecture, operating system, and system node information.
vmstat Analyse memory performance statistics
who am i Display the effective current user name, terminal line and login time
xhost hostname allows graphical access to your host from the host specified in hostname
Diagnostic files
/etc/defaultdomain Name of the current domain, read and set at each boot by script /etc/init.d/inetinit
/etc/default/cron Determine logging activity for the cron daemon through specificationof the cronlog
variable
/etc/default/login Control root logins at the console through specification of the console varible and other
defaults.
/etc/default/su Determine /etc/hostname.le0 logging activity for the su command thru specification of
the sulog variable
/etc/dfs/dfstab List what distributed file systems will be shared at boot time
/etc/dfs/sharetab List currently shared NFS file systems
/etc/hosts Host file linked to /etc/inet/hosts
/etc/hostname.le0 Assign a system name, and through cross-referencing the /etc/hosts file, add an IP address
/etc/hostname.hme0 to a particular network interface
/etc/inetd.conf List information for network services that can be invoked by the inetd daemon
/etc/inittab Read by init daemon at startup to determine which rc script to execute; also contains
default run level.
/etc/minor_perm Specifies permissions to be assigned to device files
/etc/mnttab Display a list of currently mounted file systems
/etc/name_to_major Display a list of configured major device numbers.
/etc/netconfig Display the network configuration database read durring network initializeation and use
/etc/nsswitch.conf List the database configuration file for the name service switch engine.
/etc/path_to_inst List the contentents of the system device tree using the format of a physical device names
and instance numbers
/etc/protocols List known protocols used in conjunction with internet
/etc/release O/S release and date
/etc/rmtab List the current remotely mounted file systems
page 22
diagnostic files continued:
Watchdog Resets
CPU Watchdog Reset is initiated on a single processor machine when a trap condition occurs while traps
are disabled and register bit to enable traps is not set. The system tries to come down in a
deterministic state and traps to a reserved physical address
obpsym module should be loaded to maximize the amount of symbolic information available in the
PROM (obp) environment. Without this module, information is displayed without textual
information.
page 23
obp register commands - sun4m (used with watchdog reset analysis)
Note the number next to the OK prompt, which is the number of the CPU that hit the watchdog
reset (multi-processor only)
Solaris commands and files that can be used in watchdog reset analysis:
showrev -p
prtconf -v
pkginfo
/usr/ccs/bin/nm /dev/ksyms > symbol_file
/usr/platform/sun4u/sbin/prtdiag -v > prtdiag_file
/etc/system
/var/adm/messages
page 24
Dump analysis
Three debuggers:
adb: Assembly debugger. It is an interactive and general purpose utility and can be used to
examine files, and it provides a controlled enviroment for executing programs. By default
it does not supply a prompt.
if no info do this:
1. do a stack trace... $c
(this will give you a listing to use in step2)
2. get register pointer,
64 bit system 2nd value from 'die' ex: die (0x9, 0xf05246f4, 0x30, 0x326,...
32 bit system 2nd value from 'trap' ex: trap (0xf028a1d8, 0xf05246f4, ...
(use this value in step 3)
3. get values in register 'pc'
0xf05246f4$<regs
(use the value under the pc heading for step4)
ex: pc
fc479dbc
page 25
To find thread involved with panic:
adb commands
cpu$<cpus Display cpu0 which contains the address of the currently running thread.
$<msgbuf Display the msgbuf structure, which contains the console messages
leading up to the panic.
$C Show the call trace, and stack trace leading up to a panic from the bottom
up.
$r Display the SPARC window registers, including the program counter and
the stack pointer
<sp$<stacktrace Use the sp(stack pointer) address to locate and display a detailed
stacktrace
$q Quit adb
page 26
crash similar to adb, but the command interface is different. Crash is used to examine memory of a
running or crashed system.
crash commands:
u or user will give info on the process that was running when the crash occured
defproc will give you the current process slot number (used with proc command)
kadb is similar to adb. It must be loaded prior to the standalone program it is to debug. To run the
kernel under kadb type 'boot kadb' at the ok prompt
iscda is Initial System Crash Dump Analysis... The script is included on the sunsolve CD under
the top level directory ISCDA. The following is an example of usage:
# cd /var/crash/machine_name
# iscda unix.0 vmcore. 0 > /tmp/iscda.output
This will run the iscda script on the core dump in /var/crash/machine_name. The output
will go to /tmp/iscda.output. The output will consist of the results from a sequence of adb
and crash commands. If needed, you can send this file to the Sun solution center via Email.
SunSolve
The Sunsolve CD is a valuable tool in diagnosing problems. The following are home page
selections:
Power search provides a menu driven database selection for searching, Bug reports,
FAQ's, Patch descriptions, tech bulletins, Info docs, Symptom and
Resolutions
Patch Diag Tool Determines the patch level of your system compared to Sun's reccomended
patch lists. can be run by cli # patchdiag
Page 27
Crash Dump Analysis Displays how to load and run the ISCDA script. (Initial System
Crash Dump Analysis)
Sun Courier Submits a service request to Sun solution center. (sendmail must be
running)
Sun VTS is validation test suite. VTS is run at the Solaris level, but should not be run
while the customer's applications are up. VTS comes with the Solaris package, there
are different revisions for Solaris Releases, rev 2.12 for Solaris 2.6, 3.0 for Solaris 7
and 3.4 for Solaris 8. It is reccommended to use the version of VTS that corresponds to the O/S
you are running. Also check sunsolve for related patches.
# cd /cdrom/cdrom0/Product
# pkgadd -d . SUNWvts SUNWvtsx SUNWodu SUNWvtsmn
or
# /cdrom/cdrom0/installer or run thru file manager window
sunvts -t Navigation: (the <ctl> keys are good if you forgot to set the TERM)
<tab> move between windows
<ctl> w move between windows
<arrow> move within window
<ctl> r move within window on same line
<ctl> u move within window up/down lines
<ctl> f move within window forward
<ctl> b move within window backwards
<ctl> l refresh screen
<esc> close pop- up menu
<space> select / deselect test
Page 28 <enter> select function
STORtools
- Revision Checking
- Configuration Management
- Monitoring and Notification
- Troubleshooting and Fault Isolation
To run STORtools
# /opt/STORtools/bin/stormenu
page 29
Explorer Scripts:
New Version:
The new version of explorer can be found on Sunsolve under "navigation - diagnostic tools"
It is now a software package (SUNWexplo) and can be installed and run (initially) with the
pkgadd - d command.
To expand: # zcat SUNWexplo.tar.z | tar xvf -
to install: # pkgadd - d . SUNWexplo
Old Version:
The following is documentation sent out with the explorer script. It contains information
on how to expand, run and mail the output from the explorer.
1. #su root
2. Save the explorer.tar.Z file in directory where root has write permission
3. for encoded files :
#uudecode filename
#zcat explorer.tar.Z | tar xvf -
4. #./explorer
-While executing this script, you will be prompted to enter information about your site.
- If you have internet access, we ask that you enter "y" to the question Would you like
to e-mail results [y/n]" so that we get the output automatically.
- If you choose not to e-mail the explorer file automatically, please send the resulting file
(*.uu) as an attachment to your PTAS account manager.
5. If you choose not to email the explorer file automatically (-mail option)
please send the resulting file (*.uu) as an attachment to your PTAS Account
manager.
Note: if crontab -e does not work correctly, try setting the following variable
'setenv EDITOR vi'
Tools: (commands)
for write performance: (this will write over data. do not use if data is needed on this disk)
# dd if=/dev/zero of=/dev/rdsk/cxtxdxs2 bs=1024k
for read performance:
# dd if=/dev/rdsk/cxtxdxs2 of=/dev/null bs=1024k
# iostat -pxn 5
page 31
Backups
ufsdump backs up all files specified by files_to_dump (normally either a whole file
system or files within a file system changed after a certain date) to
magnetic tape, diskette, or disk file. Filesystems to be backed up
must be inactive (unmounted or single user mode)
0-9 dump level, 0 is full dump. It is relative to what has been backed
up. If a level 2 was done then level 4 backup was done the next day.
If the next day you did a level 5 all modified files since level 4 would
be backed up.... If instead you did a level 3 backup all modified files
since the level 2 would be backed up.
c cartridge. Sets the defaults for cartridge instead of the standard
half-inch reel.
f Dump file. Use dump_file as the file to dump to, instead of
/dev/rmt/0. If dump_file is specified as -, dump to standard output.
u update the dump record. Add an entry to the file /etc/dumpdates.
v verify. After each tape or diskette is written, verify the contents
of the media against the source file system.
ufsrestore ufsrestore utility restores files from backup media created with the
ufsdump command.
page 33
How is a Coredump Generated?
When a system crashes, it writes a copy of its memory to a temporary location on a disk, usually to the
primary swap partition. Savecore is a program which runs at boot time to retrieve the memory copy
from the temporary location and to save it to a place where it can be accessed. Savecore must be run
during the bootup process, or very shortly thereafter, before it would be overwritten by a
running operating system which uses the primary swap partition for other purposes.
b) Wait until the access lamp goes out in the CDrom drive.
2) Determine how much memory you have on your system. This can be done by:
a) examining your system banner if your system is down by typing "banner" at the "OK" prompt.
b) doing a "wsinfo" on a 2.x system running openwindows, and checking the "physical memory" column.
c) looking at the /var/adm/messages file, or output of the dmesg command, and searching for the line
which starts with "mem =". The number which follows will be in bytes. Divide by 1048576 to
get megabytes.
3) Find any locally mounted partition, other than /tmp, which has enough room to hold the coredump. A
coredump takes usually about 35% of the size of main RAM memory.
4) Verify that your dump area is at least 35% of the size of main RAM memory. A regular disk is prefered
to a meta-filesystem running under Veritas or DiskSuite control. The dump area is usually the primary
swap file.
Execute a "swap -l" command and observe the first line with values in it. Take the number in the
"blocks" column and divide by 2048. This is the number of megabytes in the primary swap file.
Compare this to the size of main RAM memory found in step (2) above.
Page 34
5) Enable savecore as follows: (Savecore is enabled by default in Solaris 2.7.)
c) ( optional if you don't want the core copied to the /var or if /var
wasn't large enough)
Substitute the name of the partition found in (3)
above for "/var" wherever it shows in the statements in (i) above.
Incidentally, if you know that savecore is enabled but do not know where
the corefiles are put, checking the "savecore" statement listed above
will tell you.
Page 35
Dump device bad when saving core on encapsulated root
Problem:
Systems with VxVM encapsulated boot disks will not be able to do system dumps if the swap
slice is not tagged as swap. With the root drive encapsulated, if the system tries to do a
system dump in the event of a panic, it may present messages similar to the following:
Problem Solution:
If the swap slice was not tagged as swap in format when the root
drive was encapsulated, the encapsulation process will zero out
the swap slice when it makes the swap volume:
When the system dumps, it need to use the physical device and not the swap volume. The dump
fails because slice 1 shows a zero size in format.
To solve the dump dev problem, you need to go into format and edit
slice 1, change the tag to swap, and give it the start and end
cylinders.
Page 36
To get the end cylinder, you need to look in /etc/vx/reconfig.c/disk.d/c?t?d?/vtoc:
# cd /etc/vx/reconfig.d/disk.d/c0t0d0
# more vtoc
partition> p
Current partition table (unnamed):
Total disk cylinders available: 2733 + 2 (reserved cylinders)
partition> 1
partition> p
page 37
Part Tag Flag Cylinders Size Blocks
0 root wm 0 - 67 50.47MB (68/0/0) 103360
1 swap wm 68 - 469 298.36MB (402/0/0) 611040
2 backup wm 0 - 2732 1.98GB (2733/0/0) 4154160
3 usr wm 473 - 877 300.59MB (405/0/0) 615600
4 var wm 878 - 1012 100.20MB (135/0/0) 205200
5 unassigned wm 0 0 (0/0/0) 0
6 - wu 0 - 2732 1.98GB (2733/0/0) 4154160
7 - wu 2732 - 2732 0.74MB (1/0/0) 1520
partition> q
page 38
Uncompressing Files:
Use the 'file (file_name)' command to determine what type of compression was used.
Ex: # file 2.6_x86_Recommended.tar.gz
2.6_x86_Recommended.tar.gz:
gzip compressed data - deflate method , original file name
*.tar.Z files use the 'zcat (file_name.tar.Z) | tar xvf -' command
Ex: # zcat explorer.v.3.1.0.tar.Z | tar xvf -
*.tar.gz files use the 'gzcat (file_name.tar.gz) | tar xvf -' command
Ex: # gzcat 2.6_x86_Recommended.tar.gz | tar xvf -
you can also use the 'gunzip' command but that will result in a *.tar file and
you will have to use the 'tar - xvf (file_name.tar)' command to expand it
****NOTE: It is a good idea (due to the locations of these commands) to have them on a floppy
or CD that you can bring on-site. *****
page 39
T300 (purple): Also see page 67
Description:
The T300 array is a hardware RAID FCAL device. As such please make sure all firmware
and patches are up to date. You can use STORtools* to exercise and troubleshoot the product.
The T300 also has a com (rs232) port so you can tip into it and a ethernet port so you can
use telnet, ftp, tftp boot, or administer it through Component Manager.
The T300 has an EP (extended Prom) boot that runs post and has its own set of commands
and also runs a limited function unix O/S called PSOS, (accessed thru tip or telnet). PSOS
can be run from the reserved area on the array drives or tftp can be used to load it from
the server.
*STORtools will only test to the MIA on the T300 product line.
Partner group
Two T310s cabled together through the UICs. The cables coming from the 2 dot (OUT ..)
ports on the UIC designate the primary array. The other array (uic 1 dot IN ) becomes the
secondary array. Only 2 T300s can be in a partner group at this time. In a partnered group
with 2 fiber paths, the server will access the LUNs thru both paths, top array LUNs
thru top array controller and bottom array LUNs thru bottom array controller. If something
happens to one of the controller then the LUNs will failover to the remaining controller.
PCU (Power Cooling Unit) battery good only 2 years, messages in syslog 45 days prior
to expiration once PCU is unplugged you have 30 min to change before array starts
a shutdown sequence. Array requires 3 fans to stay below critical temp.
UIC (Unit Interconnect Controller) verify status thru fru stat. Once UIC is removed you
have 30 min to change before array starts a shutdown sequence.
Raid Controller is only redundant in a partner group. Also needs to have some type of DMP
running (veritas) to fail over and have the server be able to access the disks on the
failed array.
page 40
T300 (continued) Also see page 67
Disk(s) Numbered 1 - 9 left to right while facing front of array. Pull disk out ( use
spring loaded latch handle) one inch, wait 30 seconds then remove from array.
Once Disk drive is removed you have 30 min to change before array starts
a shutdown sequence.
MIA Media Interface Adapter (fiber to copper connection) is only redundant in a partner group.
Also needs to have some type of DMP running (veritas) to fail over and have the server
be able to access the disks on the failed array.
LEDs: ( in general, for specific info see pg 6-9 & 6-15 install and admin manual)
Solid | Blinking
Green: normal status | system activity
Amber: Fru is being initialized | Fru failure (controller, uic, pcu, disk)
Path:
****Use format, scsi, inquiry, mode bytes, 10 = primary path 30 = secondary path ****
****You will cause a LUN failover if you try to access the secondary path LUNSs through *****
****low level commands like format and dd in a partner group*****
convert to decimal
divide by 2 Volume on array
round down sbus slot LUN (port listmap)
| | |
sbus@1f,0/SUNW,socal@1,0/sf@1,0/ssd@w50020f2300000a06,1:a
| | | |
result is I/O bd # Loop connection WWN# slice a = 0
d = on board soc+ port on the HBA last 6 digits
0 = port A are from
1 = port B mac address
(set command)
T300 Boot:
page 41
T300 (continued) Also see page 67
TFTP BOOT: (if chasis is swapped enter new mac address into /etc/ethers file of tftp server)
On Server:
1. Modify /etc/hosts file on server with ip and name of array
2. Modify (create) /etc/ethers file on server with mac and array name
3. Create /tftpboot directory and copy nbxxx.bin (psos) to it
4. Un comment '#tftp' in /etc/inetd.conf
5. kill -HUP inetd PID#
6. ps -ef | grep in.rarpd (should be running... restart if tftp doesn't work)
On Array:
7. Modify Bootmode to tftp (:/:set bootmode tftp)
8. Modify tftphost to server's IP (:/: set tftphost xxx.xxx.xxx.xxx)
9. Modify tftpfile to nbxxx.bin (step#3) (:/: set tftpfile nbxxx.bin)
10. Modify IP to ip assigned to your array ** (:/: set ip xxx.xxx.xxx.xxx)
11. Reset array
** if rarp is working, array should get IP from server, If IP is assigned thru "set"
command than array will go to the 'who is tftphost' phase of tftpboot.
Add a volume (lun) to a array: (:/: sys blocksize (n)k should be set to correct value before 'vol add')
T300 useful commands: (use the 'help' command to get specific switches)
File management:
mkdir, rmdir, cd, pwd, touch, cat, more*, tail ,rm, mv, telnet, ftp**
*more command use q=quit, f= forward, b= backward
** ftp requires a password on the root account
vol commands:
vol list, vol add, vol remove, vol init, vol mount, vol unmount, vol mode,
vol verify, vol stat.
Firmware upgrading: (strongly recommended to have array "out of use" before upgrading
firmware. This includes disable polling from Component manager)
FTP firmware files to / on the array. At this moment the files can be found at
http://icode.ebay but in the future they will be available on sunsolve Patch 109115.xx.
/syslog Array error log file, 1Meg in size. Then gets copied to .old
/syslog.old backup to syslog
/etc/syslog.conf Configures where to send error messages
123456
page 43
ACT ( A Crashdump Tool)
ACT is a tool that can be run against a core dump or live system. It generates a report that gives you
server state information based on the core. ACT should be run on the server that panicked or should
at least be run on a server that has the same O/S version as the core that is being analysed. The
engineers that maintain ACT recommend you give it to your customers and have them install it on
their servers. When a core dump is produced they can run it on the core and forward the output
to the solution center, because it is much smaller than the core it will save time in transmission.
Act is supposed to become the standard output that all centers will accept.
# gunzip CTEact.tar.gz
(this will create a CTEact.tar file)
# tar -xvf CTEact.tar
(this will explode the CETact directory)
# pkgadd -d . CTEact
(will install the package into /opt/CTEact)
(answer install questions, I selected 'n' for mailout option)
(executable is /opt/CTEact/bin/act)
Examples:
# ./act -l (output on live server to screen)
# ./act -l -s /tmp/dir/ (output from live server to seperate files)
# ./act -d /var/crash/hostname/vmcore.0 -s /tmp/dir/ (output core
file to seperate files in /tmp/dir)
# ./act -d /var/crash/hostname/vmcore.0 > /tmp/act_out (output core
file to file /tmp/act_out)
ACT was conceived and developed by Steve Cumming, while working for what was
SunService and then while working for SMCC European CTE. After a short
illness Steve died on July 12th 1998.
page 44
Installation
ACT now resides in package format for both x86 and sparc,so pkgadd should be
used for installation. To check on the current version click Here.
By installing one of the packages below ACT will be installed for the
appropriate architecture and version of Solaris you are running and a new
RC script will be installed which will configure savecore and run ACT
against the newly generated crash dump upon system reboot.
Or alternatively if you have KENV installed then you can tar the following
over kenv in order to update Kenv with the latest version ACT.
Instructions
ACT takes the following options, options may appear in any order :
-d corefile
ACT assumes that the file corefile contains the kernel core image.
This file could be /dev/mem if you want ACT to analyze the running
system.
-l
Should be used when running act on a live system.
-n namelist
ACT assumes that the file namelist contains a valid kernel
namelist. This file could be /dev/ksyms if you want ACT to
analyze the running system.
-s directory
Tells act to split its output into several files writing the data
into the directory specified to aid readability. The files created
are,the names speak for themselves:-
-u
Displays stack information in an alternate form
-z
This informs ACT to display timezone information in localtime
rather than GMT
page 45
Advantages of Splitting a Drive into Multiple File Systems (info doc 14622)
Rather than using an entire disk drive for one file system, which may lead to inefficiencies and
other problems, you can split a single drive into sections. The sections are called slices, as
each is a slice of the disk's capacity. Once the partition has been allocated, it becomes the a logical
disk drive. A disk can be split into eight subdisks. The splitting of the disk is often called partitioning
or labeling of the disk drive. Below is an example:
partition>
Here are some of the reasons for multiple filesystems on one hard drive.
1. Damage Control: If the system were to crash due to software error, hardware failure,
or power problems, some of the disk blocks might still be in the file system cache and not
have been written to disk yet. This can cause damage to the filesystem structure. While the
methods used try to reduce this damage, and the FSCK utility can repair most of the damage,
spreading the files across multiple filesystems minimizes the possibility of damage, especially
to those files that are needed during boot-up. When the files are split up across the disk
slices, critical files end up on slices that rarely change or are mounted read-only and never
change. The chances of them being damaged and preventing you from recovering the remainder
of the system are greatly reduced.
3. Space Management: Files are used from a reserve of free space on a per-file system basis.
If, for example, a user has allocated a large amount of space, depleting the free space, and the
entire system disk were a single filesystem, there would be no free space left for critical system
files. The entire system would freeze when it ran out of space.
Using separate filesystems, especially for user files, allows only that a single user, or group of
users, to be inconvenienced when filesystem becomes full. The system will continue to operate,
allowing the System Administrator to handle the problem. The exception to the above scenario is
the root filesystem.
4. Performance: The larger the filesystem, the larger the tables that must be managed.
As the disk fragments and space become scarce, the further apart the fragments of a file
might be placed on the disk. Using multiple (smaller) partitions reduces the absolute distance
and keeps the sizes of the tables manageable. Although the UFS file filesystem does not suffer
page 46
Advantages of Splitting a Drive into Multiple File Systems (cont.)
from table size an fragmentation problems as much as System V file systems, this is still a
concern.
5. Backups: Many of the back-up utilities, such as "ufsdump" work on a complete filesystem basis.
If a filesystem is large, it could take longer than you want to allocate to back-up. Most importantly,
multiple smaller backups are easier to handle and recover from.
Below is a listing of slices, some that are required, root and swap, and the recommended additional
slices such as usr, var, opt, home and tmp.
1. The root slice: The root slice is mounted at the top of the filesystem hierarchy. It is mounted automatically
as the system boots, and cannot be unmounted. All other file systems are mounted below the root.
The root partition typically runs on between 15 and 30mb. It is usually placed on the first slice of
the disk, or more commonly know as slice 0 or a.
2. The swap slice: The default rule is that there is twice as much swap space as there is RAM
installed on the system. For example, if you have 16mb of ram, the swap space would need
to be 32mb. Although this is just a preliminary template as to how much swap to use,
their are other factors to consider, an example would be if a users system is running large
applications that use large amounts of data, such as a CAD application. You can monitor the
amount of swap space used via the pstat or swap commands. If you did not allow enough swap
space during the initial install you can add additional swap with either the swapon or swap
commands.
3. The usr slice: The usr slice holds the remainder of the operating system utilities. It needs to be
large enough to hold all the packages you chose to install when installing the OS. If you are going to
install local applications or third-party applications in this slice, it needs to be large enough to hold
them. It is generally better if the usr slice contains the operating system and only symbolic
links to the applications. The filesystem is often mounted read-only to prevent changes.
4. The var slice: The var slice holds the spool directories used to queue printer files and mail, as well
as log files that my be unique to the system. It also holds the /var/tmp directory, which is used for
larger temporary files. It is the read-write counterpart to the usr slice. Every system, even a diskless
client, needs it's own var filesystem. It is not a filesystem that can be shared with any other
system(s).
5. The opt slice: In the newer UNIX systems based on System V release 4 (Solaris 2.x) many sections
are now optional and no longer needed to be loaded on the /usr filesystem. They are now installed
onto the /opt filesystem. Additional add on packages are also installed in this filesystem.
6. The home or export home (remote users) slice: The home directory is where the user's login directories
are placed. Making home its own slice prevents users from hurting anything else if they run this
filesystem out of space. A good starting point for the size of this slice is 1mb per application user plus
5mb per power user and 10mb per developer you intend to support.
Page 47
Advantages of Splitting a Drive into Multiple File Systems (cont):
These are rough estimates and are to be only used as a guideline, your configuration may need
more or less space per user. Usually this is /export/home. Don't put things into /home,
as this is a reserved mount point for automounted NFS filesystems. It's fine to use when
automounter is turned off, but it is on by default.
7. The tmp slice: Large temporary files are placed in the /var/tmp but sufficient temporary files are
placed in /tmp. The files in the /tmp directory are very short-lived and are cleared out during a reboot of
the system. If users run mostly application based programs 5 to 10mb should be sufficient for this
slice. If developers are the primary users of the system 10 to 20mb may be needed. Once again these
numbers or only a guideline, your needs may be different.
How to configure a system to run on a network (info doc 14981) (also see pg 56 Adding a 2nd network interface)
1. /etc/hosts
This file is used to resolve host name into IP addresses. This file must be updated if no naming
service is being used. This file should contain the IP and host name of each system on the
local network, including any gateways or routers.
Example:
127.0.0.1 localhost
129.145.71.109 kishori loghost #this is the IP and host name for the local machine
129.145.71.110 sage #this is the IP and host name for a host on the network
2. # ifconfig -a
Be sure that both the loopback and network interface are up and running.
Example:
lo0: flags=849<UP,LOOPBACK,RUNNING,MULTICAST> mtu 8232
inet 127.0.0.1 netmask ff000000
le0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500
inet 129.145.71.109 netmask ffffff00 broadcast 129.145.71.255
3. /etc/netmasks
This file should contain the netmasks. If you are using the default netmasks and it appears in
ifconfig -a, this file is not necessary.
Example:
# The netmasks file associates Internet Protocol (IP) address
# masks with IP network numbers.
#
# network-number netmask
#
# Both the network-number and the netmasks are specified in
# "decimal dot" notation, e.g:
# 128.32.0.0 255.255.255.0
#
129.145.0.0 255.255.255.0
page 48
How to configure a system to run on a network(cont.):
4. /etc/defaultrouter
If you want to define a default router include the router name in this file.
5. /etc/hostname.le0 or /etc/hostname.hme0 (depending on you interface type) This file should contain
the name of the local host.
6. /etc/resolv.conf
If you are using dns this file should contain the name of the domain and the IP address of the nameserver.
It is acceptable to list more than one nameserver (up to 4). The nameservers will be consulted in the
order listed. Be careful this file is very sensitive to extra spaces and tabs.
Example:
domain support.Corp.Sun.Com
nameserver 129.150.254.2
7. /etc/nsswitch.conf
Check this file for the appropriate entries. If a naming service is being used this file should reflect that.
8. It is a good idea to reboot the system at this point. Check to see if the network is working by pinging other
machines both inside and outside of your network.
NOTE: This document was written for VxVM 2.x. New functionality in VxVM 3.x renders many
of the "extra steps" in replacing a primary root disk obsolete. See the comments interspersed
below regarding steps when using VxVM 3.x.
If Volume Manager (VxVM) is running on a system with the root disk encapsulated and mirrored, and
the root disk fails, the system stays up and running, due to the fact that it is mirrored, but how can you
recover the original root disk?
The 'primary' root disk is the system disk on which the OS was originally installed. This
disk was "encapsulated" into VxVM and then mirrored. Since this disk is encapsulated, there is a
direct mapping of partitions onto volumes for /, swap, /usr, and /var.
The 'secondary' root disk is a disk which was first initialized into VxVM and then used to form a mirror
for the primary root disk.
VxVM 2.x: Since it was initialized, rather than encapsulated, there is no mapping of partitions onto the
volumes /, swap, /usr, and /var. VxVM 3.x: When the mirror of the root disk is created, the mapping
of partitions onto the volumes /, swap, /usr, and /var is maintained.
Page 49
SEVM - How to recover a primary boot disk. (cont.)
If the 'secondary' system disk fails, the replacement of the disk is straightforward. It is handled in
the same manner that any other failed drive needs to be replaced.
The easiest way to do this is to run 'vxdiskadm' and choose option #4 (Remove a disk for replacement).
Then, shut down the system (if necessary) to physically replace the disk, and reboot.
Run 'vxdiskadm' again, this time choosing option #5 (Replace a failed or removed disk). When asked
to 'encapsulate' the disk, reply "no", and then reply "yes" when asked if you wish to initialize it.
This will begin recovery of the disk and the mirrors will resync automatically.
NOTE: If you are running Volume Manager version 3.x.x or above, it is not necessary to follow the
steps below. Instead, the process for replacing the 'primary' boot disk is EXACTLY the same as that
for the 'secondary' boot disk, which is shown above. The reason for this is because Volume
Manager 3.x automatically creates the underlying "hard" partitions for /usr and /var on the replacement
disk, whereas older versions did not.
The recovery of the 'primary' boot disk contains a few additional steps because the procedure must
reestablish the direct mapping between the partitions on the disk and the system volumes. This is
necessary so that the system can be changed back to use underlying devices, should this be
necessary (for example, to perform a system upgrade or boot from cdrom to fsck one of these filesystems).
1.Run 'vxdiskadm' and choose option #4 (Remove a disk for replacement). Then, shut down the system
(if necessary) to physically replace the disk, and reboot.
2. Run 'vxdiskadm' and choose option #5 (Replace a failed or removed disk). When asked to 'encapsulate' the
disk, reply "no", and then reply "yes" when asked if you wish to initialize it.
3.This step will change depending on the number of partitions on the boot disk. The 'vxdiskadm'
command will put back partition 0 (for /) automatically, and may also do this for swap. However,
if you have any additional volumes on that disk (i.e., /usr or /var), you will have to run a command
to put the partition on the new disk in the correct location.
Examine the partitions on the replaced disk by running 'format' or 'prtvtoc' on it. At the very least, you
will see a partition for root and one for the public and one for the private partitions for VxVM. Determine
if any partitions are missing. If so, these "missing" partitions can be recreated easily using the steps below.
The command to use is 'vxmksdpart'. You give this command the name of a particular subdisk, and it creates a
partition on the disk in the correct location. The syntax is:
Page 50
SEVM - How to recover a primary boot disk. (cont.)
For example, if you have a subdisk named "disk01-02" and wanted to create partition 7 on the disk to map
this subdisk, you can run
where <subdisk> is the name of the subdisk used in the swapvol volume on the primary boot disk
(for example, "rootdisk-01"), and <partition> is the unused partition to use for swap (for
example, "1"). The "0x03" tag specifies this partition is for 'swap'.
3b. USR. To create a partiton for /usr (if this disk contains /usr), run:
3c. VAR. To create a partiton for /var (if this disk contains /var), run:
Disable DMP
Note: Be sure to do these steps first: 1. umount all file systems created on Volume
Manager volumes 2. Stop the Volume Manager (vxdctl stop).
page 51
Memory Scrubbing
On Ultra Enterprise (sun4u) platforms ECC is generated and checked by the UPA devices
(CPU, SYSIO and PSYCHO), not by the memory controller (Address Controller or AC).
Thus, ECC covers the entire data path between devices and memory.
***This means that an ECC error can be reported against a memory (DIMM/SIMM) that might not be bad ***
For a few ECC errors one may not recommend DIMM/SIMM replacement however in the case when
the errors are exactly 12 hours apart the DIMM/SIMM must be replaced. Memory scrubber runs every
12 hours after the system is booted. The purpose of scanning physical memory is to read each memory
location and determine if the data and ECC are correct. If the data does not match ECC, ECC will be
rerun and correction made to memory content. If it fails exactly 12 hours apart it means the error
appeared again despite of the correction, it will be corrected again however the DIMM/SIMM must be replaced.
physmem 3b7b
disable_memscrub:
disable_memscrub: 0
if it is "0" it is enabled
if it is "1" it is disabled
When using telnet to connect to a remote server you can have the a application that has a GUI
interface (like VTS) display on your local server by doing the following:
1. # /usr/openwin/bin/xhost + (run this on your local server. 'xhost - ' removes permissions)
2. Connect to remote server and:
If using csh, use this syntax: If using sh or ksh, use this syntax:
# setenv DISPLAY <hostname>:0.0 & # DISPLAY=<hostname>:0.0
example: # export DISPLAY
# sentenv DISPLAY persia:0.0 &
3. Run application and the GUI should display on the local server
page 52
Cluster 2.x http://suncluster.eng
http://neato.east/suncluster/scinstall.html (good install doc)
General:
Up to 4 nodes in cluster
Only Sun Storage is supported (can get waiver, but seldom granted)
HA or PDB (Parallel Data Base)
HA - 1 server runs at up to 100% or 2 up 50 % so the other node can take over in case of
failure
PDB - Both servers access the database simutaiously, no logical hosts or shared ccd
Supports Solaris 2.6, 7, 8
Supports QFE, SCI, fast ethernet, gigabit ethernet on the private net
Supports different types of server nodes in the cluster
Terminal concentrator is special model, it does not send a break on power on
DMP and Fast Write Cache not supported
(touch /kernel/drv/ap before vxvm install to not load DMP)
Topologies:
Clustered Pair
N+1 (hot standby node)
Ring or cascade
N to N scalable (cascading failover)
Shared Nothing ( used for Informix parallel server)
No logical hosts
The instants of Oracle syncing goes over the private network
No shared CCD
Must select CVM on install even with Volume Manager 3.0.4, to get OPS pick at end.
Must install UDLM (Oracle CD)
Create shared disk group while only one node in cluster.
Page 53
Cluster 2.x (cont.)
Hardware Notes:
Must change the initiator id on one node if using SCSI arrays between 2 nodes
(see procedure 5-17)
If Quorum device is replaced it needs to be reconfigured.
#scconf - q
A5000 - full loop only
must be mirrored
DMP, FW cache not supported
Direct or Hub attached (pg 5-23 5-27)
Wiring Diagrams
(pg 5-30)
SCI - scrubber jumpers need to be 'on' on one node 'off' on all the other nodes
/opt/SUNWsma/bin (has the SCI sm_config template files you need to
modify and run sm_config)
switch1.sc (4 nodes, 8 cards, 2 switches)
switch2.sc (2 nodes, 4 cards, 2 switches)
link1.sc (2 nodes, 4 cards, 0 switches)
#/opt/SUNWsma/bin/sm_config - f template file
Terminal Concentrator - port 1 is used for setup (numbered 1-8 not 0-7) (pg 5-56)
Enable setup mode - Power On < 30sec (test button) 15 more sec (test button)
should get monitor::
:: erase EEPROM (to set password to default, default is IP address of box)
Remove the password from port 8 in a 3 node Nto N cluster for 'port locking'
Cluster Commands:
ccp Command used to run the cluster control panel software on the
admin workstation
# ccp clustername &
cconsole Command used to start up the cluster console on the admin W/S
# cconsole
get_node_status Command used to get the status of a node (also can use hastat and
scconf clustername - p commands)
# get_node_status
haswitch Switch logical host to another node (will start the reconfiguration)
# haswitch nodename
hastat Will give you the status of the cluster, will lie if private network is
down. You can run it in the common window to get all views
# hastat (- m 0 skip messages)
hareg registers data service with HA and associate the given logical
host.
# hareg - s - r dataservice - h logicalhost
# hareg - y dataservicename (to turn on a dataservice)
# hareg (to verify a service is turned on)
# hareg - n dataservicename (to stop a data service)
# hareg - u dataservicename (will shutdown dataservice on all
Page 54 logical hosts)
Cluster 2.x (cont)
pnmset Command to create PNM NAFO groups (on each node) for the public
network interfaces to be used for the NFS data service.
# opt/SUNWpnm/bin/pnmset (follow interactive install)
pnmstat - l Command lists the /etc/pnmconfig file (to set up NAFO groups)
scadmin startcluster The first node into the cluster must enter with the 'cluster ' switch.
# scadmin startcluster nodename clustername
scadmin startnode All remaining nodes can join the cluster with the startnode switch
# scadmin startnode
scadmin stopnode To remove your node from the cluster use the stopnode switch. (do
this before init or shutdown commands)
# scadmin stopnode
scadmin switch Switch logical host to another node (will start the reconfiguration)
same as haswitch command
# scadmin switch nodename
scconf Command used to configure cluster parameters (many, use MAN)
# scconf - F (creates admin filesystem, each node)
# scconf - L (for logical hosts) (one node, diskset)
# scconf - q (for quoram device)
# scconf -N (to change a node ethernet address )
scdidadmn Command to initialize the Disk ID psudo driver (SDS install only)
builds a file with paths from each node to disks
# scdidadm - r (on node 0 to initialize)
# scdidadm - l (L) (verify DID configuration)
scinstall Installation command for Sun Cluster from CD
scmgr Command to start Sun Cluster manager (cluster monitor) (set DISPLAY)
# /opt/SUNWcluster/bin/scmgr nodename &
xhost Command on admin W/S to allow all xhost connections from
cluster nodes (graphics)
# /usr/openwin/xhost +
Cluster Files:
/etc/opt/SUNWcluster/conf/clustername.cdb
Contains Install info, flat file use more command to view.
/etc/opt/SUNWcluster/conf/ccd.database
Contains cluster database, viewed by scconf, scadmin commands. If you have to restore
this file to a 'bad' node, you must reboot (file info is kept in memory)
/etc/opt/SUNWcluster/conf/hanfs/vfstab.logicalhostname
Logical hosts vfstab file
/etc/opt/SUNWcluster/conf/hanfs/dfstab.logicalhostname
Logical hosts dfstab file (shared filesystems)
/etc/clusters
Admin W/S file, contains cluster names and node names
/etc/serialports
Admin W/S file, contains node names and port assignments on the consentrator
/etc/pnmconfig
Public network file. pnmset command creates, pnmstat - l command will list.
/etc/hosts
You must enter logical host name and IP.
Page 55
Cluster 2.x (cont)
Cluster Files:
/etc/name_to_major
vxio must have the same number on both nodes to switch nfs logical host
(unencapsulate first, change number)
/opt/SUNWcluster/bin
Most SC2.2 commands are located in this directory
/var/opt/SUNWcluster
Cluster error messages are located in this directory and in /var/adm/messages
The newer pci based servers come with a Operating Envrionment Installation CD to use with
Solaris 2.5 and 2.6. This CD will create a mini-root partion and allows you to install and boot the server
from the older versions of Solaris.
The mini-root is currently Solaris 7 and starts at cylinder 0 on the boot disk. Once the intended version
of Solaris is loaded, the environmental CD makes mini-root (not mini-me) swap (slice1), leaving it starting
at cylinder 0. This is alright if you are not encapsulating root.
When you then encapsulate root, swap (slice1) remains starting at cylinder 0, and veritas will not allow that
space to be used for a core dump. It assumes it is reserved for the VTOC.
One way we have used to get around this is to boot from the Operating Envrionment Installation CD,
load mini-root onto one disk and the intended O/S on another, through the custom install option. Then
boot from the other disk and encapsulate it.
- add hostname and ip address to /etc/hosts file (hostname is usually hostanme_interface ex: sunnie_qfe0)
- create a /etc/hostname. interface file # touch /etc/hostname.sunnie_qfe0
- vi /etc/hostanme. interface file add entry at top (no spaces) hostname_interface
- ifconfig interface (hme0,qfe0,ect.) plumb
- ifconfig interface inet IP_address # ifconfig qfe0 inet 129.145.121.123
- ifconfig interface netmask 255.255.255.0 # ifconfig qfe0 netmask 255.255.255.0
- ifconfig interface broadcast IP_address.255 # ifconfig qfe0 broadcast 129.145.121.255
- ifconfig interface up #ifconfig qfe0 up
- ifconfig - a (if ready to use, should look like this:)
qfe0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500
inet 129.145.121.123 netmask ffffff00 broadcast 129.145.121.255
ether 8:0:20:88:xx:xx
*** Warning: touch the file /etc/notrouter so the server will not route between the two ethernet interfaces***
page 56
Veritas Volume Manager :
Volume Manager takes physical disks and allows you to create logical volumes across these disks.
A group of physical disks is called a 'disk group'
All or portions of these physical disks can be combined to create logical 'volumes'
You then can create filesystems on these logical volumes that span multiple physical disks.
Rules:
- There must be a rootdg, for vxvm to come up at boot. This is usually made when you
install vxinstall volume manager and encapsulate your boot disk. Although you do not
have to encapsulate the boot disk, rootdg can be made up any disk.
- You must have 2 unassigned slices to encapsulate a disk. (public and private regions)
- vxunroot will unencapsulate a volume only if /, swap, /usr, /var, and /opt are the only
filesystems on the encapsulated disk.
General the flow of building logical volumes, creating a filesystem and mounting it, is as follows:
1. assign physical disks to free disk pool (to use with volume manager)
# vxdisksetup - i c#t#d# c#t#d# (ect...)
2. create a disk group (uses disks in the free disk pool. You assign names. nconfig is private db
copies, default is 4 and nlogs kernel logs, both switches are optional)
# vxdg init diskgrp_name disk_name=cxtxdx nconfig=# nlog=#
Page 57
Veritas Volume Manager (cont):
General the flow of building logical volumes, creating a filesystem and mounting it, is as follows:
# vxprint - htg dg_name (find plex name of mirror volume you want to use)
# vxplex - g dg_name dis plex_name (dissociate plex with volume)
#vxmake - g dg_name - U fsgen vol vol_name plex=plex_name (make the volume)
#mkdir /mp_name (create a mount point)
#vxvol - g dg_name start vol_name (start the newly created volume)
#mount /dev/vx/dsk/dg_name/vol_name /mp_name
rem out 'vxio' lines in /etc/system (usually 2 lines at the end of vm section)
copy /etc/vfstab to /etc/vfstab.vm
copy /etc/vfstab.prevm to /etc/vfstab
touch /etc/vx/reconfig.d/state.d/install-db
reboot
(to reverse)
uncomment 'vxio' lines in the /etc/system file (on both disks if root was mirrored)
copy /etc/vfstab.vm to /etc/vfstab (on both disks if root was mirrored)
rm /etc/vx/reconfig.d/state.d/install-db
reboot
Page 58
Veritas Volume Manager (cont):
vxdg free how much free space in a diskgroup: vxdg - g dg_name free
vxdg list list all imported disk groups (exported use: vxdisk - s list | grep dgname)
vxdg init Creates a disk group: vxdg init dg_name disk_name=c#t#d#
vxdg adddisk Add disk to dg: vxdg - g dg_name adddisk disk_name=cxtxdx
vxdg rmdisk Remove disk from dg: vxdg - g dg_name rmdisk disk_name
vxdg upgrade Upgrade dg after VM upgrade: vxdg upgrade dg_name
vxdg deport deport a dg: vxdg deport dg_name
vxdg import import a dg: vxdg import dg_name
vxassist make makes a logical volume: mirror
vxassist -g diskgrp_name -U fsgen make vol_name size layout=stripe nstripe=# disk_name disk_name (ect.
raid5
vxassist maxsize what is the max size raid you can make in a disk group:
mirror
vxassist - g dg_name maxsize layout=stripe nstripe=#
raid5
vxassist mirror mirror a stripe or concat vol :vxassist - g dg_name vol_name disk_name(s) &
vxassist remove mirror Used to remove a mirror permenemtly (do not use to break mirror)
vxassist - g dg_name remove mirror vol_name
vxplex used to attach and dissociate plex(es) with volumes:
vxplex att vol_name plex_name or vxplex - o rm dis vol_name
vxdisk - s list | grep dgname Gives you a listing of all disk groups
vxdisksetup - i used to add a disk to the volume manager free disk pool: vxdisksetup - i c#t#d#
vxdiskunsetup - C used to remove a disk from the free disk pool: vxdiskunsetup - C c#t#d#
vxdiskadd will do both the vxdisksetup and vxdg adddisk: vxdiskadd c#t#d#
vxvol start start a volume after it was made with vxassist or vxmake: vxvol start vol_name
vxvol stop used to stop a volume after a umount: vxvol stop vol_name
vxedit - r rm allows you to recursivly remove a volume, plex or subdisk: vxedit - r rm vol_name
plex_name
Page 59
FTPing to and from sunsolve:
You can use this to temporarily store files that you may want to access at a customers site or to
send files from a customer site that you can retreive on swan.
Anything sent to sunsolve will be deleted after two days
Internal to sunsolve:
(change to directory where the file you want to send resides)
# rftp sunsolve.sun.com
Name : anonymous or suncore
Password: (enter your e-mail address or suncore passwd changes weekly check url:)
https://livelink.central.sun.com/livelink/livelink?func=ll&objId=5537115&objAction=browse&sort=name
ftp> cd cores
ftp> mkdir dir_name (as of 5/01 you cannot create directories. Skip to bin command)
ftp>cd dir_name
ftp>pwd
257 "/cores/dir_name" is current directory.
ftp> bin
ftp> put file_name_to_be_sent
ftp> quit
#
Page 60
Serengeti: 3800 - 6800
Serengeti 8 (3800):
Serengeti 24 (6800):
SC Board: System Console. You can tip or telnet to the SC card to configure/maintain the server.
(SSC) There are 3 shells you can acess and configure from the SC, Platform shell, Domain shell
and O/S shell on a specific domain. The SC bd is part of the platform, it is not
configured into a domain. A second (slave) SC board is installed if the redundancy
kit is ordered. The SC runs it's own O/S and is upgraded and backed up across the
ethernet connection.
Repeater Bds: The repeater boards establish and maintain the connections between the system boards
(RP) and the IO boats. The 3800 and 4800 have 2 repeater boards, although the circuitry
for the repeaters on the 3800 is on the centerplane. The 6800 has 4 repeater bds.
System Boards: The system board is common across all 3 servers. It can have 2 or 4 CPUs
(SB) installed on it (they are not field replaceable). The system board has sockets
for 8 banks of 4 dimms. Each CPU has 2 corresponding dimm banks. It is possible
that a CPU might not have any dimms installed in its corresponding banks.
However, a populated dimm bank must have a corresponding CPU installed.
I/O boat: The I/O boat types : PCI or cPCI, no sbus I/O boat. The PCI and compact PCI
(IB) adapters are installed in the I/O boats. Currently cpci is only available on the 3800.
LEDS: on off
Activate (green): Bd is activated. You must Bd is not activated: you can
NOT remove the board when remove the board when this
this LED is on LED is off
Removal ok (amber): you can safely remove the you must not remove the
component under hot-pluggable component under hot-pluggable
conditions. conditions.
Partitioning:
You can configure the server in single or dual partition mode. If you select dual partition
mode, each partition will be electrically separated from the other. The 3800 (on bd repeaters)
and the 48x0 have dual repeaters one will be configured for each partition, the 6800 has
4 repeater bds, 2 will be configured for each partition. Dual partition mode is recommended for
keeping domains electrically separated.
Domains: On the serengeti, you configure the resources you want allocated to each domain. The domain
(like on an E10K) then becomes an independent server. At a minimum each domain must have
a system bd, I/O boat with ethernet/scsi PCI card, and a boot disk.
Domain/Partition configurations:
6800: Domains A,B even bd #s grid0, C,D odd bd#s grid1 (best practices)
2 partitions 3 domains ABC, ABD, ACD, or BCD
2 partitions 4 domains A,B,C,D
Page 62
Serengeti: 3800 - 6800: (cont.)
Power on hardware:
Connect to SCC
enter 0 (platform shell)
> poweron all
to verify: > showboards -v
Run this command from the platform shell. Keep in mind this command will not
update the slave SC. To update it you must make it the primary or run the command
from the slave SC.
>flashupdate -c <source board> <replacement board> (to copy firmware btwn like bds)
> setupplatform (enter information and modify ACLs . for each domain use
deleteboard and addboard - d commands )
In setupplatform:
Syslog loghost [ ] : ip_of_adminStation
Log Facility [ ]: local0 (can be 0-7)
In setupdomain: (for each domain)
Syslog loghost [ ] : ip_of_adminStation
Log Facility [ ]: local1 (can be 0-7)
In syslog.conf on admin station:
local0.notice /var/adm/messages.platform
local1.notice /var/adm/messages.domainA
(ect...)
Admin station:
create the files: # touch /var/adm/messages.nnnnnnn
restart syslog: # kill -HUP `cat /etc/syslog.pid` or ( /etc/init.d/syslog stop) ( /etc/init.d/syslog start)
Page 65
Setup remote logging: (cont.)
/usr/lib/newsyslog file: (so logs do not grow forever. On line 2 enter all message file names you created.)
- # logger -p local0.notice "test message for platform log file" (check contents of log files to make sure
logging is working) (if not check permissions on log file)
setfailover off /on and check log file on log host (if not snoop interface, make sure log entry is reaching loghost
also make sure syslogd is not running with the -t switch)
Notes:
- Use 'connections' command to see if ghost sessions are keeping you from connecting to a domain.
(reset the SC , from slave sc or reset button, to remove those sessions.)
- Use the dash (-) to remove an entry when running setupplatform
Firmware: http://pts-americas.west/esg/msg/techinfo/platform/sun_fire/firmware-matrix/
Patch # SC Firmware CPU (MHz) Domain Firmware Other features
-------- --------- --------------- ------------
112127-xx 5.12.5 750/900 (Masks 2.1/2.2 only) 5.12.x
5.12.6 750/900 (Masks 2.1/2.2 only) 5.12.x DR
5.12.7 750/All 900 5.12.x DR/900 2.3
112494-xx 5.13.0 750/All 900 5.12.x or 5.13.x DR/ SC auto failover
5.13.1 750/All 900 5.12.x or 5.13.x “
5.13.2 750/All 900/1050 5.12.x or 5.13.x DR/1050/failover
5.13.3 750/All 900/1050 5.12.x or 5.13.x “
750/All 900/1050 5.12.x or 5.13.x “
5.13.5 750/All 900/1050 5.12.x or 5.13.x “ /L2 timing
112883-xx 5.14.0 750/All 900/1050 5.12.x, 5.13.x or 5.14.0 DR/Failover/COD
5.14.4 750/All 900/1050/1200 5.12.x, 5.13.x or 5.14.x “ /L2 timing
Freshchoice (scsi2/ethernet) adapter firmware has problem booting CDROM. Bug 4397457
workaround: To patch get-mail of ISP fcode to give longer timeout period:
ok cd /ssm@0,0/pci@b,2000/pci@2/SUNW,isptwo@4
ok patch 100 64 get-mail
ok
Page 66
Mounting and unmounting CD without vold:
This will dump the file into the heart of the e-mail. Use for text documents, post output ect...
More T3 info:
Forgotten password:
reset the T3
press (return) within 3 seconds of reset (on the console sesion you have open)
type set passwd (this will display the current password)
T3 Logging: (you will need to modify the T3s host file and syslog.conf file by ftping them to a unix
server, edit them, send the files back to the T3 and reset the T3)
You should already have the T3 connected to the network and be able to telnet to the T3
type 'set' to make sure you have an ip, netmask, gateway, and hostname on the T3
:/: set logto *
modify T3s host file (add ip and hostname of loghost)
modify T3s syslog.conf (add line '*.info @ip_address_of_loghost')
modify loghost syslog.conf file (add line ' local7.info [tab] /var/adm/messages.t3') must use local7
touch /var/adm/messages.t3 on loghost
kill -HUP syslog.d pid or stop and restart it on the loghost
ftp modified host and syslog.conf to the T3
reset the T3 to have changes take effect
Page 67
StarCat 15K:
General:
StarCat 15K:
Has 18 available slots for system board sets. In each of the 18 available slots, you can configure (1)
System bd slot 0 bd and (1) hsPCI, MaxCPU or SunFire Link bd slot 1 bd.
System board set is made up of a system board (slot 0 bd) and a slot 1 type board. A slot 1
type board is usually a I/O (hsPCI) board, but can be a SunFire link or MaxCPU bd. The
slot0 and slot1 boards are physically mounted on a 'carrier plate and expander board'.
The expander bd/carrier plate is then inserted into one of the 18 available slots of the StarCat.
Control Board Set: (2) See Fin I0771-1 (keep old id bd if replaceing CP1500 bd on the SC)
Also see I0761-1 (upgrade CP1500 post & OBP )
Control Board set is made up of a 'System Controller Bd', 'System Controller Perepheral Bd'
and a 'CenterPlane Support Bd'. The system controller runs solaris and the SMS packages.
The System controller peripheral board has 2 SDS mirrored boot disks, DVD-rom and a 4mm DAT
that are used by the System Controller board.
The System Controller bd and SC peripheral bd are mounted on the CenterPlane Support bd. The
Centerplane support bd is then inserted into one of the 2 control bd slots on the StarCat.
The Control Bd set provides system clock, I2C monitoring bus, console bus to all domains,
serial port and 2 net ports to outside world, serial port internal to other SC and internal net connection
to each domain and other SC.
The SCs come with a O/S installed in a 'sys-unconfig' state. When you run smsconfig -m to
configure your SCs, it is easiest if the SCs are on the network and able to reach their default gateway.
IPMP contacts the gateway to determine if the physical interface is up.
Floating = community hostname and IP address. This address will follow the main SC
failover = virtual IP and hostname that will float between hme0 and eri1 on each SC
hme0, eri1= regular IP and hostnames for the interfaces
SC console port pinout: (plus null modem info for connection to 25pin/9pin serial ports)
o o o
td (2/3)<--- rd – 5 > o o o < 3 -- td ----> (3/2) rd
dtr (20/4)<--- cts -- 2 > o | o < 1 – dtr ----> (6,8/6,1) dsr, dcd
|
4 gnd (7/5)
Page 68
StarCat 15K: (cont...)
Example of IPMP configuration on Sun Fire 15K system controllers C (Community) Network:
System controller: The SC's are fully functional servers with 2 SDS mirrored 18gb disks,
DVD-rom and a 4mm DAT. They will come already loaded from the factory with Solaris
and SMS. At this time, there is no way to create the ' idprom.image' files in the field (so make sure
they are backed up). The default login and password is sms-svc, sms-svc.
Domain install: If the domain has a D240 attached the install (after creating the domain:
setupplatform, deleteboard, addboard, setobpparams, setkeyswitch) can be done from
the D240s DVD-rom. If you do not have a DVD-rom attached to the domain you are loading
you will most likley have to boot net.
Page 69
StarCat 15K: (cont...)
c= IOC0 d= IOC1
(slot 0 or 1) (slot 2 or 3)
| always 1 board type
| | |
/pci@17c,700000/pci@1/SUNW,isptwo@4/disk@0,0
| | |
change to decimal 6= I/O slot 0 or 2 device identifier
divide by 2 7= I/O slot 1 or 3
result is EX slot
1716=23 23/2=11 r1
EX slot=11
SMS daemons:
dca - domain configuration agent. One for every POST. Talks to dcs on domain (only on active SC.)
dsmd - domain status monitoring daemon (only on active SC.)
dxs - domain X server. One for each domain. (only on active SC.)
efe - event front-end daemon. Part of SMC acts as intermediarybtwn SMC agent and SMS (only act SC)
Page 70
StarCat 15K: (cont...)
SMS daemons: (cont)
SMS Files:
console - creates a remote connection to the domain's virtual console driver, making the window in which
the command is executed a "console window" for the specified domain
deleteboard - removes a board from the domain it is currently assigned to
deletetag - remove the domain tag name associated with the domain
disablecomponent - adds a component to the domain or platform blacklist
enablecomponent - removes a component from the platform, domain or ASR blacklist
flashupdate - updates the Flash PROM in the system controller (SC), and the Flash PROMs in
a domain's CPU and MaxCPU boards, given the board location.(/opt/SUNWSMS/firmware)
ex: flashupdate -f /opt/SUNWSMS/hostobjs/sgcpu.flash SB1 (leave Name blank to do all SBs)
fruupdate (command in 'help' listing, but no description or man page)
help - displays a list of valid SMS commands along with their correct syntax
initcmdsync - The command synchronization commands work together to control the recovery of user-defined
scripts interrupted by a system controller (SC) failover
marginclock [-f (65|75|83.333) | -s synth-freq | -m [+/-] margin-percent][-y]
marginvoltage [-p1.5] [-p2.5] [-p3.3] [-p5.0] [-pcore] [-m(0|+|-)] [-d domain_id|domain_tag]
[-d domain_id|domain_tag...] [-b location] [-b location...] [-y]
moveboard - first attempts to unassign location from the domain it is currently assigned to and possibly active
in, then proceeds to assign, connect, and configure location to the domain
poweroff - powers off the specified dual 48V power supply, fan tray, or board
poweron - powers on the specified dual 48V power supply, fan tray, or board
reset - allows you to reset one or more domains in one of two ways: reset the hardware to a clean state
or send an externally initiated reset (XIR) signal
resetsc - resets the other SC
runcmdsync - command prepares the specified script for automatic synchronization (recovery) after a failover.
Savecmdsync - The command synchronization commands work together to control the recovery of user-defined
scripts interrupted by a system controller (SC) failover
setbus - perform dynamic bus reconfiguration on active expanders in a domain
setchs - SMS1.4 set component health status. SMS can auto fail components. Setchs lets you change the status
setcsn - SMS1.4 set chasis serial number. allows you to set csn once. (showplatform) # setcsn -c serial#
setdatasync - schedule filename enables you to specify a user-created file to be added to or removed from the
data propagation list.
setdate - allows the SC platform administrator to set the SC or optionally a domain date and time values.
Allows domain administrators to set the date and time values for their domains.
setdefaults - removes all SMS instances of a previously active domain. A domain instance includes all
pcd entries except network information; all message, console, and syslog log files; and, optionally,
all NVRAM and boot parameters. pcd entries and NVRAM and boot parameters are returned to
system default settings
setfailover - provides the ability to modify the state of failover for the SC failover mechanisms
setkeyswitch - changes the position of the virtual keyswitch to the specified value
setobpparams - allows a domain administrator to set the virtual NVRAM and REBOOT variables passed to
OpenBoot PROM by setkeyswitch
setupplatform - sets up the available component list for domains.
showboards - displays board assignments
showbus - display the bus configuration of expanders in active domains
showchs - SMS1.4 displays component health status. EX: showchs -r sb15
showcmdsync - displays the command synchronization list to be used by the spare system controller (SC) to
determine which commands or scripts need to be restarted after an SC failover.
showcomponent - displays whether the specified component is listed in the platform, domain, or ASR blacklist file.
showdatasync - provides the current status of files propagated (copied) from the main SC to its spare
showdate - display the date and time for the system controller (SC) or a domain
showdevices - displays the configured physical devices on system boards and the resources made available by
these devices.
Page 72
StarCat 15K: (cont...)
local-mac-address :
The "local-mac-address?" eeprom parameter is used enable the MAC addresses which are burnt-in on
network cards.
false - do not use the card's burnt-in adresses, use the nvram default address for all interfaces
(shown on obp banner)
true - use the on-board MAC address (if there is any). This setting is necessary to get a
unique MAC address per interface.
The default setting of the local-mac-address? is set to "false". On non clustered servers the installation
engineer must not forget to set local-mac-address? to true to avoid having one MAC address several
times in the network, which causes network problems.
- first format the second disk exactly like the original root disk: (typically s7 is reserved for metadatabase)
# metadb -a -f -c 3 c0t0d0s7 c1t0d0s7 (-a and -f options create the initial state database replicas. -c 3
puts three state database replicas on each specified slice)
- for each slice, you must create 3 new metadevices: one for the existing slice, one for the slice on the
mirrored disk, and one for the mirror. To do this, make the appropriate entries in the md.tab file.
Follow this example, creating groups of 3 entries for each data slice on the root disk.
- run the metainit command to create all the metadevices you have just defined in the md.tab file.
If you use the -a option, all the metadevices defined in the md.tab will be created.
# metainit -a -f (-f is required because the slices on the root disk are currently mounted)
- run the metaroot command for the metadevice you designated for the root mirror. In the example
above, we created d0 to be the mirror device for the root partition, so we would run:
# metaroot d0
- edit the /etc/vfstab file to change each slice to the appropriate metadevice. 'metaroot' command has already
done this for you for the root slice.
/dev/dsk/c0t0d0s1 - - swap - no -
to
/dev/md/dsk/d1 - - swap - no -
Make sure that you change the slice to the main mirror, d1 not to the simple submirror, d11.
- reboot the system. Do not proceed without rebooting your system, or data corruption will occur.
- After the system has rebooted, you can verify that root and other slices are under DiskSuite's control:
# df -k
# swap -l
The outputs of these commands should reflect the metadevice names, not the slice names.
# metattach d0 d20 (must be done for each partition on the disk, and will start the syncing of data)
- to follow the progress of this syncing for this mirror, enter the command
# metastat d0
Although you can run all the metattach commands one right after another, it is a good idea to run the next
metattach command only after the first syncing has completed. Once you have attached all the submirrors
to the metamirrors, and all the syncing has completed, your root disk is mirrored.
Page 74
IPMP: (Solaris 8 Update 2 10/01)
General Description:
IPMP allows you to create a logical IP address that can be swapped on-the-fly to another
physical network interface.
IPMP Test IP Address: physical interfaces (hme0,qfex,ge). This address is used by IPMP to determine
the status of the physical interface. It is not for use by applications.
IPMP Logical IP Address: IP address is used by applications for data transfers to and from
the server. This IP address will failover between the configured interfaces.
Setup ipv4 IPMP: (IPMP group w/ 1 stanndby interface) see IP Multipathing Admin Guide
/etc/hostname.hme0 :
/etc/hostname.qfe0 :
Page 75
T3B or T3+ Firmware Rev 2.1 New Functions:
Volume slicing:
- Create max 16 slices within a T3, either WG or PP.
- Layered on top of volumes. If volume is unmounted all slices go away.
- Volume slices cannot be seen until the voilume is initalized and mounted.
- Minimum size is 1GB, increments of 1GB, starts on GB boundaries.
- Maximum size is size of volume.
- Once enabled cannot be disabled.
EX: (simple example of sliceing a volume on a t3+)
Enabled by new system variable enable_volslice.
sys enable_volslice (Note: if volslice is enabled, you must create a slice to see lun in format)
vol add vol_name data u#d#-# raid # standby* u#d9
vol init vol_name data rate(1-16) optional
volslice create slice_name -z size vol_name
volslice list
lun perm list (should be rw, else `lun default all_lun rw')
vol mount vol_name
WWN Groups: Allows groups of wwns to share security features, saves lazy typists.
SE9960- One DKC logic cabinet, one to six DKU disk cabinets, arranged on right and left (R1-3, L1-3).
R1 is added first, add on alternate sides for best performance.
Max 32GB cache, 32 host ports, 512 disk drives.
up to 4096 logical devices can be configured and presented.
SE9980V- One DKC logic cabine, one to four DKU cabinets. Added same as 9960.
Max 64GB cache, 64 host ports, 1024 disk drives.
up to 8092 logical devices can be configured and presented.
All use the concept of "storage clusters" redundant combinations of cache boards, host adapter boards (CHA)
and disk adapters boards (DKA). All array transactions run through the cache.
Basic building block is called the B4, which is 4 trays of disks (HDUs). In 9910 and 9970 B4 is all 4 HDUs of
disks, in 9960 and 9980 a B4 is 4 (of 8) HDUs in a cabinet (bottom 4 or top 4). HDUs will be numbered in N
shape. The same 4 drives in a B4 are a parity group, which is where the RAID level is set. A parity group will
always be 4 drives. In 9970 and 9980 parity groups can span 2 B4's.
B4's are numbered 1 through 12; 1 and 2 are in cabinet R1, 3 and 4 are in L1, 5 and 6 are in R2 etc. Disk drives
in each 9910 and 9960 HDU are numbered 0 through B (11), thus 12 drives. Disk drives in each 9970 and 9980
HDU are numbered 00 thru 0F and 10 thru 1F. Accesssing drives 10 thru 1f requires an additional card in the
HDU.
Each parity group is set to an emulation mode, the system then divides that parity group into the appropriate
number of LDEV's based on the emulation mode sizing. LDEV's can be presented on the host ports as LUN's
as is or combined to create larger LUNs.
In 9910 and 9960 drive B (top last drive on left) in each HDU in the L1 and R1 DKU's is used as a universal
spare, the bottom B4 drive B will always be a spare if installed, the top B4 drive B may be designated as spares
or may be a normal parity group. In a 9910 any drives installed in slot B will be spares. In 9970 and 9980, drive
0F will be the spare (top left drive next to center cards). Same rules apply for slot 0F as B in 9910 and 9960.
In 9970 the HDU can be "split" using special cards to create two B4's.
Windows PC mounted in array. 9970 and 9980 have optional second SVP mounted in cold standby.
Two modes of operation, View and Modify, View will come on when the Remote Console is connected.
Disconnect Remote Console or reboot SVP to go back to Modify mode.
Page 77
Hitachi StorEdge 99X0 Arrays: (cont...)
Passwords:
raid-initialsetup
raid-install
raid-online
horc-forcibly
MAINTENANCE: lots of jumpers on boards, must be carefully checked. All changes must be made thru
modify mode on the svp, carefully following the procedures. Repair procedures have
a pre change section, a change section and a post change section, follow all steps.
USE THE MANUALS (on CD comes with the firmware) !!
SunFire forgotten password: (SRDB 26846) This procedure works with firmware version 5.11.3 and higher.
If the platform administrator's password is lost, the following procedure can be used to
clear the password.
1. Reboot the System Controller (SC). You won't be able to do this by logging into the platform shell.
You'll need to hit the reset button on the SC to do this.
2. The normal sequence of a System Controller rebooting is for SCPOST to run, then ScApp. You'll need
to wait for ScApp to start loading, then hit Control-A to spawn a vxWorks shell. SCPOST is done running
when you see the message 'POST Complete'. At this point, ScApp will begin to load. When you see
the copyright message 'Copyright 2001 Sun Microsystems, Inc. All rights reserved.', Hit CONTROL-A.
You should see the following:
Page 78
Sunfire forgotten password: (cont:)
This last line is the vxWorks prompt. Keep in mind, that ScApp will still continue to load all the way
to the point of giving you the menu to enter the platform/domain shells. To make it less confusing,
wait for the ScApp menu to display on your screen, then hit return. You should see the
vxWorks prompt -> again.
3. Make a note of the current boot flags settings. This will be used to restore the boot flags to the original value.
-> getBootFlags()
5. Reboot the System Controller (CONTROL-X or reboot ). Once reset, it will stop at the -> prompt.
6. If you are running firmware 5.17.x or above, enter the following commands, otherwise, go to step 7:
-> ld 1,0,"/sc/flash/vxAddOn.o"
If you are running firmware 5.17.x or 5.18.x, enter the following command at the prompt
-> uncompressJVM("/sc/flash/JVM.zip", "/sc/flash/JVM");
If you are running firmware 5.19.x or later, enter the following command at the prompt
-> uncompressFile("/sc/flash/JVM.zip", "/sc/flash/JVM");
Wait for the following System Controller messages to display. Your prompt will come back right away,
but it'll take about 10 seconds for these messages to show up:
8. After the above messages are displayed, restore the bootflags to the original value using the
setBootFlags() command.
9. Reboot the System Controller using CONTROL-X or the reboot command. Once rebooted,
the platform administrator's password will be cleared.
Page 79
StorEdge Network FC Switch:
The StorEdge Network FC Switch are replacing the fibre hubs. When you receive them they
are configured as similar to a hub (all ports one zone). The switch will initially get it's IP address
by RARPing (though it has a default IP of 10.0.0.1). You cannot telnet to the switch, you must use
the GUI to configure (may change with future firmware).
Remember: each array in a zone must have a unique tag address or box id...
Page 80
Hitachi Lightning 9900V notes:
also see: http://storage.east/hitachi
Cluster - set of boards in a subsystem. 2 clusters: CL1, CL2. Mirror config across clusters
Emulations - Lun Specifications (what type of disk drive do you want the lun to appear to be?)
LUSE - Lun Size Expansion: Make large Lun from Ldevs (concatinate)
CVS/VLL - Make smaller Luns from free space 35gb and lower (must be smaller than emulation size selected)
Pariy Group (aka: Array Group): 4 disks only. Select physical disks, Select emulation, (this will give
you a number of Ldevs depending on emulation) Assign Ldevs to CU
Lun Mapping: Map a Ldev to ports on the CHAs. Done thru Storage Navigator.
Host mode 0 is standard, host mode 9 for Solaris, host mode C for windows
Host Groups: When Lun Security is on upto 128 host groups/ port. Can config host mode and have lun0 per
group. Need to know WWN of HBA
High Speed Mode: All the processers on a CHA will be working 1 port : 1 port 4 procs (other 3 ports
disabled)
Standard speed mode : 1 processor per port on a 4port CHA, 1 proc/2ports 8 port CHA
Offline SVP: Software (m/c CD) to load on your PC. Use to configure without SVP. Requires config floppy
DCI - Define Configure Install: DCI operation destroys customer data use for new install only.
Use 'Change Configuration' on existing subsystems. (Shift ctl i raid-initialsetup)
Page 81
Hitachi Lightning 9900V notes: Cont.
Customer wants (10) 500gb luns. How many HDDs do you need?
1, (1) 500gb lun = (14) 36gb open-L Ldevs 500/36= 13 r32 (round up to 14)
2, (10) luns = 140 Ldevs 14x10=140
3, parity groups = 24 6 Ldevs/parity group 140/6= 23 r2 (round up to 24)
4, 96 HDDs required 24 parity groups x 4 disks/group = 96 disks
Microcode CD:
- Read ECN (engineering Change Notice) comes with m/c CD
- Includes Manuals (use them)
- Includes Offline SVP software
If message led is on, check subsystem status: (if blinking communication problem with the SVP)
- Maintenance button on SVP
Storage Navigator - Allows you to do Lun mapping, LUSE, CVS, DCR, True Copy, Shadow Image
from a client thru the lan to the SVP. Make sure the SVP is not in 'modify' mode so
you can get write access. Default login: root pwd: root
http://ipaddress-main-SVP//cgi-bin/utility/sjc0000.cgi
DCR/Flashaccess - Dynamic Cache Residency: Will keep a Ldev resident in cache, save on transfer time.
If purchased set it up on install, will save downtime later
Page 82
Hitachi Lightning 9900Vnotes: Cont.
HDLM - Hitach Dynamic Link Manager: Loaded on the server similar to DMP.
/opt/dynamiclinkmanager/log /bin
Defaults:
Sun Windows Setting
Path Health Check off off 15 - 1444 min
auto failback none off
HDLM commands:
# dlnkmgr veiw (-path), (-sys),
offline (-path)
online (-path)
set -ellv log-level, -elfs log-size, -systflv trace-level, -pchk, -s
clear
help
True Copy: Remote copy to another disk subsystem (9900 to 9900). Mainly used for disaster recovery.
You configure it on each subsystem using Storage Navigator. One will be the Master (MCU)
and the other Remote (RCU).
2 transfer methods:
SYNC: Data that is transferred to the MCU is inturn sent to the RCUthru a dedicated port.
When the data is acknowleged at the RCU the MCU sends an acknowlegement back to the HBA
ASYNC: Data sent to the MCU is acknowleged to the HBA before the MCU receives
acknowlegement from the RCU
The dedicated port has to be configured as 'initiator' on the MCU and 'RCU target' on the RCU.
This port is a point to point connection between the disk subsystems.
The PVOL is the primary volume (Ldev) the data is sent to it from the server.
The SVOL is the secondary volume (Ldev) on the RCU that True Copy copies to.
SMPL - simplex volume prior to any pair operation or result of 'pairsplit -s' command
COPY - (initial copy in progress) a result of a 'paircreate' command
PAIR - initial copy complete and doing updates as data changes on pvol
PSUS - pair operations suspended as a result of a 'pairsplit' command
PSUE - pair operations suspended as a result of a failure
Page 83
Hitachi Lightning 9900V notes: Cont.
Shadow Image: A local copy within a disk subsystem. Configured using Storage Navigator.
The PVOL is the primary volume (Ldev) the data is sent to from the server. The SVOL is the secondary
volume (Ldev) that Shadow Image copies to. You can can have a max of 9 copies (svols), this includes
(3)level 1 SVOLs and (6) level2 SVOLs (cascade)
Level1 Level2
_______S
_____S /
| \ ________S
| _______S
P_____S /
| \ ________S
| _______S
|_____S /
\ ________S
Quick Functions:
quicksplit : makes it possible to read and write SVOLs immediately after split
quickresync: reduces the resync time considerably
quickrestore: reduces restore time considerabaly
Minnow StorEdge 3300 Series array: (also see page 110 for disk replacement)
OEM'd from Dot Hill. Small cheap array. Scsi hardware raid and jbod. Fiber array soon.
Raid levels 0, 1, 3, 5, 1+0, and 0+1 supported.
Up to 12 drives per box, 2 redundant RAID controllers.
Model 3310 Ultra 160 LVD SCSI (will work Single Ended as well).
Use new LVD card and SUNWqus driver.
Luns are created and owned by one controller, other is failover for it. Controllers can be active/active or
active/passive. All interface to array is done thru the master controller.
Parts are raid controllers (2), event monitor units (emus) (2), power supplies (2), terminator board (1), io
board(1), disks (12). All hot swappable. Replacing terminator and io board will interupt io.
Page 84
Minnow StorEdge 3300 Series array: cont...
Cableing can be complex, refer to manual. 4 channels within box, two are for host, 2 for drives
Single bus- all drives same channel.
Dual bus- split drives between two channels (split drives 1-6 & 7-12, channels 0 &2).
IO Board
Channels 0 and 2 are drive channels
Channel 1 and 3 are host channel ports
SB and DB ports are “jumper” ports: Single bus jumper cable from channel 0 to SB port.
Dual bus jumper cable from channel 2 to DB port.
Expansion unit (JBOD) has no controllers, has 4 port IO Board (A, Aterm, B, Bterm).
Aterm and Bterm are self terminating ports, need to be at end of chain.
Single bus in expander jumper cable from B to Aterm.
Dual bus in expander no jumper cable installed.
If adding an expander to a “controller” box run the cable to the “non term” ports.
Box Management thru serial port or GUI (GUI doesn't work well yet).
If using network connect both controllers to same subnet, only master controller has ip address. IP
assigned by DHCP or static thru serial port connection.
Standard RS232 null modem (9 pin female) serial cable to either controller. Settings are 38400 baud, 8N1.
control-l refreshes screen (if just connected to running array hit control-l choose VT100 mode)
control-w switches between the controllers.
control-acbd reset to factory defaults, password “oemmaint”
Config tool is a text based menu, common to all arrays, main selections are: (use Return and ESC to navigate)
view & edit logical drives (create, expand, delete, raid configs, partition, set spares)
view & edit logical volumes (create, delete logical volumes)
view & edit host luns (assign lun id's and map host channels)
view & edit scsi drives (view drive status,flash drive leds, set global spares, clone drives)
view & edit scsi channels (status, properties, set controller target id)
view & edit config parameters (controller settings, set baud and ip address)
view & edit peripheral devices (set expansion box, secondary controller, array status (emu))
veiw system information (cache size, firmware revision, Ect...)
system functions (reset, shutdown, fw upgrade)
event logs
Create LUNs: (in general, example does not use logical volumes so no “+” raid levels)
setup qlobal spares- v/e scsi drives–select disk- add global spare- yes
setup logical drive- v/e logical drives–select LG–create logical drive-yes-raid-select disks-capacity-ESC
partition logical drive- v/e logical drives-select logical drive-partition-select partition(arrow)-size-yes
map luns to host- v/e host luns-select controler-select lun#-select logical drive-select partition-map(y)
Page 85
Tuning ecache scrubber scan rate:
To adjust ecache_scan_rate:
1. As root, run the following command to adjust ecache_scan_rate.
NOTE: This does not require downtime. Be very careful, though, as mis-typing the command could
result in downtime.
2. To make the change permanent, add the parameter setting to /etc/system. It is best to insert all
3 parameters together into /etc/system if the settings are not already there:
set ecache_scrub_enable=1
set ecache_scan_rate=1000
set ecache_calls_a_sec=100
VxWorks (serengeti SC): Use when you cannot get into scapp or to recover a failed SC flashupdate
- Reset the SC using the reset button on the front of the SC.
- when “ Copyright 2001-2002 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms. “ appears hit CTRL A
->setBootFlags(0x0) CTRL X (will reboot and stop booting at the -> PROMPT.
->setBootFlags(0xd) ( then “reboot” to Change the boot settings back so SC automatically boots ScApp)
Page 86
LVD PCI Adapter: (ultra scsi-3 375-3057)
Code named jasper, it is a Low Voltage Differential card. Mainly supports the S1, D2 and Minnow
(SE 3310) arrays.
The LVD drivers are not on any Solaris CD yet, 8 02/02 or 9. You will have to either make a temp boot disk
and patch it or boot net from a patched image to see the disks on a LVD adapter, until a bootable CD is
released that has driver support for the LVD.
do the following to see disks on a LVD adapter:(drivers and patches available on EIS cd sun/progs, sun/patch)
Once loaded you can install Solaris on the LVD disks. But you have to select 'manual
boot' so you can then patch the install image before reboot as follows:
- cd /net/ipaddress_of_install_server/shared_dir_where_pkgs_located/
- pkgadd -R /a -d . (add all four pkgs 32 and 64 bit)
- patchadd -R /a 112697-02
- reboot
see doc 816-2156-11.pdf StorageEdge PCI Dual Ultra3 SCSI Host Adapter Install Guide.
The Nordica bd is used both in the netra line and the SC of a 15k. When replacing the Nordica
bd (501-5473) in a 15k you have to upgrade the OBP so you will have all the SC functionality.
The info doc says you should do the procedure on rev -12 and below. We had to do the procedure on
a -13 board to get it to work(without it we could not see the 'man' network interfaces).
In general you have to: (see fin and download readme for specifics)
You can find “The current Nordica OBP firmware image available for download” at :
http://pts-americas.west/esg/hsg/starcat/patches.html
Serengetti /15k Dynamic Reconfiguration: Min Requires Solaris 8 (02/02 u7) SC 5.12.6
(also see 15k dr examples page 109)
(Solaris commands)
To get a list of component NAMES: # cfgadm -al
To remove a bd from a domain: # cfgadm -o unassign,nopoweroff -c disconnect NAME (ex: N0.SB1)
To add a bd into a domain: # cfgadm -v -c configure NAME (ex: N0.SB1)
To see if board has perm mem: # cfgadm -val | grep permanent
Page 87
To Clean up non-root disc “controler” numbers: (see info docs 15019, 27756)
# mv /etc/path_to_inst /etc/path_to_inst.orig
# rm /etc/path_to_inst.old
# cd /dev/dsk
# rm c1* c2* c3* c4* (do not remove your boot device)
# cd /dev/rdsk
# rm c1* c2* c3* c4* (do not remove your boot device)
# rm -rf /dev/cfg/* (new on solaris 8)
If boot disk is under Sun StorEdge Volume Manager, search for "rootdev:" in /etc/system.
ex: rootdev: /pseudo/vxio@0:0 (Write down this device name exactly, you will use it on boot.)
# init 0
ok boot -ar (take the default through all prompts except: “Do you want to rebuild this file [n]?” y )
(and if you have the boot disk under StorEdge Volume Manager, when asked for)
( the physical root device, enter the device name you found above)
In Hex:
------------------------------------------------------------------
| Exp| cpu0| cpu1| cpu2| cpu3| max0| max1| pci0| pci1| axq0| axq1|
------------------------------------------------------------------
| 0| 0| 1| 2| 3| 8| 9 | 1c | 1d | 1e | 1f |
|1 | 20 | 21 | 22 | 23 | 28 | 29 | 3c | 3d | 3e | 3f |
| 2| 40 | 41 | 42 | 43 | 48 | 49 | 5c | 5d | 5e | 5f |
| 3| 60 | 61 | 62 | 63 | 68 | 69 | 7c | 7d | 7e | 7f |
| 4| 80 | 81 | 82 | 83 | 88 | 89 | 9c | 9d | 9e | 9f |
| 5| a0 | a1 | a2 | a3 | a8 | a9 | bc | bd | be | bf |
| 6| c0 | c1 | c2 | c3 | c8 | c9 | dc | dd | de | df |
| 7| e0 | e1 | e2 | e3 | e8 | e9 | fc | fd | fe | ff |
| 8 | 100 | 101 | 102 | 103 | 108 | 109 | 11c | 11d | 11e | 11f |
| 9 | 120 | 121 | 122 | 123 | 128 | 129 | 13c | 13d | 13e | 13f |
| 10 | 140 | 141 | 142 | 143 | 148 | 149 | 15c | 15d | 15e | 15f |
| 11 | 160 | 161 | 162 | 163| 168 | 169 | 17c | 17d | 17e | 17f |
| 12 | 180 | 181 | 182 | 183| 188 | 189 | 19c | 19d | 19e | 19f |
| 13 | 1a0 | 1a1 | 1a2 | 1a3 | 1a8 | 1a9 | 1bc | 1bd | 1be | 1bf |
| 14 | 1c0 | 1c1 | 1c2 | 1c3 | 1c8 | 1c9 | 1dc | 1dd | 1de | 1df |
| 15 | 1e0 | 1e1 | 1e2 | 1e3 | 1e8 | 1e9 | 1fc | 1fd | 1fe | 1ff |
| 16 | 200 | 201 | 202 | 203 | 208 | 209 | 21c | 21d | 21e | 21f |
| 17 | 220 | 221 | 222 | 223 | 228 | 229 | 23c| 23d | 23e | 23f |
------------------------------------------------------------------
Page 88
Starcat SC: clean the slate: (bring down domains)
#redx -l (will put you in local mode to look at dumps. redxl.csh for non SC analysis)
redxl>dumpf load dump-file-name (will load dump and give you a brief summary)
redxl>dumpf types (will list the domain board configuration)
redxl> wfail (will give you failure info “1E”= 1st error “1E+”= accumulated errors)
SB (slot 0) redx commands:
redxl> shproc 0 0 3 (show PROC. 0 0 3 = exb0 slot0 cpu 3 shproc connects to DCDS, SDC, AR, SBBC)
redxl> shdcds 0 0 1 (show DCDS. 0 0 1= exb 0 slot0 dcds 1 shdcds connects to PROC, DX)
redxl> shdx 0 0 3 (show DX. 0 0 3= exb 0 slot0 dx 3 shdx connects to SDI(exb) DCDS)
redxl> shar 0 0 (show AR. 0 0 = exb 0 slot0 shar connects to AQX(exb) SDI 0(exb) PROCs)
redxl> shbbc 0 0 1 (show SBBC. 0 0 1 = exb 0 slot0 sbbc 1 shbbc connects to SDC, PROCs)
redxl> shsdc 0 0 (show SDC. 0 0 = exb 0 slot0 shsdc connects to SBBC, PROCs)
I/O(slot1) redx commands:
redxl> shioc 0 1 1 (show IOC. 0 1 0= exb0 slot1 ioc 1 shioc connects to SDC, DXs, AR)
redxl> shar 0 1 (show AR. 0 1 = exb 0 slot1 shar connects to AQX(exb) SDI 0 (exb) IOCs)
redxl> shdx 0 1 1 (show DX. 0 0 1= exb 0 slot1 dx 1 shdx connects to SDI(exb) IOCs)
redxl> shsdc 0 1 (show SDC. 0 1 = exb 0 slot1 shsdc connects to SBBC, IOCs)
redxl> shbbc 0 1 (show SBBC. 0 0 1 = exb 0 slot1 shbbc connects to SDC, IOCs)
Expander (exb) redx commands:
redxl> shaxq 0 (show AXQ. 0 = exb 0 shaxq connects to AMXs(cp) ARs, SDCs, SDI 0)
redxl> shcbr axq 0 (show CBR AXQ. 0 = exb 0 )
redxl> shsdi 0 0 (show SDI. 0 0 = exb 0 sdi 0 shsdi connects to DARBs (cp) DMXs(cp) ARs
SDCs, SDIs(exb) AXQ(exb) (6 SDIs/exb)
redxl> shcbr exb 0 (show CBR EXB. 0 = exb 0)
CenterPlane (cp) redx commands:
redxl> shamx 0 1 (show AMX. 0 1 = cp 0 amx 1 shamx connects to AXQs (exbs)
redxl> shrmx 1 (show RMX. 1 = cp 1 shrmx connects to AXQs (exbs)
redxl> shdmx 0 (show DMX. 0 = cp 0 shdmx connects to SDIs (exbs) port 0-3, 1-2, 2-1, 3-0, 4-4, 5-5
redxl> shdarb 1 (show DARB. 1 = cp 1 shdarb connects to SDI 0 (exbs) shows domain configs)
Terms:
AR Address Repeater (1 per SB, IO, max CPU)
AMX Address MultipleXer (2 per centerplane buss C0, C1)
AXQ Address controller (1 per expander board)
DARB Data ARBiter (1 per centerplane buss C0,C1)
DCDS Dual CPU Data Switch (2 per SB, 1 per Max CPU. 1/DCDS for 2 PROCs)
DMX Data MultipleXer (6 per centerplane bussC0,C1 connects to SDI exbs)
DX Data Switch (4 per slot0, 2 per slot1 bd)
RMX Response MultipleXer (1 per centerplane buss C0,C1)
SBBC System Boot Bus Controller (2 per slot0, 1 per slot1 bd)
SDC System Data path Controller (1 per SB, IO, max CPU)
SDI System Data Interface (6 per EXB, 0 is master connects to DMXs)
Page 89
StorADE:
You can bring up the GUI by typing (in a browser window, any server):
http://hostname :7654 (default login: ras password: agent)
(I found cli diags to be more useful then the GUI)
Get fru info from a serengetti: (prtfru does not work on serengetti, explorer must be loaded)
#cd /opt/SUNWexplo/bin
# LD_LIBRARY_PATH=/opt/SUNWexplo/lib
# export LD_LIBRARY_PATH
# CLASSPATH=/opt/SUNWexplo/java/fruid-scappclient.jar:/opt/SUNWexplo/java/libfru.jar
# export CLASSPATH
# ./rprtfru.sparc -b sc_ip_address:password >/tmp/fruid(must use password. will put output in file /tmp/fruid)
Page 90
SWAP
What is recommended now (2003) swap size with gb physical memory servers?
(http://docs.sun.com/db/doc/817-0798/6mgisnqfi?a=view)
Performance considerations:
How much and how often?
# swap -s (command to monitor swap resources)
# swap -l (command to determine if your system needs more swap space)
How to tell how much swapping? (if too much should consider adding more physical memory)
# vmstat 5 5 (look at sr column, also note po, page out column. non-zero numbers
- page scanner looking for pages to mark as free, po - we're sending stuff out.)
# iostat -npxc 5 5 (check for kw/s on the swap partition - non-zero and the page outs from
vmstat are really writes to swap partition(s).
(http://docs.sun.com/db/doc/816-4553/6maop1hik?a=view)
Dump considerations:
How much memory do you want dumped? all, kernel, kernel + active process
# dumpadm
Dump content: kernel pages
Dump device: /dev/dsk/c0t3d0s1 (swap)
Savecore directory: /var/crash/pluto ***(large enough to hold core)
Savecore enabled: yes
savecore -L (live core dump, WATCH OUT, do not do a savecore -L to a dumpslot under volume
manager control)
DR considerations:
How much physical memory on most populated System board?
Nonpermanent Memory (currently 32gb physical mem/max/bd) Before you can delete
a board, the environment must vacate the memory on that board.Vacating a board means
flushing its nonpermanent memory to swap space.
http://education.central/AliasArchive/Archives/ILT/ses_systemadmin-ext/msg08612.html
http://education.central/AliasArchive/Archives/ILT/ses_systemadmin-ext/msg05509.html
Page 91
from /net/cores.central/cores/dir5/
(REAL DATA: looked at explorer for ram size and explaned core to check size)
Two models: 6120- standalone, desk side or rack, like T3 WG or PP. 6320- rack solution like the 3900 (Indy),
includes service processor, management net. Next generation T3, just don't call it the T4. Very much like the T3.
Drives in front, two power supplies in back on top, one controller, two loop cards. Components are similar to the
T3 but are physically enclosed differently, not swappable between T3 and T4. Units are 3U high. On back,
controller in middle, loop cards on each side. Loop cables are different (use RJ-45 type connector). All fiber
connections use the LC style connector.
Arrays are 2GB capable on the front end using the Qlogic 2300 chipset. Internally run at 1GB using the Qlogic
2200 chipset.
All commands are the same as a T3 with 2.1 and above firmware. Max luns per array is 64, max luns per volume
is 32. Each tray is still limited to two volumes, using contiguous disks. If you have min config (7 disks) and
build two volumes, you will need to remove/create a volume to add more disks.
Note- internally brick terminology is the same as T3 (volslice, volume). Although, maserati manuals refer to
them as pools (volumes on T3) and volumes (volsices on T3).
Page 92
Maserati Notes- StorEdge 6320 and 6120: cont.
FC switches may be mounted in the rack but are no longer monitored or controlled via the SP.
Flash Archive interactive install: (saves time on multi domain installs)(see info doc 40131)
Create a flash image from a patched server: (load patches and packages before creating image)
# cd /
# flarcreate -S -n image_name /path_ to/ image_file (~2.2gb - can use -c compress, 2x longer, only 1/5 smaller)
# share –F nfs -o ro,anon=0 /path_ to/ image_file (share image file) (/etc/init.d/nfs.server start)
Boot new server and load from image: (if boot from CD best to use same release as flash ex :sol9 04/04)
(note: you need network connectivity btwn image server and new server to download image)
- On server to be loaded: boot cdrom or boot net (if you have created a install server or 12/15K)
- Answer all install questions until you get to “F2 Standard” “F4 Flash” select “F4”
- Select NFS
- NFS Location: ip_address:/path_ to/ image_file (ex: 192.148.220.113:/var/tmp/flash )
- Continue answering install questions as you would on a regular interactive install
- Server will load Solaris from the image you specified/created
Page 93
UltraSPARC III CPU Diagnostic Monitor (CDM): ( see Sun Alert ID: 55081 )
CDM is supported only on UltraSparc-III processors based platforms with Solaris 8 or Solaris 9 releases.
CDM contains 3 packages with total size less than 1MB.
To start CDM, add packages and boot server. Will run at `default' settings without modifications
to /etc/cpudiagd.conf. To change settings modify /etc/cpudiagd.conf. See cpudiagd man pages for log files
and config info.
To remove CDM :
# /etc/init.d/cpudiag stop
# pkgrm SUNWcdiam SUNWcdiar SUNWcdiax
(Generator will ask for hostid of main SC, ScApp version, RTOS version. If you type 'service' (return, return)
in the platform shell the SC will list the needed info)
To enter service mode type 'service' and enter password in the platform shell.
To exit service mode type 'service'
ex: setchs -s ok, suspect, faulty -r "reason for status" -c /N0/SB2/p2
raidctl: solaris command ( V440 hardware raid command, mirror within controler only)
Page 95
Network troubleshooting:
Commands:
arp -a display entries in the arp table
dmesg check status of interface at boot time
ifconfig allows you to add/modify/delete interface parameters (see page 48,75)
kstat -n interface kernal stats for interface (good info)
kstat -p kstat -p | grep interface gives speed and duplex information
ndd -set /dev/eri instance 0 sets view to eri0
ndd /dev/eri \? shows what eri paramaters are modifiable
ndd -get /dev/tcp tcp_status displays tcp parameter value 'tcp_status' also ndd -get /dev/eri link_status
netstat -i gives you interface details # of packets, collisions, errors ect...
netstat -Pn protocol protocol info, no name resolution
netstat -rnv routing info, no name resolution, local veiw
netstat -k interface same info as kstat -p but not well formatted
ping 192.168.47.2 command contacts and reports status of 192.168.47.2
rup 192.168.47.2 contacts and reports up time for 192.168.47.2
route (add, get, flush, delete) command allows you to add, get, delete, flush, entries in the routing table
snoop monitors network traffic use -v ,-d ,interface, ipaddress to filter view
spray 192.168.47.2 will send packets to 192.168.47.2 report on transfer rate and number received
traceroute 192.168.47.2 maps and times route from your server to 192.168.47.2
Files:
Daemons:
Page 96
How to find your way around a B1600... (min O/S Sol8 12/02, Sol9 04/03)
Default login sc: admin:no psswd sw: admin:admin
SC commands:
console console connection to switch or blade (use showplatform name. #. to return)
help lists available commands
showplatform -v platform and blade config and status information
setupsc initial sc setup...
showsc lists config data provided to setupsc command
poweroff s# Poweroff blade number s# (console to blade & shutdown first)
poweron s# Poweron blade number s#
SW commands:
help lists available commands
? command ? will list available syntax
show vlan listing and ports assigned to vlans
show running-config current switch configuration
show startup-config Config used at boot time
show mac-address-table mac addresses learned by ports
show system platform wide config information
show interface Shows status/config of selected interface
show spanning-tree displays spanning-tree info
switch ports:
NETPn ports are external uplink switch ports. There is no correlation of NETPn port to blade number.
SNPn ports are internal downlink switch ports that are connected to the blades ce interfaces.
There is a 1 to 1 correlation of SNPn port to blade number ( ce0 to ssc0/swt, ce1 to ssc1/swt)
Setting up Vlans:
Vlans are assigned to ports and can be designated as tagged or untagged. A tagged vlan is
one that uses tagged communication to a vlan aware interface. A untagged vlan passes
all untagged traffic. Ports that have the same vlan assigned to it can communicate together.
The formula for determining a Solaris interface number for a tagged vlan (VID) is:
1000 * VID + device PPA = Vlan logical PPA
vlan 15 on ce0 : 1000 * 15 + 0 (for ce0) = ce15000
vlan 15 on ce1 : 1000 * 15 +1 (for ce1) = ce15001
Ex: to assign blade s0 and blade s1 interface ce0 to vlan 15 you would do the
following:
on S0 and S1:
# ifconfig ce15000 plumb
# ifconfig ce15000 inet ip_address netmask + broadcast + up
create/add hostname to /etc/hostname.ce15000
add ip_address (es) and hostanmes to /etc/hosts
on switch:
Console# config
Console(config)#vlan database
Console(config-vlan)#vlan 15 name VLAN15 media ethernet
Console(config-vlan)#end
Console#config
Console(config)#interface ethernet SNP0 (s0 ce0 is connected to SNP0 port)
(continued on next page)
Page 97
b1600 cont...
(you would follow the same procedures if creating untagged vlans only the interface would remain
ce0 and the switch command would not have 'tagged' at the end. ALSO: if you want the vlan to
be seen outside the chassis you must allow it on a external port NETPn)
Console#config
Console(config)#interface port-channel 2
Console(config-if)#exit
Console(config)#interface ethernet netp2
Console(config-if)#channel-group 2
Console(config-if)#exit
Console(config)#interface ethernet netp3
Console(config-if)#channel-group 2
Console(config-if)#end
Console#show interface status port-channel 2
to create LACP (link aggregation connection protocol) trunk: ports must be connected to a LACP- enabled
trunk ports on another switch
(The trunk is automatically activated if LACP is enabled on the connected port of the
target switch. A trunk formed with another switch using LACP is automatically assigned the
next available trunk ID)
Spanning tree:
Where two bridges are used to connect the same two computer network segments, a spanning
tree configuration occurs. Because spanning trees have multiple paths to the same destination,
a condition called 'bridge loop' is created. 'Spanning tree protocol' is communications between
bridges designed to eliminate the loop path. Caution should be used if you are configuring the
switch for spanning tree protocol. In that it will effect switches in the customers network.
Page 98
b1600 cont...
sc commands:
bootmode reset_nvram|diag|skip_diag| normal|bootscript= string sn {sn} This command allows you to specify a
boot mode for a blade. You need to use it to boot Linux blades for the first time
break -y s# Command causes blade to drop from Solaris into either kadb or OBP
console -f -r Access console of a switch or blade. (ssc#/swt,s#) type #. to return to the sc> promp
consolehistory -b -e -g Displays the contents of the switch or blade consoles buffer. (boot|run ssc#/swt|s#)
flashupdate -s IPaddress -f path -v ssc# s# Enables you to upgrade firmware to a System Controller or to a blade
help [command] Provides help text for specified command
logout
password command allows a user to change his or her own password
poweroff -f -y -s -r Powers off components (ch,ssc#,s#)
poweron -f -y -s -r Powers on components. (ch,ssc#,s#)
removefru -f -y Powers down components (ch,ssc#,s#)
reset -y -x Resets components (s#,ssc#/swt,ssc#/sc,ssc#)
resetsc -y Resets the active System Controller.
setdate set the time of day on the System Controller, switches, and server blades.
setdefaults -y Returns the active System Controller (but not its switch) to the factory default settings.
setfailover Tells you which System Controller is the active and standby System Controller.
setlocator on off Turn on/of blade locator
setupsc Enables you to configure the active System Controller interactively.
showdate Displays the current date and time
showenvironment -v Displays environmental sensors status in components of the chassis. (ssc#,psn,s#)
showfru Displays the contents of component (s) FRUID database (ssc#,s#,ch,psn)
showlocator Tells you whether the locator LED is on or off.
showlogs -b -e -g -v Displays the events (s#, ssc#)
showplatform -v -p Displays the status of each component. (ssc#,ssc#/swt,psn,s#,ch)
showsc [-v] Displays a summary of the configuration of the active System Controller.
showusers Shows the users currently logged into the System Controller.
standbyfru -f -y Powers down components (ch, ssc#, s#)
u Gives user administration privileges
useradd username Adds a named user to the list of permitted System Controller users.
userdel username Deletes a user from the list of permitted System Controller users.
userpassword username allows a user with a-level permissions to alter another users password.
userperm username aucr specifies the named users permission levels.
usershow username Shows details of the specified users login account.
Page 99
b1600 cont...
Page 100
b1600 cont...
Page 101
b1600 cont...
Page 102
Cluster 3.x: http://suncluster.eng http://cluster.central (Installation Information)
Introduction: Sun Cluster 3 is the first integrated release of Sun's next generation
Full Moon clustering technology. Sun Cluster 3 extends Solaris with the
Full Moon cluster framework, enabling the use of core Solaris services such
as file systems, devices, and networks seamlessly across a tightly coupled
cluster and maintaining full Solaris compatibility for existing applications.
General: Configuration guide is located at suncluster.eng. All Information is too much to show here.
Below are some highlights.
Admin w/s: Admin Workstation not mandatory. Management GUI is now web based.
Good to install Sun Console software on Sun machine to have access to double window GUI.
Server Requires end user distribution. However Server Storage and some Software
may require more. Best to at least install Full distribution.
Hardware Notes:
Must change the initiator id on one node if using SCSI arrays between 2 nodes
See info Doc 20704 for scsi initiator change procedure.
When a disk is replaced, the cluster needs to be made aware through the
scdidadm command.
Page 103
Cluster 3.x: (cont...)
Wiring Diagrams - See the configuration guide on internal site: suncluster.eng.
Commands:
scrgadm manage registration and unregistration of resource types, resource groups, and resources
scconf Update the cluster software configuration. Recommend running scsetup and this will print out the
scconf command used. Therefore remember and use the commands you use repetitively.
-pv[v] Prints out the configuration.
scinstall Install Sun Cluster software and initialize new cluster nodes.
-pv[v] Print out packages and versions installed.
scdidadm The scdidadm utility administers the device identifier (DID) pseudo device driver did
-C Removes references to nonexistent devices on the cluster nodes.
-l Lists the local devices in the DID configuration file.
-L Lists all the paths, including those on remote hosts, of the devices in the DID config file.
-r Reconfigures the database.
Page 104 -R Performs a repair procedure on a particular device instance.
Cluster 3.x: (cont...) Commands:
scvxinstall The scvxinstall utility provides automatic VxVM installation and optional root-disk encapsulation
for Sun Cluster nodes.
scswitch Perform ownership and state change of resource groups and disk device groups in Sun Cluster
configurations. Below are some examples:
Misc Procedures:
Device Groups:
Register a new disk group:
scconf -a -D type=vxvm,name=new_disk_group,nodelist=nodex:nodex
Sync device group info after adding a volume:
scconf -c -D name=diskgroup,sync
Getting registered device group information:
scstat -D
Switch a device group off a node:
scswitch -z -D device_group -h node
Switch a device group offline (must be quiescent and unmounted)
scswitch -F -D device_group
Switch a device group into maintenance state (must be quiescent and unmounted)
scswitch -m -D device_group
Switch a device group online:
scswitch -z -D device_group -h node
Resource Groups:
Get current resource group status:
scstat -g
Switch a resource group to another node:
scswitch -z -g resource_group -h node
Switch all resource and device groups off a node:
scswitch -S -h node
Take a resource group offline on all nodes:
scswitch -F -g resource_group
Bring a resource group online on all nodes:
scswitch -Z -g resource_group
View configured resource groups:
scrgadm -p[v][v]
Removing a resource group: Before a resource group may be removed, all resources within the group
must be removed. The steps required are:
1) take the resource group offline
scswitch -F -g resource_group
2) disable the resources within the group
scswitch -n -j name_of_resource
3) remove the resources within the group
scrgadm -r -j name_of_resource
4) remove the resource group
scrgadm -r -g resource_group
Page 105
SMS upgrade 1.4.1: (see SMS 1.4.1 install guide http://www.sun.com/servers/highend/sms.html)
Download your SMS packages: http://www.sun.com/servers/highend/sms.html (make sure to run cksum and compare)
(also on EIS CD3 starting Apr-27-04)
- unzip file and note location
Update the SC and CPU flash PROMs on the new main SC (SC1)
- switch user to sms-svc
- flash SC: sc1:sms-svc:> flashupdate -f /opt/SUNWSMS/firmware/SCOBPimg.di sc1/fp0
sc1:sms-svc:> flashupdate -f /opt/SUNWSMS/firmware/nSSCPOST.di sc1/fp1 CP1500 only
sc1:sms-svc:> flashupdate -f /opt/SUNWSMS/firmware/oSSCPOST.di sc1/fp1 SCV2(cp2140) only
Page 106
smsupgrade 1.4.1: (Cont...)
Mirrored disk replacement: (use when submirror “ State: Needs maintenance” in metastat cmd)
On failing disk: (If you can access the disk, if not start at the cfgadm -c unconfigure step)
# umount filesystem (unmount any non-svm open filesystems on failed disk)
# metadb -d c1t0d0s7 (if replicas on this disk, remove them)
# metadb | grep c1t0d0s0 (verify there are no existing replicas left on the disk)
# cfgadm -c unconfigure c1::dsk/c1t0d0 (might not complete command if busy, remove failed disk)
Raid-5 disk replacement: (use when raid unit “ State: Needs maintenance” in metastat cmd)
On failing disk:(If you can access the disk, if not start at the cfgadm -c unconfigure step)
# umount filesystem (unmount any open non-svm filesystems on this disk)
# metadb -d c1t0d0s7 (any replicas on this disk, remove them)
# metadb | grep c1t0d0 (verify there are no existing replicas left on the disk)
# cfgadm -c unconfigure c1::dsk/c1t0d0 (might not complete command if busy, remove the failed disk)
Page 107
Solaris 9 SVM (sds) disk replacement: (cont...)
(if you did not have the smsbackup file, and restored the IDPROM files, you will have to Set platform name
and change base ip addresses if necessary. Use explorer output from failed SC, Customer supplied info for
reference. Also see infodoc ID71490)
- The smsconfig -m command modifies the hosts file. Check it to be sure things are as they should be.
- Verify auto-boot?=true, watchdog-reboot?=false (eeprom auto-boot?, eeprom watchdog-reboot?)
- Shutdown newly loaded SC and do hard reset. (Press reset button on SC).
On MAIN SC as user sms-svc: setfailover on Wait 5 minutes.....
On MAIN SC as user sms-svc: Verify setfailover (showfailover -v) and showdatasync are "ACTIVE" to propogate
changes to spare SC.
- Run explorer and SunCheckup on both SCs, compare outputs and correct any errors.
- When datasync is completed: On Main and spare SC, make a backup copy of sms files (smsbackup)
15K DR examples: (also see serengetti/15k dr commands page 87, infodoc 76795 How to DR a Single PCI Card)
(cfgadm commands run from domain)
# cfgadm -val (get name “app ID” of board to use with cfgadm -c 'disconnect' or configure command)
# cfgadm -val | grep permanent (see what SB has perm memory)
# cfgadm -c disconnect SB0 (removes SB0)
# cfgadm -c configure SB0 (adds SB0 back into domain)
# cfgadm -c disconnect IO1 ( removes IO1 and all pci adapters on it)
# cfgadm -c configure IO1 (configures IO1 back into domain) IO PCI slot #s
# cfgadm -c disconnect pcisch5:e01b1slot0 (removes pci card in IO1 slot 0) |3|1|
# cfgadm -c disconnect pci_pci0:e00b1slot1 (removes pci card in IO0 slot1) |2|0|
# cfgadm -c configure pcisch5:e01b1slot0 (configures pci card in IO1 slot 0 into domain)
15/25K hpost:
sms-svc> hpost -d r -l127 (run hpost on domain R level 127)
.postrc (etc/opt/SUNWSMS/adm/config/platform or A-R)
level 64 (run level 64)
dash_H_level 127 (run level 127 when DRing a board into domain)
no_ioadapt_ok (test SB only. Good when you create a test domain w/o IO)
no_obp_handoff (when testing SB only don't attempt to load obp)
Page 109
SMSbackup: (how to manually expand backup file) also see infodoc 77357
3310/3510 Disk replacement: (also see infodoc 78432 and page 84)
- save nvram info: system functions, Controller maintenance, Save NVRAM to disks, yes
- Identify bad disk: view and edit scsi device, look for BAD or FAILED status, note Chl, Id and LG_DRV #s,
select bad drive, Identify scsi drive, flash all But Selected drive, Flash Drive Time, yes (go find the disk)
disk ID #s(single bus 3310) disk ID #s (dual bus 3310 ) disk ID#s (3510)
Chl 0 Chl 2 Chl 0 Ch 0 / Ch2
0 3 8 11 0 3 0 3 0 3 6 9
1 4 9 12 1 4 1 4 1 4 7 10
2 5 10 13 2 5 2 5 2 5 8 11
- Physically unseat bad disk, let spin down 20 sec, then remove
- Install replacement disk
- view and edit scsi device, look for NEW_DRV or USED_DRV status.
If not seen: select a disk, Scan scsi drive, select Chl (use noted #), select Id# (of replacement), yes
- Is replacement to be new local or global spare? If not skip to copy and replace step
if so: view and edit scsi device, select replacement disk, add Global spare drive or add Local spare drive, yes
- If replaced disk cannot be spare. view and edit logical drives, select logical drive, select PREVIOUS spare
disk, copy and replace drive, yes (when copy is completed assign PREVIOUS spare back in step above)
Page 110
Removing the top cover on a V20z: (very tricky :-)
Keep top button down, pull cover forward until click, slide to the rear.
Useful COD commands: ( to obtain a license www.sun.com/licensing) 5.14.00 and up see Info doc 81531
showcodlicense (-r)
addcodlicense sc> addcodlicense 01:80d8a855:000000000:0201010100:c:00000000:BLqg5Ko
deletecodlicense
enablecodboard <sb#> Used to replace a COD sb (need service passwd on Sun Fires)
showcodusage
showplatform -p cod (addcodlicense will populate this area)
setupplatform -p cod
showboards
ALOM4v: Niagra (Ontario, Erie) (initial login/password admin/admin1) also see ALOM commands on page 94
New in ALOM4v:
Password recovery (procedure on page 113)
If the admin password is lost/forgotten, can reset the NVRAM to factory defaults, including clearing all users.
Requires physical access to the machine to unplug power cords and connect to ALOM serial port. "
Flashupdate protection
ALOM flash is in two segments with a persistant switch.
'flashupdate' always operates on the non-running segment. Segments are only switched after flashupdate
completes and image is CRC verified. A jumper can also switch the segments.
Ex: sc> flashupdate -s 129.148.173.99 -f /tmp/122430-01/System_Firmware-6_1_2-Sun_Fire_T2000.bin-latest
Supports new LED States:
White locator LED flashes at 4Hz when activated.
Green LED states:
Standby blink: 0.1sec on, 2.9sec off. When system is on standby power
Slow blink: 0.5 sec on, 0.5sec off: When system is in transition (running POST, powering down, etc)
Steady ON: system is running
Amber LED states:
Off: No faults.
On: Service required.
Amber slow blink to indicate unacknowledged faults not supported.
Page 111
ALOM4v: (cont)
showfaults Prints any faults Environmental faults, faulty FRUs, POST-detected faults, which result in ASR-disable
FMA-detected faults, prints the time and status of the last POST run.
clearfault <UUID> to manually clear an FMA-diagnosed fault. (get UUID from showfaults output)
ASR commands:
showcomponent view and manage the list of blacklisted (ASR-disabled) devices
enablecomponent disabled state is stored on the actual FRU, such as the DIMM itself.
disablecomponent A FRU disabled on one system will remain disabled when inserted in another system
clearasrdb
setkeyswitch
normal: System can be used normally.
stby: Powers off the system and prevents 'poweron' command or button from operating.
diag: Forces the system to run servicemode diagnostics at next reset.
locked: Prevents 'flashupdate' and 'break' commands, system can power on/off and reset normally.
showkeyswitch
showfru command prints both static and dynamic sections
setfru command to set Customer_DataR in all FRUs
showhost version command to print the software versions contained in the Host flash prom.
obpupdate command to update the Host flash prom (POST, OBP, etc). 'obpupdate' and 'flashupdate' will be merged
into a single command which will update both ALOM and the Host flash from a single master image
flash host prom
Servicemode commands: Be sure to set sc_servicemode to false when done!
setsc sc_servicemode true Warning: misuse of this mode may invalidate your warranty.
showplatform -v will print CPU #Cores and version information. "
ping <ipaddress> - test network connectivity
clearnvramlog - erases persistent 'showlogs -v'
frucapture - offload a FRU's DFRUID image via FTP
fruupdate - update (overwrite) a DFRUID image via FTP
setcsn - set the chassis serial number, required when replacing the PDB board.
Can only be executed one time and only with a blank (new) PDB
fmagentconfupdate - field update FMA agent via FTP
showfmfaults - show current FMA faults stored on the DOC (Disk-on-chip)
showfmerptlog1 - show the first 40 ereports on DOC
showfmerptlog2 - show the last 40 ereports on DOC
clearereports - clear the ereport logs from DOC
docftpput - FTP a DOC file off of ALOM. " Note: the above command names may change by product ship!
spdiag consists of the following commands:
i2ctest - run a single pass of the i2c test
envtest - run a single pass of the environmental test
sptest - run a single pass of the SP diag tests
setdiagopt - set diag test options used by 'rundiag'
rundiag - start diagnostics in the background
stopdiag - stop any running background diagnostic tests
showdiagstatus - show the status of background tests
resetdiagstatus - reset the diagnostic status Servicemode: spdiag suite
Page112
ALOM4v (cont...)
Solaris to Linux cross-reference: ( http://www.unixporting.com/quickguide.html and Linux overview for Solaris users
817-3341-10)
To manually harden a SC with SMS 1.5: (note telnet, rlogin, ftp, vold will not work so make sure you
serial console access before you harden it) infodoc 83763
# /opt/SUNWjass/bin/jass-execute -q -d sunfire_15k_sc-secure.driver
To Power on:
To turn on main power mode (all components powered on), press and release the small Power button on the server
front panel. When main power is applied to the full server, the Power/OK LED next to the Power button lights and
remains lit. or
Page115
Galaxy ILOM: cont...
(Connect a serial cable from the RJ-45 Serial Mgt port on your ILOM SP to laptop)
-> start /SYS
To Power off: press and release the small Power button on the server front panel
or -> stop /SYS
To start the serial console: (Connect a serial cable from the RJ-45 Serial Mgt port on your ILOM SP to laptop)
-> cd /SP/console
start `esc ( ` to return to SP
eeprom default is screen and keyboard. Use solaris eeprom command to
get serial console in solaris (ssh to host or see remote console below)
eeprom input-device=ttya
eeprom output-device=ttya
BIOS: You need to change the BIOS setting to have serial port control
after POST. (this will not override the eeprom setting in solaris)
to change setting:
F2 (ctl-E) on reset, Advanced, Remote access Configuration,
Redirect after POST [always]
(Some OSs may not work if set to always)
CLI
<verb><options><target><properties>
VERBS:
See Sun Fire X4100 and X4200 Servers System Management Guide for guidance on
CLI commands.
cd Navigate the object namespace.
create Set up an object in the namespace
delete Remove an object from the namespace.
exit Terminate a session to the CLI.
help Displays help information about commands and targets.
load Transfers a file from an indicated source to an indicated target.
reset Resets the state of the target.
set Sets target properties to the specified value.
show Displays information about targets and properties.
start Starts the target
stop Stops the target.
version Displays the version of service processor firmware running.
Options: short-cuts
-default n/a Causes the verb to perform only its default functions.
-destination n/a Specifies the location of a destination for data.
-display -d Shows the data the user wants to display.
-examine -x Examines the command but does not execute it.
-force -f Causes an immediate shutdown, instead of an orderly shutdown.
-help -h Displays help information.
-level -l Executes the command for the current target and all targets contained through the level specified.
-output -o Specifies the content and form of command output.
-resetstate n/a Resets the state of the target to its default.
-script n/a Skips warnings or prompts normally associated with the command.
-source n/a Indicates the location of a source image.
Page 116
Galaxy ILOM: cont...
-> cd ../SP
-> show
//SP
Targets:
alert cli clients clock console logs network serial
services sessions users
Properties:
Web Gui allows you to: (To log on, use https://ipaddress)
redirect graphical console to remote host.
connect a virtual floppy or CD-ROM drive.
monitor and manage fans remotely.
monitor BIOS messages, OS messages and system status remotely.
interrogate NICs for MAC remotely.
Power on, off and reset remotely
USERs:
Can't delete the following accounts:root/anonymous/ldapproxy
Can create an additional 7 accounts.
Send break: When logged into the SP using ssh with a console session running,: ESC + Shift-b
Page 117
Revision History: