You are on page 1of 125

BOSTON DISTRICT

T. O. I
Hand Book
07/24/06
Disclaimer

This Document is for reference only.

The purpose of this document is to give the SSE a quick


reference to a broad amount of material. It is not intended
to replace the original product manuals, and should not be
used in place of these manuals or substituted for training on
these products.

This can be best used as tool to get you in the right frame of
mind (product wise) when preparing to go on a call.

Comments, suggestions, and request for updated copies should


be sent to:

toi.handbook@east.sun.com

Copies also available at:

http://webhome.east/boston/toi.html
Table of contents
Desktop configurations: ........................................................................................................ 1
Firmware revision number: ................................................................................................... 1
OBP Escape hatches ........................................................................................................... 1
nvalias, NVRAMRC ........................................................................................................... 2
reset Host ID ..................................................................................................................... 2
Boot sequence ................................................................................................................... 2
Run Levels ........................................................................................................................ 2
Restore Boot Block ........................................................................................................... 2
E1000/2000 info ............................................................................................................... 3
E series info ..................................................................................................................... 4
OBP commands ................................................................................................................ 5
OBP device path breakdown ............................................................................................. 6
Device tree listing - desktop ............................................................................................... 6
E- 450 information ............................................................................................................. 7
E- 10000 information ......................................................................................................... 8
Blacklist ............................................................................................................................. 10
Sysyem Bd power proceedure ............................................................................................ 10
E 10k component numbering ............................................................................................... 11
Scsi Array Model 100 ........................................................................................................ 12
Model 200 Array ............................................................................................................... 13
ssaadm commands ............................................................................................................. 13
Replace WWN on SSA ................................................................................................... 14
A1000 Array ..................................................................................................................... 14
D1000 Array .................................................................................................................... 14
RSM Disk Tray ................................................................................................................ 15
A3000/3500 Array ........................................................................................................... 16
A5000 Array .................................................................................................................... 16
luxadm commands ............................................................................................................. 17
Disk replacment in Veritas ................................................................................................ 18
A5000 min configuration ................................................................................................. 18
A5000 addressing ............................................................................................................... 19
A5000 Target assignments ................................................................................................ 19
RDAC ................................................................................................................................ 19
Raid Overview .................................................................................................................... 19
Raid Levels ....................................................................................................................... 20
Boot process ....................................................................................................................... 20
Diagnostic commands .......................................................................................................... 21
Diagnostic Files ................................................................................................................... 22
Watchdog resets .................................................................................................................. 23
What to look for on a watchdog reset ................................................................................. 24
Dump Analysis ..................................................................................................................... 25
abd commands ..................................................................................................................... 26
crash commands ................................................................................................................... 27
kadb ..................................................................................................................................... 27
Sunsolve ............................................................................................................................... 27
SunVTS ............................................................................................................................... 28
STORtools ........................................................................................................................... 29
Explorer Scripts .................................................................................................................... 30
Performance Analysis tools ................................................................................................. 31
Backup .............................................................................................................................. 32
ufsdump .............................................................................................................................. 32
ufsrestore ........................................................................................................................... 32
tar ...................................................................................................................................... 33
cpio ................................................................................................................................... 33
dd .................................................................................................................................... 33
How to get a core dump on a 2.x server ............................................................................ 34
Dump device bad when saving core on encapsulated root ................................................ 36
Uncompressing Files ........................................................................................................ 39
T300 (purple) ...................................................................................................................... 40
ACT (A Coredump Tool) ................................................................................................... 44
Advantages of Splitting a Drive into Multiple File Systems ............................................... 46
How to configure a system to run on a network ................................................................... 48
SEVM - How to recover a primary boot disk ..................................................................... 49
Disable DMP .................................................................................................................... 51
Memory Scrubber ............................................................................................................... 52
Display remote App GUI locally.......................................................................................... 52
Cluster 2.x .......................................................................................................................... 53
Encapsulating root after using Environmental CD to load O/S .......................................... 56
Adding a second network interface ...................................................................................... 56
Adding a default gateway ..................................................................................................... 56
Volume Manager (general info) ........................................................................................... 57
FTPing to and from sunsolve ............................................................................................ 60
Serengeti 3800, 4800, 6800 ............................................................................................... 61
mounting CDROM without vold ........................................................................................ 67
mailx: send files/messages ............................................................................................. 67
StarCat 15k notes ................................................................................................. 68
local-mac-address .................................................................................................. 73
SDS- How to mirror root .............................................................................................. 73
IPMP .................................................................................................. 75
T3B or T3+ Firmware Rev 2.1 New Functions: ............................................................... 76
Hitachi StorEdge 99X0 Arrays: ...................................................................................... 77
SunFire forgotten password ........................................................................................... 78
StorEdge Network FC Switch ....................................................................................... 79
Hitachi 9900v notes ..................................................................................................... 81
Minnow 3300 Array .................................................................................................... 84
Tuning ecache scrubber scan rate ..................................................................................... 86
VxWorks (serengeti) ........................................................................................................ 86
LVD adapter information ................................................................................................. 87
Replaceing a nordica bd in a 15K SC .............................................................................. 87
Serengetti/15k DR boards ............................................................................................. 87
Clean up non-root disc “controler” numbers .................................................................... 88
Starcat Portid cheat sheet ................................................................................................. 88
Starcat SC clean slate ..................................................................................................... 89
Starcat redx info ............................................................................................................ 89
StorADE ................................................................................................................... 90
Get FRU info from serengetti .......................................................................................... 90
Swap ...................................................................................................................... 91
Maserati Notes- StorEdge 6320 and 6120 ....................................................................... 92
Flash Archive interactive install ..................................................................................... 93
UltraSPARC III CPU Diagnostic Monitor (CDM) ......................................................... 94
SunFire Service Mode Password Generator ................................................................... 94
V440 ALOM, raidctl .................................................................................................... 94
Finding Solaris release and distribution loaded .............................................................. 95
Find local NIS servers ................................................................................................... 95
Network troubleshooting command, files, daemons ..................................................... 96
How to find your way around a B1600 ................................................................ 97
Cluster 3.x ........................................................................................... 103
SMSupgrade 1.4.1 info ........................................................................................... 106
Solaris 9 SVM (sds) disk replacement ............................................................................ 107
SC rebuild after total disk failure ............................................................................ 108
15K DR / hpost examples .......................................................................................... 109
smsbackup: manually check a backup file: ..................................................................... 110
3310/3510 Disk replacement: ........................................................................................... 110
How to mount a CD image file (.iso) as a filesystem: ....................................................... 110
Removing the top cover on a V20z .................................................................................. 111
Explorer -w scextended with cron .................................................................................. 111
Useful COD commands ..................................................................................................... 111
ALOM4v Ontaeri/Erie(Niagra) ........................................................................................... 111
Forgotten password (ALOM4v) .......................................................................................... 113
Solaris to Linux cross reference .......................................................................................... 113
SSH information ............................................................................................................... 114
Galaxy ILOM info ............................................................................................................. 115
SSH with SMS 1.5 ............................................................................................................. 115
Desktop Configurations

processor memory sbus slots onboard hosts network

SS4 70, 85, 110 1sim/bank 1 scsi II 10bt/AUI


16,32
SS5 70,85,110,170 1sim/bank 3 scsi II 10bt
8/32
SS10 20,30,40,50 1sim/bank 4 scsi II 10bt
16/64
SS20 50-150mhz 1sim/bank 4 scsi II 10bt/AUI
16/32/64
Ultra 2 167,200,300 2sims/bank 4 fast/wide 10/100
16/32/64/128
Ultra 5 270 2sims/bank PCI N/A 10/100
can't use 256mb
Ultra 10 d/b 2sims/bank PCI N/A 10/100

1000 ..... 4/group 3 ...... ......

1000e ..... ...... 3 ...... ......

2000 51,61,81 4/group 4 ...... .....


8/32 meg
2000e 51,61,81 4/group 4 ...... ......

Commands to find firmware revision number:

#/usr/platform/'uname-i'/sbin/prtdiag -v (gives you a listing of all boards)


#/usr/sbin/prtconf -V (gives you a listing of the master boards version)
ok .version
ok banner

OBP escape hatches

L1-a (stop-a) (Ctrl Break)* To stop a process in OBP or to bring a system down in solaris (not reccomended)
L1-f (stop-f) enters command mode on ttya before probing H/W, use 'fexit' to continue with initialization
sequence.
L1-d (stop-d) Sets diag-switch? parameter to true. Enables verbose output durring post.
L1-n (stop-n) Resets NVRAM contents to defaults. (not reccomended. see 'nvrecover')
L1 (stop) Runs POST in INIT mode (does not depend on security mode)

* laptop key strokes

Make a new alias... OBP, printenv, nvramrc

1 ok show-disks
2 select a disk controller a,b,c
3 ok nvalias (alias name) (ctl-y) ..... control-y is the yank command, and will give you the path you
selected in the show-disks command.
You have to type sd@n,n for Sbus or disk@n for PCI at the end.

Page 1
To recover NVRAMRC, printenv, veritos

ok nvrecover (ctl-c)
ok nvstore

To remove an alias nvramrc, printenv, devalias

ok nvunalias (alias name)

To reprogram your MAC address and host ID

ok 17 0 mkp (return)

ok 8 0 20 xx yy zz 080020xxyyzz mkpl (return)


(curser disappears)
(ctl-d)
(ctl-r)

ok banner

Boot Sequence

1 Beep (keyboard)
2 Led's blink, screen goes blank, (POST)
3 Banner
4 Testing memory (selftest#mem)
5 Boot (auto-boot?)
6 diag-switch?
7 prom loads boot block (UFS reader)

Run levels O/S command rc.script

1 single user init 1 /etc/rc1


2 multi-user but no sharing init 2 /etc/rc2
3 multi-user with sharing init 3 /etc/rc3
4 N/A user configurable
5 shutdown and shuts off pwr init 5 /etc/rc5
6 stop and reboot init 6 /etc/rc6
0 goto firmware init 0 /etc/rc0

Restore boot block

# installboot /usr/platform/'uname-i'/lib/fs/ufs/bootblk /dev/rdsk/c0t0d0s0


|
ex: sun4u

page 2
Deskside server

Key switch

-standby no power
-on normal
-diag verbose post, on board, master bd (1000,2000)
-secure prevents a (stop-a) and disables reset switch

1000/2000 server Info

1000 40mhz control card


1000e 50 mhz control card
2000 40 mhz control card
2000e 50mhz control card

*auto master- if you replace any CPU/Mem cards put new card in slot 0

Master Board requirements:


CPU
Memory
Latest firmware rev.

* To determine which board is master:

1 ok print-nvram-stat

2 switch cables to board you want to be master


<2> ok 0 switch-cpu

3 make board 0 a master bd


<0> ok set-master-nvram
<0> ok print-nvram-stat

4 Get rid of unwanted master


<0> ok 2 switch-cpu
<2> ok clear-master-nvram
<2> ok print-nvram-stat (move rs232 cable to master board)

command to show all sbus cards: <ok> show-devs

NVRAM contents 1000/2000

If you need to change a CPU board, you do not need to do anything with the NVRAM. There is a copy
on the control board and it wil be automatically transfered..... If you need to change a control board you
must use the proceedure in the FE handbook (pg. cpu81) to invalidate the contents of nvram on the
new control board.

Page 3
Ultra Enterprise 3000 Information

2 power supplies
6 cpus
I/O board w/sbus, internal scsi adpter
clock board, clock, voltage monitor, reset, console (keep firmware)

CPU boards

CPU/mem bd Speeds Processors memory

501-2976 83mhz 167mhz 8@8


501-4312 more sram 250mhz 32 @ 8
501-4882 83-90-100mhz 333mhz 128 @ 8
400mhz
600mhz
I/O boards

I/O type Speed sbus o/b fiber on board host network

1 83mhz 3 soc fas 10/100


2 83mhz 2 (upa) soc fas 10/100
3 83 and 83/90/100 0 (2pci) n/a ultra wide 10/100
4 83 and 83/90/100 3 soc+ fas 10/100
5 83/90/100 2upa soc+ f/w scsi 10/100

Clock boards

Clock board numbers Speed

501-2975 83mhz
501-4286 83mhz
501-4946 83-90-100 (x500 servers)
501-5365 83-90-100 (x500 servers, shipped with the E6500)

page 4
OPB commands:

(OBP reference guide) get this... http\\docs.sun.com

banner a brief decsription of the system. mac address, firmware level, host ID
boot -v will verbose boot the system from defaults set in printenv list and devalias file.
boot -a will boot without the use of /etc/system file (interactive boot)
boot -s will boot in single user mode
boot (alias) will boot the server from the specified alias in the devalias file
cd / will put you in a directory hiearchy for listing hardware paths. 'device-end' gets
you out of this mode
devalias shows you a listing of your device aliases
limit-ecache-size will allow you to boot a 400mhz 8meg cache processor on os 2.5.1 or 2.6 CD
solaris 7 works fine. Jumbo patch 105181-14 for 2.6 or 103640-27 for 2.5.1
nvalias is used to create an alias

nvunalias is used to remove an alias. see previous example.


nvrecover is used to recover a deleted alias
nvstore is used with nvrecover
.properties when you are in device hiearchy mode ( cd /) on 3.x systems you can use the
.properties command to see info about the device path you are on. use
.attributes for the same function on 2.x systems.
probe-scsi list only internal disks
probe-scsi-all list all scsi devices
probe-fcal list all photon drives
printenv used to give you a listing of the environment settings
prom-copy will copy the flash prom from one board to another boards must be the same type.
prom-copy (src dest) ex : ok prom-copy 0 2 will copy flash prom from
board 0 to board 2
reset will reset the system
setenv (variable) used to set an envronment setting (variable). use printenv to get setting syntext.
set-default will set a line in the environment to default. ex ok set-default output-device
show-post-results show results of the last POST
show-disks will give you a disk controler listing and is used when creating an nvalias.
show-devs will give you a listing of all device paths on the system. Use the 'cd /' command
to go down the path.
socal-diag-all when you are in device hiearchy mode (cd /), you can go down a socal path
(ex: cd /sbus@3,0/SUNW,socal@0,0). And run OBP diags on that path.
show-wwn when you are in device hiearchy mode (cd /), you can go down a socal path
(ex: cd /sbus@3,0/SUNW,socal@0,0). And show the world wide number and loop
id .
selftest when you are in device hiearchy mode (cd /), you can go down a socal path
(ex: cd /sbus@3,0/SUNW,socal@0,0). And run the socal selftest.
sifting will search for the command specified. ex: ok sifting probe-scsi
update-proms will update the proms (do not use to copy to cpu board 0, use the prom-copy # #
command)
watch -net watch packets and clock tics
watch-net-all watch packets and clock tics
words will list all the fourth commands for the current screen
.xir-state-all externally Initated Reset command, used to gather info on a hung machine

page 5
Command to reset the line in the envronment to defaults:

set-default ex: ok set-default output-device

Move an S-buss card from one slot to another:

1. Remove controller (Sbus card)


2. boot - r, remove path_to_inst
3. boot - ra

You might also be able to switch the Sbus-probe-list order to change the C# in c#t#d#s#.

OBP path breakdown for Enterprise machines

convert to decimal
divide by 2
round down sbus slot lun#
| | |
/sbus@7,0/SunWfas@3,8800000/sd@1,0
| |
result is bd # target#

Device tree listings for desktop machines

4m: ss4, ss5, ss10, ss20


/iommu/sbus/cgsix path to monitor card
/ledma/lc path to on-board network adapter
/espdma/esp path to on-board scsi devices

4u ultra 1 - 140, 170


/sbus@1f,0/ledma@e/le path to on-board network adapter
/espdma/esp path to onboard scsi devices

ultra 1 - 140e, 170e, 200e


/sbus@1f,0/hme path to on board network
/fas path to on-board scsi devices

ultra 2
/upa/sbus/hme path to on-board network
/fas@e path to on-board scsi devices

ultra 5,10
/upa/pci@1f/apb/pci@1,0 path to pci slots 1-3
/upa/pci@1f/apb/pci@1,1/ide@3 path to cdrom and disk
/network@1,1 path to on-board network
/m64b path to on-board graphics adapter
/ebus@1 path to system devices
ultra 30
/upa/pci@1f,2000 path to pci slots 1(33/66mhz) - 4 (33mhz)
page 6
/upa/pci@1f,4000/scsi@3 path to on-board scsi devices
/network@1,1 path to on-board network (hme)
/ebus@1 path to system devices
ultra 60
/upa/pci@1f,2000 path to pci slots 1(33/66mhz) -4 (33mhz)
/upa/pci@1f,4000/scsi@3 path to internal scsi devices
/scsi@3,1 path to external scsi devices
/network@1,1 path to network (hme)
/ebus@1 path to system devices

acronyms for above listings


esp scsi2 50 pin
fas fast and wide scsi 68 pin
hme 100mb ethernet
isp Intel Scsi Processor
le0 10mb ethernet
qe Quad Ethernet
qfe Quad fast Ethernet
soc Serial Optical Controler
socal Serial Optical Controler +

Ultra 450
and
Ultra Enterprise 450

ok setenv disk_led_assoc add a pci adapter to printenv list to get entries into prtconf so you
can do the following proceedure:

1. To find a drive path on an ultra 450, get the path '/pci@6,40001# - - - - - - - - - - /sd@0,0
from the format command.
2. Change the 'sd' to 'disk' and '0,0' to 0
3. #prtconf -vp | grep 'c#t#d#. . . . . . . . . . . . . /disk@#
4. results will be the slot# and the disk# will tell you the drive.

Device tree listing ----- ----- ------ ---- ---- FE Handbook 1 cpu-126 and cpu-128

mfg-options is a NVRAM variable is a decimal value that sets up the system as a workstation or a server.
the UE 450 is currently not offered as a workstation.
ok setenv mfg-options 0 (workstation default) Ultra 450
ok setenv mfg-options 49 (server default) Ultra Enterprise 450

upa-port-skip-list is a NVRAM variable used to skip probing of upa ports, following upa ports are used:

Prosessors upa ports 0,1,3


framebuffers upa ports 1d and 1e
psycho upa ports 4,6,1f

ex: ok setenv upa-port-skip-list 3,1d (skips CPU3 and FFB1)

page 7
obdiag is a command you can run for prom based diagnostics

pcIO-probe-list is an NVRAM variable used to control the probe order for onboard PCI devices (/pci@1f,4000)

pci-slot-skip-list is an NVRAM variable used to skip probing of PCI devices plugged into the backpanel slots

memory-interleave is a NVRAM variable that controls how OBP sets memory interleaving

env-monitor is a NVRAM variable that determins how OBP responds to envronmental monitoring via the l2c
serial bus.

.post command displays the results of POST

.asr command displays the system devices and settings

asr-enable , asr-disable commands enable and disables system devices.

/associations The associations tree node contains entries representing catigories of assosiations or connections
between system components that are dispersed in the device tree.

ex: ok cd /associations/slot2dev
ok .properties

ok cd /associations/slot2led
ok .properties

ok cd /associations/slot2disk
ok .properties
E10000

SSP basic commands

hostinfo will give you a status of different parts of the E10k


-F fan status, on/off, speed
-S signature blocks (board ID)
-h processor status
-p power status (boards and centerplane)
-t temperature status

domain_create requirements: system boards must be present not in use


Sufficient memory and at least one proc
At least 1 network interface
Connection to a disk for OS
Unique hostname
Entry in host database
template eeprom.image file
syntax:
Create a new domain:
ssp:domain% domain_create -d domain -b 0 3 4 -o 2.5.1 -p platform

page 8
Recreate a domain that previously existed (domain_history file)
ssp:domain% domain_create -d domain

domain_remove Domain must be halted


syntax: # ssp:domain% domain_remove -d domain

domain_rename syntax # ssp:domain% domain_rename -d old_name -n new_name

domain_status will tell you which boards are in each domain

domain_switch will change the domain your ssp window is conected to.

domain_history Displays the contents of domain_history file (contains removed domain info)

power no argument will tell you the voltages at each board


-on
-off
-all = everything except AC sequencers ex: power -on -all
-ps = powersupply ex: power -on -ps # (#=0-7)
-p = AC sequencer ex: power -on -p # (#=0-4)

-cb = control board ex: power -on -cb # (#=0-1)


-sb = system board ex: power -on -sb # (#=0-15)
-csb = center plane sprt bd ex: power -on -csb # (#=0-1)

fan no arguments same as hostinfo -F (fan status)


-t =tray ex: # fan -t x -p off (x = 0-15)
-1 =group of trays ex: # fan -1 x -p off (x=front,rear)
-p on all fans on

autoconfig Must be run when adding a new revision of a board to the system
May also be required when moving a board to a new slot
Not required if all boards are the same revision level
(Do not run on a system board that is running the OS, or on the
centerplane when any domain is running the OS)

board_id will read the serial number eeprom on specified board


(has no effect on running domain)

thermcal_config thermcal_config must be run when installing a new board


or moving a board to a new slot, or else temperature sensing
for that board will be incorrect.

Target board must be off for 30 miniutes before running


Updates SSP file with conversion factors from serial eeproms
ssp:domain% edd_cmd -x stop
ssp:domain% thermcal_config
ssp:domain% edd_cmd -x start

bringup boot the domain ex: # bringup -A off -l32 will bring system to the <ok> prompt
and run hpost at level 32 (7-128)
ex: # bringup will bring up system (autoboot)

netcon start network console session


page 9
Blacklist

- Edit via hostveiw or manually (vi)


- Explicit removal of components for isolation of intermittent faults or benchmarking
- processors
- IO controllers
- ASICs
- Memory banks
- Boards
- Busses

- Default location of blacklist file


/var/opt/SUNWssp/etc/platform_name/blacklist
- After editing the blacklist file, halt the domain and re-run bringup to make changes take
effect. (reboot does not cause hpost to reread the blacklist file)

Hostveiw To remove a device from the blacklist file:


- Edit
- Blacklist
(change veiw if required)

- MIDDLE click on blacklisted device (should change from black to white)


- run bringup to make changes take effect.

Redlist:
$SSPVAR/etc/platform_name/redlist is an ASCII file that enables the system administrator
or root to restrict, from the SSP, the configuration of the host system. It lists components
that POST cannot touch, and whose state POST cannot change. Redlisted components are
also considered effectively black- listed. Never use redlisting if blacklisting will do.

System Board Power off Procedure

1. Have the customer bring down all jobs on the domain in question.
Next, they need to either use the shutdown command or use the init0 command
to bring the system to the <ok> prompt.
2. After this has been done, go to the ssp login window. Login as ssp and (ssp password)
3. At the SUNW_HOSTNAME prompt, enter either the platform name or the name of the
existing domain
4. Issue the 'domain_status' command , this will list all the domains and system boards
associated with each domain.
5. Issue the 'domain_switch (domain name)' command , to get to the proper domain.
6. Use the 'power -off -sb #' (#= system board #) command , to power off the system board to
be removed. MAKE SURE THE YELLOW LEDS ARE OFF BEFORE REMOVING BOARD.
7. After completing the work on the system board and the board has been reinstalled, use the
'power -on -sb #' (#=system board#) command, to return the power to the system board.
8. Next use the 'bringup' command to autoboot or the 'bringup -A off' to stop at the <ok>
prompt.

Page 10
Component Numbering

Processors
component Solaris Hostveiw Post
System Board 0 - 15 SB 0 - 15 sysbd 0- 15
proc. Mod. 0-3 /SUNW,ultraSparc@0,0 00-63 proc0.0 - proc 15.3
| |
proc. in hex (0 - 3f ) sysbd#.proc#
I/O ( SBus)

Component Solaris Cable Label Post


I/O port 0 - 3 /sbus@40 SB0.0.0 scard 0.0.0
| | |
Subtract 40 sysbd#.Sbus#.Slot# sysbd#.SBus#.Slot#
change to decimal
divide by 4
answer is board #
remainder is SBus #
I/O (PCI)

Component Solaris Cable Label Post


I/O port 0 - 3 /PCI@40 PCI0.0.0 scard 0.0.0
| | |
Subtract40 sysbd#.PCI#.0 sysbd#.PCI#.0
change to decimal
divide by 4
answer is board #
remainder is PCI #

Memory

Component Post
System board memory mem x.0
|
system bd.#.bank#
SSP: (notes)

/etc/netmasks should be: 10.0.0.0 255.255.255.0 (for private net or cb1 will not come up)
share cdrom to load VTS share -F nfs -o ro,anon=0 /cdrom/cdrom0/s0

3.4 commands:
showfailover: Shows you the failover status
showdatasync Shows you the datasync status (from main to spare)
setfailover on enables failover
force forces a failover to spare
off disables failover to spare
setdatasync backup backup files to spare
ssp_backup creates a ssp_backup.cpio file ex: # ssp_backup /var/tmp
ssp_restore restores ssp_backup.cpio file ex: # ssp_restore /var/tmp/ssp_backup.cpio
ssp_config float lets you change the hostname for the floating hostname (name should be in the hosts
files of both SSPs and also in /etc/ssphostname on the domains)

Page 11
SCSI Array

MODEL 100

Front Panel LCD indications:

POST Located in the top left corner. (circle with line at 12:00) indicates post is running
Service Under POST icon (wrench). Service is needed, always displayed with another icon
Controller Located to the right of service icon (looks like a se scsi icon). indicates a controller
problem
Alpha- POST - test codes and status value of failing test are flashed continuously.
numerics Normal operation - Four lsd's of world wide number
Controller errors - Panic code is flashed continuously, and controller icon is on
Fan Fan failure or heat problem
Battery Fast write cache Low NVRAM battery voltage, battery should be replaced.
Drive a small solid rectangle represents an avalible drive
fibre Fiber optic link state. Two link icons A and B. Switched on when link is
established.

POST codes

01 LCD failure Replace fan tray?


08 Fan failure Replace fan tray
09 P/S failure Replace Power supply
30 Battery failure Replace battery module
xx Controller failure Replace Controller

100/110mhz
|
Model 11/2
|
size of drives

Layout:
__ _________POWER SUPPLY_________
| |d0 |d0 |d0 |
| F |d1 |d1 | d1 |
| A |d2 |d2 | d2 |
| N |d3 T0 | d3 T2 | d3 T4 |
| |d4 |d4 | d4 |
| T |_________________________________|
| R | d0 |d0 |d0__________|____________
| A| d1 |d1 |d1 | |
| Y | d2 T1 |d2 T3 | d2 T5____|_________ |
| | d3 |d3 | d3 | | |
| | d4 |d4 | d4 | c0t5d0s?
|__|_________________________________|

page 12 Tray 1 Tray 2 Tray 3


STRIPE Trays...

MIRROR arrays.

*Use channel B first on controller Fiber to copper adapter, 1 port for each host.

** run #ssaadm display cn (where 'n' = controller number)


|
this will give you array info on this controller

Solstice Disk Suite: "md" devices... Can change /etc/vfstab and /etc/system to bypass and use raw device
Use command 'metastat -s' to tie "md' device name in vfstab to physical partition name.
# solstice & (will run the GUI)

MODEL 200

The Sparc storage array model 200 is a rack mount disk array controller. Up to six differential SCSI disk
trays can be connected to it. Each tray can hold up to six drives. Ports are numbered 0-5 right to left, top
to bottom.

controller # drive in tray


| |
c2t2d0
|
port # on controller of array (or tray #, determined by port on array controller)

Connectors and switches:

Fiber optic connector Connects F/O cables from host to array


Scan connector Used to test SSA controller in factory

NVRAM LED Gives info on the SSA NVRAM. Press the NVRAM button when the SSA
is off, if the NVRAM LED comes on, then there is data pending on the
NVRAM that must be flushed to disk using the fastwrite software
command.
NVRAM button Used to determine if there is any data pending on the SSA NVRAM
DIAG switch Used to set the diag level of the SSA. DIAG position for normal
diagnostics. DIAG EXT for extended diagnostics.
Reset switch Resets array... Do not press while array is in use.
SYS OK LED Gives info on SSA status. Blinking is running normally.( freq=activity)
Off is no power or hung. Solid On is power but hung.

100/200 Array Commands:

# ssaadm release /dev/rdsk/c#t#d#s# To release a specific disk


# ssaadm release c# To release all drives on a specific controller
# ssaadm stop /dev/rdsk/c#t#d#s# To stop a specific disk
# ssaadm stop -t2 c# To stop a specific tray on a specific controller
# ssaadm stop c# To stop all drives on a specific controller
# ssaadm display c# To display status on all drives on a controller
# ssaadm -v download -w ####WWN##### c# To download old wwn to new SSA controler (2.5 and >)
page 13
Procedure to replace WWN on a SSA

1. Boot from CDROM


ok boot cdrom -sw
2. Locate the new array controller
# ls -l /dev/dsk/c*t0d0s2 | grep NWWN (=new controllers WWN, SSA display)
3. Mount servers '/' filesystem on /a
# mount -o ro /dev/dsk/c0t0d0s0 /a
4. Download the old address to the new controller
# /a/usr/sbin/ssaadm -v download -w ####WWN##### c#
(old WWN) (c# from step2)
5. # halt

6. Press reset on the back of the SSA


(if you don't know the original WWN, mount the root filesystem on /a and do a ls-l on
/dev/dsk/c#t0d0s2)

A1000 Disk array

- Same disk tray as the D1000


- Hardware raid controller
- Ultra Differential Fast/Wide host Connection
- 8-16 meg Processor Memory (2 simm slots)
- 16-64 meg Data cache (2 simm slots)
- Battery Backup for data cache
- Scsi ID switch on controller
- two models 8 HH or 12 low Profile chassis

D1000 Disk tray

D1000 disk tray is used in the Storage Edge A3500 RAID array. 5,8, or 15 D1000's can be used,
depending on the configuration. It uses the same disk tray as the A1000, but different controller. It
has 2 sets of scsi connectors, you can run 2 scsi busses into it and divide the drives or jumper the
busses together and have the array on one buss.

- Does not have a hardware raid controller


- 16bit Ultra Differential Fast/Wide Scsi bus
- two models 8HH (9.1gb or 18.2gb) or 12 low profile (4.2gb or 9.1gb)
- hot plug disk drives
- hot plug power and cooling units
- dual power cables to seperate sequencers

Scsi Id and Array Id are set on the rear DIP switch ( D1000 can be configured for 1 or 2 busses)

sw1: Disk Array 1 Id: up: drive IDs 8-11 or 8-13, Down: drive IDs 0-3 or 0-5
sw2: Disk Array 2 Id: up: drive IDs 8-11 or 8-13, Down: drive IDs 0-3 or 0-5
sw3: Drives Remote Start: up wait for scsi command, Down: check sw4
sw4: Drives Delay Start: up: Start with delay (id*12), Down: start at power-on
sw5: Reserved

Module ID switch (rear): Wheel switch used to ID unit (1-5) when used in an A3500
configuration.
Page 14
Disk Layout: D1000
Array2 | Array1
sw2: down | 0 1 2 3 4 5 | 0 1 2 3 4 5| sw 1: down
sw2: up | 8 9 10 11 12 13| 8 9 10 11 12 13| sw 1: up
Front veiw

Leds on back: Color location


Power supply Status led: Normal (green), failureand other p/s is ok (amber) P/S
Cooling status leds (4): Normal (green), blower failure (amber) fan housing
Temp fault: Normal (off), fault (amber) Control bd
controller power: Normal (green), no power (off) Control bd

RSM Disk Tray

RSM are used in the Storage Edge A3000 RAID array. Each A3000 contains 5 RSM disk trays

- Internally drives operate on a 16-bit Single-Ended Fast/Wide Scsi bus


- Externally the tray interface is a 16-bit Differential Fast/Wide Scsi bus
- 3 to 7 4.2gb or 9.1gb HH disk drives
- hot plug disk drives
- hot plug, redundent power and cooling units
- dual power cables to seperate sequencers

*** Scsi Id for the tray is set on the I/O board. setting of 0-6 or 8-14, 8-14 is required for the
RDAC module.

* Scsi Id for the SEN card is a wheel selection and should be set to 15 (F).

RSM
_____front veiw___________
*** | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
_________or_____________
| 8 | 9 | 10 | 11 | 12 | 13 | 14 |
target IDs

Leds/switches-

Disk leds:
Red-fault, Green I/O activity
Panel leds:
Power on/off switch
Power indicator (green)
Power module A and B fault (red)
Fan module warning (amber)
Fan module falure (red)
Over temp (red)
Reset Alarm (pbs)

page 15
A3000/A3500

A3000
- 56 inch rack.
- contains 5 RSM disk trays
- 1 RDAC Module
- each RDAC module has dual hot plug RAID controllers

A3500
- 72 inch rack
- contains 5, 7, 15 D1000 disk trays
- 1, 2, or 3 RDAC modules
- each RDAC module has dual hot plug RAID controllers

# raidutil - c (c#t#d#) - B battery age info for that controllers (A3x00)


- R to reset battery age after replacement (A3x00)

Break ,(esc), Q40, ld</Debug, arrayPrintSummary,cfgUnitList,vdShow,dstDevs,


rdacMgrSetModeActivePassive, rdacMgrSetModeDualActive,rdacMgrAltCtlFail,rdacMgrAltCtlResetRelease,
moduleList,sysReboot

A5000 (photon)

- The A5000 or Photon is a Fiber channel array


- up to 14 hh drives or 22 low profile hot pluggable, dual ported FC-AL disk drives

Model #'s
A5000 - 14 7200 rpm Drive of 9.1GB each
A5100 - 14 7200 rpm Drives of 18.2GB each
A5200 - 22 10000 rpm Drives of 9.1 GB each
RAID Manager

Commands:
# /usr/lib/osa/bin/rm6 to run
# /usr/lib/osa/lad will give ctd#s, controller serial #s and lun configurations
# fwutil /usr/lib/osa/fw/aaaaaaaaa.apd cxtxdxs0 Downloads appware to a controller (halt all i/0)
# fwutil /usr/lib/osa/fw/bbbbbbbb.bwd cxtxdxs0 Downloads bootware to a controller (halt all i/0)
# raidutil - c (c#t#d#) - b battery age info for that controllers (A3x00)
- r to reset battery age after replacement (A3x00)

RAID Manager Device Naming Conventions

Target ID of RAID controller


| slice
| |
C# T# D# S#
| |
| Lun # (created when setting up array)
Host Controller #

page 16
luxadm commands for the A5000

luxadm probe -p Display information about all attached A5000s. This will give you the
enclosure names
luxadm display Use the display subcommand to display enclosure or device specific info
enclosure info ex: # luxadm display mars-0
device info ex: # luxadm display mars-0,f3 (f3= front disk slot# 3)
luxadm inq Use the inquiry subcommand to display inquiry info for the enclosure or
specific disk
enclosure info ex: # luxadm inq mars-0
device info ex: # luxadm inq mars-0,f4 (f4=front disk slot#4)
laxadm led_blink Use the led_blink subcommand to start flashing the yellow led
associated with a specific disk.
ex: # luxadm led_blink mars-0,f2 (f2=front disk slot 2)
luxadm led_off Use the led_off subcommand to turn off the yellow LED
associated with a specific disk.
ex: # luxadm led_off mars-0,r3 (r3= rear disk slot#3)
luxadm power_off Use the power_off subcommand to set an enclosure or disk to
power save mode
enclosure ex: # luxadm power_off mars-0
disk ex: # luxadm power_off mars-0,f5 (f5=front disk slot#5)
luxadm power_on Use the power_on subcommand to set a drive or enclosure to
its normal power on state.
enclosure ex: # luxadm power_on mars-0
disk ex: # luxadm power_on mars-0,f1 (f1=front disk slot#1)
luxadm remove_device Use this subcommand to 'hot remove' a device or enclosure, when
removing failed disk units for replacement. Verbose output will
walk you thru the proceedure
enclosure ex: # luxadm remove_device mars-0
disk only ex: # luxadm remove_device mars-0,f6
luxadm insert_device Use the insert_device subcommand for 'hot' insertion of a new disk or
enclosure. Use after the remove_device command to replace a failed
drive with a new one. Verbose output will walk you thru the proceedure.
ex: # luxadm insert_device mars-0,f5
luxadm reserve Use the reserve subcommand to reserve the specified disk(s) for exclusive
use by the host from which the subcommand was issued.
ex: # luxadm reserve mars-0,f6
luxadm release The release command releases the drive from the reserve state
ex: # luxadm release mars-0,f6
luxadm enclosure_name Use the enclosure_name subcommand to change the enclosure name of
one or more A5000s
ex: # luxadm enclosure_name mars1 pluto2
(change from pluto2 to mars1)
luxadm download Use the download command to download a prom image to the
FEPROMs on an A5000 interface board. Stop all activity on this
connection before downloading firmware, the array will recycle
automatically after the download.
ex: # luxadm download -s mars-0 (will download firmware from
default file /usr/lib/locale/C/LC_MESSAGES/ibfirmware)
ex: # luxadm download -s -f /special/upgrade/ibfirmware.latest
mars-0
-f you can specify the file name and do not use the default

page 17
luxadm fcal_s_download Use the fcal_s_download command to download new fcode into ALL
the FC100-HA sbus cards or display the current versions of the fcode
in each FC100-HA Sbus card.
display: ex: # luxadm fcal_s_download
download:
ex: # luxadm fcal_s_download -f /usr/lib/firmware/fc_s/fcal_s_fcode

Disk failure and replacement Veritas

remove 1. # vxdiskadm
2. item 4 (Remove disk for replacement), Enter disk name, Remove another disk? n
3. item 11 (Disable (offline)a disk device) offline the same disk so it can be removed, q
4. # vxdctl enable (This will reconfigure DMP)
5. # luxadm remove_device mars-0,f0 (mars-0,f0 is enclosure name, diskslot#) return
(physically remove disk drive) (return)
replacement 6. # luxadm insert_device mars-0,f0 (mars-0,f0 is enclosure name, diskslot#) return
(physically insert new disk) return
7. # vxdctl enable (This will reconfigure DMP)
8. #vxdiskadm
9. item 5 (Replace a failed or removed disk) Enter disk name, enter c#t#d#, continue y,
replace another? n, quit q
10. from here you have a choice of 2 ways to complete this. (most of the time this is up to
the customer to do) read both before choosing.

1. make new disk spare and spare disk part of the RAID

# usr/sbin/vxedit -g rootdg set spare=on disk01


# /usr/sbin/vxedit -g rootdg set spare=off disk05
OR
2. Take the data from the rebuilt spare and put it back on the new drive
Evacuate the spare, disk05 back to disk01 to recover original configuration
# /etc/vx/bin/vxevac disk05 disk01

Minimum Configuration A5000

These are minimum disk configurations to insure adequate signal retransmission.

14 disk array The minimum configuration system has drives in slots 3, 6 in front and drives in
0, 3, and 6 in the rear. No other configuration is authorized. As disks are added they
should be spaced to minimize gaps between disks.

22 disk array The minimum configuration system has drives in slots 0, 5 in front and drives in
0, 3, 6,and 10 in the rear. No other configuration is authorized. As disks are added they
should be spaced to minimize gaps between disks.

Page 18
A5000 Addressing

"sf" = Host Adapter (socal) has 2 ports sf@0,0 and sf@1,0


"ses" = Interface Boards (IB) in the A5000, 2 IBs/array, 2 ports/IB ses 0 and 1 = IB-A
"ssd" = disk drives ses 2 and 3 = IB-B

convert to decimal Data path through IB to disk


divide by 2 sbus slot 21 = node A
round down d = on bd soc+ 22 = node B lun (always 0)
| | | |
sbus@1f,0/SUNW,socal@1,0/sf@1,0/ssd@w2100002037007fa1,0:a
| | | |
result is I/O bd # Loop connection WWN# slice a = 0
port on the HBA
0 = port A
1 = port B

A5000 Target ID assignments

(Box ID x 32) + (Backplane# x 16) + (Disk slot#) = Target ID


| | |
0,1,2,3 0 front 0-11 left to right
1 rear
ex: a rear disk slot 5 in a A5000 with box ID of 3 would be (3 x 32) + (1x16) +5 = t117

RDAC Module

- used in the A3000 and A3500 arrays


- dual hot plug RAID controllers
- Hot plug power and cooling units
- Battery backed up data cache
- Scsi out must be terminated (UDWIS)
- Controller Status leds Pattern will give you error information.
- SCSI ID jumpers for both RAID controllers, Default is 5 for top controller and 4 for the lower one

RAID Overveiw

RAID Manager Device Naming Conventions

Target ID of RAID controller


| slice
| |
C# T# D# S#
| |
| Lun # (created when setting up array)
Host Controller #

page 19
RAID LEVELS

RAID 0
RAID 0 is actually a AID (Array of Interconnected Disks) the R (redundant) part just isn't
here. RAID 0 is being able to put multiple physical disks together to make it appear as
one large virtual disk. There is no parity drives or parity stripes.

RAID 1
RAID 1 is an array that is mirrored. That means there are 2 sets of disks, every disk has a
counter part that is an exact copy. If one fails the other will take its place.
RAID 3
RAID 3 has striped data across multiple volumes and a dedicated parity drive. If one of the
drives should fail, it's data can be reconstructed from the parity drive.
RAID 5
RAID 5 has striped data across multiple volumes as RAID 3, but also has it's parity striped
across multiple volumes. RAID 5 is also able recover from a failed disk.

Boot process

1. VTOC (volume table of contents) Sector 0 of boot disk


2. Boot Block Sector 1-15 UFS reader can be rebuilt with the
installboot command.
3. UFSboot /platform/'uname-m'/ufsboot Loads standalone
kernel. You can tell it is loaded by the first instance of
the spinning wheel (after the memory size post
spinning wheel.)
4. genunix /kernel/genunix; generic unix kernel for the
operating system; specific only to the O/S release
5. unix /platform/'uname-m'/kernel/unix
specific to O/S and archecture type.
(you can tell it is loaded by the
second instance of spinning
wheel, at the Sun O/S Release 5.7
message).
6. /etc/system has the varibles to custom load kernel parameters.
boot -a will not use /etc/system file on boot
7. /etc/inittab sysinit: as we are trying to grab the console.
respawn: respawn proc if it dies
initdefault: default run level
wait: wait for job to complete
Powerfail: on PWR signal run approprite command.

page 20
Diagnostic commands:

arp Displays Address Resolution Protocol tables.


catman -w Create the /usr/share/man/windex database for use with index function available
thru the apropos command. Creates a windex file that includes every solaris command
and a brief description.
compare Will tell you the difference between two files ex: compare /kernel /usr/kernel
crash Used to analyse crash dumps
devlinks Creates symbolic links in /dev using info in /devices
df -k Displays disk space usage in Kbytes, including free space
dfmounts Display remote filesystem mount info.
dfshares Displays shared filesystem info.
diff Compare file contents
disks Creates symbolis links in /dev/dsk and /dev/rdsk, used after the drvconfig command
drvconfig Configure the /devices directory and the device information tree.
eeprom Analyse and change PROM settings.
file Determine a file's type
find Search for specific files
format Analyse or modify partition information
fsck Check UFS filesystems for inconsistencies
fstyp -v Display extensive file system parameters for a specified file system.
grep Analyse file contents, and search for specific patterns.
groups Display group definitions for a given user
ifconfig -a Add, display, and analyse the status of network interfaces
iostat Analyse I/O performance issues
isainfo - v Will tell you if you are running 32 or 64 bit applications
last Display history of system login information
ls Analyse file properties
mpstat reports processor stats on a per processor basis
ndd get and set named device driver parameters
netstat (-i, -r, -k) Analyse network tunning information, including active routes. -i interface info/collisions,
-r router info, -k kernel info pipe to more look for interface, verbose version of -i,
newfs Create and examine file system parameters
nfsstat Analyse NFS performance information
od Octal dump of a file. ex: od -c /etc/nsswitch.conf will display all charectors in the file
pagesize print the size of a memory page in bytes
patchdiag (sunsolve CD) Listing of recommended patches
patchadd -p Displays patches loaded on your system,
patchinstall (sunsolve CD) Is used to install patches
(ex: # cd /cdrom/cdrom0)
( # ./patchinstall)
backoutpatch (sunsolve CD) Will remove a patch after you cd to that directory
(ex: # cd /var/sadm/patch/102044-01)
( #./backoutpatch .)
perfmeter Provide graphic display of performance metrics
ping (-s) Contact network hosts by sending Internet Control Message Protocol (ICMP) request and
reply datagrams.
pkgchk check file integrity and accuracy of installation

pkginfo -l Will give you a description of all the packages (w/o pkg name) or one package (w pkg
name)
prtdiag Display system configuration and diagnostic information (/usr/platform/ 'uname -m'/sbin)
prtconf -v Get system device information from POST probe
prtconf -vp Device tree info and PROM version (OBP)
page 21
Diagnostic commands continued:

prtvtoc List the vtoc (disk label) of a disk drive ex: prtvtoc /dev/rdsk/c0t0d0s0
psrinfo -v Will give you processor information
prsadm - f (-n) - f Will allow you to offline a processor. - n will online a specified processor
/usr/ucb/ps -aux Lists processes in CP utilization desending order.
pwck checks the password file for inconsistencies
sar Analyse system performance information (must be initialized in /etc/init.d/perf)
showrev -p list currently installed patches; patchadd -p in solaris 2.6 and above
snoop (-s) display and analyse network traffic
strings Search object and binaryfiles for ASCII strings
sysdef Analyse device and software configuration information.
swap Add, delete and monitor system swap areas
sum Calculate and print a checksum value for a named file
sys-unconfig Enables you to change information entered during sysidtool phase of installation
tail -f Leave file open for reading and display what is there
tic Terminfo compiler; translates a terminfo file from source to compiled format
timex List runtime and system activity information during command execution
traceroute Show the route followed by packet transfered in a subnet environment
truss Trace system calls issued and used by a program or command
tunefs Modify file system parameters that affect layout policies
uname Print platform, architecture, operating system, and system node information.
vmstat Analyse memory performance statistics
who am i Display the effective current user name, terminal line and login time
xhost hostname allows graphical access to your host from the host specified in hostname

Diagnostic files

/etc/defaultdomain Name of the current domain, read and set at each boot by script /etc/init.d/inetinit
/etc/default/cron Determine logging activity for the cron daemon through specificationof the cronlog
variable
/etc/default/login Control root logins at the console through specification of the console varible and other
defaults.
/etc/default/su Determine /etc/hostname.le0 logging activity for the su command thru specification of
the sulog variable
/etc/dfs/dfstab List what distributed file systems will be shared at boot time
/etc/dfs/sharetab List currently shared NFS file systems
/etc/hosts Host file linked to /etc/inet/hosts
/etc/hostname.le0 Assign a system name, and through cross-referencing the /etc/hosts file, add an IP address
/etc/hostname.hme0 to a particular network interface
/etc/inetd.conf List information for network services that can be invoked by the inetd daemon
/etc/inittab Read by init daemon at startup to determine which rc script to execute; also contains
default run level.
/etc/minor_perm Specifies permissions to be assigned to device files
/etc/mnttab Display a list of currently mounted file systems
/etc/name_to_major Display a list of configured major device numbers.
/etc/netconfig Display the network configuration database read durring network initializeation and use
/etc/nsswitch.conf List the database configuration file for the name service switch engine.
/etc/path_to_inst List the contentents of the system device tree using the format of a physical device names
and instance numbers
/etc/protocols List known protocols used in conjunction with internet
/etc/release O/S release and date
/etc/rmtab List the current remotely mounted file systems

page 22
diagnostic files continued:

/etc/rpc List available RPC programs


/etc/services List the well-known networking services and associated port numbers; maintained by NIC
/etc/system Tunable Kernel parameters boot -a will boot w/o an /etc/system file
/etc/vfstab List local and remote filesystems mounted at boot time.
/var/adm/messages Lists resent console window and boot messages
/var/adm/sulog Display a record for each invocation of the su command
/var/adm/utmpx List user and accounting information for the who and login commands
/var/adm/wtmpx Maintain history of user information for the accounting packageand report facility.
/var/crash/hostname Crash files, unix is the symbol lookup file, vmcore is the core dump, bounds is incremental
value for next core set.
/var/lp/log List print services activity
/var/sadm/install/contents List installed software packages
/var/sadm/install_data/install_log A listing of the way the install was completed
/var/sadm/pkg patch and package information (new O/Ss)
/var/sadm/patch patch and package information (old O/Ss)
/var/sadm/system/admin/INST_RELEASE List of clusters installed on the system.
/var/saf/_log List activity of the Service Access Facility (SAF)
/var/spool/locks/lck clean up to clear bad tip session (will get error- all ports busy)

Watchdog Resets

CPU Watchdog Reset is initiated on a single processor machine when a trap condition occurs while traps
are disabled and register bit to enable traps is not set. The system tries to come down in a
deterministic state and traps to a reserved physical address

System Watchdog Reset is when a fatal error is detected on a multi-processor machine.

obpsym module should be loaded to maximize the amount of symbolic information available in the
PROM (obp) environment. Without this module, information is displayed without textual
information.

To check if obpsym is loaded:


# modinfo | grep obpsym
To load the module from command line:
# modload -p misc/obpsym
To load module with each boot, enter the following in /etc/system:
forceload: misc/obpsym

obp register commands - sun 4u (used with watchdog reset analysis)

.locals Displays the local CPU registers


.registers Dumps the registers of the current window, those in use at the time of the crash.
ctrace Displays a stack trace, listing routines that erer active when the system went down
(obpsym module should be loaded. see above)
.pstate Formatted display of the process state register
.ver Formatted display of the version register
.ccr Formatted display o f the ccr or cache control register
.trap-registers Display of trap related registers

page 23
obp register commands - sun4m (used with watchdog reset analysis)

.locals Displays the local CPU registers


.registers Dumps the registers of the current window, those in use at the time of the crash.
ctrace Displays a stack trace, listing routines that erer active when the system went down
(obpsym module should be loaded. see above)
.psr Formatted display of the process status register
.fregisters Display of the floating point registers

What to look for at the OK prompt of a watchdog reset:

Note the number next to the OK prompt, which is the number of the CPU that hit the watchdog
reset (multi-processor only)

Note the information in the following fields from OK prompt:


.registers- Valid addresses associated with the window registers on
display
.locals - Valid addresses associated with the registers on this display
cstrace - pc addresses and routine names
.ver - The implementation (IMPL) and (MANUF) manufacturer
numbers.
.trap-registers- The trap type (TT), the (TSTATE), and the processor state
(PSTATE)
.pstate - The RED value, which is similar to the ET (enable trap) bit on
SPARC Version 8.

Solaris commands and files that can be used in watchdog reset analysis:

showrev -p
prtconf -v
pkginfo
/usr/ccs/bin/nm /dev/ksyms > symbol_file
/usr/platform/sun4u/sbin/prtdiag -v > prtdiag_file
/etc/system
/var/adm/messages

Related document numbers in the SunSolve database include

1360 - Trouble Shooting Watchdog Resets


14133- Is the system crash due to hardware or software
14230- System crashes and how to prepare for analysis by Sun Service

page 24
Dump analysis

****Cores sent in from the customer are located in:


/net/eastcores/corefiles/SO# (SO#= SO opened by customer)
/net/cores.central/cores/gesd/fidelity/open/SO#

***(STOP A) sync ... on a hung system will cause a core dump.

Three debuggers:

adb: Assembly debugger. It is an interactive and general purpose utility and can be used to
examine files, and it provides a controlled enviroment for executing programs. By default
it does not supply a prompt.

(to run adb on a dump file)


#cd /var/crash/host_name
# adb -k unix.n vmcore.n

(to run on a live system)


#adb -kw /dev/ksyms /dev/mem

What to look for in a core dump with the adb debugger:

$<msgbuf This will give you the:


Name or the failing process
Register pointer (rp=)
PID (pid=)
Program counter (pc=)
stack pointer (sp=)
thread of failing process (g7)

if no info do this:

To find executing instruction:

1. do a stack trace... $c
(this will give you a listing to use in step2)
2. get register pointer,
64 bit system 2nd value from 'die' ex: die (0x9, 0xf05246f4, 0x30, 0x326,...
32 bit system 2nd value from 'trap' ex: trap (0xf028a1d8, 0xf05246f4, ...
(use this value in step 3)
3. get values in register 'pc'
0xf05246f4$<regs
(use the value under the pc heading for step4)
ex: pc
fc479dbc

4. use the value in 'pc' to see the executing instruction


fc479dbc/ai
(it will tell you something like 'ram_write)

page 25
To find thread involved with panic:

1. panic_thread/x (32 bit systems)


panic_thread/k (64bit systems)
(this will print out something like... panic_thread: f5c66480)
2. use the thread value to find the 'procp'
f5c66480$<thread
(look @ the structure and retrieve 'procp' value)
ex: procp
f5c0fcc8
3. Take a look at the process structure to get the process name and arguments (psargs)
f5c0fcc8$<proc2u
(you should see something in text for process name)
4. You can also use the 'procp' value found in step 2 to get the 'pidp' address
f5c0fcc8$<proc
pidp
f74ccf93
5. Use the 'pidp' adderss from step 4 with the 'pid.print' macro
f74ccf93$<pid.print

adb commands

cpu$<cpus Display cpu0 which contains the address of the currently running thread.

cpun $< cpu Display the cpu identified by n

$<msgbuf Display the msgbuf structure, which contains the console messages
leading up to the panic.

$c Display the stack trace

$C Show the call trace, and stack trace leading up to a panic from the bottom
up.

$r Display the SPARC window registers, including the program counter and
the stack pointer

<sp$<stacktrace Use the sp(stack pointer) address to locate and display a detailed
stacktrace

$q Quit adb

$>file Redirect output to file

page 26
crash similar to adb, but the command interface is different. Crash is used to examine memory of a
running or crashed system.

(to run crash on a dump file)


# cd /var/crash/host_name
# crash vmcore.n unix.n

(to run on a live system)


# crash (without any arguments)

crash commands:
u or user will give info on the process that was running when the crash occured

stat will give you the following information:


system name
version information
time of crash
age of system
type of panic

proc will give you listing of process table

defproc will give you the current process slot number (used with proc command)

defthread will give you the current thread address

kadb is similar to adb. It must be loaded prior to the standalone program it is to debug. To run the
kernel under kadb type 'boot kadb' at the ok prompt

iscda is Initial System Crash Dump Analysis... The script is included on the sunsolve CD under
the top level directory ISCDA. The following is an example of usage:

# cd /var/crash/machine_name
# iscda unix.0 vmcore. 0 > /tmp/iscda.output

This will run the iscda script on the core dump in /var/crash/machine_name. The output
will go to /tmp/iscda.output. The output will consist of the results from a sequence of adb
and crash commands. If needed, you can send this file to the Sun solution center via Email.

SunSolve

The Sunsolve CD is a valuable tool in diagnosing problems. The following are home page
selections:

Power search provides a menu driven database selection for searching, Bug reports,
FAQ's, Patch descriptions, tech bulletins, Info docs, Symptom and
Resolutions

Patch Diag Tool Determines the patch level of your system compared to Sun's reccomended
patch lists. can be run by cli # patchdiag

Page 27
Crash Dump Analysis Displays how to load and run the ISCDA script. (Initial System
Crash Dump Analysis)

Sun Courier Submits a service request to Sun solution center. (sendmail must be
running)

Installing a patch with Sunsolve CD:


# cd /cdrom/cdrom0
# ./patchinstall (patch#)
Removing a patch with the Sunsolve CD:
# showrev -p (list all patches installed on your system, get name and rev)
# find / -name 102044-01 -print (find installed location of patch)
# cd /var/sadm/patch/102044-01 (change to patch directory)
# ./backoutpatch .
# reboot
SUN VTS

Sun VTS is validation test suite. VTS is run at the Solaris level, but should not be run
while the customer's applications are up. VTS comes with the Solaris package, there
are different revisions for Solaris Releases, rev 2.12 for Solaris 2.6, 3.0 for Solaris 7
and 3.4 for Solaris 8. It is reccommended to use the version of VTS that corresponds to the O/S
you are running. Also check sunsolve for related patches.

Installation: (loads to the /opt/SUNWvts directory)


(share -F nfs -o ro,anon=0 /cdrom/cdrom0/s0 if ssp)

# cd /cdrom/cdrom0/Product
# pkgadd -d . SUNWvts SUNWvtsx SUNWodu SUNWvtsmn
or
# /cdrom/cdrom0/installer or run thru file manager window

To run: (programs reside in the /opt/SUNWvts/bin directory)

# sunvts - Default graphical interface (CDE) on local machine


# sunvts - l Runs Openlook graphical interface on local machine
# sunvts - t Runs in tty mode*
# sunvts -h host-name Runs graphical interface on local machine
while connecting and testing a remote machine (Sun vts
must be loaded on both machines)
* #TERM=vt100; export TERM (use this command when running in tty mode from notebook)
*** set_options / Thresholds to 00 ( to log errors and continue )

sunvts -t Navigation: (the <ctl> keys are good if you forgot to set the TERM)
<tab> move between windows
<ctl> w move between windows
<arrow> move within window
<ctl> r move within window on same line
<ctl> u move within window up/down lines
<ctl> f move within window forward
<ctl> b move within window backwards
<ctl> l refresh screen
<esc> close pop- up menu
<space> select / deselect test
Page 28 <enter> select function
STORtools

STORtools Toolkit simplifies the monitoring and troubleshooting of Sun


StorEdge A5000, A5100, A5200 disk array instalations. The tool provides
an easy to use menu driven front end program with task explanations and
help information. Command line utilities are provided for advanced custmized
use. The utilities have standard man pages for online documentation.

STORtools provides tools for performing the following tasks:

- Revision Checking
- Configuration Management
- Monitoring and Notification
- Troubleshooting and Fault Isolation

To install from CDrom:


# pkgadd -d . STORtools

To install after down load from web site:


# uncompress STORtools.tar.Z
# tar -xvf STORtools.tar
# pkgadd -d . STORtools

To run STORtools
# /opt/STORtools/bin/stormenu

page 29
Explorer Scripts:

New Version:

The new version of explorer can be found on Sunsolve under "navigation - diagnostic tools"
It is now a software package (SUNWexplo) and can be installed and run (initially) with the
pkgadd - d command.
To expand: # zcat SUNWexplo.tar.z | tar xvf -
to install: # pkgadd - d . SUNWexplo

Once the package is installed explorer can be run from /opt/SUNWexplo/bin/explorer.

Old Version:
The following is documentation sent out with the explorer script. It contains information
on how to expand, run and mail the output from the explorer.

1. #su root
2. Save the explorer.tar.Z file in directory where root has write permission
3. for encoded files :
#uudecode filename
#zcat explorer.tar.Z | tar xvf -
4. #./explorer

-While executing this script, you will be prompted to enter information about your site.
- If you have internet access, we ask that you enter "y" to the question Would you like
to e-mail results [y/n]" so that we get the output automatically.
- If you choose not to e-mail the explorer file automatically, please send the resulting file
(*.uu) as an attachment to your PTAS account manager.

Explorer in CRON (for this example, explorer will reside in /usr/tmp)

**** Do steps 1-3 above


1. # copy file 'explorer.template' to another file (ie: file_name)
2. # chmod 755 file_name
3. Edit file_name and fill in the appropiate lines.
4. Edit the root crontab file using the 'crontab -e' command and make an entry
similar to the following:

00 23 1 * * cd /usr/tmp; /usr/bin/zcat explorer.tar.Z | /usr/bin/tar xvf - ; /usr/tmp/explorer -file


/usr/tmp/file_name -mail

5. If you choose not to email the explorer file automatically (-mail option)
please send the resulting file (*.uu) as an attachment to your PTAS Account
manager.

Note: if crontab -e does not work correctly, try setting the following variable
'setenv EDITOR vi'

To veiw the explorer output file


run uudecode on the *.uu file (this will create a host_id.tar.z file)
run gunzip on the tar.z file (this will create a host_id.tar file)
run tar -xvf on the .tar file (this will expand the file to the explorer output
structure)
page 30
Performance Analysis

Tools: (commands)

timex reports system activity for the execution of a single command


-o reports I/O transfers
-s reports sar activity during command
-h reports 'hog factor'
ex: # timex ps -ef (will tell you the amount of time the ps command took to
execute)
top display and update information about the top cpu processes
ex: # top 20 (will give you stats on the top 20 processes default is 10)

vmstat reports Virtual memory statistics


ex: # vmstat 15 2 (will collect and report virtual memory stats for 15 intervals of
2 seconds)

iostat reports I/O statistics


ex: # iostat 60 3 (will collect and report I/O statistics for 3 60 second intervals)

disk thruput test: ( from infodoc 21931)

for write performance: (this will write over data. do not use if data is needed on this disk)
# dd if=/dev/zero of=/dev/rdsk/cxtxdxs2 bs=1024k
for read performance:
# dd if=/dev/rdsk/cxtxdxs2 of=/dev/null bs=1024k
# iostat -pxn 5

mpstat Reports processor statistics per processor


ex: # mpstat 30 2 (will collect and report proc stats for 30 intervals of 2 seconds)

sar reports overall system activity


-u CPU usage data
-q average length of run queue
-r collect paging data
ex: sar -u 60 30 (will collect cpu data for 30 intervals of 60 seconds each)
sar -q 60 30 (will collect run queue data for 30 intervals of 60 seconds each)
sar -r 60 30 (will collect paging data for 30 intervals of 60 seconds each)

w reports on current system activity per user

page 31
Backups

ufsdump backs up all files specified by files_to_dump (normally either a whole file
system or files within a file system changed after a certain date) to
magnetic tape, diskette, or disk file. Filesystems to be backed up
must be inactive (unmounted or single user mode)

0-9 dump level, 0 is full dump. It is relative to what has been backed
up. If a level 2 was done then level 4 backup was done the next day.
If the next day you did a level 5 all modified files since level 4 would
be backed up.... If instead you did a level 3 backup all modified files
since the level 2 would be backed up.
c cartridge. Sets the defaults for cartridge instead of the standard
half-inch reel.
f Dump file. Use dump_file as the file to dump to, instead of
/dev/rmt/0. If dump_file is specified as -, dump to standard output.
u update the dump record. Add an entry to the file /etc/dumpdates.
v verify. After each tape or diskette is written, verify the contents
of the media against the source file system.

ex: # ufsdump 0cfu /dev/rmt/0 /dev/rdsk/c0t3d0s0 (full dump of a root file


system on c0t3d0 on cartridge tape unit 0)
# usfdump 0uf /dev/rmt/0 /usr (dump the /usr filesystem to tape)
# ufsdump 5fuv /dev/rmt/1 /dev/rdsk/c0t3d0s6 (make and verify an
incremental dump at level 5 of the /usr partition of c0t3d0,
on tape unit 1

ufsrestore ufsrestore utility restores files from backup media created with the
ufsdump command.

i Interactive. After reading in the directory information from the


media, ufsrestore invokes an interactive interface that allows
you to browse through the dump file's directory hierarchy and
select individual files to be extracted. Valid commands
are ls, cd, add, verbose, delete, extract, quit
r Recursive. Restore the entire contents of the media into the
current directory (which should be the top-level of the file system).
To completely restore a file system, use this function letter to
restore the level 0 dump, and again for each incremental dump.
t Table of contents. List each filename that appears on the media.
If no filename argument is given, the root directory is listed.
x Extract the named files from the media. If a named file matches
a directory whose contents were written onto the media, and the h
modifier is not in effect, the directory is recursively extracted
f Use dump_file instead of /dev/rmt/0 as the file to restore from.
Typically dump_file specifies a tape or diskette drive.

ex: # ufsrestore tvf /dev/rmt/0 (list tape contents of /dev/rmt/0)


# ufsrestore rvf /dev/rmt/0 (restore contents of tape /dev/rmt/0 to
the current directory you are in)
# ufsrestore ivf /dev/rmt/0 (interactive restore of tape /rmt/0)
page 32
tar Copies and Archives files
-c create (backup)
-v verbose (details)
-f device
-t table of contents (list)
-x extract
-p restore to original mode
-h follow symbolic link
-d access special files

ex: tar -cvf /dev/rmt/0 /usr (backup /usr to tape /rmt/0)


tar -xvf /dev/rmt/0 /usr (restores /usr from tape /rmt/0)
tar -tvf /dev/rmt/0 (lists the contents of tape /rmt/0)
zcat file_name.tar.Z | tar xvf - (expand a tar.Z file)

cpio copies and archives files


-o output
-v verbose
-i input
-t list
-d create directories
-m retain modification time

ex: # cpio -ov /usr /dev/rmt/0 (copies /usr to /dev/rmt/0)


cpio -itv < /dev/rmt/0 (list the contents of /dev/rmt/0)
cpio -idmv < /dev/rmt/0 (restores /dev/rmt/0)

dd Device to device copy


ex: # dd if=ascii_file of=ebcid_file conv=ebcidic (converts an ascii file to ebcidic)
# dd if=/dev/rmt/0 of=/dev/rmt/1 (copies from rmt/0 to rmt/1)
# dd if=/dev/rdsk/c0t0d0s2 of=/dev/rdsk/c1t0d0s2 bs=512000
(for an quick copy of c0t0d0 on c1t0d0)

page 33
How is a Coredump Generated?

When a system crashes, it writes a copy of its memory to a temporary location on a disk, usually to the
primary swap partition. Savecore is a program which runs at boot time to retrieve the memory copy
from the temporary location and to save it to a place where it can be accessed. Savecore must be run
during the bootup process, or very shortly thereafter, before it would be overwritten by a
running operating system which uses the primary swap partition for other purposes.

How to Get a Coredump from a Solaris 2.x system

Getting a coredump is not enabled by default, because corefiles can be


quite large. Enabling a coredump requires the following to be done:

1) Verify that savecore exists.


Do the following command:
ls -l /usr/bin/savecore
savecore is located in the SUNWtoo package (Programming tools) in 2.X, and is not part of the core install.

If savecore does not exist on a 2.X system, do a pkgadd on SUNWtoo.


a) Put the correct OS version installation CDrom in the CDrom drive.

b) Wait until the access lamp goes out in the CDrom drive.

c) # pkgadd -d /cdrom/sol*/s0/Sol* SUNWtoo

d) Answer the questions.

2) Determine how much memory you have on your system. This can be done by:

a) examining your system banner if your system is down by typing "banner" at the "OK" prompt.

b) doing a "wsinfo" on a 2.x system running openwindows, and checking the "physical memory" column.

c) looking at the /var/adm/messages file, or output of the dmesg command, and searching for the line
which starts with "mem =". The number which follows will be in bytes. Divide by 1048576 to
get megabytes.

3) Find any locally mounted partition, other than /tmp, which has enough room to hold the coredump. A
coredump takes usually about 35% of the size of main RAM memory.

4) Verify that your dump area is at least 35% of the size of main RAM memory. A regular disk is prefered
to a meta-filesystem running under Veritas or DiskSuite control. The dump area is usually the primary
swap file.
Execute a "swap -l" command and observe the first line with values in it. Take the number in the
"blocks" column and divide by 2048. This is the number of megabytes in the primary swap file.
Compare this to the size of main RAM memory found in step (2) above.

Page 34
5) Enable savecore as follows: (Savecore is enabled by default in Solaris 2.7.)

a) Edit /etc/init.d/sysetup, and search for the word


"savecore". You will find something similar to
##
## Default is to not do a savecore
##
#if [ ! -d /var/crash/`uname -n` ]
#then mkdir -p /var/crash/`uname -n`
#fi
#echo 'checking for crash dump...\c '
#savecore /var/crash/`uname -n`
#echo ''

b) Remove the left "#" signs from the bottom 6


statements in (i) above.

c) ( optional if you don't want the core copied to the /var or if /var
wasn't large enough)
Substitute the name of the partition found in (3)
above for "/var" wherever it shows in the statements in (i) above.

Incidentally, if you know that savecore is enabled but do not know where
the corefiles are put, checking the "savecore" statement listed above
will tell you.

Page 35
Dump device bad when saving core on encapsulated root

Problem:

Systems with VxVM encapsulated boot disks will not be able to do system dumps if the swap
slice is not tagged as swap. With the root drive encapsulated, if the system tries to do a
system dump in the event of a panic, it may present messages similar to the following:

panic: <some OS kernel panic message>


syncing file systems... done
2084 static and sysmap kernel pages
380 dynamic kernel data pages
385 kernel-pageable pages
0 segkmapkernel pages
0 segvn kernel pages
253 current user process pages
3102 total pages (3102 chunks)

dumping to vp fc2f9204, offset 171232


0 total pages, dump device bad <=- The problem!
rebooting...

Problem Solution:

If the swap slice was not tagged as swap in format when the root
drive was encapsulated, the encapsulation process will zero out
the swap slice when it makes the swap volume:

Part Tag Flag Cylinders Size Blocks


0 root wm 0 - 134 100.20MB (135/0/0) 205200
1 unassigned wm 0 0 (0/0/0) 0
2 backup wm 0 - 2732 1.98GB (2733/0/0) 4154160
3 usr wu 825 - 1229 300.59MB (405/0/0) 615600
4 usr wm 1230 - 1667 325.08MB (438/0/0) 665760
5 unassigned wm 0 0 (0/0/0) 0
6 - wu 0 - 2732 1.98GB (2733/0/0) 4154160
7 - wu 135 - 135 0.74MB (1/0/0) 1520

In this example, slice 1 is the swap slice.

When the system dumps, it need to use the physical device and not the swap volume. The dump
fails because slice 1 shows a zero size in format.

To solve the dump dev problem, you need to go into format and edit
slice 1, change the tag to swap, and give it the start and end
cylinders.

Page 36
To get the end cylinder, you need to look in /etc/vx/reconfig.c/disk.d/c?t?d?/vtoc:

# cd /etc/vx/reconfig.d/disk.d/c0t0d0
# more vtoc

#THE PARTITIONING OF /dev/rdsk/c0t0d0s2 IS AS FOLLOWS :

#SLICE TAG FLAGS START SIZE


0 0x2 0x200 0 103360
1 0x0 0x201 103360 611040
2 0x5 0x200 0 4154160
3 0x4 0x200 718960 615600
4 0x7 0x200 1334560 205200
5 0x0 0x200 1539760 410400
6 0x0 0x000 0 0
7 0x0 0x000 0 0

In this example, 611040b is the ending cylinder for slice 1.

In format, select the root drive and edit slice 1:

partition> p
Current partition table (unnamed):
Total disk cylinders available: 2733 + 2 (reserved cylinders)

Part Tag Flag Cylinders Size Blocks


0 root wm 0 - 67 50.47MB (68/0/0) 103360
1 unassigned wm 0 0 (0/0/0) 0
2 backup wm 0 - 2732 1.98GB (2733/0/0) 4154160
3 usr wm 473 - 877 300.59MB (405/0/0) 615600
4 var wm 878 - 1012 100.20MB (135/0/0) 205200
5 unassigned wm 0 0 (0/0/0) 0
6 - wu 0 - 2732 1.98GB (2733/0/0) 4154160
7 - wu 2732 - 2732 0.74MB (1/0/0) 1520

partition> 1

Part Tag Flag Cylinders Size Blocks


1 unassigned wm 0 0 (0/0/0) 0

Enter partition id tag[unassigned]: swap


Enter partition permission flags[wm]:
Enter new starting cyl[0]: 68
Enter partition size[0b, 0c, 0.00mb]: 611040b <==from vtoc file
partition> l
Ready to label disk, continue? y

partition> p

Current partition table (unnamed):


Total disk cylinders available: 2733 + 2 (reserved cylinders)

page 37
Part Tag Flag Cylinders Size Blocks
0 root wm 0 - 67 50.47MB (68/0/0) 103360
1 swap wm 68 - 469 298.36MB (402/0/0) 611040
2 backup wm 0 - 2732 1.98GB (2733/0/0) 4154160
3 usr wm 473 - 877 300.59MB (405/0/0) 615600
4 var wm 878 - 1012 100.20MB (135/0/0) 205200
5 unassigned wm 0 0 (0/0/0) 0
6 - wu 0 - 2732 1.98GB (2733/0/0) 4154160
7 - wu 2732 - 2732 0.74MB (1/0/0) 1520

partition> q

page 38
Uncompressing Files:

What to use to uncompress files:

Use the 'file (file_name)' command to determine what type of compression was used.
Ex: # file 2.6_x86_Recommended.tar.gz
2.6_x86_Recommended.tar.gz:
gzip compressed data - deflate method , original file name

*.tar.Z files use the 'zcat (file_name.tar.Z) | tar xvf -' command
Ex: # zcat explorer.v.3.1.0.tar.Z | tar xvf -

*.tar.gz files use the 'gzcat (file_name.tar.gz) | tar xvf -' command
Ex: # gzcat 2.6_x86_Recommended.tar.gz | tar xvf -
you can also use the 'gunzip' command but that will result in a *.tar file and
you will have to use the 'tar - xvf (file_name.tar)' command to expand it

*.tar.z files copy to *.tar.Z and use zcat (see above)

*.zip files use the 'unzip (file_name.zip)' command


Ex: # unzip stuff.zip

*.tar files use the 'tar -xvf (file_name.tar)' command


Ex: # tar -xvf 2.6_x86_Recommended.tar

*****zcat can be found on most versions of Solaris in /usr/bin******


gzcat can be found on the web and Sunsolve CD
gunzip or gzunzip can be found in /usr/dist/exe on the corporate network
tar can be found on most versions of Solaris in /usr/bin
unzip can be found in /usr/dist/local/exe on the corporate network

****NOTE: It is a good idea (due to the locations of these commands) to have them on a floppy
or CD that you can bring on-site. *****

page 39
T300 (purple): Also see page 67

Description:

The T300 array is a hardware RAID FCAL device. As such please make sure all firmware
and patches are up to date. You can use STORtools* to exercise and troubleshoot the product.
The T300 also has a com (rs232) port so you can tip into it and a ethernet port so you can
use telnet, ftp, tftp boot, or administer it through Component Manager.
The T300 has an EP (extended Prom) boot that runs post and has its own set of commands
and also runs a limited function unix O/S called PSOS, (accessed thru tip or telnet). PSOS
can be run from the reserved area on the array drives or tftp can be used to load it from
the server.
*STORtools will only test to the MIA on the T300 product line.
Partner group

Two T310s cabled together through the UICs. The cables coming from the 2 dot (OUT ..)
ports on the UIC designate the primary array. The other array (uic 1 dot IN ) becomes the
secondary array. Only 2 T300s can be in a partner group at this time. In a partnered group
with 2 fiber paths, the server will access the LUNs thru both paths, top array LUNs
thru top array controller and bottom array LUNs thru bottom array controller. If something
happens to one of the controller then the LUNs will failover to the remaining controller.

Tray ID #s (fru stat, fru list)

u# = unit right now valid numbers are u1 and u2


u1d3 = unit 1 disk 3
u2pcu1 = unit 2 power cooling unit 1
u1l1 = unit 1 loop 1 (uic1)
u2ctr1 = unit 2 raid controller 1

Default array login: :/:> root (return) no password

Default Configuration: 1 LUN RAID 5

Chassis Model number history:

p1.7 Darker gray, 2 fiber data ports on raid controller bd.


p1.8 Single fiber data port and HH 1.6" seagate drives
p1.9 Single fiber data port and LP (1.0") drives
p2.0 Redesigned chassis called "barney" (have not seen yet 3/12/00)

Hot pluggable FRUs:

PCU (Power Cooling Unit) battery good only 2 years, messages in syslog 45 days prior
to expiration once PCU is unplugged you have 30 min to change before array starts
a shutdown sequence. Array requires 3 fans to stay below critical temp.

UIC (Unit Interconnect Controller) verify status thru fru stat. Once UIC is removed you
have 30 min to change before array starts a shutdown sequence.

Raid Controller is only redundant in a partner group. Also needs to have some type of DMP
running (veritas) to fail over and have the server be able to access the disks on the
failed array.
page 40
T300 (continued) Also see page 67

Disk(s) Numbered 1 - 9 left to right while facing front of array. Pull disk out ( use
spring loaded latch handle) one inch, wait 30 seconds then remove from array.
Once Disk drive is removed you have 30 min to change before array starts
a shutdown sequence.

MIA Media Interface Adapter (fiber to copper connection) is only redundant in a partner group.
Also needs to have some type of DMP running (veritas) to fail over and have the server
be able to access the disks on the failed array.

LEDs: ( in general, for specific info see pg 6-9 & 6-15 install and admin manual)

Solid | Blinking
Green: normal status | system activity
Amber: Fru is being initialized | Fru failure (controller, uic, pcu, disk)

Path:

Sbus controller # (hba)


|
C#T#D#S#
| | |_ Slice
| |____ T300 volume number (LUN) (use 'port listmap' command)
|_______ Target ID of array ( use 'port list' 'port set' commands)

****Use format, scsi, inquiry, mode bytes, 10 = primary path 30 = secondary path ****
****You will cause a LUN failover if you try to access the secondary path LUNSs through *****
****low level commands like format and dd in a partner group*****

convert to decimal
divide by 2 Volume on array
round down sbus slot LUN (port listmap)
| | |
sbus@1f,0/SUNW,socal@1,0/sf@1,0/ssd@w50020f2300000a06,1:a
| | | |
result is I/O bd # Loop connection WWN# slice a = 0
d = on board soc+ port on the HBA last 6 digits
0 = port A are from
1 = port B mac address
(set command)

T300 Boot:

- Eprom: T300 EP boot (1st stage)


POST
- U1d1 (will try to get PSOS from U1d1- d9 or TFTP if set bootmode tftp)
- PSOS boot (T300 Release x.x) (2nd stage)
- POST
- Mount filesystems
- Load daemons
- Login prompt

page 41
T300 (continued) Also see page 67

TFTP BOOT: (if chasis is swapped enter new mac address into /etc/ethers file of tftp server)

On Server:
1. Modify /etc/hosts file on server with ip and name of array
2. Modify (create) /etc/ethers file on server with mac and array name
3. Create /tftpboot directory and copy nbxxx.bin (psos) to it
4. Un comment '#tftp' in /etc/inetd.conf
5. kill -HUP inetd PID#
6. ps -ef | grep in.rarpd (should be running... restart if tftp doesn't work)
On Array:
7. Modify Bootmode to tftp (:/:set bootmode tftp)
8. Modify tftphost to server's IP (:/: set tftphost xxx.xxx.xxx.xxx)
9. Modify tftpfile to nbxxx.bin (step#3) (:/: set tftpfile nbxxx.bin)
10. Modify IP to ip assigned to your array ** (:/: set ip xxx.xxx.xxx.xxx)
11. Reset array

** if rarp is working, array should get IP from server, If IP is assigned thru "set"
command than array will go to the 'who is tftphost' phase of tftpboot.

Add a volume (lun) to a array: (:/: sys blocksize (n)k should be set to correct value before 'vol add')

vol add vol_name data u#d#-# raid # standby* u#d9


vol init vol_name data rate(1-16)
vol mount vol_name
vol stat*
vol list*
vol mode*
(Note: if t3b and volslice is enabled, you must create a slice to see lun in format- pg. 76)
*optional

T300 useful commands: (use the 'help' command to get specific switches)

File management:
mkdir, rmdir, cd, pwd, touch, cat, more*, tail ,rm, mv, telnet, ftp**
*more command use q=quit, f= forward, b= backward
** ftp requires a password on the root account

vol commands:
vol list, vol add, vol remove, vol init, vol mount, vol unmount, vol mode,
vol verify, vol stat.

boot Boot system (-i, -s,)


disable Disable controller (u1,u2) or loop cards (ux lx)
disk Disk administration (version)
date set date and time (200003071607 = 03/07/2000 16:07)
enable Enable controller (u1,u2) or loop cards (ux lx)
ep Program the flash prom
fru Display FRU information (-s , -st, list, stat,)
help Display reference Manual pages
id Display fru identification summary
lpc Get interconnect card property (ledtest)
page 42
T300 (continued) Also see page 67

passwd change or display array password


port configure the interface port number (list, listmap, set)
proc Display status of outstanding vol processes (list, kill)
refresh Start/stop battery refreshing or display it's status
reset Reset system
set Display or modify the set information
shutdown Shutdown disk tray or partner group
sys Display or modify the system information (list) (*mp_support to rw for dmp)
tzset set the time zone
ver Display the software version
vol Display or modify volume information

Firmware upgrading: (strongly recommended to have array "out of use" before upgrading
firmware. This includes disable polling from Component manager)

FTP firmware files to / on the array. At this moment the files can be found at
http://icode.ebay but in the future they will be available on sunsolve Patch 109115.xx.

Raid controller firmware upgrade:


:/:> boot -i nb###.bin
:/:> reset -y (Warning: if base firmware was below 1.17a, use serial port to reset)
EEprom upgrade:
:/:> ep download ep2_09.bin
UIC upgrade:
:/:> lpc download u#l# lpc_04.11 (3minutes/card, will take card off line)
Disk upgrade: (unmount volumes, 20min for 9 disks, led goes amber during download)
:/:> disk download u1d1-9 D44a.lod

Useful Array files:

/syslog Array error log file, 1Meg in size. Then gets copied to .old
/syslog.old backup to syslog
/etc/syslog.conf Configures where to send error messages

Comm port wiring for notebook: ( it works I verified it)

RJ11 to DB9 or DB25


1 grd --------------- 5 grd--------------7 grd
5 RXD -------------- 3 TXD-------------2 TXD
6 TXD -------------- 2 RXD-------------3 RXD

123456

Useful web sites:


http://icode.ebay Firmware
http://ISI.com PSOS o/s information
http://thedance.ebay/hardware/arrays/purple/hardware.html White papers and documentation

page 43
ACT ( A Crashdump Tool)

ACT is a tool that can be run against a core dump or live system. It generates a report that gives you
server state information based on the core. ACT should be run on the server that panicked or should
at least be run on a server that has the same O/S version as the core that is being analysed. The
engineers that maintain ACT recommend you give it to your customers and have them install it on
their servers. When a core dump is produced they can run it on the core and forward the output
to the solution center, because it is much smaller than the core it will save time in transmission.
Act is supposed to become the standard output that all centers will accept.

Available at Http://cte-www.uk It is in *.gz format. To expand it:

# gunzip CTEact.tar.gz
(this will create a CTEact.tar file)
# tar -xvf CTEact.tar
(this will explode the CETact directory)
# pkgadd -d . CTEact
(will install the package into /opt/CTEact)
(answer install questions, I selected 'n' for mailout option)
(executable is /opt/CTEact/bin/act)

Examples:
# ./act -l (output on live server to screen)
# ./act -l -s /tmp/dir/ (output from live server to seperate files)
# ./act -d /var/crash/hostname/vmcore.0 -s /tmp/dir/ (output core
file to seperate files in /tmp/dir)
# ./act -d /var/crash/hostname/vmcore.0 > /tmp/act_out (output core
file to file /tmp/act_out)

****** Info from our website ******

ACT is a tool developed over several years to aid in the process of


analysing kernel dumps. It attempts to perform a good first pass on a
kernel dump.

ACT prints detailed and accurate information about:


- Where the kernel panicked
- A complete list of threads on the system.
- The contents of the /etc/system file which was read when the failed
system booted
- A list of kernel modules that were loaded at the time of the panic.
- The output of the kernel message buffer
- Full deadlock detection relating to threads blocked on mutexes or
readers/writer locks.
- Threads blocked in either getblk() or biowait().

ACT was conceived and developed by Steve Cumming, while working for what was
SunService and then while working for SMCC European CTE. After a short
illness Steve died on July 12th 1998.

ACT is under continuous development by members of Computer Systems European


CTE group based in Bagshot, UK.

page 44
Installation

ACT now resides in package format for both x86 and sparc,so pkgadd should be
used for installation. To check on the current version click Here.

By installing one of the packages below ACT will be installed for the
appropriate architecture and version of Solaris you are running and a new
RC script will be installed which will configure savecore and run ACT
against the newly generated crash dump upon system reboot.

CTEactx.tar.gz. ACT for X86

CTEact.tar.gz. ACT for SPARC

Or alternatively if you have KENV installed then you can tar the following
over kenv in order to update Kenv with the latest version ACT.

KENVact.tar.gz. ACT for KENV.

Instructions

ACT takes the following options, options may appear in any order :

-d corefile
ACT assumes that the file corefile contains the kernel core image.
This file could be /dev/mem if you want ACT to analyze the running
system.

-l
Should be used when running act on a live system.

-n namelist
ACT assumes that the file namelist contains a valid kernel
namelist. This file could be /dev/ksyms if you want ACT to
analyze the running system.

-s directory
Tells act to split its output into several files writing the data
into the directory specified to aid readability. The files created
are,the names speak for themselves:-

biowait getblk modules msgbuf mutex rwlock


threads system summary sunsolve

-u
Displays stack information in an alternate form

-z
This informs ACT to display timezone information in localtime
rather than GMT

page 45
Advantages of Splitting a Drive into Multiple File Systems (info doc 14622)

Rather than using an entire disk drive for one file system, which may lead to inefficiencies and
other problems, you can split a single drive into sections. The sections are called slices, as
each is a slice of the disk's capacity. Once the partition has been allocated, it becomes the a logical
disk drive. A disk can be split into eight subdisks. The splitting of the disk is often called partitioning
or labeling of the disk drive. Below is an example:

Current partition table (original):


Total disk cylinders available: 2036 + 2 (reserved cylinders)

Part Tag Flag Cylinders Size Blocks


0 root wm 0 - 1872 921.87MB (1873/0/0) 1887984
1 unassigned wm 0 0 (0/0/0) 0
2 backup wm 0 - 2035 1002.09MB (2036/0/0) 2052288
3 unassigned wm 1873 - 2035 80.23MB (163/0/0) 164304
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 unassigned wm 0 0 (0/0/0) 0
7 unassigned wm 0 0 (0/0/0) 0

partition>

Here are some of the reasons for multiple filesystems on one hard drive.

1. Damage Control: If the system were to crash due to software error, hardware failure,
or power problems, some of the disk blocks might still be in the file system cache and not
have been written to disk yet. This can cause damage to the filesystem structure. While the
methods used try to reduce this damage, and the FSCK utility can repair most of the damage,
spreading the files across multiple filesystems minimizes the possibility of damage, especially
to those files that are needed during boot-up. When the files are split up across the disk
slices, critical files end up on slices that rarely change or are mounted read-only and never
change. The chances of them being damaged and preventing you from recovering the remainder
of the system are greatly reduced.

2. Access Control: Only complete slices can be marked as read-only or read-write.


If you desire to mount the shared Operating System sections as read-only to prevent changes, they
have to be on their own slice.

3. Space Management: Files are used from a reserve of free space on a per-file system basis.
If, for example, a user has allocated a large amount of space, depleting the free space, and the
entire system disk were a single filesystem, there would be no free space left for critical system
files. The entire system would freeze when it ran out of space.
Using separate filesystems, especially for user files, allows only that a single user, or group of
users, to be inconvenienced when filesystem becomes full. The system will continue to operate,
allowing the System Administrator to handle the problem. The exception to the above scenario is
the root filesystem.

4. Performance: The larger the filesystem, the larger the tables that must be managed.
As the disk fragments and space become scarce, the further apart the fragments of a file
might be placed on the disk. Using multiple (smaller) partitions reduces the absolute distance
and keeps the sizes of the tables manageable. Although the UFS file filesystem does not suffer
page 46
Advantages of Splitting a Drive into Multiple File Systems (cont.)

from table size an fragmentation problems as much as System V file systems, this is still a
concern.

5. Backups: Many of the back-up utilities, such as "ufsdump" work on a complete filesystem basis.
If a filesystem is large, it could take longer than you want to allocate to back-up. Most importantly,
multiple smaller backups are easier to handle and recover from.

Below is a listing of slices, some that are required, root and swap, and the recommended additional
slices such as usr, var, opt, home and tmp.

1. The root slice: The root slice is mounted at the top of the filesystem hierarchy. It is mounted automatically
as the system boots, and cannot be unmounted. All other file systems are mounted below the root.

The root filesystem needs to be large enough to hold the following:


* The boot information and the bootable kernel (kernel/genunix), and a backup
of the kernel just in case the main one gets damaged.
* Any local system configuration files, which typically reside in the /etc directory.
* Any stand-alone programs, such as diagnostics, that may be run instead of the OS.

The root partition typically runs on between 15 and 30mb. It is usually placed on the first slice of
the disk, or more commonly know as slice 0 or a.

2. The swap slice: The default rule is that there is twice as much swap space as there is RAM
installed on the system. For example, if you have 16mb of ram, the swap space would need
to be 32mb. Although this is just a preliminary template as to how much swap to use,
their are other factors to consider, an example would be if a users system is running large
applications that use large amounts of data, such as a CAD application. You can monitor the
amount of swap space used via the pstat or swap commands. If you did not allow enough swap
space during the initial install you can add additional swap with either the swapon or swap
commands.

3. The usr slice: The usr slice holds the remainder of the operating system utilities. It needs to be
large enough to hold all the packages you chose to install when installing the OS. If you are going to
install local applications or third-party applications in this slice, it needs to be large enough to hold
them. It is generally better if the usr slice contains the operating system and only symbolic
links to the applications. The filesystem is often mounted read-only to prevent changes.

4. The var slice: The var slice holds the spool directories used to queue printer files and mail, as well
as log files that my be unique to the system. It also holds the /var/tmp directory, which is used for
larger temporary files. It is the read-write counterpart to the usr slice. Every system, even a diskless
client, needs it's own var filesystem. It is not a filesystem that can be shared with any other
system(s).

5. The opt slice: In the newer UNIX systems based on System V release 4 (Solaris 2.x) many sections
are now optional and no longer needed to be loaded on the /usr filesystem. They are now installed
onto the /opt filesystem. Additional add on packages are also installed in this filesystem.

6. The home or export home (remote users) slice: The home directory is where the user's login directories
are placed. Making home its own slice prevents users from hurting anything else if they run this
filesystem out of space. A good starting point for the size of this slice is 1mb per application user plus
5mb per power user and 10mb per developer you intend to support.
Page 47
Advantages of Splitting a Drive into Multiple File Systems (cont):

These are rough estimates and are to be only used as a guideline, your configuration may need
more or less space per user. Usually this is /export/home. Don't put things into /home,
as this is a reserved mount point for automounted NFS filesystems. It's fine to use when
automounter is turned off, but it is on by default.

7. The tmp slice: Large temporary files are placed in the /var/tmp but sufficient temporary files are
placed in /tmp. The files in the /tmp directory are very short-lived and are cleared out during a reboot of
the system. If users run mostly application based programs 5 to 10mb should be sufficient for this
slice. If developers are the primary users of the system 10 to 20mb may be needed. Once again these
numbers or only a guideline, your needs may be different.

How to configure a system to run on a network (info doc 14981) (also see pg 56 Adding a 2nd network interface)

1. /etc/hosts
This file is used to resolve host name into IP addresses. This file must be updated if no naming
service is being used. This file should contain the IP and host name of each system on the
local network, including any gateways or routers.

Example:
127.0.0.1 localhost
129.145.71.109 kishori loghost #this is the IP and host name for the local machine
129.145.71.110 sage #this is the IP and host name for a host on the network

2. # ifconfig -a
Be sure that both the loopback and network interface are up and running.

Example:
lo0: flags=849<UP,LOOPBACK,RUNNING,MULTICAST> mtu 8232
inet 127.0.0.1 netmask ff000000
le0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500
inet 129.145.71.109 netmask ffffff00 broadcast 129.145.71.255

If the interface to the network is not up and running do the following:

# ifconfig le0 plumb


NOTE: The default may be hme0 (for most Ultra machines)

3. /etc/netmasks
This file should contain the netmasks. If you are using the default netmasks and it appears in
ifconfig -a, this file is not necessary.

Example:
# The netmasks file associates Internet Protocol (IP) address
# masks with IP network numbers.
#
# network-number netmask
#
# Both the network-number and the netmasks are specified in
# "decimal dot" notation, e.g:
# 128.32.0.0 255.255.255.0
#
129.145.0.0 255.255.255.0
page 48
How to configure a system to run on a network(cont.):

4. /etc/defaultrouter
If you want to define a default router include the router name in this file.

5. /etc/hostname.le0 or /etc/hostname.hme0 (depending on you interface type) This file should contain
the name of the local host.

6. /etc/resolv.conf
If you are using dns this file should contain the name of the domain and the IP address of the nameserver.
It is acceptable to list more than one nameserver (up to 4). The nameservers will be consulted in the
order listed. Be careful this file is very sensitive to extra spaces and tabs.

Example:
domain support.Corp.Sun.Com
nameserver 129.150.254.2

7. /etc/nsswitch.conf
Check this file for the appropriate entries. If a naming service is being used this file should reflect that.

8. It is a good idea to reboot the system at this point. Check to see if the network is working by pinging other
machines both inside and outside of your network.

SEVM - How to recover a primary boot disk. (info doc 14820)

NOTE: This document was written for VxVM 2.x. New functionality in VxVM 3.x renders many
of the "extra steps" in replacing a primary root disk obsolete. See the comments interspersed
below regarding steps when using VxVM 3.x.

If Volume Manager (VxVM) is running on a system with the root disk encapsulated and mirrored, and
the root disk fails, the system stays up and running, due to the fact that it is mirrored, but how can you
recover the original root disk?

First, some terminology:

The 'primary' root disk is the system disk on which the OS was originally installed. This
disk was "encapsulated" into VxVM and then mirrored. Since this disk is encapsulated, there is a
direct mapping of partitions onto volumes for /, swap, /usr, and /var.

The 'secondary' root disk is a disk which was first initialized into VxVM and then used to form a mirror
for the primary root disk.

VxVM 2.x: Since it was initialized, rather than encapsulated, there is no mapping of partitions onto the
volumes /, swap, /usr, and /var. VxVM 3.x: When the mirror of the root disk is created, the mapping
of partitions onto the volumes /, swap, /usr, and /var is maintained.

Page 49
SEVM - How to recover a primary boot disk. (cont.)

RECOVERING THE 'SECONDARY' BOOT DISK:

If the 'secondary' system disk fails, the replacement of the disk is straightforward. It is handled in
the same manner that any other failed drive needs to be replaced.

The easiest way to do this is to run 'vxdiskadm' and choose option #4 (Remove a disk for replacement).
Then, shut down the system (if necessary) to physically replace the disk, and reboot.

Run 'vxdiskadm' again, this time choosing option #5 (Replace a failed or removed disk). When asked
to 'encapsulate' the disk, reply "no", and then reply "yes" when asked if you wish to initialize it.

This will begin recovery of the disk and the mirrors will resync automatically.

RECOVERING THE 'PRIMARY' BOOT DISK:

NOTE: If you are running Volume Manager version 3.x.x or above, it is not necessary to follow the
steps below. Instead, the process for replacing the 'primary' boot disk is EXACTLY the same as that
for the 'secondary' boot disk, which is shown above. The reason for this is because Volume
Manager 3.x automatically creates the underlying "hard" partitions for /usr and /var on the replacement
disk, whereas older versions did not.

If you are using Volume Manager 2.x, continue on:

The recovery of the 'primary' boot disk contains a few additional steps because the procedure must
reestablish the direct mapping between the partitions on the disk and the system volumes. This is
necessary so that the system can be changed back to use underlying devices, should this be
necessary (for example, to perform a system upgrade or boot from cdrom to fsck one of these filesystems).

1.Run 'vxdiskadm' and choose option #4 (Remove a disk for replacement). Then, shut down the system
(if necessary) to physically replace the disk, and reboot.

2. Run 'vxdiskadm' and choose option #5 (Replace a failed or removed disk). When asked to 'encapsulate' the
disk, reply "no", and then reply "yes" when asked if you wish to initialize it.

3.This step will change depending on the number of partitions on the boot disk. The 'vxdiskadm'
command will put back partition 0 (for /) automatically, and may also do this for swap. However,
if you have any additional volumes on that disk (i.e., /usr or /var), you will have to run a command
to put the partition on the new disk in the correct location.

Examine the partitions on the replaced disk by running 'format' or 'prtvtoc' on it. At the very least, you
will see a partition for root and one for the public and one for the private partitions for VxVM. Determine
if any partitions are missing. If so, these "missing" partitions can be recreated easily using the steps below.

The command to use is 'vxmksdpart'. You give this command the name of a particular subdisk, and it creates a
partition on the disk in the correct location. The syntax is:

/etc/vx/bin/vxmksdpart <subdisk> <partition> <tag> <flags>

Page 50
SEVM - How to recover a primary boot disk. (cont.)

For example, if you have a subdisk named "disk01-02" and wanted to create partition 7 on the disk to map
this subdisk, you can run

/etc/vx/bin/vxmksdpart disk01-02 7 0x00 0x00

3a. SWAP. To create a partition for the swap volume, run:

/etc/vx/bin/vxmksdpart -g rootdg <subdisk> <partition> 0x03 0x01

where <subdisk> is the name of the subdisk used in the swapvol volume on the primary boot disk
(for example, "rootdisk-01"), and <partition> is the unused partition to use for swap (for
example, "1"). The "0x03" tag specifies this partition is for 'swap'.

3b. USR. To create a partiton for /usr (if this disk contains /usr), run:

/etc/vx/bin/vxmksdpart -g rootdg <subdisk> <partition> 0x04 0x00

3c. VAR. To create a partiton for /var (if this disk contains /var), run:

/etc/vx/bin/vxmksdpart -g rootdg <subdisk> <partition> 0x07 0x00

There is no reason to create any other partitions on the boot disk.

Disable DMP

Note: Be sure to do these steps first: 1. umount all file systems created on Volume
Manager volumes 2. Stop the Volume Manager (vxdctl stop).

1. remove the "vxdmp" driver from the "/kernel/drv" directory


rm /kernel/drv/vxdmp
2. edit /etc/system, and remove the line:
forceload: drv/vxdmp
3. Remove the Volume Manager DMP files:
rm -rf /dev/vx/dmp /dev/vx/rdmp
4. symbolically link /dev/vx/dmp to /dev/dsk
ln -s /dev/dsk /dev/vx/dmp
5. symbolically link /dev/vx/rdmp to /dev/rdsk
ln -s /dev/rdsk /dev/vx/rdmp
6. shut down the system to disable the DMP functionality
7. reboot

Patch 105181-20 not loading... Check for 106125, 106292, 106361-08

page 51
Memory Scrubbing

On Ultra Enterprise (sun4u) platforms ECC is generated and checked by the UPA devices
(CPU, SYSIO and PSYCHO), not by the memory controller (Address Controller or AC).
Thus, ECC covers the entire data path between devices and memory.

***This means that an ECC error can be reported against a memory (DIMM/SIMM) that might not be bad ***

For a few ECC errors one may not recommend DIMM/SIMM replacement however in the case when
the errors are exactly 12 hours apart the DIMM/SIMM must be replaced. Memory scrubber runs every
12 hours after the system is booted. The purpose of scanning physical memory is to read each memory
location and determine if the data and ECC are correct. If the data does not match ECC, ECC will be
rerun and correction made to memory content. If it fails exactly 12 hours apart it means the error
appeared again despite of the correction, it will be corrected again however the DIMM/SIMM must be replaced.

check to see if memory scrubbing is enabled do:


# echo disable_memscrub\ /X | adb -k

physmem 3b7b
disable_memscrub:
disable_memscrub: 0

if it is "0" it is enabled
if it is "1" it is disabled

Display a remote application GUI on your local server

When using telnet to connect to a remote server you can have the a application that has a GUI
interface (like VTS) display on your local server by doing the following:

1. # /usr/openwin/bin/xhost + (run this on your local server. 'xhost - ' removes permissions)
2. Connect to remote server and:
If using csh, use this syntax: If using sh or ksh, use this syntax:
# setenv DISPLAY <hostname>:0.0 & # DISPLAY=<hostname>:0.0
example: # export DISPLAY
# sentenv DISPLAY persia:0.0 &

3. Run application and the GUI should display on the local server

page 52
Cluster 2.x http://suncluster.eng
http://neato.east/suncluster/scinstall.html (good install doc)
General:

Up to 4 nodes in cluster
Only Sun Storage is supported (can get waiver, but seldom granted)
HA or PDB (Parallel Data Base)
HA - 1 server runs at up to 100% or 2 up 50 % so the other node can take over in case of
failure
PDB - Both servers access the database simutaiously, no logical hosts or shared ccd
Supports Solaris 2.6, 7, 8
Supports QFE, SCI, fast ethernet, gigabit ethernet on the private net
Supports different types of server nodes in the cluster
Terminal concentrator is special model, it does not send a break on power on
DMP and Fast Write Cache not supported
(touch /kernel/drv/ap before vxvm install to not load DMP)

Cluster install (chapter 8 sun cluster 2.2 book)

Admin w/s Only requires end user distribution


2.2 release 7/00 has all the cluster related o/s patches
install order: o/s, cluster patches, cluster software
important files:
/etc/clusters logical hostname and nodes
/etc/serialports node name and concentrator port

Server install Requires full distribution, 10k requires full+oem


installer must be root
Avoid 'scinstall' "change" option if possible. Use 'scconf'command
Software components:
CMM -Cluster Membership Monitor
CCD - Cluster Configuration Database
SMA - Private Network Management
SSVM/CVM - Volume manager
PNM - Public Network Management
Logical Hosts
DLM - Distributed Lock Manager
Data Services

Topologies:
Clustered Pair
N+1 (hot standby node)
Ring or cascade
N to N scalable (cascading failover)
Shared Nothing ( used for Informix parallel server)

OPS : (Oracle Parrell Server)

No logical hosts
The instants of Oracle syncing goes over the private network
No shared CCD
Must select CVM on install even with Volume Manager 3.0.4, to get OPS pick at end.
Must install UDLM (Oracle CD)
Create shared disk group while only one node in cluster.
Page 53
Cluster 2.x (cont.)

Hardware Notes:
Must change the initiator id on one node if using SCSI arrays between 2 nodes
(see procedure 5-17)
If Quorum device is replaced it needs to be reconfigured.
#scconf - q
A5000 - full loop only
must be mirrored
DMP, FW cache not supported
Direct or Hub attached (pg 5-23 5-27)
Wiring Diagrams
(pg 5-30)
SCI - scrubber jumpers need to be 'on' on one node 'off' on all the other nodes
/opt/SUNWsma/bin (has the SCI sm_config template files you need to
modify and run sm_config)
switch1.sc (4 nodes, 8 cards, 2 switches)
switch2.sc (2 nodes, 4 cards, 2 switches)
link1.sc (2 nodes, 4 cards, 0 switches)
#/opt/SUNWsma/bin/sm_config - f template file

Terminal Concentrator - port 1 is used for setup (numbered 1-8 not 0-7) (pg 5-56)
Enable setup mode - Power On < 30sec (test button) 15 more sec (test button)
should get monitor::
:: erase EEPROM (to set password to default, default is IP address of box)
Remove the password from port 8 in a 3 node Nto N cluster for 'port locking'
Cluster Commands:

abort partition Same as scadmin stopnode... Use scadmin stopnode command


ccdadm <clustname> -p ccd.database.ssa - creates a ccd.database.pure file for recovery use
-r ccd.database.pure - restores to ccd.database file
-v verify consistancy of the dynamic copy of ccd.database
-x convert the candidate file to a CCD database. Or verifies the CCD file.

ccp Command used to run the cluster control panel software on the
admin workstation
# ccp clustername &
cconsole Command used to start up the cluster console on the admin W/S
# cconsole
get_node_status Command used to get the status of a node (also can use hastat and
scconf clustername - p commands)
# get_node_status
haswitch Switch logical host to another node (will start the reconfiguration)
# haswitch nodename
hastat Will give you the status of the cluster, will lie if private network is
down. You can run it in the common window to get all views
# hastat (- m 0 skip messages)
hareg registers data service with HA and associate the given logical
host.
# hareg - s - r dataservice - h logicalhost
# hareg - y dataservicename (to turn on a dataservice)
# hareg (to verify a service is turned on)
# hareg - n dataservicename (to stop a data service)
# hareg - u dataservicename (will shutdown dataservice on all
Page 54 logical hosts)
Cluster 2.x (cont)

Cluster Commands: (cont)

pnmset Command to create PNM NAFO groups (on each node) for the public
network interfaces to be used for the NFS data service.
# opt/SUNWpnm/bin/pnmset (follow interactive install)
pnmstat - l Command lists the /etc/pnmconfig file (to set up NAFO groups)
scadmin startcluster The first node into the cluster must enter with the 'cluster ' switch.
# scadmin startcluster nodename clustername
scadmin startnode All remaining nodes can join the cluster with the startnode switch
# scadmin startnode
scadmin stopnode To remove your node from the cluster use the stopnode switch. (do
this before init or shutdown commands)
# scadmin stopnode
scadmin switch Switch logical host to another node (will start the reconfiguration)
same as haswitch command
# scadmin switch nodename
scconf Command used to configure cluster parameters (many, use MAN)
# scconf - F (creates admin filesystem, each node)
# scconf - L (for logical hosts) (one node, diskset)
# scconf - q (for quoram device)
# scconf -N (to change a node ethernet address )
scdidadmn Command to initialize the Disk ID psudo driver (SDS install only)
builds a file with paths from each node to disks
# scdidadm - r (on node 0 to initialize)
# scdidadm - l (L) (verify DID configuration)
scinstall Installation command for Sun Cluster from CD
scmgr Command to start Sun Cluster manager (cluster monitor) (set DISPLAY)
# /opt/SUNWcluster/bin/scmgr nodename &
xhost Command on admin W/S to allow all xhost connections from
cluster nodes (graphics)
# /usr/openwin/xhost +

Cluster Files:

/etc/opt/SUNWcluster/conf/clustername.cdb
Contains Install info, flat file use more command to view.
/etc/opt/SUNWcluster/conf/ccd.database
Contains cluster database, viewed by scconf, scadmin commands. If you have to restore
this file to a 'bad' node, you must reboot (file info is kept in memory)
/etc/opt/SUNWcluster/conf/hanfs/vfstab.logicalhostname
Logical hosts vfstab file
/etc/opt/SUNWcluster/conf/hanfs/dfstab.logicalhostname
Logical hosts dfstab file (shared filesystems)
/etc/clusters
Admin W/S file, contains cluster names and node names
/etc/serialports
Admin W/S file, contains node names and port assignments on the consentrator
/etc/pnmconfig
Public network file. pnmset command creates, pnmstat - l command will list.
/etc/hosts
You must enter logical host name and IP.
Page 55
Cluster 2.x (cont)

Cluster Files:

/etc/name_to_major
vxio must have the same number on both nodes to switch nfs logical host
(unencapsulate first, change number)
/opt/SUNWcluster/bin
Most SC2.2 commands are located in this directory
/var/opt/SUNWcluster
Cluster error messages are located in this directory and in /var/adm/messages

Encapsulating root after using Environmental CD to load O/S:

The newer pci based servers come with a Operating Envrionment Installation CD to use with
Solaris 2.5 and 2.6. This CD will create a mini-root partion and allows you to install and boot the server
from the older versions of Solaris.

The mini-root is currently Solaris 7 and starts at cylinder 0 on the boot disk. Once the intended version
of Solaris is loaded, the environmental CD makes mini-root (not mini-me) swap (slice1), leaving it starting
at cylinder 0. This is alright if you are not encapsulating root.

When you then encapsulate root, swap (slice1) remains starting at cylinder 0, and veritas will not allow that
space to be used for a core dump. It assumes it is reserved for the VTOC.

One way we have used to get around this is to boot from the Operating Envrionment Installation CD,
load mini-root onto one disk and the intended O/S on another, through the custom install option. Then
boot from the other disk and encapsulate it.

Adding a second network interface:

(also see pg 48 - 49 How to configure a system to run on a network)


This proceedure can also be used to add the first network interface and may work without booting the machine.

- add hostname and ip address to /etc/hosts file (hostname is usually hostanme_interface ex: sunnie_qfe0)
- create a /etc/hostname. interface file # touch /etc/hostname.sunnie_qfe0
- vi /etc/hostanme. interface file add entry at top (no spaces) hostname_interface
- ifconfig interface (hme0,qfe0,ect.) plumb
- ifconfig interface inet IP_address # ifconfig qfe0 inet 129.145.121.123
- ifconfig interface netmask 255.255.255.0 # ifconfig qfe0 netmask 255.255.255.0
- ifconfig interface broadcast IP_address.255 # ifconfig qfe0 broadcast 129.145.121.255
- ifconfig interface up #ifconfig qfe0 up
- ifconfig - a (if ready to use, should look like this:)
qfe0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500
inet 129.145.121.123 netmask ffffff00 broadcast 129.145.121.255
ether 8:0:20:88:xx:xx

*** Warning: touch the file /etc/notrouter so the server will not route between the two ethernet interfaces***

Adding a default gateway:

# route add default gateway_IP_address


then vi /etc/defaultrouter and enter gateway_IP_address (to keep router thru reboots)

page 56
Veritas Volume Manager :

Volume Manager takes physical disks and allows you to create logical volumes across these disks.
A group of physical disks is called a 'disk group'
All or portions of these physical disks can be combined to create logical 'volumes'
You then can create filesystems on these logical volumes that span multiple physical disks.

Veritas Volumes have 2 partitions on them, a public and a private region.


The public region is the size of the whole physical disk
The private region is 1024 sectors long. The configuration database is located in this region.
There is enough room in the private region to define 128 disks.
The private region is usually located at the beginning of a disk
And is usually slice # 3.

If you run # prtvtoc /dev/rdsk/c#t#d#s2 on a disk initialized under vm (#vxdisksetup - i c#t#d#)


a '15' in the Tag column output indicates the private region
a '14' in the Tag column output indicates the public region

Rules:

- There must be a rootdg, for vxvm to come up at boot. This is usually made when you
install vxinstall volume manager and encapsulate your boot disk. Although you do not
have to encapsulate the boot disk, rootdg can be made up any disk.
- You must have 2 unassigned slices to encapsulate a disk. (public and private regions)
- vxunroot will unencapsulate a volume only if /, swap, /usr, /var, and /opt are the only
filesystems on the encapsulated disk.

General the flow of building logical volumes, creating a filesystem and mounting it, is as follows:

1. assign physical disks to free disk pool (to use with volume manager)
# vxdisksetup - i c#t#d# c#t#d# (ect...)

2. create a disk group (uses disks in the free disk pool. You assign names. nconfig is private db
copies, default is 4 and nlogs kernel logs, both switches are optional)
# vxdg init diskgrp_name disk_name=cxtxdx nconfig=# nlog=#

3. add disks from the free disk pool to the diskgroup


# vxvg - g diskgrp_name adddisk disk_name=cxtxdx disk_name=cxtxdx (ect...)

4. Create a logival volume in your disk group mirror


# vxassist -g diskgrp_name -U fsgen make vol_name size layout=stripe nstripe=# disk_name disk_name (ect..)
(ex: 100m) raid5 {nolog}
5. mirror a striped or concat logical volume (optional)
# vxassist -g diskgrp_name mirror vol_name disk_name disk_name disk_name (ect..)

6. start the volume


#vxvol start vol_name

7. Make the filesystem that sits on the logical volume


# newfs /dev/vx/rdsk/ diskgrp_name/ vol_name

Page 57
Veritas Volume Manager (cont):

General the flow of building logical volumes, creating a filesystem and mounting it, is as follows:

8. create a mount point (you decide dir_name)


# mkdir /dir_name

9. Mount the filesystem on the mount point


# mount /dev/vx/dsk/ diskgrp_name/ vol_name /dir_name

Break a mirror and unencapsulate:

# vxprint - htg rootdg (get the names of mirror plexes)


# vxplex - g rootdg - o rm dis rootvol-02 swapvol-02 (use pl names from vxprint)
# vxunroot (this will ask for a re-boot when completed)

(you can use vxdiskadm to re-encapsulate)

Break a mirror and take the plex to make another volume:

# vxprint - htg dg_name (find plex name of mirror volume you want to use)
# vxplex - g dg_name dis plex_name (dissociate plex with volume)
#vxmake - g dg_name - U fsgen vol vol_name plex=plex_name (make the volume)
#mkdir /mp_name (create a mount point)
#vxvol - g dg_name start vol_name (start the newly created volume)
#mount /dev/vx/dsk/dg_name/vol_name /mp_name

To boot without Volume manager:

rem out 'vxio' lines in /etc/system (usually 2 lines at the end of vm section)
copy /etc/vfstab to /etc/vfstab.vm
copy /etc/vfstab.prevm to /etc/vfstab
touch /etc/vx/reconfig.d/state.d/install-db
reboot
(to reverse)
uncomment 'vxio' lines in the /etc/system file (on both disks if root was mirrored)
copy /etc/vfstab.vm to /etc/vfstab (on both disks if root was mirrored)
rm /etc/vx/reconfig.d/state.d/install-db
reboot

Deport and Import a disk group:

# vxdg list (get a list of disk groups)


# vxdg deport dg_name
# vxdg import dg_name (can use - n name or - s for shared or - t for temporary optional switches)

Remove a volume from Volume Manager:

# umount / vol_name or (filesystem that sits on volume)


# vxvol - g dg_name stop vol_name (stop the volume)
# vxedit - g dg_name - r rm vol_name (recursivly removes volume, plex, and sub-disk from vm)

Page 58
Veritas Volume Manager (cont):

Volume Manager commands:

vxdg free how much free space in a diskgroup: vxdg - g dg_name free
vxdg list list all imported disk groups (exported use: vxdisk - s list | grep dgname)
vxdg init Creates a disk group: vxdg init dg_name disk_name=c#t#d#
vxdg adddisk Add disk to dg: vxdg - g dg_name adddisk disk_name=cxtxdx
vxdg rmdisk Remove disk from dg: vxdg - g dg_name rmdisk disk_name
vxdg upgrade Upgrade dg after VM upgrade: vxdg upgrade dg_name
vxdg deport deport a dg: vxdg deport dg_name
vxdg import import a dg: vxdg import dg_name
vxassist make makes a logical volume: mirror
vxassist -g diskgrp_name -U fsgen make vol_name size layout=stripe nstripe=# disk_name disk_name (ect.
raid5
vxassist maxsize what is the max size raid you can make in a disk group:
mirror
vxassist - g dg_name maxsize layout=stripe nstripe=#
raid5
vxassist mirror mirror a stripe or concat vol :vxassist - g dg_name vol_name disk_name(s) &
vxassist remove mirror Used to remove a mirror permenemtly (do not use to break mirror)
vxassist - g dg_name remove mirror vol_name
vxplex used to attach and dissociate plex(es) with volumes:
vxplex att vol_name plex_name or vxplex - o rm dis vol_name
vxdisk - s list | grep dgname Gives you a listing of all disk groups
vxdisksetup - i used to add a disk to the volume manager free disk pool: vxdisksetup - i c#t#d#
vxdiskunsetup - C used to remove a disk from the free disk pool: vxdiskunsetup - C c#t#d#
vxdiskadd will do both the vxdisksetup and vxdg adddisk: vxdiskadd c#t#d#
vxvol start start a volume after it was made with vxassist or vxmake: vxvol start vol_name
vxvol stop used to stop a volume after a umount: vxvol stop vol_name
vxedit - r rm allows you to recursivly remove a volume, plex or subdisk: vxedit - r rm vol_name
plex_name

vxmake sd manually make a sub-disk: vxmake sd sd_name offset=# len=size disk=disk_name


vxmake plex manually make a plex from a sub disk: vxmake plex plex_name sd=sd_name
vxmake vol manually make a volume from a plex: vxmake - U fsgen vol vol_name plex=plex_name
vxunroot unencapsulates a disk: vxunroot disk_name
vxdiskadm menu driven disk adminiatration
vxio set set the number of vxio deamons (default is 10. 2/cpu is recommended) : vxio set #
permently set daemons in the s85vxvm-startup2 file.

Volume Manager files:

/etc/vx/bin:/opt/VRTSvmsa/bin:. Set PATH to:


/etc/vx/reconfig.d/state.d/install-db Touch this file to prevent Volume manager from starting
/etc/vx/reconfig.d/disk.d/cxtxdx
/var/opt/vmsa/logs/commands all GUI commands are located here
/etc/vfstab.prevm Copy of the vfstab before vm was installed
/opt/VRTS/bin/vea GUI for version 3.5

Page 59
FTPing to and from sunsolve:

You can use this to temporarily store files that you may want to access at a customers site or to
send files from a customer site that you can retreive on swan.
Anything sent to sunsolve will be deleted after two days

Internal to sunsolve:
(change to directory where the file you want to send resides)
# rftp sunsolve.sun.com
Name : anonymous or suncore
Password: (enter your e-mail address or suncore passwd changes weekly check url:)
https://livelink.central.sun.com/livelink/livelink?func=ll&objId=5537115&objAction=browse&sort=name
ftp> cd cores
ftp> mkdir dir_name (as of 5/01 you cannot create directories. Skip to bin command)
ftp>cd dir_name
ftp>pwd
257 "/cores/dir_name" is current directory.
ftp> bin
ftp> put file_name_to_be_sent
ftp> quit
#

External from sunsolve:

# ftp sunsolve.sun.com (192.9.9.24)


login: anonymous
password: your_email_address
ftp> cd cores/dir_name/ (as of 5/01 you cannot create directories. Skip to bin command)
ftp> bin
ftp> get file_name_to_be_retrieved
ftp> quit
#

Page 60
Serengeti: 3800 - 6800

General: (first supported O/S on serengeti Solaris 8 4/01)

Serengeti 8 (3800):

Support for 2 to 8 Ultrasparc III processors (2 system bds max)


Up to 64 Gbytes of Memory (8 banks of 4 dimms each. 2 banks/CPU. possible that a CPU
be installed without a bank but a populated bank must have corresponding CPU installed)
12 hot-swappable compact pci (cPCI) slots
Up to 2 domains
Power Server: up to 3 power supplies nema 6-15P (connect internal to rack )
Rack mount: up to 2 NEMA L6-30P
Serengeti 12 (4800):

Support for 2 to 12 Ultrasparc III processors (3 system bds max)


Up to 96 Gbytes of Memory (8 banks of 4 dimms each. 2 banks/CPU. possible that a CPU
be installed without a bank but a populated bank must have corresponding CPU installed)
16 PCI slots or * 8 hot swappable cPCI slots or *combination of 8 PCI and 4 cPCI
Up to 2 domains
Power Server: up to 3 power supplies nema 6-15P
Rack mount: up to 2 NEMA L6-30P

Serengeti 12i (4810): (100% front access for specialized environments.)

Support for 2 to 12 Ultrasparc III processors (3 system bds max)


Up to 96 Gbytes of Memory (8 banks of 4 dimms each. 2 banks/CPU. possible that a CPU
be installed without a bank but a populated bank must have corresponding CPU installed)
16 PCI slots or * 8 hot swappable cPCI slots or *combination of 8 PCI and 4 cPCI
Up to 2 domains
Power Server: up to 3 power supplies nema 6-15P (connect internal to rack )
Rack mount: up to 2 NEMA L6-30P

Serengeti 24 (6800):

Support for 2 to 24 Ultrasparc III processors (6 system bds max)


Up to 192 Gbytes of Memory (8 banks of 4 dimms each. 2 banks/CPU. possible that a CPU
be installed without a bank but a populated bank must have corresponding CPU installed)
32 PCI slots or * 16 hot swappable cPCI slots or *combination of PCI and cPCI
Up to 4 domains (2 domains / partition)
Power Rack mount: up to 4 NEMA L6-30P
Hardware:

SC Board: System Console. You can tip or telnet to the SC card to configure/maintain the server.
(SSC) There are 3 shells you can acess and configure from the SC, Platform shell, Domain shell
and O/S shell on a specific domain. The SC bd is part of the platform, it is not
configured into a domain. A second (slave) SC board is installed if the redundancy
kit is ordered. The SC runs it's own O/S and is upgraded and backed up across the
ethernet connection.

Repeater Bds: The repeater boards establish and maintain the connections between the system boards
(RP) and the IO boats. The 3800 and 4800 have 2 repeater boards, although the circuitry
for the repeaters on the 3800 is on the centerplane. The 6800 has 4 repeater bds.

* When available Page 61


Serengeti: 3800 - 6800: (cont.)

System Boards: The system board is common across all 3 servers. It can have 2 or 4 CPUs
(SB) installed on it (they are not field replaceable). The system board has sockets
for 8 banks of 4 dimms. Each CPU has 2 corresponding dimm banks. It is possible
that a CPU might not have any dimms installed in its corresponding banks.
However, a populated dimm bank must have a corresponding CPU installed.

I/O boat: The I/O boat types : PCI or cPCI, no sbus I/O boat. The PCI and compact PCI
(IB) adapters are installed in the I/O boats. Currently cpci is only available on the 3800.

ID Board: The ID board is a pre-programmed daughter board that is on the centerplane.


The 3800s ID board is incorporated into the centerplane. The id board has
the System chasis id #, System serial #/host id, (6) MAC adresses for the 6800
and (4) for the 3800, 4800.

LEDS: on off
Activate (green): Bd is activated. You must Bd is not activated: you can
NOT remove the board when remove the board when this
this LED is on LED is off

Fault (amber): an internal fault occurred No internal fault occurred

Removal ok (amber): you can safely remove the you must not remove the
component under hot-pluggable component under hot-pluggable
conditions. conditions.

Partitioning:
You can configure the server in single or dual partition mode. If you select dual partition
mode, each partition will be electrically separated from the other. The 3800 (on bd repeaters)
and the 48x0 have dual repeaters one will be configured for each partition, the 6800 has
4 repeater bds, 2 will be configured for each partition. Dual partition mode is recommended for
keeping domains electrically separated.

Domains: On the serengeti, you configure the resources you want allocated to each domain. The domain
(like on an E10K) then becomes an independent server. At a minimum each domain must have
a system bd, I/O boat with ethernet/scsi PCI card, and a boot disk.

Domain/Partition configurations:

3800/4800/6800: configuration Domain IDs


1 partition 1 domain A
1 partition 2 domains A,B
2 partitions 2 domains A,C (1 per partition, 6800 see comment below)

6800: Domains A,B even bd #s grid0, C,D odd bd#s grid1 (best practices)
2 partitions 3 domains ABC, ABD, ACD, or BCD
2 partitions 4 domains A,B,C,D

Page 62
Serengeti: 3800 - 6800: (cont.)

To connect to the SCC:


# tip hardwire (from admin workstation, or notebook pc to SCC0 console port)
# telnet ip_address_of_SC (ip address of sc must be configured > setupplatform)

Power on hardware:
Connect to SCC
enter 0 (platform shell)
> poweron all
to verify: > showboards -v

Switch to domain: (from platform shell)


> console -d [a, b, c or d]

Power on domain: (from domain shell)


> setkeyswitch on (will start POST)
to verify >showkeyswitch
(once POST is complete should get OK prompt and be able
to run standard OBP commands and boot)

Power off domain: (after init 0, shutdown, ect)


(from domain shell) > setkeyswitch off (wait, takes a while)

Power off platform: (after domain(s) are set off)


(from platform shell) > poweroff all

update SC firmware: (from platform shell on the SC)

>flashupdate -y -f ftp://root:password@host_ip/path_to_new_firmware all rtos

Run this command from the platform shell. Keep in mind this command will not
update the slave SC. To update it you must make it the primary or run the command
from the slave SC.

>flashupdate -c <source board> <replacement board> (to copy firmware btwn like bds)

Save SC configuration: (from platform shell on the SC)

> dumpconfig - f ftp://root:password@host_ip/path_to_dumpdir

Restore SC Configuration: (from platform shell on the SC)

> restoreconfig - f ftp://root:password@host_ip/path_to_dumpdir

To create/modify Platform: (from platform shell on the SC)

> setupplatform (enter information and modify ACLs . for each domain use
deleteboard and addboard - d commands )

To create/modify Domain: (from Domain shell on the SC)


> setupdomain (enter information. Defaults in [ ]s) Page 63
Serengeti: 3800 - 6800: (cont.)

Navigating between shells:

When you first connect: enter 0 (platform) 1 (domain A) 2 (domain B) ect...


Platform -> Domain > console - d [A,B,C,D ] (will go to OBP, O/S or shell)
Domain -> Platform > disconnect
Domain -> OBP > break (after 'setkeyswitch on ' had been run)
OBP -> Domain ctrl ] or ~# or (telnet) ctl ] send break or (ssh) #.
Domain -> O/S > resume (after O/S was brought up via 'boot' command)
O/S -> Domain ctrl ] or ~# or (telnet) ctl ] send break or (ssh)#.
Platform Shell commands: (command - h will give you a listing... ** command avalable on slave SC)
addboard assign a board to a domain -d,
connections ** show connections to the system controller or a domain
console connect to a domain shell/console -d
deleteboard delete a board from a domain
disablecomponent add a component to the blacklist
disconnect ** disconnect this connection or a specified connection
dumpconfig ** save the system controller configuration to a server
enablecomponent delete a component from the blacklist
flashupdate ** update flash prom images -y, -f,
help show help for a command or list commands
history ** show shell command history
password change platform or domain password
poweroff turn components off
poweron turn components on
reboot ** reboot the system controller
reset ** reset the other system controller
restoreconfig ** restore the system controller configuration from a server
service service mode (see page 94)
setchs (service cmd) setchs -s ok, suspect, faulty -r "reason for status" -c /N0/SB2/p2
setdate set the date and time for the platform
setdefaults set default configuration values
setescape (5.16.00) change escape charectors (default #.)
setfailover (5.13.00) changes the state of SC failover on, off, force,
setkeyswitch set the keyswitch position for a domain on
set-keygen (5.16.00) Generates/lists ssh host keys/fingerprint -l -r
setupplatform ** configure the platform -p acls, -p partition,
showboards show board information -d,-e, -p, -v,
showchs (service cmd) shows chs status (use with setchs)
showcomponent show state of a component -v,
showdate show the current date and time for the platform
showenvironment show environment sensors -u, -w, -l, -p, -v,
showescape (5.16.00) lists escape charector
showframe show frame information -v,
showfailover (5.13.00) displays SC and clock failover status
showfru (5.16.00) list frus in system -r manr
showkeyswitch show the keyswitch positions
showlogs show the logs -d, -v,
showplatform ** show the status of domain and platform configuration -d, -p, -v,
showsc ** show system controller uptime, version, and configuration -v,
sshrestart (5.16.00) restarts ssh server to put new host keys into effect
testboard test a board
testinterconnect run interconnect test (available in service mode only)
Page 64
Serengeti: 3800 - 6800: (cont.)

Domain Shell commands: (command - h will give you a listing... )

addboard assign a board to a domain -d


break send break to the domain console
connections show connections to the domain
deleteboard delete a board from a domain
disablecomponent add a component to the blacklist
disconnect disconnect this connection
enablecomponent delete a component from the blacklist
help show help for a command or list commands
history show shell command history
password change domain password
poweroff turn components off
poweron turn components on
reset (-x) reset the domain (XIR, will dump a hung domain)
resume return to domain console
setdate set the date and time for the domain
setdefaults set default configuration values
setkeyswitch set the keyswitch position on, off
setupdomain configure the domain -v,
showboards show board information -v,
showcomponent show state of a component -v,
showdate show the current date and time for the domain
showdomain show domain configuration -v
showenvironment show environment sensors -v,
showkeyswitch show the keyswitch position
showlogs show the logs -v
testboard test a board

Setup remote logging:

In setupplatform:
Syslog loghost [ ] : ip_of_adminStation
Log Facility [ ]: local0 (can be 0-7)
In setupdomain: (for each domain)
Syslog loghost [ ] : ip_of_adminStation
Log Facility [ ]: local1 (can be 0-7)
In syslog.conf on admin station:
local0.notice /var/adm/messages.platform
local1.notice /var/adm/messages.domainA
(ect...)

Admin station:
create the files: # touch /var/adm/messages.nnnnnnn
restart syslog: # kill -HUP `cat /etc/syslog.pid` or ( /etc/init.d/syslog stop) ( /etc/init.d/syslog start)

(continued on next page)

Page 65
Setup remote logging: (cont.)
/usr/lib/newsyslog file: (so logs do not grow forever. On line 2 enter all message file names you created.)

--- Change --- --- To ---


LOG=messages #LOG=messages
cd /var/adm for LOG in messages messages.platform messages.domainA (ect..)
test -f $LOG.2 && mv $LOG.2 $LOG.3 do
test -f $LOG.1 && mv $LOG.1 $LOG.2 cd /var/adm
test -f $LOG.0 && mv $LOG.0 $LOG.1 test -f $LOG.2 && mv $LOG.2 $LOG.3
mv $LOG $LOG.0 test -f $LOG.1 && mv $LOG.1 $LOG.2
cp /dev/null $LOG test -f $LOG.0 && mv $LOG.0 $LOG.1
chmod 644 $LOG mv $LOG $LOG.0
cp /dev/null $LOG
chmod 644 $LOG
done
to test logging:

- # logger -p local0.notice "test message for platform log file" (check contents of log files to make sure
logging is working) (if not check permissions on log file)

setfailover off /on and check log file on log host (if not snoop interface, make sure log entry is reaching loghost
also make sure syslogd is not running with the -t switch)
Notes:
- Use 'connections' command to see if ghost sessions are keeping you from connecting to a domain.
(reset the SC , from slave sc or reset button, to remove those sessions.)
- Use the dash (-) to remove an entry when running setupplatform

Firmware: http://pts-americas.west/esg/msg/techinfo/platform/sun_fire/firmware-matrix/
Patch # SC Firmware CPU (MHz) Domain Firmware Other features
-------- --------- --------------- ------------
112127-xx 5.12.5 750/900 (Masks 2.1/2.2 only) 5.12.x
5.12.6 750/900 (Masks 2.1/2.2 only) 5.12.x DR
5.12.7 750/All 900 5.12.x DR/900 2.3
112494-xx 5.13.0 750/All 900 5.12.x or 5.13.x DR/ SC auto failover
5.13.1 750/All 900 5.12.x or 5.13.x “
5.13.2 750/All 900/1050 5.12.x or 5.13.x DR/1050/failover
5.13.3 750/All 900/1050 5.12.x or 5.13.x “
750/All 900/1050 5.12.x or 5.13.x “
5.13.5 750/All 900/1050 5.12.x or 5.13.x “ /L2 timing
112883-xx 5.14.0 750/All 900/1050 5.12.x, 5.13.x or 5.14.0 DR/Failover/COD
5.14.4 750/All 900/1050/1200 5.12.x, 5.13.x or 5.14.x “ /L2 timing

112884-xx 5.15.0 750/All 900/1050/1200 ASR


114523-01 5.16.0 750/All 900/1050/1200 SSH

Freshchoice (scsi2/ethernet) adapter firmware has problem booting CDROM. Bug 4397457
workaround: To patch get-mail of ISP fcode to give longer timeout period:

(set nvram parameter fcode-debug? to true.)

ok cd /ssm@0,0/pci@b,2000/pci@2/SUNW,isptwo@4
ok patch 100 64 get-mail
ok

Page 66
Mounting and unmounting CD without vold:

to stop vold : (automount daemon for cdrom and floppy)


# /etc/init.d/volmgt stop
to mount cdrom:
# mount -F hsfs -o ro /dev/dsk/c0t6d0s0 /cdrom
to unmount cdrom:
# umount /cdrom
to start vold :
# /etc/init.d/volmgt start

Send a file using mailx Command line:

# mailx -r return_email_address -s subject_no_spaces sendto_email_address < filename

This will dump the file into the heart of the e-mail. Use for text documents, post output ect...

Send a message using mailx Command line:

# mailx -r return_email_address -s subject_no_spaces sendto_email_address


Cc: (enter cc: e-mail address if any)
Type text of e-mail (control d) when finished
EOT
#

More T3 info:

Forgotten password:
reset the T3
press (return) within 3 seconds of reset (on the console sesion you have open)
type set passwd (this will display the current password)

T3 Logging: (you will need to modify the T3s host file and syslog.conf file by ftping them to a unix
server, edit them, send the files back to the T3 and reset the T3)
You should already have the T3 connected to the network and be able to telnet to the T3
type 'set' to make sure you have an ip, netmask, gateway, and hostname on the T3
:/: set logto *
modify T3s host file (add ip and hostname of loghost)
modify T3s syslog.conf (add line '*.info @ip_address_of_loghost')
modify loghost syslog.conf file (add line ' local7.info [tab] /var/adm/messages.t3') must use local7
touch /var/adm/messages.t3 on loghost
kill -HUP syslog.d pid or stop and restart it on the loghost
ftp modified host and syslog.conf to the T3
reset the T3 to have changes take effect

Page 67
StarCat 15K:

General:

StarCat 15K:
Has 18 available slots for system board sets. In each of the 18 available slots, you can configure (1)
System bd slot 0 bd and (1) hsPCI, MaxCPU or SunFire Link bd slot 1 bd.

Supports up to 18 system bds (72 CPUs)


and a combination of, not to exceed 18 total slot1 bds:
up to 17 MaxCPU bds (34 CPUs),
up to 18 hsPCI boards (2 3.3v and 2 5v PCI adapters slots per board),
up to 18 SunFire link boards. (includes: 1 3.3v and 1 5v PCI adapter slots per board)
Domains: up to 18 Domains
Power requirements: (12) NEMA L6-30P (2 seperate power grids)

System Board set: (up to 18)

System board set is made up of a system board (slot 0 bd) and a slot 1 type board. A slot 1
type board is usually a I/O (hsPCI) board, but can be a SunFire link or MaxCPU bd. The
slot0 and slot1 boards are physically mounted on a 'carrier plate and expander board'.
The expander bd/carrier plate is then inserted into one of the 18 available slots of the StarCat.

Control Board Set: (2) See Fin I0771-1 (keep old id bd if replaceing CP1500 bd on the SC)
Also see I0761-1 (upgrade CP1500 post & OBP )

Control Board set is made up of a 'System Controller Bd', 'System Controller Perepheral Bd'
and a 'CenterPlane Support Bd'. The system controller runs solaris and the SMS packages.
The System controller peripheral board has 2 SDS mirrored boot disks, DVD-rom and a 4mm DAT
that are used by the System Controller board.
The System Controller bd and SC peripheral bd are mounted on the CenterPlane Support bd. The
Centerplane support bd is then inserted into one of the 2 control bd slots on the StarCat.
The Control Bd set provides system clock, I2C monitoring bus, console bus to all domains,
serial port and 2 net ports to outside world, serial port internal to other SC and internal net connection
to each domain and other SC.

The SCs come with a O/S installed in a 'sys-unconfig' state. When you run smsconfig -m to
configure your SCs, it is easiest if the SCs are on the network and able to reach their default gateway.
IPMP contacts the gateway to determine if the physical interface is up.

Floating = community hostname and IP address. This address will follow the main SC
failover = virtual IP and hostname that will float between hme0 and eri1 on each SC
hme0, eri1= regular IP and hostnames for the interfaces

SC console port pinout: (plus null modem info for connection to 25pin/9pin serial ports)

o o o
td (2/3)<--- rd – 5 > o o o < 3 -- td ----> (3/2) rd
dtr (20/4)<--- cts -- 2 > o | o < 1 – dtr ----> (6,8/6,1) dsr, dcd
|
4 gnd (7/5)

Page 68
StarCat 15K: (cont...)
Example of IPMP configuration on Sun Fire 15K system controllers C (Community) Network:

System Controller Floating IP <===== .150


|
_____________|_____________
| |
_______ .151 .152 ======> IPMP Logical IP failover Address
/ | |
/ ______|_________ _______|________
IPMP | | | |
LEVEL | SC0 | | SC1 |
\ hme0 eri1 hme0 eri1
\_ .100 .101 .200 .201 ===> IPMP Test IP Address

Private internal net interfaces:

scman0: SC's internal ethernet interface to each domain ( I1 network )


scman1: SC's internal interface to other SC (I2 network)
dman0: Domain's internal ethernet interface to each SC and domain ( I1 network)

15k O/S install:

System controller: The SC's are fully functional servers with 2 SDS mirrored 18gb disks,
DVD-rom and a 4mm DAT. They will come already loaded from the factory with Solaris
and SMS. At this time, there is no way to create the ' idprom.image' files in the field (so make sure
they are backed up). The default login and password is sms-svc, sms-svc.

Domain install: If the domain has a D240 attached the install (after creating the domain:
setupplatform, deleteboard, addboard, setobpparams, setkeyswitch) can be done from
the D240s DVD-rom. If you do not have a DVD-rom attached to the domain you are loading
you will most likley have to boot net.

To boot from an install server:


(on install server)
- add the ethernet address and node_name in the /etc/ethers file
- add the node_name in the /etc/hosts file
# add_install_client node_name sun4u (solaris CD in the Tools directory, keep CD mounted)
- check the /tftpboot directory for created files (file name hex representation of nodse IP address)
- check the /etc/bootparams file for node_name

(on the domain)


- check out your network interfaces ok> watch-net-all
- check out network interface alias ok> devalias
- change if desired interface is not listed, nvunalias, show-nets, nvalias, nvstore
- boot net_alias -install

Blacklist: is populated/unpopulated by hand or with the 'enablecomponent'/'disablecomponent'


commands. The path is /etc/opt/SUNWSMS/config/platform or A-R/ blacklist.
Use the 'hpost -? blacklist' command to list possible entries

.postrc: The path is /etc/opt/SUNWSMS/config/platform or A-R/ .postrc.


Use the 'hpost -? .postrc' command to list possible entries

Page 69
StarCat 15K: (cont...)

Send BREAK to domain: (be careful, will stop solaris):


from the console connection: ~# (goes right to OK prompt, NOT domain shell like serengetti)

Decoding CPU locations: 15k

/SUNW,UltraSPARC III @1c 2,0


| |
change to decimal CPU ID = 0-3 system bd, 8,9 (MaxCPU bd.)
divide by 2
result is EX slot
1C16=28 28/2=14
EX slot=14

Decoding Memory locations: 15k


memory offset 4=bank0 6=bank1
|
/SUNW,memory-controller @12 2,400000
| |
change to decimal CPU ID = 0-3 system bd, 8,9 (MaxCPU bd.)
divide by 2
result is EX slot
1216=18 18/2=9
EX slot=9

Decoding I/O card locations: 15k

c= IOC0 d= IOC1
(slot 0 or 1) (slot 2 or 3)
| always 1 board type
| | |
/pci@17c,700000/pci@1/SUNW,isptwo@4/disk@0,0
| | |
change to decimal 6= I/O slot 0 or 2 device identifier
divide by 2 7= I/O slot 1 or 3
result is EX slot
1716=23 23/2=11 r1
EX slot=11

SMS (Sun Management Server)

Default login: sms-svc


Default password: sms-svc

SMS daemons:

dca - domain configuration agent. One for every POST. Talks to dcs on domain (only on active SC.)
dsmd - domain status monitoring daemon (only on active SC.)
dxs - domain X server. One for each domain. (only on active SC.)
efe - event front-end daemon. Part of SMC acts as intermediarybtwn SMC agent and SMS (only act SC)

Page 70
StarCat 15K: (cont...)
SMS daemons: (cont)

esmd - environmentalstatus monitoring daemon (only on active SC)


fomd - failover monitoring daemon
frad - field replaceableunit access daemon
hwad - hardware access daemon
kmd - key management daemon (only on active SC)
mand - management network daemon
mld - messages logging daemon
osd - OpenBoot Server daemon (only on active SC.)
pcd - platform configuration database daemon (only on active SC)
ssd - SMS startup daemon
tmd - task management daemon (only on active SC)

SMS Files:

/export/home/sms-svc/.sms_env - SMS user environment


/export/home/sms-svc/.cshrc - SMS user .cshrc
/export/home/sms-svc/.login - SMS user .login

/etc/opt/SUNWSMS/.sms_groups - sms groups file

/etc/opt/SUNWSMS/config/dsmd_tuning.txt - Domain status and monitoring daemon tuning info


/etc/opt/SUNWSMS/config/esmd_tuning.txt - Environmental status and monitoring daemon tuning info
/etc/opt/SUNWSMS/config/fomd.cf - Failover monitoring daemon config file
/etc/opt/SUNWSMS/config/fomd_sys_datasync.cf - Failover monitoring daemon datasync file

/etc/opt/SUNWSMS/config/platform/.postrc - Platform specific .postrc file


/etc/opt/SUNWSMS/config/platform/blacklist - Platform specific blacklist file

/etc/opt/SUNWSMS/config/A/.postrc - Domain specific (A-R) .postrc file


/etc/opt/SUNWSMS/config/A/blacklist - Domains specific (A-R) blacklist

/etc/opt/SUNWSMS/startup/ssd_start - Start script for the ssd daemons


/etc/opt/SUNWSMS/startup/sms_env.sh -

/var/opt/SUNWSMS/.pcd/domain_info - Platform configuration database daemon domain info


/var/opt/SUNWSMS/.pcd/platform_info - Platform configuration database daemon platform info
/var/opt/SUNWSMS/.pcd/sysboard_info - Platform configuration database daemon sysboard info

/var/opt/SUNWSMS/adm/.logger - Message logging daemon specifics


/var/opt/SUNWSMS/data/osdTimeDeltas - OpenBoot Prom server daemon info file

/var/opt/SUNWSMS/data/A/nvramdata - Domains specific (A-R) nvram information


/var/opt/SUNWSMS/data/A/idprom.image - Domains specific (A-R) idprom information
/var/opt/SUNWSMS/data/A/bootparamdata - Domains specific (A-R) boot parameters

SMS commands: (/opt/SUNWSMS/bin)

addboard - assigns, attaches and configures a board to the domain (domain_id|domain_tag.)


addtag - adds the specified domain tag name (new_tag) to a domain
cancelcmdsync - The command synchronization commands work together to control the recovery of
user-defined scripts interrupted by a system controller (SC) failover
Page 71
SMS commands :(cont)

console - creates a remote connection to the domain's virtual console driver, making the window in which
the command is executed a "console window" for the specified domain
deleteboard - removes a board from the domain it is currently assigned to
deletetag - remove the domain tag name associated with the domain
disablecomponent - adds a component to the domain or platform blacklist
enablecomponent - removes a component from the platform, domain or ASR blacklist
flashupdate - updates the Flash PROM in the system controller (SC), and the Flash PROMs in
a domain's CPU and MaxCPU boards, given the board location.(/opt/SUNWSMS/firmware)
ex: flashupdate -f /opt/SUNWSMS/hostobjs/sgcpu.flash SB1 (leave Name blank to do all SBs)
fruupdate (command in 'help' listing, but no description or man page)
help - displays a list of valid SMS commands along with their correct syntax
initcmdsync - The command synchronization commands work together to control the recovery of user-defined
scripts interrupted by a system controller (SC) failover
marginclock [-f (65|75|83.333) | -s synth-freq | -m [+/-] margin-percent][-y]
marginvoltage [-p1.5] [-p2.5] [-p3.3] [-p5.0] [-pcore] [-m(0|+|-)] [-d domain_id|domain_tag]
[-d domain_id|domain_tag...] [-b location] [-b location...] [-y]
moveboard - first attempts to unassign location from the domain it is currently assigned to and possibly active
in, then proceeds to assign, connect, and configure location to the domain
poweroff - powers off the specified dual 48V power supply, fan tray, or board
poweron - powers on the specified dual 48V power supply, fan tray, or board
reset - allows you to reset one or more domains in one of two ways: reset the hardware to a clean state
or send an externally initiated reset (XIR) signal
resetsc - resets the other SC
runcmdsync - command prepares the specified script for automatic synchronization (recovery) after a failover.
Savecmdsync - The command synchronization commands work together to control the recovery of user-defined
scripts interrupted by a system controller (SC) failover
setbus - perform dynamic bus reconfiguration on active expanders in a domain
setchs - SMS1.4 set component health status. SMS can auto fail components. Setchs lets you change the status
setcsn - SMS1.4 set chasis serial number. allows you to set csn once. (showplatform) # setcsn -c serial#
setdatasync - schedule filename enables you to specify a user-created file to be added to or removed from the
data propagation list.
setdate - allows the SC platform administrator to set the SC or optionally a domain date and time values.
Allows domain administrators to set the date and time values for their domains.
setdefaults - removes all SMS instances of a previously active domain. A domain instance includes all
pcd entries except network information; all message, console, and syslog log files; and, optionally,
all NVRAM and boot parameters. pcd entries and NVRAM and boot parameters are returned to
system default settings
setfailover - provides the ability to modify the state of failover for the SC failover mechanisms
setkeyswitch - changes the position of the virtual keyswitch to the specified value
setobpparams - allows a domain administrator to set the virtual NVRAM and REBOOT variables passed to
OpenBoot PROM by setkeyswitch
setupplatform - sets up the available component list for domains.
showboards - displays board assignments
showbus - display the bus configuration of expanders in active domains
showchs - SMS1.4 displays component health status. EX: showchs -r sb15
showcmdsync - displays the command synchronization list to be used by the spare system controller (SC) to
determine which commands or scripts need to be restarted after an SC failover.
showcomponent - displays whether the specified component is listed in the platform, domain, or ASR blacklist file.
showdatasync - provides the current status of files propagated (copied) from the main SC to its spare
showdate - display the date and time for the system controller (SC) or a domain
showdevices - displays the configured physical devices on system boards and the resources made available by
these devices.
Page 72
StarCat 15K: (cont...)

showenvironment - displays the environmental data


showfailover - provides the ability to monitor the state of the SC failover mechanism.
showkeyswitch - displays the position of the virtual keyswitch of the specified domain
showlogs - displays platform or domain log files. The default is the platform message log.
showobpparams - allows a domain administrator to display the virtual NVRAM and REBOOT parameters
passed to OpenBoot PROM by setkeyswitch
showplatform - Show the available component list and domain state for domains.
showxirstate - displays CPU dump information after sending a reset pulse to the processors
smsbackup - creates a cpio(1) archive of files that maintain the operational environment of SMS
smsconfig -m - configures and modifies the host name and IP address settings used by the MAN daemon,
mand (must have SCs on the network and able to contact the default router for IPMP to work.)
smsrestore - restores the operational environment of the SMS from a backup file created by smsbackup
smsversion - Displays the active version and exits when only one version
of SMS is installed.
sysid {-d domain_id|-f filename} [-m YYYYMMDDhhmm] [-M machineType (defaults to 0x82)]
[-e etherAddr] [-s serial#|-H host_id] sysid -F textIDPROMfile -f newBinaryfile
thermcal - Use command if replaceing a csb bd.
testemail - SMS1.4 allows you to generate a test emailto verify SMS logging and recipients
xir [-d domain_id|domain_tag [-d domain_id|domain_tag]...] [-q] [-y]

local-mac-address :
The "local-mac-address?" eeprom parameter is used enable the MAC addresses which are burnt-in on
network cards.
false - do not use the card's burnt-in adresses, use the nvram default address for all interfaces
(shown on obp banner)
true - use the on-board MAC address (if there is any). This setting is necessary to get a
unique MAC address per interface.

The default setting of the local-mac-address? is set to "false". On non clustered servers the installation
engineer must not forget to set local-mac-address? to true to avoid having one MAC address several
times in the network, which causes network problems.

SDS - How to mirror the root disk


Use this procedure to mirror the system disk partitions using Solstice DiskSuite:

- first format the second disk exactly like the original root disk: (typically s7 is reserved for metadatabase)

# prtvtoc /dev/rdsk/c0t0d0s2 > /tmp/firstdisk


# fmthard -s /tmp/firstdisk /dev/rdsk/c1t0d0s2

- create at least 3 state database replicas on unused (10mb) slices.

# metadb -a -f -c 3 c0t0d0s7 c1t0d0s7 (-a and -f options create the initial state database replicas. -c 3
puts three state database replicas on each specified slice)

- for each slice, you must create 3 new metadevices: one for the existing slice, one for the slice on the
mirrored disk, and one for the mirror. To do this, make the appropriate entries in the md.tab file.

slice 0, create the following entries in (/etc/lvm/md.tab)


d10 1 1 /dev/dsk/c0t0d0s0
d20 1 1 /dev/dsk/c1t0d0s0
d0 -m d10
Page 73
SDS - How to mirror the root disk (cont...)

slice 1, create the following entries in (/etc/lvm/md.tab)


d11 1 1 /dev/dsk/c0t0d0s1
d21 1 1 /dev/dsk/c1t0d0s1
d1 -m d11

Follow this example, creating groups of 3 entries for each data slice on the root disk.

- run the metainit command to create all the metadevices you have just defined in the md.tab file.
If you use the -a option, all the metadevices defined in the md.tab will be created.

# metainit -a -f (-f is required because the slices on the root disk are currently mounted)

- make a backup copy of the vfstab file: # cp /etc/vfstab /etc/vfstab.pre_sds

- run the metaroot command for the metadevice you designated for the root mirror. In the example
above, we created d0 to be the mirror device for the root partition, so we would run:

# metaroot d0

- edit the /etc/vfstab file to change each slice to the appropriate metadevice. 'metaroot' command has already
done this for you for the root slice.

/dev/dsk/c0t0d0s1 - - swap - no -
to
/dev/md/dsk/d1 - - swap - no -

Make sure that you change the slice to the main mirror, d1 not to the simple submirror, d11.

- reboot the system. Do not proceed without rebooting your system, or data corruption will occur.

- After the system has rebooted, you can verify that root and other slices are under DiskSuite's control:

# df -k
# swap -l

The outputs of these commands should reflect the metadevice names, not the slice names.

- Last, attach the second submirror to the metamirror device.

# metattach d0 d20 (must be done for each partition on the disk, and will start the syncing of data)

- to follow the progress of this syncing for this mirror, enter the command

# metastat d0

Although you can run all the metattach commands one right after another, it is a good idea to run the next
metattach command only after the first syncing has completed. Once you have attached all the submirrors
to the metamirrors, and all the syncing has completed, your root disk is mirrored.

Page 74
IPMP: (Solaris 8 Update 2 10/01)

General Description:

IPMP allows you to create a logical IP address that can be swapped on-the-fly to another
physical network interface.

IPMP Test IP Address: physical interfaces (hme0,qfex,ge). This address is used by IPMP to determine
the status of the physical interface. It is not for use by applications.

IPMP Logical IP Address: IP address is used by applications for data transfers to and from
the server. This IP address will failover between the configured interfaces.

_______ .151 ======> IPMP Logical IP failover Address


/ |
/ ______|_________
IPMP | |
LEVEL | |
\ hme0 qfe0
\_ .100 .101 ===> IPMP Test IP Address

Setup ipv4 IPMP: (IPMP group w/ 1 stanndby interface) see IP Multipathing Admin Guide

ok> setenv local-mac-address? true


# ifconfig hme0 plumb 172.20.66.100 netmask + broadcast +
# ifconfig qfe0 plumb
# ifconfig hme0 group test-group
# ifconfig qfe0 group test-group
# ifconfig hme0 addif 172.20.66.151 netmask + broadcast + -failover deprecated up
# ifconfig qfe0 plumb 172.20.66.101 netmask + broadcast + deprecated -failover standby up
# ifconfig -a

/etc/hostname.hme0 :

172.20.66.100 netmask + broadcast + group test-group up \


addif 172.20.66.151 deprecated -failover netmask + broadcast + up

/etc/hostname.qfe0 :

172.20.66.101 netmask + broadcast + deprecated group test-group -failover standby up

Page 75
T3B or T3+ Firmware Rev 2.1 New Functions:

Volume slicing:
- Create max 16 slices within a T3, either WG or PP.
- Layered on top of volumes. If volume is unmounted all slices go away.
- Volume slices cannot be seen until the voilume is initalized and mounted.
- Minimum size is 1GB, increments of 1GB, starts on GB boundaries.
- Maximum size is size of volume.
- Once enabled cannot be disabled.
EX: (simple example of sliceing a volume on a t3+)
Enabled by new system variable enable_volslice.
sys enable_volslice (Note: if volslice is enabled, you must create a slice to see lun in format)
vol add vol_name data u#d#-# raid # standby* u#d9
vol init vol_name data rate(1-16) optional
volslice create slice_name -z size vol_name
volslice list
lun perm list (should be rw, else `lun default all_lun rw')
vol mount vol_name

Lun mapping and masking:


- Enabled with volume slicing.
- Each slice must be mapped to a lun.
- Slices can be renumbered to unused lun.
- Luns range from number 0 to 15.
- Lun masking controls access to lun
- Lun permissions can be none, ro (read only), rw (read write).
- Lun permissions set for all or by WWN of hba.
- Default lun permissions is rw when slice is created from existing volume.
- Lun permissions are nonewhen slices are made of volume created after volume slicing is enabled.

New Mapping / Masking command: lun


Mapping: lun map add lun <lun#> slice <slice#>
lun map rm lun <lun#> [slice <slice#>]
lun map rm all
lun map list [lun <lun#> | slice <slice#>]
Masking: lun perm
lun perm list
lun default
lun wwn list
lun wwn rm all
lun wwn rm wwn <wwn#>

WWN Groups: Allows groups of wwns to share security features, saves lazy typists.

New command: hwwn


hwwn add <grp_name> wwn <wwn#>, rm <grp_name> wwn <wwn#>
hwwn list <grp_name>
hwwn rmgrp <grp_name>
hwwn listgrp
Fabric Support: Enabled thru new sys variable fc_topology, three possible settings:
- auto: chooses between loop and fabric_p2p, depending on capability of attached device.
- loop: establishes an arbitrated loop connection thru a translated loop (TL) port
- fabric_p2p: establishes a fabric connection thru an F port
NTP can also run in the array to sync time with external server.
Page 76
Hitachi StorEdge 99X0 Arrays:

SE9910- Single cabinet, logic boards in front, disk drives in rear.


Max 16GB cache, 24 host ports, 48 disk drives.
up to 4096 logical devices can be configured and presented.

SE9960- One DKC logic cabinet, one to six DKU disk cabinets, arranged on right and left (R1-3, L1-3).
R1 is added first, add on alternate sides for best performance.
Max 32GB cache, 32 host ports, 512 disk drives.
up to 4096 logical devices can be configured and presented.

SE9970V- Single cabinet,logic boards in front, disk drives in rear.


Max 32GB cache, 48 host ports, 128 disk drives.
up to 8092 logical devices can be configured and presented.

SE9980V- One DKC logic cabine, one to four DKU cabinets. Added same as 9960.
Max 64GB cache, 64 host ports, 1024 disk drives.
up to 8092 logical devices can be configured and presented.

All use the concept of "storage clusters" redundant combinations of cache boards, host adapter boards (CHA)
and disk adapters boards (DKA). All array transactions run through the cache.

Drives are set up in either RAID 5 or RAID 1 (1+0).

Basic building block is called the B4, which is 4 trays of disks (HDUs). In 9910 and 9970 B4 is all 4 HDUs of
disks, in 9960 and 9980 a B4 is 4 (of 8) HDUs in a cabinet (bottom 4 or top 4). HDUs will be numbered in N
shape. The same 4 drives in a B4 are a parity group, which is where the RAID level is set. A parity group will
always be 4 drives. In 9970 and 9980 parity groups can span 2 B4's.

B4's are numbered 1 through 12; 1 and 2 are in cabinet R1, 3 and 4 are in L1, 5 and 6 are in R2 etc. Disk drives
in each 9910 and 9960 HDU are numbered 0 through B (11), thus 12 drives. Disk drives in each 9970 and 9980
HDU are numbered 00 thru 0F and 10 thru 1F. Accesssing drives 10 thru 1f requires an additional card in the
HDU.

Each parity group is set to an emulation mode, the system then divides that parity group into the appropriate
number of LDEV's based on the emulation mode sizing. LDEV's can be presented on the host ports as LUN's
as is or combined to create larger LUNs.

In 9910 and 9960 drive B (top last drive on left) in each HDU in the L1 and R1 DKU's is used as a universal
spare, the bottom B4 drive B will always be a spare if installed, the top B4 drive B may be designated as spares
or may be a normal parity group. In a 9910 any drives installed in slot B will be spares. In 9970 and 9980, drive
0F will be the spare (top left drive next to center cards). Same rules apply for slot 0F as B in 9910 and 9960.

In 9970 the HDU can be "split" using special cards to create two B4's.

Service Processor (SVP):

Windows PC mounted in array. 9970 and 9980 have optional second SVP mounted in cold standby.
Two modes of operation, View and Modify, View will come on when the Remote Console is connected.
Disconnect Remote Console or reboot SVP to go back to Modify mode.

Page 77
Hitachi StorEdge 99X0 Arrays: (cont...)

Switches on the SVP Main Panel:


Information- allows review of messages (SIMs)
Maintenance- Select a component for replacement
Diagnosis-
FD Copy- Create a configuration floppy disk
Install- Initial Setup, microcode upgrades, etc.

Default remote console login: USER USER

Passwords:
raid-initialsetup
raid-install
raid-online
horc-forcibly

SVP FUNCTION tabs:


LDEV: format initialize drives/parity groups
HORC or Open TruCopy: Copy between subsystems
LUN Manager: map LUNs to ports
DCR: Dynamic Cache Residency aka Flash Access LUN is mapped into cache
Shadow Image: copy data within subsystem
CVS/Virtual LUN: (small volumes) smaller than emulation mode size... use wasted space,
make small volumes for DCR
On Demand/Just In Time: add additional space
LDEV Security SANtinel: LUN Masking

MAINTENANCE: lots of jumpers on boards, must be carefully checked. All changes must be made thru
modify mode on the svp, carefully following the procedures. Repair procedures have
a pre change section, a change section and a post change section, follow all steps.
USE THE MANUALS (on CD comes with the firmware) !!

SunFire forgotten password: (SRDB 26846) This procedure works with firmware version 5.11.3 and higher.

If the platform administrator's password is lost, the following procedure can be used to
clear the password.

1. Reboot the System Controller (SC). You won't be able to do this by logging into the platform shell.
You'll need to hit the reset button on the SC to do this.
2. The normal sequence of a System Controller rebooting is for SCPOST to run, then ScApp. You'll need
to wait for ScApp to start loading, then hit Control-A to spawn a vxWorks shell. SCPOST is done running
when you see the message 'POST Complete'. At this point, ScApp will begin to load. When you see
the copyright message 'Copyright 2001 Sun Microsystems, Inc. All rights reserved.', Hit CONTROL-A.
You should see the following:

Task not found


spawning new shell.
->

Page 78
Sunfire forgotten password: (cont:)

This last line is the vxWorks prompt. Keep in mind, that ScApp will still continue to load all the way
to the point of giving you the menu to enter the platform/domain shells. To make it less confusing,
wait for the ScApp menu to display on your screen, then hit return. You should see the
vxWorks prompt -> again.

3. Make a note of the current boot flags settings. This will be used to restore the boot flags to the original value.

-> getBootFlags()

value = 48 = 0xC = '0' (Save the 0x number for # 8 below.)

4. Change the boot flags to disable autoboot.

-> setBootFlags (0x10)

5. Reboot the System Controller (CONTROL-X or reboot ). Once reset, it will stop at the -> prompt.

6. If you are running firmware 5.17.x or above, enter the following commands, otherwise, go to step 7:
-> ld 1,0,"/sc/flash/vxAddOn.o"
If you are running firmware 5.17.x or 5.18.x, enter the following command at the prompt
-> uncompressJVM("/sc/flash/JVM.zip", "/sc/flash/JVM");
If you are running firmware 5.19.x or later, enter the following command at the prompt
-> uncompressFile("/sc/flash/JVM.zip", "/sc/flash/JVM");

7. Enter the following commands at the -> prompt.


-> kernelTimeSlice 5
-> javaConfig
-> javaClassPathSet "/sc/flash/lib/scapp.jar:/sc/flash/lib/jdmkrt.jar"
-> javaLoadLibraryPathSet "/sc/flash"
-> java "-Djava.compiler=NONE -Dline.separator=\r\n sun.serengeti.cli.Password"

Wait for the following System Controller messages to display. Your prompt will come back right away,
but it'll take about 10 seconds for these messages to show up:

Clearing SC Platform password...

Done. Reboot System Controller.

8. After the above messages are displayed, restore the bootflags to the original value using the
setBootFlags() command.

-> setBootFlags (0xC) (Use the value returned from #3 above. )

9. Reboot the System Controller using CONTROL-X or the reboot command. Once rebooted,
the platform administrator's password will be cleared.

Default Storage switch passwords: (telnet to the switches in the san.)

Sun 1GB switch: user: root passwd: ma31_glw


Sun 2GB switch: user: admin passwd: password
Brocade Switch: user: admin passwd: silkworm

Page 79
StorEdge Network FC Switch:

The StorEdge Network FC Switch are replacing the fibre hubs. When you receive them they
are configured as similar to a hub (all ports one zone). The switch will initially get it's IP address
by RARPing (though it has a default IP of 10.0.0.1). You cannot telnet to the switch, you must use
the GUI to configure (may change with future firmware).

Remember: each array in a zone must have a unique tag address or box id...

Setup: (on server)


- load San Foundation Kit (SUNWsan packages) http://storage.east/san
– load and patch SanSurfer GUI (pkgadd -d SUNWsmgr) EIS CD /sun/patch/SAN/8/
– add ethernet address and switch_name to /etc/ethers
– add Ip address and switch_name to /etc/hosts
– check in.rarpd is up: ps -eaf | grep in.rarpd (start if not up /usr/sbin/in.rarpd -a &)
– turn on FC switch
– ping Ip address of switch
– bring up GUI SanSurfer ( java -jar /usr/opt/SUNWsmgr/bin/Sun.jar) or
( /usr/opt/SUNWsmgr/bin/esm_smgr)
– login (default login: su, password: su) (can't login? add patch 110696)
– Click on IP Address and enter switch IP
– Configure the switch as needed. (rate field >20 scan rate for app to get stats)

To set up zoning: (from Fabric Screen)


- click on IP address of switch / zoom / zoning / add zone / click on port / apply

To edit network config: (from Fabric Screen)


- double click on `Fabric Name' of switch

To veiw zone config: (from Fabric Screen)


- click on IP address of switch / zoom / zoning / zone index 1,2,3 ect...

To clear all zones: (from Fabric Screen)


- click on IP address of switch / zoom / zoning / clear all zones

Useful SAN commands:

luxadm fcode -p (lists SUN/QLOGIC HBAs and firmware on each).


luxadm -e port (Here you would be looking for a connected status for your device in question.)
luxadm -e dump_map /devices/pci@1f,4000/SUNW,qlc@4/fp@0,0:devctl (path is from above command)
luxadm probe
luxadm display <path> (path from above or WWN)
ls -l /dev/cfg (This will show you paths to controller mapping.)
cfgadm -al (View what fabric devices are seen and configured and their condition)
cfgadm -c configure c# (to configure a device ex: cfgadm -c configure c5::50020f2300000cab)
cfgadm -o show_FCP_dev -al (list luns under each device. very handy when troubleshooting lun issues).
ls -l /dev/fc (give you fp to path mappings)
prtconf -vp|grep -i wwn (will give you the wwn of all configured HBAs on the system, this is a snap shot of
what the prom saw at boot).

Page 80
Hitachi Lightning 9900V notes:
also see: http://storage.east/hitachi

DKC - Disk (subsystem) Control Unit


DKU - Disk only frame: up to 4 DKUs : Left 1 (L1), Right 1 (R1) , Left 2 (L2), Right 2 (R2) 9980v
SVP - Superviser Console: 1/DKC standard, optional: 2nd SVP/DKC (NOTHING EXTRA loaded on SVP!!)
ACP / DKA - Array Control Processor / Disk adapter : same thing connects to FSWs
CHA- Channel Adapter: contains fiber ports to connect to server
SM - Shared Memory: located on Cache bds, contains subsystem metadata
MDL - Maintenance Documentation Library
PDL - Product Documentation Library ( includes User Guide -theory)
SIM - System Information Message (message led blinking means it cannot talk to SVP) reference numbers
can be looked up in SIMRC.PDF manual (on m/c CD) Action code points to a work ID (USE
MANUALS!!!)
SSID - SubSystem ID: asigned number associated with: mainframes, 'Trucopy', 'Shadow Image"
HDD - Hard Disk Drive
HDU - Hard Disk Unit: up to 32 HDDs in a HDU: slot 0f is spare
B4 - Group of 4 HDUs (N shaped numbering, 0,2 on bottom: 1, 3 on top) 9970 has (1) B4 unless it has
FSW 'c' cards then 2. 9980 B4 numbering: (R1) 1,2 (L1) 3,4 (R2) 5,6 (L2) 7,8 (lowest # on bottom)
FSW - Fiber Channel Interface Switch: PCB in HDU. Connects to DKA. (3) types A, B, C (switches)
SC - Single Cabinet (9970)
MC - Multi Cabinet (9980)

Cluster - set of boards in a subsystem. 2 clusters: CL1, CL2. Mirror config across clusters
Emulations - Lun Specifications (what type of disk drive do you want the lun to appear to be?)

Cannot Hot SWAP: Backplane, FSW 'B' boards

Available Raid Types: Raid 5


Raid 10
CU - Control Unit - a addressable list of Ldevs in shared memory. Rule of thumb: use same
type of Ldev in a CU. If using another type of Ldev in system put them in another CU.
Max 32 CUs 256 Ldev/CU

LUSE - Lun Size Expansion: Make large Lun from Ldevs (concatinate)
CVS/VLL - Make smaller Luns from free space 35gb and lower (must be smaller than emulation size selected)

Pariy Group (aka: Array Group): 4 disks only. Select physical disks, Select emulation, (this will give
you a number of Ldevs depending on emulation) Assign Ldevs to CU

Lun Mapping: Map a Ldev to ports on the CHAs. Done thru Storage Navigator.
Host mode 0 is standard, host mode 9 for Solaris, host mode C for windows

Host Groups: When Lun Security is on upto 128 host groups/ port. Can config host mode and have lun0 per
group. Need to know WWN of HBA
High Speed Mode: All the processers on a CHA will be working 1 port : 1 port 4 procs (other 3 ports
disabled)
Standard speed mode : 1 processor per port on a 4port CHA, 1 proc/2ports 8 port CHA

Offline SVP: Software (m/c CD) to load on your PC. Use to configure without SVP. Requires config floppy

DCI - Define Configure Install: DCI operation destroys customer data use for new install only.
Use 'Change Configuration' on existing subsystems. (Shift ctl i raid-initialsetup)

Page 81
Hitachi Lightning 9900V notes: Cont.

How to figure needed disk capacity: (but don't forget spares)

Customer wants (10) 500gb luns. How many HDDs do you need?
1, (1) 500gb lun = (14) 36gb open-L Ldevs 500/36= 13 r32 (round up to 14)
2, (10) luns = 140 Ldevs 14x10=140
3, parity groups = 24 6 Ldevs/parity group 140/6= 23 r2 (round up to 24)
4, 96 HDDs required 24 parity groups x 4 disks/group = 96 disks

Spare Disk Drives:


Are available to any array group
Spares install in slot 0f of each HDU
Manditory: B4-1, B4-3
Optional: B4-2, B4-4

Adding Frames: Watch HDU Jumper locations when adding frames.

Microcode CD:
- Read ECN (engineering Change Notice) comes with m/c CD
- Includes Manuals (use them)
- Includes Offline SVP software

M/C Upgrade Sequence:


- SVP
- Everything but DKU
- DKU

If message led is on, check subsystem status: (if blinking communication problem with the SVP)
- Maintenance button on SVP

Special Key strokes:


shift-ctl i 'raid-initialsetup' used for DCI
alt-shift > update config diskette
shift-ctl m 'mode' puts you in mode mode for m/c upgrades
raid-install used in disk replacement

Storage Navigator - Allows you to do Lun mapping, LUSE, CVS, DCR, True Copy, Shadow Image
from a client thru the lan to the SVP. Make sure the SVP is not in 'modify' mode so
you can get write access. Default login: root pwd: root
http://ipaddress-main-SVP//cgi-bin/utility/sjc0000.cgi

DCR/Flashaccess - Dynamic Cache Residency: Will keep a Ldev resident in cache, save on transfer time.
If purchased set it up on install, will save downtime later

Page 82
Hitachi Lightning 9900Vnotes: Cont.

HDLM - Hitach Dynamic Link Manager: Loaded on the server similar to DMP.
/opt/dynamiclinkmanager/log /bin
Defaults:
Sun Windows Setting
Path Health Check off off 15 - 1444 min
auto failback none off

HDLM commands:
# dlnkmgr veiw (-path), (-sys),
offline (-path)
online (-path)
set -ellv log-level, -elfs log-size, -systflv trace-level, -pchk, -s
clear
help

True Copy: Remote copy to another disk subsystem (9900 to 9900). Mainly used for disaster recovery.
You configure it on each subsystem using Storage Navigator. One will be the Master (MCU)
and the other Remote (RCU).

2 transfer methods:

SYNC: Data that is transferred to the MCU is inturn sent to the RCUthru a dedicated port.
When the data is acknowleged at the RCU the MCU sends an acknowlegement back to the HBA

ASYNC: Data sent to the MCU is acknowleged to the HBA before the MCU receives
acknowlegement from the RCU

The dedicated port has to be configured as 'initiator' on the MCU and 'RCU target' on the RCU.
This port is a point to point connection between the disk subsystems.
The PVOL is the primary volume (Ldev) the data is sent to it from the server.
The SVOL is the secondary volume (Ldev) on the RCU that True Copy copies to.

True Copy Volume States:

SMPL - simplex volume prior to any pair operation or result of 'pairsplit -s' command
COPY - (initial copy in progress) a result of a 'paircreate' command
PAIR - initial copy complete and doing updates as data changes on pvol
PSUS - pair operations suspended as a result of a 'pairsplit' command
PSUE - pair operations suspended as a result of a failure

To setup True Copy:


– Decide on PVOL and SVOL
– SSID (need to know, get from customer)
– Serial number of each disk subsystem
– setup Path between the subsystems (ports, cables, ect...)
– Define MCU to RCU path
– ASYNC only Define Consistancy Groups (order in which you want data sent to svol)
– True Copy (create pairs)

Page 83
Hitachi Lightning 9900V notes: Cont.

Shadow Image: A local copy within a disk subsystem. Configured using Storage Navigator.

The PVOL is the primary volume (Ldev) the data is sent to from the server. The SVOL is the secondary
volume (Ldev) that Shadow Image copies to. You can can have a max of 9 copies (svols), this includes
(3)level 1 SVOLs and (6) level2 SVOLs (cascade)

Level1 Level2
_______S
_____S /
| \ ________S
| _______S
P_____S /
| \ ________S
| _______S
|_____S /
\ ________S

Shadow Image Commands:

paircreate: starts a initial copy and results in a PVOL SVOL pair


pairsplit: splits the pair. quick or steady options. Level1 must be split before level2
a split will syncronized data btwn the PVOL and SVOL before the split.
Pairresync: Will resyncronize a suspended pair.

Quick Functions:
quicksplit : makes it possible to read and write SVOLs immediately after split
quickresync: reduces the resync time considerably
quickrestore: reduces restore time considerabaly

Minnow StorEdge 3300 Series array: (also see page 110 for disk replacement)
OEM'd from Dot Hill. Small cheap array. Scsi hardware raid and jbod. Fiber array soon.
Raid levels 0, 1, 3, 5, 1+0, and 0+1 supported.
Up to 12 drives per box, 2 redundant RAID controllers.

Model 3310 Ultra 160 LVD SCSI (will work Single Ended as well).
Use new LVD card and SUNWqus driver.

Logical Disk or Group- the raid setup from the disks.


Logical Volume- a raid of logical disks (how they do 1+0 and 0+1).
Partitioning- may map a chunk of LD or LV.
Local spares assigned to particular LD or LV
Global spares assigned within array.

Luns are created and owned by one controller, other is failover for it. Controllers can be active/active or
active/passive. All interface to array is done thru the master controller.

Parts are raid controllers (2), event monitor units (emus) (2), power supplies (2), terminator board (1), io
board(1), disks (12). All hot swappable. Replacing terminator and io board will interupt io.

Page 84
Minnow StorEdge 3300 Series array: cont...

Cableing can be complex, refer to manual. 4 channels within box, two are for host, 2 for drives
Single bus- all drives same channel.
Dual bus- split drives between two channels (split drives 1-6 & 7-12, channels 0 &2).

Any box combination, maximum of 16 drives per any one channel.

IO Board
Channels 0 and 2 are drive channels
Channel 1 and 3 are host channel ports
SB and DB ports are “jumper” ports: Single bus jumper cable from channel 0 to SB port.
Dual bus jumper cable from channel 2 to DB port.

Expansion unit (JBOD) has no controllers, has 4 port IO Board (A, Aterm, B, Bterm).
Aterm and Bterm are self terminating ports, need to be at end of chain.
Single bus in expander jumper cable from B to Aterm.
Dual bus in expander no jumper cable installed.
If adding an expander to a “controller” box run the cable to the “non term” ports.

Box Management thru serial port or GUI (GUI doesn't work well yet).

If using network connect both controllers to same subnet, only master controller has ip address. IP
assigned by DHCP or static thru serial port connection.
Standard RS232 null modem (9 pin female) serial cable to either controller. Settings are 38400 baud, 8N1.

control-l refreshes screen (if just connected to running array hit control-l choose VT100 mode)
control-w switches between the controllers.
control-acbd reset to factory defaults, password “oemmaint”

Config tool is a text based menu, common to all arrays, main selections are: (use Return and ESC to navigate)

view & edit logical drives (create, expand, delete, raid configs, partition, set spares)
view & edit logical volumes (create, delete logical volumes)
view & edit host luns (assign lun id's and map host channels)
view & edit scsi drives (view drive status,flash drive leds, set global spares, clone drives)
view & edit scsi channels (status, properties, set controller target id)
view & edit config parameters (controller settings, set baud and ip address)
view & edit peripheral devices (set expansion box, secondary controller, array status (emu))
veiw system information (cache size, firmware revision, Ect...)
system functions (reset, shutdown, fw upgrade)
event logs

Create LUNs: (in general, example does not use logical volumes so no “+” raid levels)
setup qlobal spares- v/e scsi drives–select disk- add global spare- yes
setup logical drive- v/e logical drives–select LG–create logical drive-yes-raid-select disks-capacity-ESC
partition logical drive- v/e logical drives-select logical drive-partition-select partition(arrow)-size-yes
map luns to host- v/e host luns-select controler-select lun#-select logical drive-select partition-map(y)

Modify /kernel/drv/sd.conf: (must do for all lun #'s other than 0)


create the following 2 line entry for each lun: name="sd" class="scsi"
target=# lun=#; (change target and lun)

Page 85
Tuning ecache scrubber scan rate:

See FIN I0755-1.


The following procedure can helpon UltraSparc II servers that experience ecache failures. Best used on
servers that mirrored ecache is not an option.
The procedure increases the scan rate from 100 times a second to 1000 times a second.
It will increase the system utilization by about 1%.

To adjust ecache_scan_rate:
1. As root, run the following command to adjust ecache_scan_rate.

# echo 'ecache_scan_rate/W 0t1000' | adb -kw

NOTE: This does not require downtime. Be very careful, though, as mis-typing the command could
result in downtime.

2. To make the change permanent, add the parameter setting to /etc/system. It is best to insert all
3 parameters together into /etc/system if the settings are not already there:

set ecache_scrub_enable=1
set ecache_scan_rate=1000
set ecache_calls_a_sec=100

To check a system's current setting use the following command.


This does not modify the setting in any way:

# echo 'ecache_scan_rate/D' | adb -k

VxWorks (serengeti SC): Use when you cannot get into scapp or to recover a failed SC flashupdate

- Reset the SC using the reset button on the front of the SC.
- when “ Copyright 2001-2002 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms. “ appears hit CTRL A

->setBootFlags(0x0) CTRL X (will reboot and stop booting at the -> PROMPT.
->setBootFlags(0xd) ( then “reboot” to Change the boot settings back so SC automatically boots ScApp)

- to configure a netmask: -> ifMaskSet("eri0", 0xffffff00) (example will set it to 255.255.255.0)

- to configure an IP address: -> ifAddrSet("eri0", "129.146.232.222")

- to enable the network interface: -> ifFlagSet("eri0", 0x8063)

- to configure a default router: -> routeAdd("0.0.0.0", "129.146.232.10")

- to Register the name/address of a server: -> hostAdd("myhost", "129.146.240.105")

- to test the network interface : -> ping "myhost",1 or ping “129.146.240.105”,1

- to Update the ScApp flashprom in Vxworks:


updateBootFlashURL("ftp://login:password@myhost/path_to/sgrtos.flash")
updateScAppFlashURL("ftp://login:password@myhost/path_to/sgsc.flash")

Page 86
LVD PCI Adapter: (ultra scsi-3 375-3057)

Code named jasper, it is a Low Voltage Differential card. Mainly supports the S1, D2 and Minnow
(SE 3310) arrays.
The LVD drivers are not on any Solaris CD yet, 8 02/02 or 9. You will have to either make a temp boot disk
and patch it or boot net from a patched image to see the disks on a LVD adapter, until a bootable CD is
released that has driver support for the LVD.

do the following to see disks on a LVD adapter:(drivers and patches available on EIS cd sun/progs, sun/patch)

- add_install_server (create the solaris 8 image)


- download "QUS" drivers from www.sun.com(all four) SUNWqus, SUNWqusu, SUNWqusx and SUNWqusux.
- download patch 112697-02 from sunsolve
- pkgadd -R /boot_dir/Solaris_8/Tools/Boot -d . (add drivers to boot image)
- patchadd -R /boot_dir/Solaris_8/Tools/Boot 112697-02 (patch boot image)
- add_install_client (enable client to boot net from server)
- boot net (client)

Once loaded you can install Solaris on the LVD disks. But you have to select 'manual
boot' so you can then patch the install image before reboot as follows:

- cd /net/ipaddress_of_install_server/shared_dir_where_pkgs_located/
- pkgadd -R /a -d . (add all four pkgs 32 and 64 bit)
- patchadd -R /a 112697-02
- reboot

see doc 816-2156-11.pdf StorageEdge PCI Dual Ultra3 SCSI Host Adapter Install Guide.

CP1500 - (15k SC) replacement: (see fin I0761-1)

The Nordica bd is used both in the netra line and the SC of a 15k. When replacing the Nordica
bd (501-5473) in a 15k you have to upgrade the OBP so you will have all the SC functionality.
The info doc says you should do the procedure on rev -12 and below. We had to do the procedure on
a -13 board to get it to work(without it we could not see the 'man' network interfaces).
In general you have to: (see fin and download readme for specifics)

– download the CP1500 OBP image


– run the downloaded script (updates CP1500 OBP to 3.14.6)
– flash the SC (flashupdate)
– reset OBP parameters

You can find “The current Nordica OBP firmware image available for download” at :
http://pts-americas.west/esg/hsg/starcat/patches.html

Serengetti /15k Dynamic Reconfiguration: Min Requires Solaris 8 (02/02 u7) SC 5.12.6
(also see 15k dr examples page 109)
(Solaris commands)
To get a list of component NAMES: # cfgadm -al
To remove a bd from a domain: # cfgadm -o unassign,nopoweroff -c disconnect NAME (ex: N0.SB1)
To add a bd into a domain: # cfgadm -v -c configure NAME (ex: N0.SB1)
To see if board has perm mem: # cfgadm -val | grep permanent

Page 87
To Clean up non-root disc “controler” numbers: (see info docs 15019, 27756)

# mv /etc/path_to_inst /etc/path_to_inst.orig
# rm /etc/path_to_inst.old
# cd /dev/dsk
# rm c1* c2* c3* c4* (do not remove your boot device)
# cd /dev/rdsk
# rm c1* c2* c3* c4* (do not remove your boot device)
# rm -rf /dev/cfg/* (new on solaris 8)

If boot disk is under Sun StorEdge Volume Manager, search for "rootdev:" in /etc/system.
ex: rootdev: /pseudo/vxio@0:0 (Write down this device name exactly, you will use it on boot.)

# init 0
ok boot -ar (take the default through all prompts except: “Do you want to rebuild this file [n]?” y )
(and if you have the boot disk under StorEdge Volume Manager, when asked for)
( the physical root device, enter the device name you found above)

Set network parameters at boot:

ok> boot net:speed=100,duplex=full (no spaces)

Starcat Portid cheat sheet:


Decimal:
------------------------------------------------------------------
| Exp| cpu0| cpu1| cpu2| cpu3| max0| max1| pci0| pci1| axq0| axq1|
------------------------------------------------------------------
| 0| 0| 1| 2| 3| 8| 9 | 28 | 29 | 30 | 31 |
| 1| 32 | 33 | 34 | 35 | 40 | 41 | 60 | 61 | 62 | 63 |
| 2| 64 | 65 | 66 | 67 | 72 | 73 | 92 | 93 | 94 | 95 |
| 3| 96 | 97 | 98 | 99 | 104 | 105 | 124 | 125 | 126 | 127 |
| 4 | 128 | 129 | 130 | 131 | 136 | 137 | 156 | 157 | 158 | 159 |
| 5 | 160 | 161 | 162 | 163 | 168 | 169 | 188 | 189 | 190 | 191 |
| 6 | 192 | 193 | 194 | 195 | 200 | 201 | 220 | 221 | 222 | 223 |
| 7 | 224 | 225 | 226 | 227 | 232 | 233 | 252 | 253 | 254 | 255 |
| 8 | 256 | 257 | 258 | 259 | 264 | 265 | 284 | 285 | 286 | 287 |
| 9 | 288 | 289 | 290 | 291 | 296 | 297 | 316 | 317 | 318 | 319 |
| 10 | 320 | 321 | 322 | 323 | 328 | 329 | 348 | 349 | 350 | 351 |
| 11 | 352 | 353 | 354 | 355 | 360 | 361 | 380 | 381 | 382 | 383 |
| 12 | 384 | 385 | 386 | 387 | 392 | 393 | 412 | 413 | 414 | 415 |
| 13 | 416 | 417 | 418 | 419 | 424 | 425 | 444 | 445 | 446 | 447 |
| 14 | 448 | 449 | 450 | 451 | 456 | 457 | 476 | 477 | 478 | 479 |
| 15 | 480 | 481 | 482 | 483 | 488 | 489 | 508 | 509 | 510 | 511 |
| 16 | 512 | 513 | 514 | 515 | 520 | 521 | 540 | 541 | 542 | 543 |
| 17 | 544 | 545 | 546 | 547 | 552 | 553 | 572 | 573 | 574 | 575 |
------------------------------------------------------------------

In Hex:
------------------------------------------------------------------
| Exp| cpu0| cpu1| cpu2| cpu3| max0| max1| pci0| pci1| axq0| axq1|
------------------------------------------------------------------
| 0| 0| 1| 2| 3| 8| 9 | 1c | 1d | 1e | 1f |
|1 | 20 | 21 | 22 | 23 | 28 | 29 | 3c | 3d | 3e | 3f |
| 2| 40 | 41 | 42 | 43 | 48 | 49 | 5c | 5d | 5e | 5f |
| 3| 60 | 61 | 62 | 63 | 68 | 69 | 7c | 7d | 7e | 7f |
| 4| 80 | 81 | 82 | 83 | 88 | 89 | 9c | 9d | 9e | 9f |
| 5| a0 | a1 | a2 | a3 | a8 | a9 | bc | bd | be | bf |
| 6| c0 | c1 | c2 | c3 | c8 | c9 | dc | dd | de | df |
| 7| e0 | e1 | e2 | e3 | e8 | e9 | fc | fd | fe | ff |
| 8 | 100 | 101 | 102 | 103 | 108 | 109 | 11c | 11d | 11e | 11f |
| 9 | 120 | 121 | 122 | 123 | 128 | 129 | 13c | 13d | 13e | 13f |
| 10 | 140 | 141 | 142 | 143 | 148 | 149 | 15c | 15d | 15e | 15f |
| 11 | 160 | 161 | 162 | 163| 168 | 169 | 17c | 17d | 17e | 17f |
| 12 | 180 | 181 | 182 | 183| 188 | 189 | 19c | 19d | 19e | 19f |
| 13 | 1a0 | 1a1 | 1a2 | 1a3 | 1a8 | 1a9 | 1bc | 1bd | 1be | 1bf |
| 14 | 1c0 | 1c1 | 1c2 | 1c3 | 1c8 | 1c9 | 1dc | 1dd | 1de | 1df |
| 15 | 1e0 | 1e1 | 1e2 | 1e3 | 1e8 | 1e9 | 1fc | 1fd | 1fe | 1ff |
| 16 | 200 | 201 | 202 | 203 | 208 | 209 | 21c | 21d | 21e | 21f |
| 17 | 220 | 221 | 222 | 223 | 228 | 229 | 23c| 23d | 23e | 23f |
------------------------------------------------------------------
Page 88
Starcat SC: clean the slate: (bring down domains)

Clean dump and post files in /var/opt/SUNWSMS/adm/A-R


Remove all boards from domains: ex: # deleteboard SB0 SB1 IO0 IO1 ect...
Stop SMS both SCs: /etc/init.d/sms stop
# mv /etc/opt/SUNWSMS/SMS1.3/config/MAN.cf /etc/opt/SUNWSMS/SMS1.3/config/MAN.old
# sys-unconfig
Without the MAN.cf file it is as though smsconfig -m has never been run.

Starcat redx info: check out : http://pts-americas.west/esg/hsg/starcat/tools/xcredx.html

#redx -l (will put you in local mode to look at dumps. redxl.csh for non SC analysis)
redxl>dumpf load dump-file-name (will load dump and give you a brief summary)
redxl>dumpf types (will list the domain board configuration)
redxl> wfail (will give you failure info “1E”= 1st error “1E+”= accumulated errors)
SB (slot 0) redx commands:
redxl> shproc 0 0 3 (show PROC. 0 0 3 = exb0 slot0 cpu 3 shproc connects to DCDS, SDC, AR, SBBC)
redxl> shdcds 0 0 1 (show DCDS. 0 0 1= exb 0 slot0 dcds 1 shdcds connects to PROC, DX)
redxl> shdx 0 0 3 (show DX. 0 0 3= exb 0 slot0 dx 3 shdx connects to SDI(exb) DCDS)
redxl> shar 0 0 (show AR. 0 0 = exb 0 slot0 shar connects to AQX(exb) SDI 0(exb) PROCs)
redxl> shbbc 0 0 1 (show SBBC. 0 0 1 = exb 0 slot0 sbbc 1 shbbc connects to SDC, PROCs)
redxl> shsdc 0 0 (show SDC. 0 0 = exb 0 slot0 shsdc connects to SBBC, PROCs)
I/O(slot1) redx commands:
redxl> shioc 0 1 1 (show IOC. 0 1 0= exb0 slot1 ioc 1 shioc connects to SDC, DXs, AR)
redxl> shar 0 1 (show AR. 0 1 = exb 0 slot1 shar connects to AQX(exb) SDI 0 (exb) IOCs)
redxl> shdx 0 1 1 (show DX. 0 0 1= exb 0 slot1 dx 1 shdx connects to SDI(exb) IOCs)
redxl> shsdc 0 1 (show SDC. 0 1 = exb 0 slot1 shsdc connects to SBBC, IOCs)
redxl> shbbc 0 1 (show SBBC. 0 0 1 = exb 0 slot1 shbbc connects to SDC, IOCs)
Expander (exb) redx commands:
redxl> shaxq 0 (show AXQ. 0 = exb 0 shaxq connects to AMXs(cp) ARs, SDCs, SDI 0)
redxl> shcbr axq 0 (show CBR AXQ. 0 = exb 0 )
redxl> shsdi 0 0 (show SDI. 0 0 = exb 0 sdi 0 shsdi connects to DARBs (cp) DMXs(cp) ARs
SDCs, SDIs(exb) AXQ(exb) (6 SDIs/exb)
redxl> shcbr exb 0 (show CBR EXB. 0 = exb 0)
CenterPlane (cp) redx commands:
redxl> shamx 0 1 (show AMX. 0 1 = cp 0 amx 1 shamx connects to AXQs (exbs)
redxl> shrmx 1 (show RMX. 1 = cp 1 shrmx connects to AXQs (exbs)
redxl> shdmx 0 (show DMX. 0 = cp 0 shdmx connects to SDIs (exbs) port 0-3, 1-2, 2-1, 3-0, 4-4, 5-5
redxl> shdarb 1 (show DARB. 1 = cp 1 shdarb connects to SDI 0 (exbs) shows domain configs)

Terms:
AR Address Repeater (1 per SB, IO, max CPU)
AMX Address MultipleXer (2 per centerplane buss C0, C1)
AXQ Address controller (1 per expander board)
DARB Data ARBiter (1 per centerplane buss C0,C1)
DCDS Dual CPU Data Switch (2 per SB, 1 per Max CPU. 1/DCDS for 2 PROCs)
DMX Data MultipleXer (6 per centerplane bussC0,C1 connects to SDI exbs)
DX Data Switch (4 per slot0, 2 per slot1 bd)
RMX Response MultipleXer (1 per centerplane buss C0,C1)
SBBC System Boot Bus Controller (2 per slot0, 1 per slot1 bd)
SDC System Data path Controller (1 per SB, IO, max CPU)
SDI System Data Interface (6 per EXB, 0 is master connects to DMXs)

Page 89
StorADE:

Has diagnostics included in it that are supposed to replace Storetools.


Alot of the new arrays and fiber channel backplanes are supported.

You can bring up the GUI by typing (in a browser window, any server):
http://hostname :7654 (default login: ras password: agent)
(I found cli diags to be more useful then the GUI)

New cli storage diagnostics located in : /opt/SUNWstade/Diags/bin


listing below.

6120test -tests the functionality of disks in a 6120 array (minnow)


a5ktest - tests the functionality of disks in the Sun StorEdge A5000 and A5200 array
a5ksestest - tests Sun StorEdge A5000 and A5200 arrays
a3500fctest - verifies functionality of Sun StorEdge A3500FC disk tray
brocadetest - diagnose Brocade Silkworm Fibre Channel switches
d2disktest - tests the functionality of the Internal Sun StorEdge D2 Array disk
daksestest - tests Sun Fire V880 FC-AL disk backplanes
daktest - tests the Sun Fire V880 FC-AL disk
dex - Device Exerciser for Sun StorEdge arrays
discman - discovery manager
disk_inquiry - disk-only version of the inquiry program
disktest - No manual entry
enc_inquiry - No manual entry
fcdisktest - tests the functionality of internal fibre channel disk
fctapetest - tests the functionality of Fibre Channel tape drives
ifptest - tests functionality of the PCI FC-100 Fibre Channel-Arbitrated loops (FC-AL) card
lbf - A loop back frame diagnostic utility program that tests Fibre Channel-Arbitrated loops (FC-AL)
linktest - diagnose Sun StorEdge network passive Fibre Channel components
linktest2 - No manual entry .
ofdg - No manual entry
ondg - No manual entry
qlc_hba - displays stats on qlc hba
qlctest - tests the functions of the 1gb and 2 gb PCI and cPCI Fibre Channel Network Adapter boards.
socaltest - tests the SOC+ host adapter card
stresstest - Checks for possible SAN errors.
switchtest - diagnose Sun StorEdge Network Fibre Channel switch-8 and switch-16 switches
t3test - tests the functionality of the Sun StorEdge T3 and T3+ array LUNs
vediag - Runs virtualization engine diagnostics through SLICD
veluntest - tests the functionality of the virtualization engine by accessing the VLUNs.
volverify - No manual entry

Get fru info from a serengetti: (prtfru does not work on serengetti, explorer must be loaded)

#cd /opt/SUNWexplo/bin
# LD_LIBRARY_PATH=/opt/SUNWexplo/lib
# export LD_LIBRARY_PATH
# CLASSPATH=/opt/SUNWexplo/java/fruid-scappclient.jar:/opt/SUNWexplo/java/libfru.jar
# export CLASSPATH
# ./rprtfru.sparc -b sc_ip_address:password >/tmp/fruid(must use password. will put output in file /tmp/fruid)

Page 90
SWAP

What is recommended now (2003) swap size with gb physical memory servers?
(http://docs.sun.com/db/doc/817-0798/6mgisnqfi?a=view)

System Type Swap Space Size Dedicated Dump Device Size


Workstation 4 Gb of physical memory 1 Gbyte 1 Gbyte
Mid-range server 8 Gb of physical memory 2 Gbytes 2 Gbytes
High-end server 16 to 128 Gb of physical memory 4 Gbytes 4 Gbytes

Performance considerations:
How much and how often?
# swap -s (command to monitor swap resources)
# swap -l (command to determine if your system needs more swap space)

How do you get an estimate of needed swap/app?


# pmap -r pid# (sol 8, 9) (shows heap used/process. Add up heap to get an idea)
# pmap -Sa pid# (sol 9) (will show all reservations by each process)

How to tell how much swapping? (if too much should consider adding more physical memory)
# vmstat 5 5 (look at sr column, also note po, page out column. non-zero numbers
- page scanner looking for pages to mark as free, po - we're sending stuff out.)
# iostat -npxc 5 5 (check for kw/s on the swap partition - non-zero and the page outs from
vmstat are really writes to swap partition(s).
(http://docs.sun.com/db/doc/816-4553/6maop1hik?a=view)

Dump considerations:
How much memory do you want dumped? all, kernel, kernel + active process

# dumpadm
Dump content: kernel pages
Dump device: /dev/dsk/c0t3d0s1 (swap)
Savecore directory: /var/crash/pluto ***(large enough to hold core)
Savecore enabled: yes

# dumpadm -c all -d /dev/dsk/c0t1d0s1 -m 10%


Dump content: all pages
Dump device: /dev/dsk/c0t1d0s1 (dedicated)
Savecore directory: /var/crash/pluto (minfree = 77071KB)
Savecore enabled: yes

savecore -L (live core dump, WATCH OUT, do not do a savecore -L to a dumpslot under volume
manager control)

DR considerations:
How much physical memory on most populated System board?
Nonpermanent Memory (currently 32gb physical mem/max/bd) Before you can delete
a board, the environment must vacate the memory on that board.Vacating a board means
flushing its nonpermanent memory to swap space.

http://education.central/AliasArchive/Archives/ILT/ses_systemadmin-ext/msg08612.html
http://education.central/AliasArchive/Archives/ILT/ses_systemadmin-ext/msg05509.html

Page 91
from /net/cores.central/cores/dir5/
(REAL DATA: looked at explorer for ram size and explaned core to check size)

RAM Core size type Solaris


24gb 1.7gb k 8
20gb 984mb k 8
18gb 1gb k 8
16gb 900mb k 8
16gb 884mb k 8
16gb 2.4gb k 8
10gb 800mb k 8
8gb 1.2gb k 8
6gb 518mb k 2.6
4gb 300mb k 8
4gb 594mb k 8
4gb 305mb k 2.6
4gb 435mb k 7
4gb 2mb a 8
2gb 155mb k 2.6
2gb 243mb k 8
2gb 234mb k
2gb 374mb k 8
2gb 263mb k 7
1.5gb 220mb k 8
1gb 997mb k 7
1gb 138mb k 8

Maserati Notes- StorEdge 6320 and 6120:

Two models: 6120- standalone, desk side or rack, like T3 WG or PP. 6320- rack solution like the 3900 (Indy),
includes service processor, management net. Next generation T3, just don't call it the T4. Very much like the T3.
Drives in front, two power supplies in back on top, one controller, two loop cards. Components are similar to the
T3 but are physically enclosed differently, not swappable between T3 and T4. Units are 3U high. On back,
controller in middle, loop cards on each side. Loop cables are different (use RJ-45 type connector). All fiber
connections use the LC style connector.

Arrays are 2GB capable on the front end using the Qlogic 2300 chipset. Internally run at 1GB using the Qlogic
2200 chipset.

Model marketing designation is a 'YxZ' config: where Y=# of controllers, Z= # of trays.


Each controller can have 1 to 3 disk trays associated with it. One tray will have the controller in it, the other 2
will have no controller. Trays are joined via the loop cards. Min config a 1x1 (1 controler, 1 box), max config
is 2x6 (2 controllers, 6 boxes). Controller redudnancy is done thru a partner pair type config, just like the T3
except with the expansion trays factored in.
Up to 14 drives per tray, 7 is minimum supported number (though only 4 will work). Drive slot 14 is the hot
spare location. Like T3 don't have to have a spare, but if you do it must be slot 14. Drive sizes are 36GB, 73GB
and 146GB drives.

All commands are the same as a T3 with 2.1 and above firmware. Max luns per array is 64, max luns per volume
is 32. Each tray is still limited to two volumes, using contiguous disks. If you have min config (7 disks) and
build two volumes, you will need to remove/create a volume to add more disks.

Note- internally brick terminology is the same as T3 (volslice, volume). Although, maserati manuals refer to
them as pools (volumes on T3) and volumes (volsices on T3).
Page 92
Maserati Notes- StorEdge 6320 and 6120: cont.

6120 LED indicators:


Green- Normal
Yellow- Service action required
Blue- Safe to remove (hot swap)
White- Locator beacon
6320 -rack has a V100 service processor, an integrated patch panel and a SPAT (service processor accessory
tray). V100 has cdrom, optional usb flash memory card to save config. Patch panel consolidates connections for
service components and fiber connections. SPAT has a 4 port terminal concentrator (NTC) with a built in
modem, a firewall/router, an ethernet hub and future usb power management sequencer. (Customers are
encouraged to use remote services by Sun thru the provided modem. During initial release of the
product 5/03 thru 9/03 install is free.)

FC switches may be mounted in the rack but are no longer monitored or controlled via the SP.

6320 has 3 LANs set up:


I internal- for components only
SP LAN- remote services net (behind the firewall)
User LAN- one customer net port.
6230 default logins and passwords and roles:
Service Processor root/!root
Firewall root/sun1 user firewall access
NTC rss/sun1rss NTC user
NTC su/sun1rss NTC admin
6120 array root/!root array admin
GUI passwords
config service admin/!admin full access
storage/!storage storage set up only
guest/!guest observe only

Login to sp from external system using ssh


ssh -l root <ip>
(sp does not have menu to make changes to config, like 3900 Indy)
Use web based GUI
https://<ip>:9443/se6000ui/login.do (GUI is similar to storade)
Use sccs CLI: ( from external system with packages installed.)
Commands located in /opt/se6x20/cli/bin
sscs login, sscs list, sscs add, sscs create, sscs modify, sscs delete.

Flash Archive interactive install: (saves time on multi domain installs)(see info doc 40131)

Create a flash image from a patched server: (load patches and packages before creating image)
# cd /
# flarcreate -S -n image_name /path_ to/ image_file (~2.2gb - can use -c compress, 2x longer, only 1/5 smaller)
# share –F nfs -o ro,anon=0 /path_ to/ image_file (share image file) (/etc/init.d/nfs.server start)
Boot new server and load from image: (if boot from CD best to use same release as flash ex :sol9 04/04)
(note: you need network connectivity btwn image server and new server to download image)
- On server to be loaded: boot cdrom or boot net (if you have created a install server or 12/15K)
- Answer all install questions until you get to “F2 Standard” “F4 Flash” select “F4”
- Select NFS
- NFS Location: ip_address:/path_ to/ image_file (ex: 192.148.220.113:/var/tmp/flash )
- Continue answering install questions as you would on a regular interactive install
- Server will load Solaris from the image you specified/created
Page 93
UltraSPARC III CPU Diagnostic Monitor (CDM): ( see Sun Alert ID: 55081 )

CDM is supported only on UltraSparc-III processors based platforms with Solaris 8 or Solaris 9 releases.
CDM contains 3 packages with total size less than 1MB.

To download packages: http://diagnostics.sfbay/cdm/

EIS-CD 29JUL03 will also have packages on it

Download consists of three Sun Packages:


Install order
SUNWcdiam 3
SUNWcdiar 2
SUNWcdiax 1

To start CDM, add packages and boot server. Will run at `default' settings without modifications
to /etc/cpudiagd.conf. To change settings modify /etc/cpudiagd.conf. See cpudiagd man pages for log files
and config info.

To remove CDM :
# /etc/init.d/cpudiag stop
# pkgrm SUNWcdiam SUNWcdiar SUNWcdiax

(note: log files in /var/cpudiag/log/ remain after CDM is removed)

SunFire Service Mode Password Generator: (for info see http://acts.ebay/bulletins/index.cgi?bulletin=159)

Generator url: https://sfservicepass.sfbay/

(Generator will ask for hostid of main SC, ScApp version, RTOS version. If you type 'service' (return, return)
in the platform shell the SC will list the needed info)

To enter service mode type 'service' and enter password in the platform shell.
To exit service mode type 'service'
ex: setchs -s ok, suspect, faulty -r "reason for status" -c /N0/SB2/p2

V440 : (Chalupa) Solaris 8 7/03 beta

ALOM: ('#.' to enter, default login admin admin1)


poweron power on server, fru. Turns off ok-2-remove led
poweroff power off server
removefru will move a FRU into a state whereby it is ready to be removed
reset resets the managed system
break causes the SC to send a break to the managed system OS
bootmode provides control over the OBP firmware behavior during system initialization
console connect this user session to the managed system's OS console stream #. to return
consolehistory displays the contents of the selected OS console output buffer
showlogs displays the contents of the managed system eventlog
setlocator cause SC to turn the managed system locator indicator on or off
showlocator display the managed system locator indicator current state
showenvironment displays the environmental status to the SC for the managed system.
Showfru prints out the FRUID data stored in the FRU PROM
showplatform displays the hardware configuration of the platform
showsc displays the details of the SC software configuration and firmware version information.
Page 94
shownetwork displays the current SC network configuration parameters
setsc allows the user to individually configure SC parameters
setupsc interactivly configures the SC parameters
showdate displays the current SC date and time
setdate allows the user to set the current SC date and time
resetsc resets the SC
flashupdate download a new firmware image to the active SC
setdefaults set all the user settable SC configuration parameters to their default value
useradd add a new user to the SCs user database
userdel remove an existing user from the SCs user database
usershow displays the configuration details for a user account, or all accounts (w/o argument)
userpassword allows an administrator to set/change a users password
userperm sets the permissions for the specified user
password allows a user to change their own login password
showusers display a list of users currently logged into the SC
logout logs the current user out from his alom session
help [command] provides assistance to the user of the CLI by listing the commands

raidctl: solaris command ( V440 hardware raid command, mirror within controler only)

raidctl -h Help text, no man pages


raidctl -c Create mirror (note: raid volume will use original disks ctd#) ex: raidctl -c c1t1d0 c1t2d0
raidctl -d Delete mirror ex: raidctl -d c1t1d0
raidctl [-f] Update controler firmware ex: raidctl -F image 1
raidctl -l List raid controller status ex: raidctl -l 1

Adding Locales to Solaris: (S8 see infodoc 44626, S7 infodoc 44505 )


There are 3 ways to add locales to a server.

Initial install select locales while installing


Upgrade select locales while Upgrading
pkgadd pkgadd from Solaris Media kit Languages CD (about 100meg/locale)
(/cdrom/Sol_8_1001_lang_sparc/components/<product>)

Finding Solaris release and distribution loaded:

# more /etc/release (to find the Solaris version loaded)


# more /var/sadm/system/admin/CLUSTER (to find the distribution loaded)

SUNWCXall - Full Distribution + OEM Support


SUNWCall - Full Distribution
SUNWCprog - Developer
SUNWCuser - End User
SUNWCreq - Core

Find local NIS servers (see infodoc:4736)


% rpcinfo -b ypserv 2
(systems that respond are running ypserv, and thus NIS servers)
Are they serving your NIS domain?
% yppoll -h responding_server passwd.byname

Page 95
Network troubleshooting:
Commands:
arp -a display entries in the arp table
dmesg check status of interface at boot time
ifconfig allows you to add/modify/delete interface parameters (see page 48,75)
kstat -n interface kernal stats for interface (good info)
kstat -p kstat -p | grep interface gives speed and duplex information
ndd -set /dev/eri instance 0 sets view to eri0
ndd /dev/eri \? shows what eri paramaters are modifiable
ndd -get /dev/tcp tcp_status displays tcp parameter value 'tcp_status' also ndd -get /dev/eri link_status
netstat -i gives you interface details # of packets, collisions, errors ect...
netstat -Pn protocol protocol info, no name resolution
netstat -rnv routing info, no name resolution, local veiw
netstat -k interface same info as kstat -p but not well formatted
ping 192.168.47.2 command contacts and reports status of 192.168.47.2
rup 192.168.47.2 contacts and reports up time for 192.168.47.2
route (add, get, flush, delete) command allows you to add, get, delete, flush, entries in the routing table
snoop monitors network traffic use -v ,-d ,interface, ipaddress to filter view
spray 192.168.47.2 will send packets to 192.168.47.2 report on transfer rate and number received
traceroute 192.168.47.2 maps and times route from your server to 192.168.47.2

Files:

/etc/defaultdomain - servers domain name


/etc/dhcp.interface - touch file for dhcp boot ex: /etc/dhcp.hme0 (hme0 will boot dhcp)
/etc/hosts - list of hosts (local file) is linked to /etc/inet/hosts
/etc/hosts.equiv - trusted remote hosts and users
/etc/hostname.xxx - contains interface name and/or config at boot time
/etc/protocols - contains protocol names configured and psudo number
/etc/services - contains services configured and default port number
/etc/notrouter - touch file if server has multiple interfaces and should NOT route
/etc/defaultrouter - contains ip address of servers router (needed to reach other subnets)
/etc/gateways - file contains static route entries
/etc/ftpusers - contains a list of users that can NOT ftp login (Solaris 8 and 9)
/etc/ftpd/ftpusers - contains a list of users that can NOT ftp login (Solaris 9)
/etc/netconfig - network config File
/etc/nsswitch.conf - contains config of named services on server
/etc/netmasks - contains a list of base addresses and netmasks
.rhosts - trusted remote hosts and users

Daemons:

dhcpagent - implements client half of the DHCP


in.dhcpd - dhcp daemon run with the -d -v switch for diagnostic output
in.ftpd - in.ftpd is the Internet FTP server process.
in.mpathd - IPMP process. Started by the 'group' option of ifconfig command
in.routed - the routing daemon (only present on router servers) -s -q
in.rdisc - implements the ICMP router discovery protocol
in.telnetd - in.telnetd is a server that supports TELNET virtual terminal protocol
xntpd - ntp daemon

Page 96
How to find your way around a B1600... (min O/S Sol8 12/02, Sol9 04/03)
Default login sc: admin:no psswd sw: admin:admin

SC commands:
console console connection to switch or blade (use showplatform name. #. to return)
help lists available commands
showplatform -v platform and blade config and status information
setupsc initial sc setup...
showsc lists config data provided to setupsc command
poweroff s# Poweroff blade number s# (console to blade & shutdown first)
poweron s# Poweron blade number s#
SW commands:
help lists available commands
? command ? will list available syntax
show vlan listing and ports assigned to vlans
show running-config current switch configuration
show startup-config Config used at boot time
show mac-address-table mac addresses learned by ports
show system platform wide config information
show interface Shows status/config of selected interface
show spanning-tree displays spanning-tree info

Sun Blade Management GUI:


http:// switch_IP_address:80 (ipaddress # from 'show running-config' command 'show system' for port address)

switch ports:
NETPn ports are external uplink switch ports. There is no correlation of NETPn port to blade number.
SNPn ports are internal downlink switch ports that are connected to the blades ce interfaces.
There is a 1 to 1 correlation of SNPn port to blade number ( ce0 to ssc0/swt, ce1 to ssc1/swt)
Setting up Vlans:
Vlans are assigned to ports and can be designated as tagged or untagged. A tagged vlan is
one that uses tagged communication to a vlan aware interface. A untagged vlan passes
all untagged traffic. Ports that have the same vlan assigned to it can communicate together.

The formula for determining a Solaris interface number for a tagged vlan (VID) is:
1000 * VID + device PPA = Vlan logical PPA
vlan 15 on ce0 : 1000 * 15 + 0 (for ce0) = ce15000
vlan 15 on ce1 : 1000 * 15 +1 (for ce1) = ce15001

Ex: to assign blade s0 and blade s1 interface ce0 to vlan 15 you would do the
following:
on S0 and S1:
# ifconfig ce15000 plumb
# ifconfig ce15000 inet ip_address netmask + broadcast + up
create/add hostname to /etc/hostname.ce15000
add ip_address (es) and hostanmes to /etc/hosts
on switch:
Console# config
Console(config)#vlan database
Console(config-vlan)#vlan 15 name VLAN15 media ethernet
Console(config-vlan)#end
Console#config
Console(config)#interface ethernet SNP0 (s0 ce0 is connected to SNP0 port)
(continued on next page)
Page 97
b1600 cont...

Console(config-if)#switchport allowed vlan add 15 tagged


Console(config-if)#end
Console#
Console#config
Console(config)#interface ethernet SNP1(s1 ce0 is connected to SNP1 port)
Console(config-if)#switchport allowed vlan add 15 tagged
Console(config-if)#end
Console#

(you would follow the same procedures if creating untagged vlans only the interface would remain
ce0 and the switch command would not have 'tagged' at the end. ALSO: if you want the vlan to
be seen outside the chassis you must allow it on a external port NETPn)

Trunking: (ports grouped together to act as one)


to create a static trunk (external ports NETP3 and NETP3 are put into trunk2): ports must be connected to a
static trunk on another switch.

Console#config
Console(config)#interface port-channel 2
Console(config-if)#exit
Console(config)#interface ethernet netp2
Console(config-if)#channel-group 2
Console(config-if)#exit
Console(config)#interface ethernet netp3
Console(config-if)#channel-group 2
Console(config-if)#end
Console#show interface status port-channel 2

to create LACP (link aggregation connection protocol) trunk: ports must be connected to a LACP- enabled
trunk ports on another switch

Console(config)#interface ethernet netp4


Console(config-if)#lacp
Console(config-if)#exit
Console(config)#interface ethernet netp5
Console(config-if)#lacp
Console(config-if)#exit

(The trunk is automatically activated if LACP is enabled on the connected port of the
target switch. A trunk formed with another switch using LACP is automatically assigned the
next available trunk ID)

Spanning tree:
Where two bridges are used to connect the same two computer network segments, a spanning
tree configuration occurs. Because spanning trees have multiple paths to the same destination,
a condition called 'bridge loop' is created. 'Spanning tree protocol' is communications between
bridges designed to eliminate the loop path. Caution should be used if you are configuring the
switch for spanning tree protocol. In that it will effect switches in the customers network.

Page 98
b1600 cont...

Full list of commands:

sc commands:
bootmode reset_nvram|diag|skip_diag| normal|bootscript= string sn {sn} This command allows you to specify a
boot mode for a blade. You need to use it to boot Linux blades for the first time
break -y s# Command causes blade to drop from Solaris into either kadb or OBP
console -f -r Access console of a switch or blade. (ssc#/swt,s#) type #. to return to the sc> promp
consolehistory -b -e -g Displays the contents of the switch or blade consoles buffer. (boot|run ssc#/swt|s#)
flashupdate -s IPaddress -f path -v ssc# s# Enables you to upgrade firmware to a System Controller or to a blade
help [command] Provides help text for specified command
logout
password command allows a user to change his or her own password
poweroff -f -y -s -r Powers off components (ch,ssc#,s#)
poweron -f -y -s -r Powers on components. (ch,ssc#,s#)
removefru -f -y Powers down components (ch,ssc#,s#)
reset -y -x Resets components (s#,ssc#/swt,ssc#/sc,ssc#)
resetsc -y Resets the active System Controller.
setdate set the time of day on the System Controller, switches, and server blades.
setdefaults -y Returns the active System Controller (but not its switch) to the factory default settings.
setfailover Tells you which System Controller is the active and standby System Controller.
setlocator on off Turn on/of blade locator
setupsc Enables you to configure the active System Controller interactively.
showdate Displays the current date and time
showenvironment -v Displays environmental sensors status in components of the chassis. (ssc#,psn,s#)
showfru Displays the contents of component (s) FRUID database (ssc#,s#,ch,psn)
showlocator Tells you whether the locator LED is on or off.
showlogs -b -e -g -v Displays the events (s#, ssc#)
showplatform -v -p Displays the status of each component. (ssc#,ssc#/swt,psn,s#,ch)
showsc [-v] Displays a summary of the configuration of the active System Controller.
showusers Shows the users currently logged into the System Controller.
standbyfru -f -y Powers down components (ch, ssc#, s#)
u Gives user administration privileges
useradd username Adds a named user to the list of permitted System Controller users.
userdel username Deletes a user from the list of permitted System Controller users.
userpassword username allows a user with a-level permissions to alter another users password.
userperm username aucr specifies the named users permission levels.
usershow username Shows details of the specified users login account.

switch comands: (use ? and help commands for assistance)

switch Exec commands:

clear counters Clears statistics on an interface


logging Clears messages from the logging buffer
mac-addresstable dynamic Removes any learned entries from the forwarding database
config Activates global configuration mode
copy Copies a code image or a switch configuration to or from Flash memory or a TFTP server
file Copy from file system
running-config Copy from current system configuration
startup-config Copy from startup configuration
tftp Copy from tftp server

Page 99
b1600 cont...

debug Debugging functions


delete Deletes a file or code image
dir Displays a list of files in Flash memory
disable Returns to normal mode from privileged mode
exit Returns to the previous configuration mode, or exits the CLI
flowcontrol Enables flow control on a given interface
garp timer Sets the GARP timer for the selected function
help Description of the interactive help system
? Shows options for command completion (context sensitive)
hostname Specifies or modifies the host name for the device
ip dhcp restart Submits a BOOTP or DHCP client request
login Enables password checking at login
password Specifies a password on a line
password-thresh Sets the password intrusion threshold, which limits the number of failed logon attempts
ping Sends ICMP echo request packets to another node on the network
port monitor Configures a mirror session
security Configures a secure port IC
quit Exits a CLI session
reload Restarts the system
show bridge-ext Shows bridge extension configuration
bridge multicast Shows the IGMP snooping MAC multicast list
gvrp configuration Displays GVRP configuration for selected interface
garp timer Shows the GARP timer for the selected function
interfaces status Displays status for the specified interface
port-channel Shows information about a particular aggregated link.
vlan Displays status for the specified VLAN interface
counters Displays statistics for the specified interface
switchport Displays the administrative and operational status of an interface
ip interface Displays the IP settings for this device
redirects Displays the default gateway configured for this device
filter Displays filter rules or captured packets
igmp snooping Shows the IGMP snooping configuration
mrouter Shows multicast router ports
line Displays a terminal line's parameters
logging Displays the state of logging
mac-addresstable Displays entries in the bridge-forwarding database
aging-time Shows the aging time for the address table
map ip precedence Shows the IP precedence map
dscp Shows the IP DSCP map
port monitor Shows the configuration for a mirror port
queue bandwidth Shows round-robin weights assigned to the priority queues
cos-map Shows the class-of-service map
radius-server Shows the current RADIUS settings
running-config Displays the configuration data currently in use
snmp Displays the status of
spanning-tree Shows the spanning tree configuration
startup-config Displays the contents of the start up configuration
system Displays system information
tacacs-server Shows the current TACACS settings
users Shows all active console and Telnet sessions,
version Displays version information for the system

Page 100
b1600 cont...

vlan Shows VLAN information


shutdown Disables an interface
silent-time time the management console is inaccessible after unsuccessfullogon attempts exceeded
spanning-tree protocol-migration Re-checks the appropriate BPDU format
whichboot Displays the files booted

switch Configure commands:

authentication login Defines logon authentication method and precedence


boot system Specifies the file or image used to start up the system
bridge-ext gvrp Enables GVRP globally for the switch
capabilities Advertises the capabilities of a given interface for use in auto-negotiation
channel-group Adds a port to an aggregated link
description Adds a description to an interface configuration
enable [level] Use this command to activate Privileged Exec mode.
password Sets a password to control access to the Privileged Exec level
end Returns to Privileged Exec mode
exec-timeout Sets the interval that the command interpreter waits until user input is detected
exit Exit from global configure mode
help Description of the interactive help system
hostname Specifies or modifies the host name for the device
interface Configures an interface type and enters interface configuration mode
ethernet Ethernet IEEE 802.3
portchannel Configures an aggregated link and interface configuration mode for the aggregated link
vlan Enters interface configuration mode for a specified VLAN
ip filter Blocks specified IP packets from entering the internal management port (NETMGT)
http port Specifies the port to be used by the Web browser interface
server Allows the switch to be monitored or configured from a browser
address Command to set the IP address for this device
dhcp restart Submits a BOOTP or DHCP client request
client-identifier Specifies the DHCP client identifier for the switch
default-gateway Defines the default gateway
igmp snooping Enables IGMP snooping
vlan static Adds an interface as a member of a multicast gro up
version Configures the IGMP version for snooping
querier Allows this device to act as the querier for IGMP snooping
query-count Configures the query count
query-max-responsetime Configures the report delay
router-port-expiretime Configures the query timeout
vlan mrouter Adds a multicast router port
jumbo-frame Enables support for jumbo frames
lacp Configures LACP for the current interface IC 4-168
line Identifies a specific line for configuration and starts the line configuration mode
logging on Controls logging of error messages
history Limits syslog messages saved to switch memory based on severity
mac-address-table aging-time Sets the aging time of the address table
static Maps a static address to a port in a VLAN
map ip precedence Enables IP precedence class-of-service mapping
map ip precedence Maps IP precedence value to a class of service
map ip dscp Enables IP DSCP class-of-service mapping
map ip dscp Maps IP DSCP value to a class of service

Page 101
b1600 cont...

negotiation Enables auto-negotiation of a given interface


no Negate a command or set its defaults
queue bandwidth Assigns round-robin weights to the priority queues
queue cos map Assigns class-of-service values to the priority queues
radius-server host Specifies the RADIUS server
port Sets the RADIUS server network port
key Sets the RADIUS encryption key
retransmit Sets the number of retries

timeout Sets the interval between sending authentication requests


snmp-server contact Sets the system contact string
location Sets the system location string
host Specifies the recipient of an SNMP notification operation
enable traps Enables the device to send SNMP traps (SNMP notifications)
spanning-tree Enables the spanning tree protocol
spanning-tree mode Configures STP or RSTP mode
forward-time Configures the spanning tree bridge forward time
hello-time Configures the spanning tree bridge hello time
maxage Configures the spanning tree bridge maximum age
priority Configures the spanning tree bridge priority
pathcost method Configures the path cost method for RSTP
transmission-limit Configures the transmission limit for RSTP
cost Configures the spanning tree path cost of an interface
portpriority Configures the spanning tree priority of an interface
edgeport Enables fast forwarding for edge ports IC
linktype Configures the link type for RSTP
speed-duplex Configures the speed and duplex operation of a given interface
switchport broadcast packetrate Configures the broadcast storm control threshold
mode Configures VLAN membership mode for an interface
acceptable-frame-types Configures frame types to be accepted by an interface
ingress-filtering Enables ingress filtering on an interface
native vlan Configures the PVID (native VLAN) of an interface
allowed vlan Configures the VLANs associated with an interface
gvrp Enables GVRP for an interface
forbidden vlan Configures forbidden VLANs for an interface
gvrp Enables GVRP for an interface
forbidden vlan Configures forbidden VLANs for an interface
priority default Sets a port priority for incoming untagged frames
tacacs-server host Specifies the TACACS server
port Sets the TACACS server network port
key Sets the TACACS encryption key
username Establish User Name Authentication
vlan database Enters VLAN database mode to add, change, and delete VLANs
vlan Configures a VLAN, including VID, name and state

Page 102
Cluster 3.x: http://suncluster.eng http://cluster.central (Installation Information)

Introduction: Sun Cluster 3 is the first integrated release of Sun's next generation
Full Moon clustering technology. Sun Cluster 3 extends Solaris with the
Full Moon cluster framework, enabling the use of core Solaris services such
as file systems, devices, and networks seamlessly across a tightly coupled
cluster and maintaining full Solaris compatibility for existing applications.

Key Benefits: Higher / Near continuous availability of existing applications based on


Solaris services such as highly available file system and network services.

Integrates/extends the benefits of Solaris scalability to dotCOM application


architectures by providing scalable and available file and network services for
horizontal applications.

Ease of management of the cluster platform by presenting a simple unified management


view of shared system resources.

General: Configuration guide is located at suncluster.eng. All Information is too much to show here.
Below are some highlights.

Up to 8 nodes in a cluster including single node clusters.


Sun and EMC storage supported with others starting in May 04.
Failover, Scalable and OPS/RAC Services
Supports Solaris 8 and 9
PNM is supported for 3.x and IPMP for 3.1 for Public net.
Supports QFE, Gigabit, Wildcat and SCI for Private net.
Supports different types of server nodes in the cluster.
DMM not supported. Have to use STMS or Powerpath which overides it.
Terminal concentrator isn't mandatory. Can use RSC or system controllers.

Admin w/s: Admin Workstation not mandatory. Management GUI is now web based.
Good to install Sun Console software on Sun machine to have access to double window GUI.

Server Requires end user distribution. However Server Storage and some Software
may require more. Best to at least install Full distribution.

Topologies Clustered Pair


N+1
Pair + N
N to N scalable
Diskless Cluster
Single-node Cluster

Hardware Notes:
Must change the initiator id on one node if using SCSI arrays between 2 nodes
See info Doc 20704 for scsi initiator change procedure.

When a disk is replaced, the cluster needs to be made aware through the
scdidadm command.

Page 103
Cluster 3.x: (cont...)
Wiring Diagrams - See the configuration guide on internal site: suncluster.eng.

Commands:

boot -x Bring server up w/o cluster


ccp Used to run the cluster control panel software
#ccp clustername

scstat Used to get a status of the whole or part of the cluster.


-D Shows status for all disk device groups.
-g Shows status for all resource groups.
-i Shows status for all IP Network Multipathing groups.
-n Shows status for all nodes.
-p Shows status for all components in the cluster. Use with -v[v] to display more verbose
output.
-q Shows status for all device quorums and node quorums.
-v[v] Shows verbose output.
-W Shows status for cluster transport path.

scrgadm manage registration and unregistration of resource types, resource groups, and resources

Show Current Configuration:


-pv [v] -t resource_type_name -g resource_group_name -j resource_name
Resource Type Commands: (add, change, remove)
-a -t resource_type_name -h RT_installed_node_list -f registration_file_path
-c -t resource_type_name -h RT_installed_node_list
-r -t resource_type_name
Resource Group Commands: (add, change, remove)
-a -g RG_name -h nodelist -y property
-c -g RG_name -h nodelist -y property -y property
-r -g RG_name
Resource Commands: (add, change, remove)
-a -j resource_name -t resource_type_name -g RG_name -y property -x extension_property
-c -j resource_name -y property -x extension_property
-r -j resource_name
Logical Host Name Resource Commands: (add)
-a -L -g RG_name -j resource_name -l hostnamelist -n netiflist -y property
Shared Address Resource Commands: (add)
-a -S -g RG_name -l hostnamelist -j resource_name -n netiflist -X auxnodelist -y property

scconf Update the cluster software configuration. Recommend running scsetup and this will print out the
scconf command used. Therefore remember and use the commands you use repetitively.
-pv[v] Prints out the configuration.

scinstall Install Sun Cluster software and initialize new cluster nodes.
-pv[v] Print out packages and versions installed.

scsetup Interactive cluster configuration tool similar to vxdiskadm in Veritas.

scdidadm The scdidadm utility administers the device identifier (DID) pseudo device driver did
-C Removes references to nonexistent devices on the cluster nodes.
-l Lists the local devices in the DID configuration file.
-L Lists all the paths, including those on remote hosts, of the devices in the DID config file.
-r Reconfigures the database.
Page 104 -R Performs a repair procedure on a particular device instance.
Cluster 3.x: (cont...) Commands:

scshutdown Shut down a cluster

scvxinstall The scvxinstall utility provides automatic VxVM installation and optional root-disk encapsulation
for Sun Cluster nodes.

scgdevs Global devices namespace administration script

scswitch Perform ownership and state change of resource groups and disk device groups in Sun Cluster
configurations. Below are some examples:

Misc Procedures:
Device Groups:
Register a new disk group:
scconf -a -D type=vxvm,name=new_disk_group,nodelist=nodex:nodex
Sync device group info after adding a volume:
scconf -c -D name=diskgroup,sync
Getting registered device group information:
scstat -D
Switch a device group off a node:
scswitch -z -D device_group -h node
Switch a device group offline (must be quiescent and unmounted)
scswitch -F -D device_group
Switch a device group into maintenance state (must be quiescent and unmounted)
scswitch -m -D device_group
Switch a device group online:
scswitch -z -D device_group -h node
Resource Groups:
Get current resource group status:
scstat -g
Switch a resource group to another node:
scswitch -z -g resource_group -h node
Switch all resource and device groups off a node:
scswitch -S -h node
Take a resource group offline on all nodes:
scswitch -F -g resource_group
Bring a resource group online on all nodes:
scswitch -Z -g resource_group
View configured resource groups:
scrgadm -p[v][v]

Removing a resource group: Before a resource group may be removed, all resources within the group
must be removed. The steps required are:
1) take the resource group offline
scswitch -F -g resource_group
2) disable the resources within the group
scswitch -n -j name_of_resource
3) remove the resources within the group
scrgadm -r -j name_of_resource
4) remove the resource group
scrgadm -r -g resource_group

Page 105
SMS upgrade 1.4.1: (see SMS 1.4.1 install guide http://www.sun.com/servers/highend/sms.html)

Download your SMS packages: http://www.sun.com/servers/highend/sms.html (make sure to run cksum and compare)
(also on EIS CD3 starting Apr-27-04)
- unzip file and note location

Prepare for Upgrade:


- switch user to sms-svc
- Make sure SCs are stable, no data syncs, DR, hw changes in progress
- Turn off failover on main SC (SC0) sc0:sms-svc:>setfailover off
- Stop SMS on the spare SC (SC1) sc1:#/etc/init.d/sms stop
- Backup SMS on spare (optional) sc1:#smsbackup (can add UFS dest dir. default: /var/tmp)

Upgrade Solaris Operating environment (optional)


sms 1.4.1 will work with sol8 and sol9. There is a different SMS package for each O/S version.
Sol8 02/02
Sol9 04/04
(if you upgrade O/S add all patches and reboot. stop sms again if rebooted)

Upgrade SMS software packages using smsupgrade: (spare sc first SC1)


- cd to download directory sc1:# cd /download_dir/sms_1_4_1_sparc_System_Management_Services_1.4.1/Tools
- smsupgrade sc1:# ./smsupgrade /download_dir/sms_1_4_1_sparc_System_Management_Services_1.4.1/Product

Switch control to spare SC (SC1)


- stop SMS on main SC (SC0) sc0:# /etc/init.d/sms stop
- bringdown spare (SC1) sc1:#init 0
- boot spare (SC1) to activate pkgs and become main OK> boot -rv

Update the SC and CPU flash PROMs on the new main SC (SC1)
- switch user to sms-svc
- flash SC: sc1:sms-svc:> flashupdate -f /opt/SUNWSMS/firmware/SCOBPimg.di sc1/fp0
sc1:sms-svc:> flashupdate -f /opt/SUNWSMS/firmware/nSSCPOST.di sc1/fp1 CP1500 only
sc1:sms-svc:> flashupdate -f /opt/SUNWSMS/firmware/oSSCPOST.di sc1/fp1 SCV2(cp2140) only

- flash SBs: sc1:sms-svc:> flashupdate -f /opt/SUNWSMS/hostobjs/sgcpu.flash sb0 sb1 sb2 ect...


(must specify location for sms 1.4.1)
- bring down sc1 sc1:# init 0
- boot sc1 OK> boot -rv

Upgrade the former main SC (SC0)


- Download your SMS packages: www.sun.com/servers/sw (make sure to run cksum and compare)
- unzip file and note location
- stop SMS on the former main (SC0) sc0:# /etc/init.d/sms stop
- Backup SMS on former main (SC0) (optional) sc0:# smsbackup (can add UFS dest dir. default: /var/tmp)

Upgrade Solaris (optional)


sms 1.4.1 will work with sol8 and sol9. There is a different SMS package for each O/S version.
Sol8 02/02
Sol9 04/04
(if you upgrade O/S add all patches and reboot. stop sms again if rebooted)

Page 106
smsupgrade 1.4.1: (Cont...)

Upgrade SMS on former main (SC0)


- cd to download directory sc0:# cd /download_dir/sms_1_4_1_sparc_System_Management_Services_1.4.1/Tools
-smsupgrade sc0:# ./smsupgrade /download_dir/sms_1_4_1_sparc_System_Management_Services_1.4.1/Product

Reboot the former main SC (SC0)


- bringdown former main (SC0) sc0:#init 0
- boot (SC0) to activate pkgs and become main OK> boot -rv

Update the SC PROMs on the former main SC (SC0)


- switch user to sms-svc
- flash SC: sc0:sms-svc:> flashupdate -f /opt/SUNWSMS/firmware/SCOBPimg.di sc0/fp0
sc0:sms-svc:> flashupdate -f /opt/SUNWSMS/firmware/nSSCPOST.di sc0/fp1 CP1500 only
sc0:sms-svc:> flashupdate -f /opt/SUNWSMS/firmware/oSSCPOST.di sc0/fp1 SCV2(cp2140) only
- bring down sc0 sc0:# init 0
- boot sc0 OK> boot -rv

Verify chasis serial number main SC (SC1)


- switch user to sms-svc
- check chasis serial # sc1:sms-svc:>showplatform -p csn
- record serial # sc1:sms-svc:>setcsn -c serial_numb

Enable failover on main SC (SC1) sc1:sms-svc:>setfailover on

Solaris 9 SVM (sds) disk replacement: (also see infodoc ID73132 )

Beginning with Solaris 9, SVM uses a new feature called Device-ID


which identifies each disk not only by it's c#t#d# name, but by
a unique ID generated by the disk's WWN or serial number.

Mirrored disk replacement: (use when submirror “ State: Needs maintenance” in metastat cmd)
On failing disk: (If you can access the disk, if not start at the cfgadm -c unconfigure step)
# umount filesystem (unmount any non-svm open filesystems on failed disk)
# metadb -d c1t0d0s7 (if replicas on this disk, remove them)
# metadb | grep c1t0d0s0 (verify there are no existing replicas left on the disk)
# cfgadm -c unconfigure c1::dsk/c1t0d0 (might not complete command if busy, remove failed disk)

Insert a new disk :


# cfgadm -c configure c1::dsk/c1t0d0 (configure new disk)
# prtvtoc /dev/rdsk/c0t0d0s2 > /tmp/firstdisk (get format for new disk)
# fmthard -s /tmp/firstdisk /dev/rdsk/c1t0d0s2 (format disk same as mirror)
# metadevadm -u c1t0d0 (will update the New DevID)
# metadb -a c1t0d0s7 (if necessary, recreate any replicas)
# metareplace -e d0 c1t0d0s0 (do this for each submirror on the disk)
# metastat -i (will change unavailable state of devices to Okay)

Raid-5 disk replacement: (use when raid unit “ State: Needs maintenance” in metastat cmd)
On failing disk:(If you can access the disk, if not start at the cfgadm -c unconfigure step)
# umount filesystem (unmount any open non-svm filesystems on this disk)
# metadb -d c1t0d0s7 (any replicas on this disk, remove them)
# metadb | grep c1t0d0 (verify there are no existing replicas left on the disk)
# cfgadm -c unconfigure c1::dsk/c1t0d0 (might not complete command if busy, remove the failed disk)

Page 107
Solaris 9 SVM (sds) disk replacement: (cont...)

Insert a new disk :


# cfgadm -c configure c1::dsk/c1t0d0
Run 'format' or 'prtvtoc' to put the desired partition table on the new disk
# metadevadm -u c1t0d0 (will update the New DevID)
# metadb -a c1t0d0s7 (if necessary, recreate any replicas)
# metareplace -e <raid5-md> c1t0d0s0 (do this for each raid on the disk)
# metastat -i (will change unavailable state of devices to Okay)

SC rebuild after total disk failure: (Sun Fire 12k/15k)


Use this procedure after disk replacement to rebuild an SC that experienced a total disk failure.
This is a modified version of the `Fresh Installed SCs' portion of the 12k/15k & 20k/25k EIS checklist.
http://sunweb.germany/EIS/Web/inst-support/checkl.html.
Note: A smsbackup from the other SC on the platform should not be restored on the failed SC. The smsbackup file
must come from the same SC that failed.
Items needed:

Solaris OE CDs (operating system install)


SMS Software (http://www.sun.com/servers/highend/sms.html)
EIS CDs
smsbackup file (from failed SC or ID-PROMs from service call)
explorer output file (from failed SC http://proactive.central)

On Main SC as user sms-svc: setfailover off


On Failed SC at ok prompt:
check OBP settings: setenv auto-boot? false
setenv diag-level pmax-epvmax
setenv input-device ttya
setenv output-device ttya
setenv local-mac-address? true
setenv diag-switch? true
setenv post-on-sir? true
setenv diag-device <same as boot-device>
- Inital SC bootup: boot cdrom.
- Get Solaris install info from explorer output (/etc/nodename, /etc/hosts, /etc/nsswitch.conf, /disks/prtvtoc ect...)
you can also reference install docs and customer supplied info.
- Install SC as per EIS "Install Spec". Entire Distribution is required.
- Install Solaris & select manual reboot.
- Fix the "No SOF Interrupt" problem. Append to /a/etc/system: exclude: drv/ohci (Makes booting much faster)
- Touch /a/etc/notrouter Disable routing.
- Reboot SC.
- Log in as user root.
- Insert EIS-CD-ONE Copy the EIS-CD to the system disc: cd /cdrom/...sun/install; sh copy-cd2sun.sh
- Insert EIS-CD-TWO. Copy the EIS-CD to the system disc: cd /cdrom/...sun2/install; sh add-cd2sun.sh
- Edit /etc/dfs/dfstab Share directory /sun
- Run setup-standard as user root: cd /sun/install; sh setup-standard.sh
(Do NOT select option to install SAN Foundation Suite.)
(PTS recommends activation of alternate break sequence on SCs)
- Log out & back in to set environment. Or enter: . $HOME/.profile
- Ensure that NIS is not configured. (If NIS/NIS+ used "files" must be first in /etc/nsswitch.conf.)
- Install Solaris patches: Recommended Cluster and Additional Solaris Patches (/sun/patch/<SolarisVn>)
- Solaris 8: Verify entry in /etc/system set TS:ts_sleep_promote=1 (EIS-ALERT#22)
- Fix sendmail messages "My unqualified host name unknown" (/etc/hosts append <hostname>.somewhere.com)
- Reboot
Page 108
SC rebuild after total disk failure: (cont...)

- Install SDS/SVM software.


- Patch the SDS software (Solaris 8 only). /sun/patch/sds/<Vn>
- Install the SMS software on failed SC. (web release: http://www.sun.com/servers/highend/sms.html)
- Patch SMS software /sun/patch/SMS/<Vn>
- As root run smsrestore on failed SC. Use file from smsbackup or install the IDPROM files obtained
via the service call.
- Reboot SC.
- Mirror the system disk. See scripts on EIS-CD in /sun/tools/SF15K (SDS Infodoc 28196)
- Set boot-device & diag-device to both sides of the mirror. (SDS: sds-disk, sds-mirror) (See Infodoc 11854)
- If NVRAM editor (nvedit) was used ensure to setenv use-nvramrc? true
- Set up UFS-ACLs for user sms-svc on SC. As root run script sms-svc-setup.sh (EIS-CD: sun/tools/SF15K)
- As user sms-svc: touch $HOME/.hushlogin
- Append "share cdrom* -o ro,anon=0" to /etc/rmmount.conf
- Share /export/install if not already. (/etc/dfs/dfstab)
- Set up /etc/defaultrouter according to customer requirements.
- Verify connectivity to defaultrouter (eg via ping).
- Execute smsconfig -m on failed SC. Use data from explorer output from failed SC, Cu supplied info reference.
(if you restored the smsbackup for the failed SC, select 'Edit Network Settings'. All the IP hostnames
will be populated and you will only have to supply the IP addresses and save the settings. smsconfig will
populate your host, netmasks and hostname files.)

(if you did not have the smsbackup file, and restored the IDPROM files, you will have to Set platform name
and change base ip addresses if necessary. Use explorer output from failed SC, Customer supplied info for
reference. Also see infodoc ID71490)
- The smsconfig -m command modifies the hosts file. Check it to be sure things are as they should be.
- Verify auto-boot?=true, watchdog-reboot?=false (eeprom auto-boot?, eeprom watchdog-reboot?)
- Shutdown newly loaded SC and do hard reset. (Press reset button on SC).
On MAIN SC as user sms-svc: setfailover on Wait 5 minutes.....
On MAIN SC as user sms-svc: Verify setfailover (showfailover -v) and showdatasync are "ACTIVE" to propogate
changes to spare SC.

- Run explorer and SunCheckup on both SCs, compare outputs and correct any errors.
- When datasync is completed: On Main and spare SC, make a backup copy of sms files (smsbackup)

15K DR examples: (also see serengetti/15k dr commands page 87, infodoc 76795 How to DR a Single PCI Card)
(cfgadm commands run from domain)

# cfgadm -val (get name “app ID” of board to use with cfgadm -c 'disconnect' or configure command)
# cfgadm -val | grep permanent (see what SB has perm memory)
# cfgadm -c disconnect SB0 (removes SB0)
# cfgadm -c configure SB0 (adds SB0 back into domain)
# cfgadm -c disconnect IO1 ( removes IO1 and all pci adapters on it)
# cfgadm -c configure IO1 (configures IO1 back into domain) IO PCI slot #s
# cfgadm -c disconnect pcisch5:e01b1slot0 (removes pci card in IO1 slot 0) |3|1|
# cfgadm -c disconnect pci_pci0:e00b1slot1 (removes pci card in IO0 slot1) |2|0|
# cfgadm -c configure pcisch5:e01b1slot0 (configures pci card in IO1 slot 0 into domain)
15/25K hpost:
sms-svc> hpost -d r -l127 (run hpost on domain R level 127)
.postrc (etc/opt/SUNWSMS/adm/config/platform or A-R)
level 64 (run level 64)
dash_H_level 127 (run level 127 when DRing a board into domain)
no_ioadapt_ok (test SB only. Good when you create a test domain w/o IO)
no_obp_handoff (when testing SB only don't attempt to load obp)
Page 109
SMSbackup: (how to manually expand backup file) also see infodoc 77357

- Copy backup file to /tmp


- sms-svc> cpio -icvdum < /tmp/sms_backup.1.4.1.cpio.0

3310/3510 Disk replacement: (also see infodoc 78432 and page 84)

- save nvram info: system functions, Controller maintenance, Save NVRAM to disks, yes
- Identify bad disk: view and edit scsi device, look for BAD or FAILED status, note Chl, Id and LG_DRV #s,
select bad drive, Identify scsi drive, flash all But Selected drive, Flash Drive Time, yes (go find the disk)

disk ID #s(single bus 3310) disk ID #s (dual bus 3310 ) disk ID#s (3510)
Chl 0 Chl 2 Chl 0 Ch 0 / Ch2
0 3 8 11 0 3 0 3 0 3 6 9
1 4 9 12 1 4 1 4 1 4 7 10
2 5 10 13 2 5 2 5 2 5 8 11

- Physically unseat bad disk, let spin down 20 sec, then remove
- Install replacement disk
- view and edit scsi device, look for NEW_DRV or USED_DRV status.
If not seen: select a disk, Scan scsi drive, select Chl (use noted #), select Id# (of replacement), yes
- Is replacement to be new local or global spare? If not skip to copy and replace step
if so: view and edit scsi device, select replacement disk, add Global spare drive or add Local spare drive, yes
- If replaced disk cannot be spare. view and edit logical drives, select logical drive, select PREVIOUS spare
disk, copy and replace drive, yes (when copy is completed assign PREVIOUS spare back in step above)

How to mount a CD image file (.iso) as a filesystem: (see SRDB 50566)

# lofiadm -a /export/install/sol-10-b72-sparc-v1.iso (must use absolute path to iso file)


/dev/lofi/1
# mkdir /cd1 (create a mount point)
# mount -F hsfs -o ro /dev/lofi/1 /cd1 (mount /dev/lofi/# on the mount point)
# df -k /cd1
Filesystem kbytes used avail capacity Mounted on
/dev/lofi/1 239904 239904 0 100% /cd1

To mount a slice of an .iso image (like s1 when doing a 'setup_install_server')


# mkdir /s1 (create the mountpoint)
# dd if=sol-10-b72-sparc-v1.iso of=vtoc bs=512 count=1 (make a copy of the vtoc)
# od -D -j 452 -N 8 < vtoc (starting cyl and block length for s1 is 452 bytes into vtoc and is 8 bytes long)
0000000 0000000750 0000857600 (slice1 starts at cyl 750 and is 857600 blks long)
0000010
# echo 750*640 | bc (Starting cyl750 *blks/cyl always 640 = s1 starting blk is 480000)
480000
# dd if=sol-10-b72-sparc-v1.iso of=sol-10-b72-sparc-v1-s1.iso bs=512 skip=480000 count=857600
# lofiadm -a /export/install/sol-10-b72-sparc-v1-s1.iso
/dev/lofi/2
# mount -F ufs -o ro /dev/lofi/2 /s1
# df -k /cd1 /s1
Filesystem kbytes used avail capacity Mounted on
/dev/lofi/1 239904 239904 0 100% /cd1
/dev/lofi/2 402086 397100 0 100% /s1

Page 110
Removing the top cover on a V20z: (very tricky :-)

Keep top button down, pull cover forward until click, slide to the rear.

Explorer -w scextended from cron:

- Add IP address and password (if used) of SC to the /etc/opt/SUNWexplo/scinput.txt file.


- run crontab -e and add -w default,scextended to the explorer entry
ex: 0 0 * * 1 /opt/SUNWexplo/bin/explorer -q -e -w default,scextended

Useful COD commands: ( to obtain a license www.sun.com/licensing) 5.14.00 and up see Info doc 81531
showcodlicense (-r)
addcodlicense sc> addcodlicense 01:80d8a855:000000000:0201010100:c:00000000:BLqg5Ko
deletecodlicense
enablecodboard <sb#> Used to replace a COD sb (need service passwd on Sun Fires)
showcodusage
showplatform -p cod (addcodlicense will populate this area)
setupplatform -p cod
showboards

ALOM4v: Niagra (Ontario, Erie) (initial login/password admin/admin1) also see ALOM commands on page 94

Removed in ALOM4v: Reduced managed system interface:


Solaris 'scadm' , Solaris 'locator', 'prtfru' cannot access DFRUID PROMs, 'prtdiag'/'prtpicl' no environmentals.
ALOM Alerts not forwarded to host syslog
ALOM 'setsupsc' questions related to managed system interface removed.
Removed ALOM environment variables:
sys_eventlevel, sys_hostname, ALOM cannot detect hung OS
Removed ALOM variables:
sys_autorestart, sys_xirtimeout, sys_wdttimeout "No CPU Signature (OBP and OS Status) support!
ALOM 'showplatform' cannot display Booting/OS Running state, stops at running
sys_bootrestart, sys_bootfailrecovery, sys_maxbootfail, sys_boottimeout

New in ALOM4v:
Password recovery (procedure on page 113)
If the admin password is lost/forgotten, can reset the NVRAM to factory defaults, including clearing all users.
Requires physical access to the machine to unplug power cords and connect to ALOM serial port. "
Flashupdate protection
ALOM flash is in two segments with a persistant switch.
'flashupdate' always operates on the non-running segment. Segments are only switched after flashupdate
completes and image is CRC verified. A jumper can also switch the segments.
Ex: sc> flashupdate -s 129.148.173.99 -f /tmp/122430-01/System_Firmware-6_1_2-Sun_Fire_T2000.bin-latest
Supports new LED States:
White locator LED flashes at 4Hz when activated.
Green LED states:
Standby blink: 0.1sec on, 2.9sec off. When system is on standby power
Slow blink: 0.5 sec on, 0.5sec off: When system is in transition (running POST, powering down, etc)
Steady ON: system is running
Amber LED states:
Off: No faults.
On: Service required.
Amber slow blink to indicate unacknowledged faults not supported.
Page 111
ALOM4v: (cont)

New in ALOM4v: ALOM handles the fault by:


Lighting the Fault LED(s)
Logging the fault to DFRUID of the indicted FRU(s)
Alerting the user using ALOM alerting mechanisms:
To logged-in ALOM users
To an email address (if configured) "

New ALOM commands:

showfaults Prints any faults Environmental faults, faulty FRUs, POST-detected faults, which result in ASR-disable
FMA-detected faults, prints the time and status of the last POST run.
clearfault <UUID> to manually clear an FMA-diagnosed fault. (get UUID from showfaults output)
ASR commands:
showcomponent view and manage the list of blacklisted (ASR-disabled) devices
enablecomponent disabled state is stored on the actual FRU, such as the DIMM itself.
disablecomponent A FRU disabled on one system will remain disabled when inserted in another system
clearasrdb
setkeyswitch
normal: System can be used normally.
stby: Powers off the system and prevents 'poweron' command or button from operating.
diag: Forces the system to run servicemode diagnostics at next reset.
locked: Prevents 'flashupdate' and 'break' commands, system can power on/off and reset normally.
showkeyswitch
showfru command prints both static and dynamic sections
setfru command to set Customer_DataR in all FRUs
showhost version command to print the software versions contained in the Host flash prom.
obpupdate command to update the Host flash prom (POST, OBP, etc). 'obpupdate' and 'flashupdate' will be merged
into a single command which will update both ALOM and the Host flash from a single master image
flash host prom
Servicemode commands: Be sure to set sc_servicemode to false when done!
setsc sc_servicemode true Warning: misuse of this mode may invalidate your warranty.
showplatform -v will print CPU #Cores and version information. "
ping <ipaddress> - test network connectivity
clearnvramlog - erases persistent 'showlogs -v'
frucapture - offload a FRU's DFRUID image via FTP
fruupdate - update (overwrite) a DFRUID image via FTP
setcsn - set the chassis serial number, required when replacing the PDB board.
Can only be executed one time and only with a blank (new) PDB
fmagentconfupdate - field update FMA agent via FTP
showfmfaults - show current FMA faults stored on the DOC (Disk-on-chip)
showfmerptlog1 - show the first 40 ereports on DOC
showfmerptlog2 - show the last 40 ereports on DOC
clearereports - clear the ereport logs from DOC
docftpput - FTP a DOC file off of ALOM. " Note: the above command names may change by product ship!
spdiag consists of the following commands:
i2ctest - run a single pass of the i2c test
envtest - run a single pass of the environmental test
sptest - run a single pass of the SP diag tests
setdiagopt - set diag test options used by 'rundiag'
rundiag - start diagnostics in the background
stopdiag - stop any running background diagnostic tests
showdiagstatus - show the status of background tests
resetdiagstatus - reset the diagnostic status Servicemode: spdiag suite
Page112
ALOM4v (cont...)

diagnostics run environment variables:


diag_trigger: when POST runs. Valid triggers: none, power-on-reset, user-reset, error-reset, all-resets
diag_verbosity: verbosity level of POST, one of: none, min, normal, max, or debug
diag_level: level of testing performed, one of: none, min, or max.
diag_mode: POST mode, one of: off, normal, service, or menu
sys_autorunonerror: Controls if the system should continue boot if POST finds an error. Set to true or false.

Forgotten password ALOM4v : Niagra Ontario, Erie


1.Connect to the ALOM serial port
2. Power cycle the server by unplugging both PSU cords and re-plugging
3. Hit "esc", the Escape key, during ALOM boot at the point: Return to Boot Monitor for Handshake
4.After hitting "esc", the ALOM boot escape menu will be printed:
ALOM <ESC> Menu
e - Erase ALOM NVRAM. m - Run POST Menu.
R - Reset ALOM. r - Return to bootmon. Your selection:
Enter "e" to erase the ALOM NVRAM and then 'r' to resume ALOM boot. ALOM will now boot and reset
all NVRAM settings. You will automatically be logged on as user 'admin' with no password and
no permissions, and all ALOM NVRAM settings will be reset to the factory defaults.

Solaris to Linux cross-reference: ( http://www.unixporting.com/quickguide.html and Linux overview for Solaris users
817-3341-10)

Solaris Linux Description

System Administration Tools


/usr/bin/admintool /bin/linuxconf system administration tasks
/usr/sbin/useradd /usr/sbin/useradd adds a new user
Kernel Configuration
/etc/system /usr/src/linux
Processes
/usr/bin/ps -ef /bin/ps -ef active processes
/bin/truss /usr/bin/strace trace of the system
/usr/ucb/users /usr/bin/users users currently on the system
/usr/ucb/ps -aux /bin/ps -aux active processes sorted by %cpu
/usr/bin/prstat /usr/bin/top active processes, reports statistics
Physical Memory
/usr/sbin/dmesg | grep mem grep MemTotal /proc/meminfo memory size
Hardware Status/Information
/usr/bin/dmesg /bin/dmesg system buffer diagnostic messages
/usr/bin/arch -k /bin/uname -m application architecture of host system
Host ID
/usr/bin/hostid /usr/bin/hostid lists host id
Hostname
/usr/bin/hostname /bin/hostname lists hostname
/usr/bin/uname -a /bin/uname -a lists hostname
Swap
/usr/sbin/swap -a /sbin/swapon -a add swap space
/usr/sbin/swap -l /usr/bin/free lists swap info
vmstat vmstat virtual memory statistics
System Files
/etc/vfstab /etc/fstab filesystem default info
/etc/inet/hosts /etc/hosts network hosts file
Page 113
Solaris to Linux cross-reference: (cont...)

Solaris Linux Description

The X Window System


/usr/openwin/bin/xterm /usr/X11R6/bin/xterm terminal emulator for x windows
/usr/openwin/bin/xhost /usr/X11R6/bin/xhost allowed connections to the X server
Networking
/usr/sbin/showmount /sbin/showmount clients that remotely mounted a filesystem
/etc/dfs/dfstab /etc/exports sharing resources
/usr/sbin/route /sbin/route manipulate the routing tables
/usr/bin/netstat /bin/netstat show network status
/usr/sbin/ifconfig /sbin/ifconfig configure network interface parameters
/usr/sbin/snoop /usr/sbin/tcpdump displays network packets and their contents
Copies
/usr/bin/cpio /bin/cpio copy files
/usr/sbin/tar /sbin/tar copy files
Software
/usr/sbin/pkgadd /bin/rpm -i[U]vh add software pkg
/usr/sbin/pkginfo /bin/rpm -qa displays software pkg info
/usr/sbin/pkgrm /bin/rpm -e removes software pk
Disk Formatting
/usr/sbin/format /sbin/mke2fs creates partition
Disk Partitioning/info
/usr/sbin/format /sbin/fdisk creates partition
/usr/sbin/format /sbin/fdisk -l lists partition info
Disk Space and Information
/usr/sbin/df /bin/df displays mounted file systems
/usr/sbin/df -k /bin/df -k displays disk space of file systems
/usr/sbin/mount /bin/mount mounts a file system
/usr/bin/du /usr/bin/du displays disk usage
Log Files
/var/adm/messages /var/log/messages system Log file
Miscellaneous
/usr/ucb/whoami /usr/bin/whoami displays current user name
/usr/bin/fdformat /usr/bin/fdformat floppy disk format
/usr/bin/tip /usr/bin/minicom terminal connect thru serial port
/usr/bin/find /usr/bin/locate find a file
/usr/bin/who -r /sbin/runlevel displays current run level

SSH - Secure Shell :


SSH (Secure Shell/Secure socket shell) is a secure Unix command interface and protocol that enables the user to have
remote access to a device located on a network. SSH is built of three different utilities, slogin, ssh, and scp - these are all
secure versions of existing Unix ultilities, rlogin, rsh and rcp. All SSH commands and sessions are encrypted to enhance
security during a remote session. In most cases, if you have to connect via ssh to a server, ICMP (ping) will be disabled.
In other words you will not be able to ping the server.

Commands for ssh users:


ssh hostname connect to hostname using ssh ex: # ssh - l root 129.148.173.230
slogin hostname you can use ssh and slogin interchangeably
ssh hostname command run command remotely on hostname
ssh -v hostname connect in verbose mode for debugging
ssh -V determine version number for your copy of ssh
Page114
Commands for ssh users: (cont.)

ssh-keygen generate a new public/private key pair


ssh-keygen -c myuserid-ssh2@pha generate new key pair with identifying comment
sftp hostname copy files interactively between hosts (requires SSH2). Commands for an sftp session are similar to
standard ftp.
scp filename hostB:filename copy file from current computer to hostB
scp1 filename hostB:filename copy file from current computer to hostB (use if hostB only supports SSH1)
scp hostA:filename hostB:filename copy file between two computers
scp -r hostA:dirname1 hostB:dirname2 copy directory (and its contents) between two computers
scp hostA:fn1 hostB:fn2 copy and rename file between two computers
scp fn1 fn2 fn3 hostB:directoryname copy multiple files into hostB's directory
ssh-agent command run command (usually a shell) under control of ssh-agent
ssh-add add local identity to list maintained in memory by ssh-agent
ssh-add filename add identity whose private key is stored in filename to list in memory
ssh-add -l list keys stored in memory
ssh-add -D delete all keys stored in memory
Commands for ssh maintainers
ssh-keygen -P /etc/ssh2/hostkey generate & store a new host key
SSH with SMS 1.5
smsinstall command will automatically harden your SC, smsupgrade will not.(Bug ID: 5079760)
to undo hardening: (pg50 SMS 1.5 Installation Manual: )
1. login to SC as superuser
2. Type at sc:# prompt: /opt/SUNWjass/bin/jass-execute -u
3. The system will prompt you with an `undo' menu
4. Select `run' number you want to undo
5. type q to exit
6. reboot system

To manually harden a SC with SMS 1.5: (note telnet, rlogin, ftp, vold will not work so make sure you
serial console access before you harden it) infodoc 83763
# /opt/SUNWjass/bin/jass-execute -q -d sunfire_15k_sc-secure.driver

Galaxy ILOM: (default login/password root/changeme)

ILOM (Integrated Lights Out Manager) (Motorola MPC8248 Service Processor):


Provides RKVMS functionality (Remote Keyboard, Video, Mouse and Storage. Default is not enabled for LAN.)
Provides ability to boot from virtual devices.
CLI through serial connection or SSH.
Environmental monitoring (voltage, fan speeds, temperatures, etc. and will send alert messages.)
Allows for LOM.
Embedded Web Server w/ SSL encryption. (connect to web GUI by: https://ipaddress)
Flash memory for built-in Linux OS.
Connects to all components via JTAG connection.
IPMI v2.0 command interface
SNMP v1, v2c and v3 interface.
CLI, Web GUI or ILOM Remote Console to manage.

To Power on:
To turn on main power mode (all components powered on), press and release the small Power button on the server
front panel. When main power is applied to the full server, the Power/OK LED next to the Power button lights and
remains lit. or
Page115
Galaxy ILOM: cont...
(Connect a serial cable from the RJ-45 Serial Mgt port on your ILOM SP to laptop)
-> start /SYS

To Power off: press and release the small Power button on the server front panel
or -> stop /SYS

Configuring the SP: (Serial Port default: 9600/8/1/none )


cd /SP/network
set /SP/network pendingipaddress=192.168.0.1
set /SP/network pendingipnetmask=255.255.255.0
set /SP/network pendingipgateway=192.168.0.10
set commitpending=true
show /SP/network

To start the serial console: (Connect a serial cable from the RJ-45 Serial Mgt port on your ILOM SP to laptop)
-> cd /SP/console
start `esc ( ` to return to SP
eeprom default is screen and keyboard. Use solaris eeprom command to
get serial console in solaris (ssh to host or see remote console below)
eeprom input-device=ttya
eeprom output-device=ttya
BIOS: You need to change the BIOS setting to have serial port control
after POST. (this will not override the eeprom setting in solaris)
to change setting:
F2 (ctl-E) on reset, Advanced, Remote access Configuration,
Redirect after POST [always]
(Some OSs may not work if set to always)
CLI
<verb><options><target><properties>
VERBS:
See Sun Fire X4100 and X4200 Servers System Management Guide for guidance on
CLI commands.
cd Navigate the object namespace.
create Set up an object in the namespace
delete Remove an object from the namespace.
exit Terminate a session to the CLI.
help Displays help information about commands and targets.
load Transfers a file from an indicated source to an indicated target.
reset Resets the state of the target.
set Sets target properties to the specified value.
show Displays information about targets and properties.
start Starts the target
stop Stops the target.
version Displays the version of service processor firmware running.
Options: short-cuts
-default n/a Causes the verb to perform only its default functions.
-destination n/a Specifies the location of a destination for data.
-display -d Shows the data the user wants to display.
-examine -x Examines the command but does not execute it.
-force -f Causes an immediate shutdown, instead of an orderly shutdown.
-help -h Displays help information.
-level -l Executes the command for the current target and all targets contained through the level specified.
-output -o Specifies the content and form of command output.
-resetstate n/a Resets the state of the target to its default.
-script n/a Skips warnings or prompts normally associated with the command.
-source n/a Indicates the location of a source image.
Page 116
Galaxy ILOM: cont...

Contents of /SYS and /SP


-> cd SYS
/SYS
- > show
/SYS
Targets: FIOBD FT0 FT1 MB PDB PS0 PS1 SASBP
Properties:
ACT = standby_blink
FAN_FAULT = off
LOCATE = off
POWERSTATE = off
PSU_FAULT = off
SERVICE = off
TEMP_FAULT = off
Commands: cd reset show start stop

-> cd ../SP
-> show
//SP
Targets:
alert cli clients clock console logs network serial
services sessions users

Properties:

Commands: cd reset show version

Web Gui allows you to: (To log on, use https://ipaddress)
redirect graphical console to remote host.
connect a virtual floppy or CD-ROM drive.
monitor and manage fans remotely.
monitor BIOS messages, OS messages and system status remotely.
interrogate NICs for MAC remotely.
Power on, off and reset remotely

Remote Console (RKVMS): (requires Java 5.0 or higher)


You can use Remote Console to get remote console, keyboard, mouse access to the server and to install s/w from local
CD drive. Open a browser https://SP_ipaddress
From Remote Console, choose Redirection->Start Redirection->Devices->Mouse/Keyboard/CD-ROM

USERs:
Can't delete the following accounts:root/anonymous/ldapproxy
Can create an additional 7 accounts.

Send break: When logged into the SP using ssh with a console session running,: ESC + Shift-b

Page 117
Revision History:

First release 01/17/00


Corrections:
02/14/00 page 30 punzip to gunzip
06/21/00 page 19 d = on bd soc+ (was in wrong place)
Additions:
02/28/00 page 39 Uncompressing files
03/14/00 page 40 - 43 T300
03/27/00 page 28 * #TERM=vt100; export TERM
03/27/00 page 44-45 ACT
05/18/00 page 46-48 Advantages of Splitting a Drive into Multiple
File Systems
05/19/00 page 48-49 How to configure a system to run on a network
05/19/00 page 49-51 SEVM - How to recover a primary boot disk.
05/23/00 page 51 Disable DMP
06/16/00 page 52 Memory Scrubbing
07/20/00 Page 13 metastat command added to Disk Suite sec.
07/20/00 Page 16 raidutil commands
07/21/00 Page 52 Display remote App GUI locally
08/26/00 Page 53 Cluster 2.x
10/13/00 Page 41 T300 Pgroup secondary disk addressing failover note
10/16/00 Page 31 mpstat command added
10/17/00 Page 40 T300 Pgroup, 2 fiber path data transfer usage note
11/09/00 Page 21 isainfo - v command added
11/09/00 Page 42 T300 tftp boot (examples added)
11/09/00 Page 56 Encapsulating root after using Environmental CD to
load O/S:
11/20/00 Page 42 Warning added (:/: sys blocksize (n)k should be
set to correct value before 'vol add')
11/20/00 Page 56 Adding a second network interface (without boot)
11/20/00 Page 56 Adding a default gateway
11/27/00 Page 53 OPS general description
12/12/00 Page 57 Volume Manager
12/28/00 Page 60 FTPing to and from sunsolve
02/06/01 Page 56 /etc/name_to_major (cluster warning added)
02/06/01 Page 56 /etc/defaultrouter added
04/24/01 Page 56 /etc/notrouter (warning added to 2nd interface)
04/24/01 Page 61 Serengeti added
04/24/01 Page 30 info on new explorer added
04/24/01 Page 67 Mounting CD without vold
05/20/01 Page 66 Notes added
06/20/01 Page 16 Update A3500 info and rm6 commands
06/20/01 Page 42 modify Enable/Disable command descriptions
06/20/01 Page 43 modify disk and lpc download descriptions
06/20/01 Page 62 modified repeater bd info (removed 3800 4800
warning on dual partitions)
06/20/01 Page 67 mailx: send messages/files
06/20/01 Page 59 take -g out of vxdg import and export example
06/20/01 Page 60 no longer able to create directories on ftp sunsolve
06/20/01 Page 43 Warning added on controller firmware upgrade
06/20/01 Page 61 * when available (added)
07/23/01 Page 28 VTS description change (removing "on-line")
11/06/01 Page 67 T3 forgotten password
11/06/01 Page 67 T3 logging
11/06/01 Page 21 -k added to netstat command
11/06/01 Page 33 dd, added a disk to disk quick copy example
11/08/01 Page 1 note addad 'or disk@n for PCI '
12/03/01 Pages 68 -73 StarCat 15k notes
12/12/01 Page 73 local-mac-address
12/12/01 Page 73 SDS- How to mirror root
01/28/02 Page 16 raidutil command switches fixed ( -B and -R)
02/21/02 Page 68 added fin I0771-1 information
02/21/02 Page 68 added SC console port pinout
05/08/02 Page 55 # scconf -N (to change a node ethernet address )
05/08/02 Page 8 (7-127) added (E10K hpost levels)
05/08/02 Page 73 smsconfig -m (added IPMP info)
05/08/02 Page 68,69 smsconfig -m info added
05/08/02 Page 75 IPMP
05/20/02 Page 11 Added SSP3.4 information
06/06/02 Page 76 T3B or T3+ Firmware Rev 2.1 New Functions:
06/10/02 Page 77 Hitachi StorEdge 99X0 Arrays:
06/13/02 Page 78 SunFire forgotten password
06/17/02 Page 64 Sun Fire setfailover, showfailover cmds added
07/16/02 Page 60 Updated 'ftp to sunsolve' with rftp
07/17/02 Page 80 StorEdge Network FC Switch
07/25/02 Page 80 Added to FC switch info
07/29/02 page I added: http://webhome.east/boston/ to disclaimer
08/09/02 Page 31 added 'top' command
10/01/02 page 81 9900v notes added
10/07/02 Page 84 Minnow info added
10/25/02 Page 72 flashupdate-f opt/SUNWSMS/hostobjs/sgcpu.flash
10/28/02 page 86 Tuning ecache scrubber scan rate
10/30/02 Page 86 VxWorks commands (serengeti)
11/04/02 Page 11 Add syntex for share cdrom for VTS
11/08/02 Page 87 LVD adapter information (ultra scsi-3 375-3057)
11/12/02 Page 54 ccdadm command added for ccd.database recovery
11/12/02 Page 87 changed step sequence for booting image
11/20/02 Page 65 added (-x) to domain reset command
11/20/02 Page 10 redlist definition added
11/27/02 Page 87 Replaceing a nordica bd in a 15K SC
12/04/02 Page 66 remove firmware bugs add firmware matrix
12/04/02 Page 87 Add serengetti DR commands
12/06/02 Page 85 Added to Minnow info
12/11/02 Page 66 add to firmware matrix
01/14/03 Page 66 modified logging information
01/21/03 Page 64 added `service' and `testinterconnect' commands
02/03/03 Page 88 Clean up non-root disk “controler” numbers
02/10/03 Page 88 Set network parameters at boot:
03/04/03 Page 80 Useful SAN commands
03/04/03 Page 79 Default Storage switch passwords
03/05/03 Page 66 Modified firmware matrix (5.14.4)
03/17/03 Page 59 added /opt/VRTS/bin/vea
03/17/03 Page 88 Starcat Portid cheat sheet
03/17/03 page 62 6800 partition info added
03/21/03 Page 90 StorADE info added
04/11/03 Page 89 Starcat SC: clean the slate
04/11/03 Page 89 Starcat redx info
04/11/03 Page 75 rm + after depreciated under /etc/hostname.qfe0 :
04/18/03 Page 90 get FRU info from serengetti
05/15/03 Page 91 SWAP
06/02/03 Page 92 Maserati Notes- StorEdge 6320 and 6120
06/19/03 Page 11 removed 'slot' from sbus numbering formula
07/08/03 Page 93 Flash Archive interactive install
07/09/03 Page 94 UltraSPARC III CPU Diagnostic Monitor (CDM)
07/10/03 Page 89 add lines to Starcat SC: clean the slate.
07/10/03 Page 94 SunFire Service Mode Password Generator
07/14/03 Page 94 added : To removeCDM
07/21/03 Page 94 V440 ALOM, raidctl
09/03/03 Page 60 update ftp info
10/24/03 Page 94 added setchs -s command to service mode
11/06/03 Page 28 added navigation keys to sunvts
11/06/03 Page 95 Finding Solaris release and distribution loaded
01/20/04 Page 96 Network troubleshooting command, files, daemons
01/26/04 Page 42, 76 volslice note added
01/28/04 Page 72 Added SMS1.4 commands
02/10/04 Page 97 How to find your way around a B1600...
02/12/04 Page 97 added default login and console info
03/01/04 Page 87 added # cfgadm -val | grep permanent
03/09/04 Page 64 updated platform commands 5.16.0
03/09/04 Page 66 updated firmware matrix
04/06/04 Page 72 add SB1 to flashupdate command
04/06/04 Page 87 add 15k to dr command
04/27/04 Page 103 Cluster 3.x
05/31/04 Page 96 added to fileinfo /etc/dhcp.interface
06/11/04 Page 106 added smsupgrade 1.4.1 info
06/28/04 Page 93 flasharch info add (use same release ex: sol9 40/04)
07/07/04 Page 60 suncore password change
07/27/04 Page 106 Solaris 9 SVM (sds) disk replacement
08/19/04 Page 108 SC rebuild after total disk failure
08/27/04 Page 73 simplified sds mirror procedure
08/27/04 Page 106 Made SVM replacement more universal
09/16/04 Page 60 added password url to ftp sunsolve info
09/23/04 Page 109 15K DR / hpost examples
10/11/04 Page110 smsbackup: manually check a backup file
11/11/04 Page110 3310/3510 Disk replacement:
12/08/04 Page 64 3800-6800 navigation (ssh) #. added
02/03/05 Page 64 added setchs showchs cmds
02/03/05 Page 110 How to mount a CD image file (.iso) as a filesystem
02/03/05 Page 31 Added iostat (disk thruput test)
04/19/05 Page111 Removing the top cover on a v20z
04/26/05 Page111 Explorer -w scextended with cron
08/03/05 Page111 Useful COD commands:
08/23/05 Page111 ALOM4v Ontaeri/Erie(Niagra)
08/23/05 Page113 Forgotten password (ALOM4v)
09/13/05 Page 68 add details to 15k serial pinout
09/26/05 Page 113 Solaris to Linux cross reference
10/18/05 Page114 SSH information
10/18/05 Page115 Galaxy ILOM info
10/28/05 Page 96 kstat -p, netstat -k added
10/28/05 Page 95 Find local NIS servers
12/08/05 Page109 Made 15k dr clearer (cfgadm -val)
01/09/06 Page93 added -S to flarcreate example for faster archive
01/17/06 Page 65 updated “remote logging”
03/22/06 page 79 updated serengeti password reset
03/28/06 Page 111 added niagra flashupdate example
04/04/06 Page116 added x4100 console information
07/24/06 Page115 SSH with SMS 1.5

You might also like