Professional Documents
Culture Documents
Technical
Technical
What is fastbooting?
How to prevent a server from booting automatically?
What is a LOM? What is the key sequence to switch
between console and LOM?
What is the shutdown command in Solaris?
What are the reboot commands in Solaris?
$4
$?
$#
$*
$0
$@
A=”this.is.a.string”; echo ${A%%.*}
A user can’t login to a Solaris server. Talk through the
troubleshooting steps.
What is psio?
History of BSD?
signals in solaris
> /dev/null
Tells the kernel to run the script with /bin/sh
253 is not contiguous. That is it has a hole in it.
multicast
SIGTERM, TERM, 15, or -15
To discover the current runlevel use “who –r”.
Highest 7
Lowest 0 on old, narrow scsi, 8 on wide scsi
init 0” will bring the server down from the current runlevel to the eeprom level.
“init 5” will bring the server down from the current runlevel to eeprom and power-off the hardware.
Identifies the services that are started by inetd as well as the manner in which they are started
It contains logical device names which are symb links to device files in /device
/kernel contains platform-independent kernel modules whereas /platform contains platform-dependent kernel
modules
/dev/fd
/var/run
/etc/syslog.conf
elonxapdcsu1-508 # cat .bash_logout
# ~/.bash_logout
clear
Whatever commands are mentioned in this file, will be executed when exiting.
username:password:lastchg:min:max:warn:inactive:expire:flag
Passwd: a 13-character encrypted user password; the string *LK*, which indicates an inaccessible account; or
the string NP, which indicates no password for the account.
Lastchg: Indicates the number of days between January 1, 1970, and the last password modification date.
Min: Contains the minimum number of days required between password changes.
Max: Contains the maximum number of days the password is valid before the user is prompted to specify a
new password.
Inactive: Contains the number of days a user account can be inactive before being locked.
Expire: Contains the absolute date when the user account expires. Past this date, the user cannot log in to the
system.
Check shadow – if not logged for certain duration, the account might’ve got expired.
16TB
/etc/services
auto_remote_inf 5281/tcp # AutoSys INF Instance
/etc/inet/inetd.conf
auto_remote_app stream tcp nowait root /opt/autotree/autosys/bin/auto_remote auto_remote_app
Increase the number of pseudo ttys. Edit /etc/system and add set pt_cnt = <num>, halt and boot –
r. From Soalris 8 onwards, this number increases dynamically.
ok boot cdrom -s
During the boot, press Stop + N.
/proc is a memory image of each process; it’s a virtual file system that occupies no disk space. /proc
is used for programs such as ps and top and all the tools in /usr/proc/bin that can be used to
examine process state.
Set following in /etc/system: set maxuprc = <num>
This happens when a process has its file opened with a link count of zero (a file with open file
descriptor unlinked) and that file has been deleted. The ways to troubleshoot are:
1. Run lsof -a +L1 /var to find out the culprit
2. find /proc/* /fd -links 0 -type f -ls
3. find /proc/* /fd -links 0 -type f -size +2000 -ls
4. find /var -type f | xargs -h | sort -n | tail -n 5 > topfive.txt
Tmpfs takes on the permissions from underlying mount point. In order to fix /tmp, you need to boot
single user and change the permissions as below:
#chmod 1777 /tmp
#chown root:sys /tmp
Set ngroups_mx=32 (Max can be 32. Can cause problem with NFS bcoz it uses 16)
External-cache is a secondary cache designed as staging between the CPU’s primary cache (very
small, but lightening fast) and the main RAM.
Using Solaris shutdown command
Sending shutdown/poweroff command from LOM
Sending shutdown/poweroff command from On/Standby switch
Change “-T” in /etc/inittab to required <termtype>. –T sun or –T xterm
/etc/default/init (CMASK=value). Default is 022. This prevents daemons from creating 666 files.
Either by modifying /etc/nodename, /etc/hosts and related files
OR
by running /usr/sbin/sys-unconfig
/proc contains lots of files. This may cause the problem with some binaries. In such case, find /
without proc as below:
#find `ls / | egrep –v ‘(proc|any_nfs_mount)’` -name core
Sun hardware released after Solaris 8 no longer supports 32 bit booting. You can only run 64 bit
kernels on those. This applies to all Ultra-III systems as well as the Sun Blade 100 and other
UltraSPARC-IIe systems.
#ulimit -a
1. Freeware named “Patch Check Advanced (pca)”
2. Traffic Light Patch management (TLP) - Run explorer on the client which needs to be patched.
Send the output file to TLP server where a script is run to check for new patches. Once the new
patches are identified, the script creates the script. Move that file back to client and apply them
using script.
3. Solaris patch manager
4. If you have a software service agreement with Sun, you can use Sun’s “SunSolve ONLINE” service
to obtain patches.
5. Sun recommended patches can be obtained from sun via anonymous ftp to sunsolve1.sun.com.
Sun bootprom expects 512 block first sector. When 3rd party CDROM use 1024 or 2048 byte sectors,
it causes the SCSI driver to see a data overrun. This could be amended by setting jumper, cutting a
trace, or using a software command.
#/etc/init.d/volmgt stop/start
If a process is holding open a file, and that file is removed, the space belonging to the file is not
freed until the process either exits or closes the file. This space is counted by df but not by du. It
happens in /var/log or /var/adm where syslog holds open a file.
By adding the soft limit and hard limit entries in /etc/system
elonsapcore2# ls -l /etc/rc3.d
total 86
-rwxr--r-- 6 root sys 2124 Apr 6 2002 S13kdc.master
-rwxr--r-- 6 root sys 2769 Apr 6 2002 S15nfs.server
-rwxr--r-- 6 root sys 621 Apr 6 2002 S34dhcp
Netcon is opened from the SSP and can read and write to the host console. Multiple simultaneous
consoles may be opened but only one can have write perms.
The firmware FORTH programming language used to control hardware diagnostics, booting, etc.
To run Sun hardware diagnostics, perform the following at the ok> prompt:
ok> setenv auto-boot? false
ok> setenv diag-switch? true
ok> setenv diag-level max
ok> setenv diag-device disk net (if appropriate)
ok> reset
(watch results of diagnostic tests)
If devices appear to be missing, you can also run the following tests:
ok> probe-scsi-all
ok> probe-sbus
ok> show-sbus
ok> show-disks
ok> show-tapes
ok> show-nets
ok> show-devs
In addition, the following commands can be used to examine the CPUs or switch to another CPU:
ok> module-info
ok> processor_number switch-cpu
{ok} devalias
Confirm NVRAMRC is enabled:
{ok} printenv use-nvramrc?
Edit the contents of nvramrc:
{ok} nvedit
Add the devalias alias:
0: devalias mlboot /sbus/whatever/8000,0f@blah:0,0
^C
Save the contents:
{ok} nvstore
{ok} reset
/platform/`arch -k`/ufsboot
Red Light:
sr -s higher than 200.
Or sar -r
prstat -u root
Command top -icmt does the best. -u can be used to monitor processes belonging to specific user.
#vmstat 5 5
Important fields under Procs and CPU are:
r - in run queue
b - blocked for resources
w - swapped
us - percent user time
sy - percent system time
id - percent idle time
Red Light:
auxwww shows %cpu and %memory used whereas elf shows tty and Parent PID.
#iostat –xnmpz (shows activities for disks)
Red Light
OpenSSH is a FREE version of the SSH connectivity tools. It encrypts all traffic to effectively
eliminate eavesdropping, connection hijacking, and other attacks. RSA is used by 1.3 and 1.5. DSA
is used by 2.0.
RSA key in $HOME/.ssh/identity (private) & $HOME/.ssh/identity.pub (public)
DSA key in $HOME/.ssh/id_dsa (private) & $HOME/.ssh/id_dsa.pub (public)
ssh –v –v –v –v hostname
You need to set "PermitRootLogin" to "yes" in /etc/ssh/sshd_config.
Copy either $HOME/.ssh/identity.pub to $HOME/.ssh/authorized_keys OR
$HOME/.ssh/id_dsa.pub to $HOME/.ssh/authorized_keys2 on remote machine.
Copy RSA or DSA public keys from local box to authorized_keys or authorized_keys2 on remote
box. When connected from local, remote encrypts a random number using public key copied over
and send to local to decrypt. Local sys decrypts it using private key (identity or id_dsa) and send
the number to remote sys. This grants the access.
sh/ksh/bash: TERM=vt100; export TERM
CSH: setenv TERM vt100
123
Each NTP node has a stratum. Stratum is an integer between 0 and 16, inclusively; stratum 0 means
a physical clock, never a computer. Examples of physical clocks include:
Stratum 16 is reserved for devices that are not synchronized. The stratum of any NTP-synchronized
device is the stratum of the device it is synchronized to, plus 1. Thus:
ntpq –p
A driftfile /etc/ntp.driftfile will be used to store the clock drift. It contains the latest estimate of clock
frequency error. This will enable faster synchronization on restart of the xntpd daemon. Many boxes
clocks do drift along on their own, a check every hour or day is generally a good idea. It contains
something like
0.0
OR
24.305
Because of latency in traffic between master and clients on network, because of CPU execution
delay, and other variables
One may try to bring the time forward whereas other wants to bring it backward. This causes split
brain. Let NTP do it. Stop hardware time management by adding following to /etc/system file: set
dosynctodr=0
901
Edit /etc/inet/services file and
Insert
netbios-ns 137/udp #samba nmbd
netbios-ssn 139/tcp #samba smbd
After
sunrpc 111/tcp #rpcbind
------------
Insert
swat 901/tcp #swat
After
ldaps 636/udp #LDAP
nmbd - name registration and resolution requests. Used for network browsing, it should be started
first.
Smbd - handles all TCP/IP based connection servers for file and print operations. It manages
authentication. Should start after nmbd.
Winbindd - starts when samba is a member of ADS domain. it is also needed when samba has trust
relationships with another domain.
It samba is not running as WINS server, there will be one single instance of nmbd running. If it is
running as WINS, there will be 2 instances of nmbd. One of them handles WINS request and second
requests name server message daemon. smbd hadles all connection requests. It spawns a new
process for each client connection made. winbindd will run as one or 2 daemons.
List the shares on a foreign host: #smbclient -L <hostname> -U%
To mount samba mount: #smbmount //hostname/public /mnt/samba
To change passwd for smb user: #smbpasswd -a local_user
It is /etc/samba/smb.conf (or /usr/local/samba/lib/smb.conf). You can locate it using #smbd -b |
grep smb.conf. To test is, use #testparm /etc/samba/smb.conf.
Check the share using smbclient. Also, check the log file /var/log/smb/samba.%m.
There is no configuration required on windows client from unix server. Just start|run the share.
There is no configuration on unix client from nt/2k/2k3 servers. However, share is mounted
differently.
CLI: smbmount //<windows machine name>/<shared folder> /<mountpoint> -o
username=<user>,password=<pass>,uid=1000,umask=000
/etc/fstab:
//<windows machine name>/<shared folder> /<mountpoint> smbfs
auto,username=<user>,password=<pass>,uid=1000,umask=000,user 0 0
Create a separate password file for Samba based on your existing /etc/passwd file:
#cat /etc/passwd | /usr/bin/mksmbpasswd.sh > /etc/samba/smbpasswd
The script does not copy user passwords to the new file. To set each Samba user's password, use
the command smbpasswd username. A Samba user account will not be active until a Samba
password is set for it.
Enable encrypted passwords in smb.conf. Verify that the following lines are not commented out:
encrypt password = yes
smb passwd file = /etc/samba/smbpasswd
Common Internet File system is enhancement of SMB protocol for sharing data across platform
gunzip can uncompress both .z and .gz whereas uncompress can only uncompress .z files.
On boot the OS checks for the existence of the file /etc/hostname.interface, which contains the
hostname. This hostname is compared with /etc/hosts to lookup the IP address. This IP is matched
against /etc/netmasks to work out the netmask. The interface card is plumbed, the IP assigned and
the netmask set. The interface is brought up onto the network.
One way of achieving this is:
# ifconfig hme1 plumb (if not currently plumbed in)
# ifconfig hme1 [inet] 192.10.10.10 netmask 255.255.255.0 up
Solaris allows up to 256 IP addresses to be assigned against one physical network interface card.
This is achieved using virtual (software) NICs. A virtual NIC is denoted by interface:[0-255], e.g.
hme0:0.
One way of achieving this is:
# ifconfig hme1:1 [inet] 192.10.10.20 netmask {255.255.255.0|0xffffffff} up
State indicates whether the interface has made a connection with the switch to State indicates
whether the interface has made a connection with the switch to which it is patched.
Speed indicates bit rate at which the interface communicates, usually 10 or 100Mbit/sec.
Duplex indicates whether the interface is synchronous (full duplex) or asynchronous (half duplex),
i.e. whether the interface can send and receive packets at the same time.
kstat bge:interface number | grep parameters (eg kstat bge:1 | grep ifspeed)
link_duplex
1 (half)
2 (full)
ifspeed
10000000 - 10 mbps
100000000 - 100 mbps
1000000000 - 1000 mbps
kstat –m ce –i 1
link_duplex = 1 (half), 2 (full)
link_speed = 10, 100, 1000
link_speed = 0 (10), 1 (100), 1000 (1000)
link_mode = 0 (half), 1 (full), * (None)
# ndd –set /dev/hme instance 1
# ndd -get /dev/hme link_status
# ndd -get /dev/hme link_speed
# ndd -get /dev/hme link_mode
To force the above settings at boot time, you could either make an rc.d script to call the above
commands for each interface individually, or can all types of interface en-mass in /etc/system.
You can configure boot services using the add_install_client script. The add_install_client script
allows you to specify all of the information required in the files that support boot services. This script
also creates the required files in the /tftpboot directory and appropriately modifies the inetd service
configuration to support tftp requests.
JumpStart clients require support from a server to automatically get the answers to system
identification questions that the client systems issue. The identification service is often provided by a
boot server, but the service can be provided by any network server configured to provide
identification.
The information can be provided either by NIS/LDAP or sysidcfg file or combination of both. sysidcfg
file superseeds everything. it must be edited manually.
JumpStart clients require support from a server to obtain answers for system configuration questions
that they issue. A system that provides this service is called a configuration server.
A configuration server provides information that specifies how the Solaris Operating System
installation proceeds on the JumpStart client. Configuration information can include:
- Installation type
- System type
- Disk partitioning and file system specifications
- Configuration cluster selection
- Software package additions or deletions
On the configuration server, files known as profile files store the configuration information. A file
called rules.ok on the configuration server allows JumpStart clients to select an appropriate profile
file.
rules file - it associates a group of clients with specific installation profiles. The groups are identified
using predefined keywords that include hostname, arch, domainname, memsize, model. Client
selects a profile by matching their own characteristics with an entry in rules file.
profiles file - it specifies how the installation is to proceed and what software is to be installed. A
separate profile file may exist for each group of clients.
check script - this script is to run after creating rules and profile file. it verifies the syntax and
creates rules.ok file.
rules.ok file - jumpstart program reads this file during automatic installation (rules file is not read)
begin and finish scripts - to carry out post and preinstallation
JumpStart clients require support from a server to find an image of the Solaris OS to install. A
system that provides this service is called an install server. An install server shares a Solaris OS
image from a CD-ROM, DVD, or local disk. JumpStart clients use the NFS service to mount the
installation image during the installation process.
The image could be served from a CD/DVD or a spooled image or flash archive. A spooled image will
be the one which is spooled on the server from the CD using setup_install_server and
add_to_install_server script. setup_install_server -b will spool only the boot image on a boot server.
Boot server will then direct the client to separate install server for the installation image.
Flash archive is an archive/image created from master server which is then distributed to hosts
using jumpstart for cloning purpose.
1. Connect new host to the network and run #boot net –install.
2. Using ARP/RARP, host gets IP address from boot server which is running in.rarpd daemon. Boot
server checks /etc/ethers for hostname matching MAC address and then checks /etc/hosts for IP
address matching hostname.
3. Host gets bootimage from boot server using tftp request (sent by OBP). Boot server holds boot
image in /tftpboot directory.
4. After getting boot image, client requests identification, software and configuration information
from boot server. Boot server has this information stored in /etc/bootparams and the daemon
running is rpc.bootparamd.
5. After mounting the root file system, client connects to configuration server (known from
/etc/bootparams file), carries out the installation and configuration. Configuration server holds the
necessary information for the client to identify itself (sysidtool) and run a proper installation
(suninstall).
This file can not have other names. A generic sysidcfg for many clients can reside in /export/config
dir. But a client specific sysidcfg should reside in /export/config/hostname dir. This location can be
passed on to client via bootparams file.
Use DHCP for both or use DHCP for x86 and /etc/ethers for SPARC
/cdrom/0/s0/Solaris_2.8/Tools/setup_install_server (copy cdrom contents into install directory)
/cdrom/0/s0/Solaris_2.8/Tools/setup_install_server –b (installs software for booting the client)
/export/install/Solaris_2.8/Tools/add_install_client (to add the client and its related information
such as MAC, jumpstart dir path, sysidcfg path etc)
/usr/sbin/flarcreate - to create flare archive
/usr/sbin/flar - archive command to extract information from archieve
ARP/RARP can’t cross the subnet. Check boot server is in the same subnet as client. Check
/etc/ethers and /etc/hosts on boot server.
Naming services provides a managed hostname/IP lookup service, e.g. DNS, NFS.
Information service provides the above and other items, such as username/password, homedir
locations, phone directories, e.g. NIS, NIS+, LDAP, DCE.
A NIS master manages and distributes the maps for a given domain. The principle copy of a NIS
maps are held on the master.
Enter NIS server information in /etc/hosts
A NIS
Set slave name
domain receives copies of the nisdomain
# domainname maps from the master and provides the information service.
Start yp client # ypinit –c OR /usr/lib/netsvc/yp/ypbind -broadcast
A
ORNIS client uses the information provided by the master or slave, rather than having to keep local
copies of the data.
Enter NIS server information in /etc/hosts.
Set the domainname
Edit /var/yp/binding/`domainname`/ypservers file
Reboot (or /etc/init.d/rpc start)
ypbind (to itself, usually)
ypserv
ypxfrd
rpc.yppasswd, rpc.ypupdated
/etc/rc2.d/S71rpc
ps -ef | grep ypserv
ypserv, ypbind
NIS usually works on broadcast way hence NIS server ought to be in the same subnet. However, if it
is in different subnet, then initialize the client with -c flag (ypinit -c) or set using ypsetme.
rpc.yppasswd daemon is probably running, but not pointing to the directory containg NIS maps. By
default it looks in /var/yp. If maps are in /var/yp/maps, start rpc.yppasswd as below:
/usr/lib/netsvc/yp/rpc.yppasswd -D /var/yp/maps
Master looks at ypservers map.
The addition of files before compat is accepted in nsswitch.conf but should not be necessary on a
"neat" server. "compat" makes /etc/passwd to be read but the entries in /etc/passwd plays a major
role in resolving the name. the lines are checked in the order in which they are encountered. So, if
the DB token (eg @<netgroupnam>) that refers to NIS-netgroup-style entries are found BEFORE a
line containing the local "files" configuration, they will be checked before those lines later in the file.
Adding "files" before "compat" forces the /etc/passwd file to be read first as a plain file (non-nis-
style) before compat reads it again in teh nis-compatible manner.
It is used in place of ypbind. It makes ypbind talk to ypserv. Use ypset if the network doesn't
support broadcasting, supports broadcasting but does not have an NIS server, or accesses a map
that exist only on a particular NIS server.
An alternative to using ypset is to use /var/yp/bindin/domainname/ypservers file. this file contains a
list of NIS servers to attempt to bind to, one server per line. If ypbind can't bind to any of the
servers from this file, it will attempt to use the server specified by ypset. if that fails, it will broadacst
on the subnet for a NIS server.
111
Perhaps because slaves don't have initial maps. In this case, first make the maps on master without
pushing it. #cd /var/yp; #make -DNOPUSH mapname.byname mapname.bynumber. Copy over the
maps to slaves. Next time when you run make, it should push the maps.
Carete /var/yp/securenets. Ypxfr will respond to hosts that are listed in this file.
Check /var/yp/ypxfr.log. Touch it if it doesn’t exist.
NIS+ clients do not hard bind to NIS+ servers (as in NIS). Clients have a list of NIS+ servers within
the cold-start file. When they need to do a lookup, they do a type of broadcast called a manycast
and talk to the first server that responds.
You can’t ypcat on netgroup. You can only ypmatch.
Name Service Caching Daemon. Can contain misinformation which hinders troubleshooting.
DNS daemon is named. Package name contains bind. Main file is /etc/named.conf which specifies
zone directories - /var/named, name servers, zone names, IP addresses of hosts etc. Zone section
specifies masetr, slave and stub, allow-update, allow-transfer etc.
Zone files contain forward/reverse look up, different kind of records such as SOA, NULL, RP, PTR, A,
NS, MX, CNAME
It means that the name service should bt authoritative. If it’s up and it says such a name doesn’t
exist, believe it and return instead of continuing to hunt for an answer.
Network File System a methodology allowed machine to manipulate files held on a remote server as
if they were local. NFS2/3 were designed by Sun. NFS4 was drafted by Sun but given to IETF later
on to make it industry standard. There is no NFS1.
An NFS server exports/shares directories to a subset of hosts on the network.
An NFS client mounts these shares onto a mountpoint, and offers the filesystem like any other
(assuming correct authentication, permissioning, etc.)
While NFS3 was an upgrade to NFS2, NFS4 is a complete rewrite of protocol. NFS2/3 are stateless,
NFS4 is stateful.
NFS version 3 (NFSv3) has more features, including variable size file handling and better error
reporting, but is not fully compatible with NFSv2 clients.
NFS version 4 (NFSv4) includes Kerberos security, works through firewalls and on the Internet, no
longer requires portmapper, supports ACLs, and utilizes stateful operations.
No. NFS doesn’t transmit size of underlying file systems. There might be trouble with du and df but
normal filesystem size is just fine.
Major number – which device driver should be used to access a particular device
Minor number – a number serving as a flag to device driver
For example, there would be a different major number for hard drives and serial terminal. All IDE HD
will have same major number (indicating same device driver). Each partition on each HD will have
different minor number.
Since NFS is cross-platform protocol, it needs a way to uniquely identify files. Typically, this is done
using NFS file handles. It is made by combining the following:
• Major number of the block device holding the file system
• Minor number of the block device holding the file system
• Inode number of the file on the file system
By combining these numbers, the server can assign a value that uniquely identifies a file.
On NFS cluster, major/minor numbers of file system may not match on two machines. This may
cause in having a stale file handles. In such case override the use of major/minor numbers by the
use of fsid= export option on the server. This assumes that all cluster nodes have a consistent file
system ID.
/var/share/icons *(async,rw,fsid=X) where X is any 32 bit number that can be used but must be
unique amongst all the exported file systems.
The automounter is a daemon process able to mount/unmount NFS shares without user intervention.
Once properly configured, it greatly reduces administrative overhead by removing the need for a
root user to run the commands. It also reduces the risk of NFS issues (e.g. hangs) because the NFS
filesystems are only mount when necessary and are unmounted shortly after they have not been
used for a little while (default is 5 minutes).
The auto_master map is looked-up in /etc/nsswitch.conf, usually “files nis”. This would look to the
/etc/auto_master first.
A direct map explicitly states the directory on which the NFS filesystem is to be mounted. It explicitly
indicates the NFS share to be mounted. Think of it as mounting a known directory on a known
directory. An advantage is that direct maps are uncomplicated and quick.
An indirect map can only imply the mountpoint and the NFS sharename. Think of is as mounted an
unknown directory into a directory, e.g. mount server1:/export/home/implicit-username on
client1:/home/implicit-username. An advantage is not having to explicitly list all possible actual
mount points (useful for homedirs) and not necessary to restart (or signal) automountd when a new
implied share is created on the NFS server.
It can run in 2 modes: multiple daemons and single daemon. MD has separate daemon for different
sites. This is used when each site's pages/files are to be kept separate from each other and you've
enough resources. Separate https installation for each virtual host. SD has a single daemon for all
sites. This is used in rest of the conditions. Single https installtion.
DNS directs all names to single IP and apache identifies name in HTTP request header.
for VAR in value1 value2 value3 …
do
# statements here
done
A temporary space where process related pages are held while moving between the kernel and the
memory. It is used when system’s memory requirements exceed the size of available RAM. Default
page size is 8KB.
Solaris defines swap space as the sum of total physical memory not otherwise used and physical
swap slice/file. This means swap is not just the physical swap space.
swap –s shows size of virtual swap (physical swap slice + part of physical memory)
It is usually larger than physical memory because when the system crashes, it dumps all its memory
content to the swap space. If swap size is smaller than physical memory, then system will not be
able to dump the memory.
Tmpfs is a filesystem that takes memory from the available swap space (swap slice + part of RAM).
What it lists as size of swap is the sum of the space currently taken by the file system and the
available swap space unless the size is limited with the size=xxxx option in vfstab.
Solaris will "page out" VM pages of memory that haven't been accessed recently when more memory
is needed (Least Recently Used); that activity is called "paging".
Solaris will swap out entire processes when a critical low point in memory is reached, which is a less
efficient way to handle memory and is there only for memory emergencies. That is called
"swapping". Swapping is very unusual in Solaris and indicates a very severe memory shortage. For
swapping to occur, you must have either some idle processes, or a lot of processes.
# swap -d /dev/dsk/c1t0d0s3
# swap -d /export/data/swapfile
# rm /export/data/swapfile
Tmpfs file system is a FS that takes the memory from virtual memory pool. What it lists as size of
swap is the sum of the space currently taken by the FS and available swap space unless the size is
limited with the size=xxxx option. In other words, size of a tmpfs filesystem has nothing to do with
the size of swap; at most with the available swap.
Solaris defines swap as the sum total of phys memory not otherwise used and physical swap. This is
confusing to some who believe that swap is just the physical swap space.
The swap –l command will list the swap devices and files configured and how much of them is
already in use.
The swap –s command will list the size of virtual swap (Phys swap plus phys mem). On a system
with plenty of memory, swap –l will typically show little or no swap space use but swap –s will show
a lot of swap space used.
Before:
# uname -a
SunOS homer 5.10 SunOS_Development sun4u sparc SUNW,Ultra-5_10
Run the script:
#!/usr/sbin/dtrace -s
syscall::uname:entry
{
self->addr = arg0;
}
syscall::uname:return
{
copyoutstr("SunOS", self->addr, 257);
copyoutstr("PowerPC", self->addr+257, 257);
copyoutstr("5.5.1", self->addr+(257*2), 257);
copyoutstr("gate:1996-12-01", self->addr+(257*3), 257);
copyoutstr("PPC", self->addr+(257*4), 257);
}
After:
# uname -a
SunOS PowerPC 5.5.1 gate:1996-12-01 PPC sparc SUNW,Ultra-5_10
First way:
Do a "prtdiag -v". If you get something like :
PCI 8 A 0 66 66 1,0 ok SUNW,emlxs-pci10df,fc00/fp (fp) LP10000-S
Then the "S" at the end of the card model tells you that you have a SUN branded HBA.
Second way:
Install EMLXemlxu package and run /opt/EMLXemlxu/bin/emlxdrv. It lets you install Sun emlx driver
or lpfc driver.
Sun branded Emulex cards can only use Sun emlxs driver.
1. Verify which disk drive corresponds with which logical device name and physical device name. Listed
below is the table for the v440 disk devices:
2. Verify that a hardware disk mirror does not exist. If it does, see infodoc 73040.
#raidctl
No RAID volumes found.
5. Verify that the device has been removed from the device tree
#cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 CD-ROM connected configured unknown
c1 scsi-bus connected configured unknown
c1::dsk/c1t0d0 disk connected configured unknown
c1::dsk/c1t3d0 unavailable connected unconfigured unknown
c2 scsi-bus connected configured unknown
c2::dsk/c2t2d0 disk connected configured unknown
usb0/1 unknown empty unconfigured ok
*NOTE that c1t3d0 is now unavailable and unconfigured. The disks blue OK-to-Remve LED is lit.
*NOTE that the green activity LED flashes as the new disk at c1t3d0 is added to the device tree
9. Verify that the new disk drive is in the device tree
#cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 CD-ROM connected configured unknown
c1 scsi-bus connected configured unknown
c1::dsk/c1t0d0 disk connected configured unknown
c1::dsk/c1t3d0 disk connected configured unknown
c2 scsi-bus connected configured unknown
c2::dsk/c2t2d0 disk connected configured unknown
usb0/1 unknown empty unconfigured ok
Using lsof -i shows incorrect mapping of TCP ports to processes that have socket open as using port
65535. eg:
sshd 8005 root 8u IPv4 0x60007ebdac00t0 TCP *:65535 (LISTEN)
sendmail 1116 root 5u IPv4 0x60007ecce000t0 TCP *:65535 (LISTEN)
If you have a separate /var, this operation will happen after /var is unmounted and init complains:
INIT: failed write of utmpx entry:"s6"
INIT: failed write of utmpx entry:"rb"
You can safely ignore these messages
It is solaris ps with additional column I/O per process. It is a tool developed by Brendan Gregg at
http://www.brendangregg.com/psio.html.
TOD clock or battery might have gone bad. You have to replace the motherboard because it is
welded directly into the motherboard.
/usr/sbin/lpfc/dfc> nodeinfo - displays the target number and all FC devices on the network
Use tcpdump which has rotation of the output built in with the switch -s.
root@box# tcpdump -I <foo> -w something.pcap -C <number of megabytes> -s 0 <capture spec>
you can use kingston. But sun will not provide hardware support until u remove 3rd party ram. Also,
it will give problem if you run SunVTS on the machine.
Command iostat -En gives the serial number of the disk. From there you can locate the disk.
Because client didn’t send "FIN" call to close the connection and went down abruptly. On the server,
that connection will remain in ESTABLISHED condition until the service is restarted to send CLOSE
call manually.
It should contain network entry instead of subnet entry. Eg
172.31.215.0 255.255.254.0 (wrong)
172.31.0.0 255.255.254.0 (right)
Disadvantage is you cant mention 2 subnets from the same network. In such cases, use the scripts
in /etc/rc.d to manually set the ip and netmask.
less than 5% will force space optimization - overhead for the system. FS can either try to minimize
the time spent allocating blocks, or it can attempt to minimize the space fragmentation.
earlier Solaris versions had /usr/platform/`uname -i`/lib/libc_psr.so.1, but it is replaced with
/usr/sbin/ftpconfig in solaris 10. it creates an anonymous ftp user and sets up its environment.
Either edit /etc/snmp/conf/snmpd.conf and comment the private and public lines. Also disable the
/etc/rc3.d/S99ucd-snmp and /etc/rc3.d/S76snmpdx
Reboot is required. Else only new processes will see the new timezone files. Any process that was
launced before the patches will have the old data in memory.
You can remove that directory without the problem.
Adding set moddebug=0x80000000 into /etc/system. This may help reboot the server in case it is
stuck at loading a particular driver: exclude: drv/qus
luxadm probe - shows the logical/multipathd disks
luxadm display <path_from_above_command> - shows real disk names and which are pri and sec
/etc/powermt display dev=all
isainfo -b
/etc/release
#showrev
same but prtconf -b shows product, banner, family, model etc
ls -l shows large size but ls -s shows very little because ls -s shows actual blocks consumed
find . -size +400 -print
halt -d
It doesn’t shutdown all processes and unmount any remaining FS
SunOS (Berkley), Solaris (Sys V)
Bill Joy prepared 1BSD, 2 BSD, vi, c shell in 1977/78 at UCB. He was cofounder of Sun Microsystems
OS Based on
SunOS 1.0 4.1BSD (1982)
SunOS2.0 4.2BSD (1985)
SunOS3.0 4.3BSD (1986)
SunOS4.0 4.3BSD (1989) + a bit of Sys V (Renamed as Solaris 1)
Solaris 2 No BSD - alll Sys V Rel 4 - 1992
SunOS 4.14 (Solaris 1.1.2) - 1994
Core of Solaris OS is identified as SunOS 5. SunOS 5 (SVR4) is different than actual SunOS x.x
(BSD).
Solaris 2.4 Incorporated SunOS5.4
Solaris 2.6 Incorporated SunOS5.6
Solaris 2.7 Incorporated SunOS5.7
Solaris 2.10 Incorporated SunOS5.10
1BSD came in 1977, 4.4BSD came in 1994. CSRG (Computer Systems Research Group) at UCB
developed it all the way. After 4.4 it was dissolved. Now FreeBSD and OpenBSD (focussing on
security) are available.
AT&T and Sun formed a company Unix International to develop SVR4 (solaris 2). Sun was out of UI
after release of SVR4. USL (Unix Sys Lab) at AT&T continued dev of SVR4. Was bought over by
Novell. HP, IBM and others formed OSF (Open Software Foundation) to oppose UI. This was a big
failure. Many vendors formed a consortium called X/Open Company Ltd to limit too many Unix
flavors and device the standards. UI merged with OSF in 1996. OSF then merged with X/Open
Compan to form The Open Group. TOG worked with IEEE to set a single standard. TOG now sets
Unix standards and releases Single Unix Specifications. POSIX are IEEE standards but IEEE is
expensive, hence industry preferred Single Unix Standards.
Solaris 2.6 (SunOS 5.6) - Included Kerberos, PAM, TrueType, Fonts, WebNFS, Large File support
Solaris 7 (SunOS 5.7) - First 64-bit UltraSpARC release, UFS logging
Solaris 8 - Multipath I/O, IPv6, IPSec, RBAC, Last update was Solaris 8 2/04
Solaris 9 - iPlanet Directory Server, Resource Manager, Solaris Volume Manager, Linux Compatibility
added, Open Windoes dropped
Solaris 10 - includes x64 bits support, DTrace, Solaris Containers, Service Manager Facility, NFSv4,
iSCSI, GNOME based Java Desktop System as default desktop, ZFS, GRUB for x86 systems
SPARC is big endian. 4A3B2C1D is stored at memory location with lowest address at 100. 100 (4A),
101 (3B), 102 (2C), 103 (1D). 4A is most significant byte and is stored at lowest address
no
in.mpathd
Yes but it can't have different types such ethernet and ATM
number of inodes = FS size (B, KB, …TB)/Number of bytes per inode (B, KB, …TB)
FS Size # of inodes
<=1GB 2048
<2GB 4096
<3GB 6144
<1TB 8192
>1TB 1048576
No. Whole FS has to be recreated.
16TB
32767
Space efficient or time efficient
file directory
r read the file list the dir content
w edit the file create/del files in dir
x execute the file check whether a file with a given name exists in dir but doesn’t let directory
listing
t sticky bit although u have rights on a dir, you can remove only your files
r and no x on dir - list the content of dir but cant access them in anyway. ls will work but ls -l will not
work
x and no r on dir - can't list the contents of dir. you can cd to dir. if u know the file name, u can
access it
Usually ignored by OS but noticed by FreeBSD. Any new file/dir careted in that dir use the
directories user id as their user id and new items have setuid turned on
New files are created with directory group id with setgid set
It can be executed with the userid/gid of owner of file. "s" file have execute bit set. "S" file doesn’t
have x perm.
New: SVM, Solaris Resource Manager, Solaris Secure Shell, IPSec with Internet Key Exchange (IKE),
Soft disk partitions, Patch management software
Enhanced: System crash dump utility replaced with mbd, IPMP, NFS, mkfs, Linux compatibility
Removed: devconfig (x86), kerberos client version 4, Crash utility
New: Zones, Postgres SQL, webmin, PDA support, Solaris Service Manager, Solaris ZFS File system,
Cluster Volume Manager, iSCSI, Java Web Console, NFSv4, SATA Support, Solaris Dynamic Tracking
(Dtrace), Kernel Module Debugger, Solaris IP Filter Firewall, 64-bit AMD64 Ssupport, 10gb ethernet
Enhanced: Tasks, projects, accounting, SSH, IPSec, TCPWrappers, 64bit computing, DHCP, SNMP,
IPv6
Removed: admintool, swmtool, DNS - bind8 replaced by bind9, SystemVRelease3 support, SVM -
transactional volume (trans metadevice) replaced by UFS logging
ls -l /dev/cfg
When system panics, system write out the contents of physical memory to a predetermined dump
devices. On reboot, a start up script (etc/init.d/savecore) calls savecore utility if enabled. It will
make sure the crash dump correpsonds to running OS and then copy the crash dump to the dump
device in 2 files unix.n and vmcore.n (n increasing sequentially). dumpadm configs are stored in
/etc/dumpadm.conf file. Crash dumps are usually 35% of physical RAM but in some cases they may
go upto 80% to 90%.
faulty hardware or software bug or drivers or modules
It is created when application crashes. It is a snapshot of RAM allocated to a process. Its config are
saved in /etc/coreadm.conf.
who -r -> . Run-level 3 Dec 13 10:10 3 0 S (here previous run level was S). It was at run level 3 for
0 times since last reboot on Dec 13 10:10
In single user only few file systems are mounted whereas in run level 1, all available file systems are
accessible but user logins are disabled.
/etc/systems - scsi_options
Defunct processes are processes that have become corrupted where they can't talk with the parent
or child process.
It collects many /etc files, details of storage, disk firmware level, showrev -p, pkginfo -l output. This
op can be then fed into patchdiag tool for patch analysis. It generates output in /opt.
SSH is a recently designed, high-security protocol. It uses strong cryptography to protect your connection against
eavesdropping, hijacking and other attacks. Telnet and Rlogin are both older protocols offering minimal security.
* SSH and Rlogin both allow you to log in to the server without having to type a password. (Rlogin's method of
doing this is insecure, and can allow an attacker to access your account on the server. SSH's method is much
more secure, and typically requires the attacker to have gained access to your actual client machine.)
* SSH allows you to connect to the server and automatically send a command, so that the server will run that
command and then disconnect. So you can use it in automated processing
A signal is a message which can be sent to a running process. Signals can be initiated by programs, users, or
administrators.For example, to the proper method of telling the Internet Daemon (inetd) to re-read its configuration
file is to send it a SIGHUP signal. Total 45 signal are there in solaris.
Where do you get Disksuite for Solaris 9?
Equivalent of vxprint
How do you clear metadevice configurations?
Why will you use SVM instead of VxVM for boot disk?
VxVM DRL
what is NDU?
How are APM and NDU related?
metastat
#metaclear
1. No advantage in using VxVM except mirroring. SVM also provides mirroring without encapsulation so
this advantage of VxVM is overcome.
2. Free as compared to expensive VxVM license
3. Upgrades are simple as compared to VxVM
You can grow but not shrink a UFS. You can grow its UFS size only if you can increase the size of
partition it lives using following command:
/usr/lib/fs/ufs/mkfs –G –M /current/mount /dev/rdsk/cXtYdZsA newsize_in512byte_blocks
This could be done online when filesystem is mounted and in use.
VxVM steals several cylinders from swap (in case there are no free cylinders left) to create private
region. This causes following probs:
1. No protection for private region because it is in the middle of the disk and pub region encompasses
the whole disk. [VxVM finds a way around by creating rootdiskPriv subdisk.]
2. Reduced flexibility of configuration because pub area is divided into a before and after private region
3. Protection of VTOC (block zero) from being overwritten. This is achieved by creating rootdisk-B0
subdisk.
All of VxVM utilities are located in /usr. The only VxVM components located in root are kernel drivers
and vxconfigd. If ever /usr can’t mount, there is very little that can be done with only root mounted.
An overlay partition includes the disk space occupied by root mirrors (rootvol, swapvol, varvol, usrvol).
During boot, before these volumes are fully configured, the default volume configuration uses the
overlay partition to access the data on the disk.
Using vxresize:
#vxresize –g datadg –F vxfs –b app 10g c3t0d0 c4t1d0
Using vxassist:
#vxassist –g datadg –F vxfs –b growto app 10g
#vxassist –g datadg –F vxfs –b growby app 5g
Using vxvol:
#vxvol –g datadg –F vxfs –b set len=1024658 app
While shrinking the volume size, do not shrink below the size of the file system. First shrink the file
system and then shrink the volume. Vxresize also resizes file system size whereas vxassist doesn’t.
Using vxresize:
#vxresize –g datadg –F vxfs –b app 10g c3t0d0 c4t1d0
Using vxassist:
#vxassist –g datadg –F vxfs –b shrinkto app 10g
#vxassist –g datadg –F vxfs –b shrinkby app 5g
Deport the disk group from first system with –h option (new host name)
Import the disk group on new system
#vxedit –g datadg rename olddiskname newdiskname
1. Using vxdiskadm to replace a failed disk:
vxdiskadm command requires two attempts to replace a failed disk. The first attempt can fail with a message of the form -
/usr/lib/vxvm/voladm.d/bin/disk.repl: test: argument expected
The command is not completed and the disk is not replaced. If you rerun the command using option 5, the replacement
successfully completes.
5. patchadd of vxfs 4.1 MP1 patch 119302-02 fails if 119254-24 is installed on Solaris 10. The reason is – pkginfo doesn’t have a
few variables set in pkginfo and pkgmap files, so it tries to unset them when it installs the patch (they’re set to true).
Modify the pkginfo to include that variable=true. It might give the error about the patch being corrupt. Modify
the pkgmap file entries for the pkginfo file to match the new size and chk values. Both the variables has to do
with zones.
ftp ftp.veritas.com
login: anonymous
passwd: your email address
cd pub/support
( Note: this is a blind directory, you will not be able to see any files here )
bin
get vxexplore.tar.Z
bye
Once you get this, uncompress & untar the file and follow the instructions in the README to generate the explorer
output. IMPORTANT: When asked to Restart VxVM Configuration Daemon? [y,n] (default: n) type n.
ftp ftp.veritas.com
login: anonymous
passwd: your email address
cd /incoming
bin
put <filename>. 290-174-344
bye.
#vxvol rdpol
#vxmend off|on plexname
#vxmend fix clean plexname
#ssaadm –t 1|2|3 stop|start controller
#vxrecover –s or #vxrecover –s volname
#vxmake plex plaxname sd=subdiskname
#vxmake sd sdname diskname,starting_block,total_number_of_blocks
#vxdctl mode
As it is a kernel thread, you can’t see it with ps. Hence you have to use vxiod command to see it is
running.
#vxprint –vl OR #vxprint –l volname OR #vxinfo vol-name
#vxprint –pl OR #vxprint –l plexname
#vxprint –st OR #vxprint –l sdname
#vxprint –vpshm > file
#vxmake –d file
#vxmend –g dg fix stale plexname
#vxmend –g dg fix clean plexname
#vxvol –g dg startall
#vxdctl initdmp
#vxdctl add disk c0t0d0s6
#vxdctl add disk c1t0d0s6
#vxassist –b make volname size layout=stripe mirror=yes disknames
#vxassist –b make volname size mirror=yes disknames
• Verify link is up
[root@elonxvcdbmsd1 /]# cat /proc/scsi/lpfc/0
Emulex LightPulse FC SCSI 7.1.14
Emulex LightPulse LP10000 2 Gigabit PCI Fibre Channel Adapter on PCI bus 07 device 48 irq 40
SerialNum: VM51733449
Firmware Version: 1.90A4 (T2D1.90A4)
Hdw: 1001206d
VendorId: 0xfa0010df
Portname: 10:00:00:00:c9:46:83:1f Nodename: 20:00:00:00:c9:46:83:1f
Link Up - Ready:
PortID 0xf0c00
Fabric
Current speed 2G
lpfc0t00 DID 063c00 WWPN 50:06:04:84:4a:37:2d:12 WWNN 50:06:04:84:4a:37:2d:12
Link is up and card zoned in correctly.
• Verify the disk doesn’t exist using fdisk, inq, vxdisk list
• Run lun_scan to locate the luns on the system
• Get LUN number of new disk using vxinq, inq.linux, fdisk
• cat /proc/scsi/scsi to see the LUNs available on the system
• Write the labels on newly detected disks as follow
• Make VxVM detect new disks using vxdctl enable. If you get foll error, then proceed as mentioned below
[root@elonxvcdbmsd1 /]# vxdctl enable
If you get thie message...VxVM vxdctl ERROR V-5-1-307 vxconfigd is not running, cannot enable
[root@elonxvcdbmsd1 /]# vxconfigd -m disable
[root@elonxvcdbmsd1 /]# vxdctl init
[root@elonxvcdbmsd1 /]# rm -f /etc/vx/reconfig.d/state.d/install-db
[root@elonxvcdbmsd1 /]# vxdctl enable
[root@elonxvcdbmsd1 /]# vxdisk -o alldgs list
• Initialize the disk, add it to diskgroup and it is ready for use
Serial number of Symmetrix/DMX from where the LUN is coming, type of RAID on LUN (defined by
storage team).
vxdisk list diskname, format, inq should show two paths.
vxdctl enable, devfsadm
lputil
QIO releases POSIX locking for the files under QIO control, making writes execute concurrent and thus faster.
Other advantage is OQI will stop doing the file system buffering for those files, thus freeing up more memory for
the database systems internal buffers. The overhead in processing time for the IO path in the CPU is minimal when
QIO is enabled.
trial version means that the vxvm will fail at boot without the license keys.
The packages are same for all the features of VxVM. It is driven by what license key you install.
#vxedit -g dgname -v rename old_volname new_volname
Restarting vxconfigd should solve the problem.
You can use vxdiskadm option 7 or manually do what it will do by creating a /etc/vx/disks.exclude file
that lists luns.
Check with /etc/vx/diag.d/vxdmpinq /dev/pathname to see the output. Next try below:
# cfgadm -o show_FCP_dev -al - check the output to see any "unusable" disk
# cfgadm -o unusable_FCP_dev -c unconfigure c3::50001fe15005e90a - to remove the disk path. If it
doesnt remove, use -f "force"
# devfsadm -C to remove any device files if devices are gone
Solaris 8 might need reboot, 9 may not. also try using drivers whether you need reboot or not.
Edit the file /usr/lib/vxvm/bin/vxroot.Around line 138 you'll see code like this:
if [ $? -eq 0 -a -n $bus_drivers ] ;
Add quotes around $bus_drivers so it looks like this:
if [ $? -eq 0 -a -n "$bus_drivers" ] ;
To recover from this (if this is the problem) without reinstalling you can either boot to network or media
and add these lines to /etc/system:
rootdev:/pseudo/vxio@0:0
set vxio:vol_rootdev_is_volume=1
When you remove a disk, its plexes are marked as bad. When you plug it back without removing the
other disk, vxvm starts up, sees it has 2 disks, both have rootvol, but one is marked as bad on the
other disk. If you physically pull out a disk, you must never have both disks in at the same time during
a subsqeuent boot. you can put it back in *after* boot and reinitialize it after everything is good, but it
will be forever tainted in a dual boot situation, without further action.
FS must be mounted.
ASL - Array support library - they allow DMP to properly claim a device, identify what type of array it
sits in and basically tell DMP which sets of procedures to use to manage the paths to that device.
APL - Array Policy Module - These are dynamically loaded kernel modules that implement the sets of
procedures and commands that DMP must issue to an array to manage the paths to it. The base DMP
code comes with a set of default APMs for Active/Active arrays or Active/Passive arrays. These APMs are
"generic" in nature. For arrays that are require specific handling (and the Clariion is a perfect example
of that), DMP relies on array specific APMs that implement procedures and commands that are specific
to that array.
vxdmpadm listenclosure all' because that will show which enclosures DMP has identified and how it
claimed them (from the array_type column).
CLR-A/PF tells you that Clariion was claimed with 'explicit failover mode' (Clariion Failovermode 1).
A Clariion configured to Failovermode 2 would get claimed with array_type 'CLR-A/P'.
ASL really gets used at device discovery, so anytime vxdisk scandisk or vxdctl enable (more involved)
gets called. One the device is claimed, the ASL doesn't do anything. The APM effectively takes over.
updating an APM online should also work. The commands that are specifically in the APM tend to relate
to path state management (i.e. how to trigger a LUN trespass, what to do following an IO failure, how to
interpret this array specific sense data) and typically are not related to IO load balancing.
http://www.symantec.com/enterprise/stn/index.jsphttps://forums.symantec.com/syment/blog/article?blog.id=Ameya
Non Disruptive Upgrade - NDU is EMC way of upgrading the firmware while the system is up
The APM, analogous to user land counterpart ASL, was tailored to handle array specific problems such
as initiating failover and supporting array specific technologies such as NDU (Non-Disruptive Upgrade)
from EMC.
#vxvol -g dgname rdpol prefer volname plexname
It indicates no need to create a file system on a volume and FS will not synch up if volume is mirrored
vxconfigd refers to /dev//vx/config file. All VM changes occur through this interface.
5
vxdg list dgname
It is the size of the smallest private region in the disk group
vm operates properly but conf changes are not allowed.
it reads the kernel log to determine the current status of vm components and updates the config db
# vxdctl mode - displays vxconfigd status
# vxdctl enable - enables
# vxdctl disable - disables
# vxdctl stop - stops
# vxdctl -k stop - sends kill -9
# vxdctl license - checks licensing
# vxconfigd - starts
/etc/vx/volboot contains hostid used to determine the ownership of disks for importing, values of
defaultdg/bootdg
# vxdctl init hostid
It is a temporary subvolume created during volume layout
• vxassist relayout: For non layered volume to non layered volume
• vxassist convert: For non layered to layered or vice versa
it invokes devfsadm to ensure OS recognises the disks, then invokes vxdctl enable which rebuilds
rebuilds volume and plex device node dirs
# vxdisk scandisk new
or
#luxadm -e forcelip /dev/cfg/c2 (2nd controller)
FAILING: public region has uncorrectable I/O failures but vm can still access private region
FAILED: vm can't access private region or public region
#vxconfigd -m disable - starts vxconfigd in disabled mode
#vxconfigd -m boot - handles boot time start up of vm. Starts rootdg and root volumes
#vxconfigd -m enable - starts vxconfigd in enabled mode. It loads rootdg, scans all
known disks for disk groups and imports those DG. sets up entries in /dev/vx/dsk and
/dev/vx/rdsk
Default installation of vm rootability places privatge region somewhere in swap by stealing few
cylinders. Because of this, private region can't be represented by SUN slice (Slice has a start cylinder
and length). Vm maps entire disk to public region and creates privatge region slice in its middle. The
private region is now in address space of both - pub and priv region. To prevent data volumes from
being created out of the space occupied by the private region, vm creates a special subdisk 'on top of'
private region section, called as 'rootdiskPriv'. It exists solely to mask off the private region.
Every disk has VTOC at first addressable sector of the disk, block zero. So that, this sector is protected
and not overwritten, vm creates special subdisk rootdisk-B0 on the top of the VTOC which persists even
if rootvol is removed.
The volumes on the root disk can't use dirty region logging.
CLEAN
• “EMPTY” state of plex is achieved only by creating a new volume using vxmake.
• “CLEAN” state of plex means plex is good, volume is not started (no I/O)
• “STALE” state indicates that plex is not synchronized with data in the CLEAN plex (could be bcoz of
taking plex offline, disk failure etc).
• “OFFLINE” state indicates that plex is not participating in any I/O.
• ”NODEVICE” indicates that disk drive below the plex has failed.
• “REMOVED” means sys admin has requested the device to appear as if it has failed.
• “IOFAIL” indicates that IO has failed but VxVM is unsure whether the disk has failed or not
(NODEVICE).
• “SYNC” state of volume indicates that the plexes are involved in read-writeback or RAID5
synchronization.
• “NEEDSYNC” state of volume is same as SYNC but internal read thread has not been started.
If VM can still access the priv region on the disk, it marks the disks as FAILING. The plex with affected
SD is set to IOFAIL. Hot relocation relocates the affected subdisk. If VM can't access the priv region, it
marks the disk as FAILED. All plexes using the disk are changed to NODEVICE state. Hot relaction
occurs.
#vxrecover -sn
Use command vxvol -g dgname -f start volname to force start only on non-redundant volumes. If
used on redundant volumes, data can be corrupted unless all mirrors have the same data.
To manually reset or change the state of a plex or volume. Volume must be stopped to run it.
Start using vxrecover -s instead of vxvol start because it starts both the top-level volumes and the
subvolumes. Vxvol start starts only top level volume.
IT was in ACTIVE state prior to failure.
mount and access a volume (using one plex at a time). Offline/online plex using vxmend.
Offline all but one plex and set the plex to CLEAN. Run vxrecover -s. Verify data on the volume. Mount
the file system as read-only so you do not have to run a FS check. Run vxvol stop. Repeat for each plex
until you identify the plex with the good data.
o plex vol01-01 is RECOVER and vol01-02 is STALE.
o Because the state of plex vol01-01 is RECOVER, it was in the ACTIVE state prior to the failure.
o Because state of plex vol01-02 is STALE, vol01-01 was the plex with good data prior to failure.
o Set all the plexes to STALE
#vxmend fix stale vol01-01
#vxmend fix stale vol01-02
o Set the good plex to CLEAN
#vxmend fix clean vol01-01
o Run vxrecover
#vxrecover –s vol01
o Offline all but one plex and set that plex to CLEAN, run vxrecover and verify the data on volume.
#vxmend off vol01-02 (bring second plex offline)
#vxmend fix clean vol01-01 (set first plex CLEAN)
#vxrecover –s vol01 (verify data after this step)
#vxvol stop vol01 (stop volume to bring 1st pl offline & 2nd pl online)
#vxmend –o force off vol01-01 (bring first plex offline)
#vxmend on vol01-02 (bring second plex online)
#vxmend fix clean vol01-02 (set second plex CLEAN)
#vxrecover –s vol01 (verify data after this step)
If vol01-01 has correct data – stop vol, 2nd stale as stale, 1st online and clean, recover:
#vxvol stop vol01 (stop the volume to change plex status)
#vxmend fix stale vol01-02
#vxmend on vol01-01
#vxmend fix clean vol01-01
#vxrecover –s vol01
If you have only one partition free, then select CDS disk layout. If you have 2 free paritions then you
can use sliced. The disk must contain an S2 slice that represents the full disk (the S2 slice cant contain
a FS), 2048 sectors of unpartitioned free space either at the beginning or at the end of the disk for
private region.
Same as data disks. In addition, it requires 2 free paritions for public and private regions. The private
region is created at the beginning of the swap area and the swap partition begins one cylinder from its
original location.
Never expand or change the layout of boot volumes. No volume in bootdg should be expanded or
shrunk because they map to a physical underlyinig partition on the disk and must be contiguous.
These volumes must be located in a contiguous area on a disk as required by the OS which means these
volumes can't use striped, RAID-5, concatenated mirrored or stripped mirrored layouts.
first swap vol must be continuous and same conditions as rest of the OS volumes. Second swap volume
can be non-contiguous and can use any layout.
Boot disk can't be a CDS disk.
Though both the disk contains same data, it is not necessarily placed at the exact location on each disk.
vxunroot
All but one plex of rootvol, swapvol, usr, var, opt and home must be removed using vedit or vxplex.
One disk in addition to the boot disk must exist in the boot disk group.
To boot from physical system partition.
if u r upgrading only VM packages including VEA package.
VMSA doesn’t run with VM 3.5 and above
S25vxvm-sysboot - determines whether root/usr are volumes, starts vm restore daemon, starts
vxconfigd in boot mode, creates disk access records for all devices, starts rootvol and usr volumes.
S30rootusr - Mount /usr as RO and checks it for any problems
S35vxvm-startup1 - starts special volumes such as swap and /var, sets up dump device
S40standardmounts - mounts /proc, adds a physical swap devices, remounts root and /usr
S50devfsadm - configures /dev/ and /devices trees
S70buildmnttab - Mounts FS such as /var/, /var/adm and /var/run
S85vxvm-startup2 - starts vxiod, changes vxconfigd to enable, imports disk groups, initializes DMP,
reattaches drives that were inaccessible when vxconfigd first started using vxreattach, starts all volumes
using vxrecover -n -s without recovering them
S86vxvm-reconfig - Performs operations defined by vxinstall and vxunroot, uses flag files to determine
actions, adds new disks, performs encapsulation
link_down timeouts - time for which HBA waits before reporting link down. Should be same as
dmp_failed_io_threshold.
Link_retry interval - retries before reporting link down
The FS makes I/O requests to OS SCSI driver, which reformats them and passes them to an HBA driver.
HBA drivers treats io requests as messages which they send between source and destination without
interpretation.
FS I/O requests to virtual volumes are actually fielded by vxvm. It creates equivalent requests to
physical disks or LUNs and issues them to OS drivers.
FS makes I/O requests to vm. VM makes its i/o requests to metadevices presented by dmp. Dmp
selects an io path for each requeest and issues the request to OS.
One way to do is reset DDI_NT_BLOCK_WWN indicator for all paths except one and create metanode for
that path. ATF from EMC is an example.
Other way to do is represent device with own metanodes with distinct name pattern such as
cXtWWWdXsX. DDI_NT_BLOCK_WWN indicator is on. Sun's mpxio is an example. DMP can create its
own metanodes linked to path-suppressing path manager's pseudo-devices and co-exist with other PM
for most purpose. But because each pseudo-device appears to DMP as a single-path device, DMP
performs no useful function.
one way to do is leave the sub-paths detected by OS unmodified, and add their own metanodes to
device tree - effectively with 3 device entries for each path.
Other way to do is leave OS subpaths intact, and insert their metanodes in a separate file system
directory. DMP can't exist with them because of no APIs. EMC Powerpath behaves in this way. DMP can
exist with PowerPath because of API availablity.
DMP would discover both the sub-paths and metanodes of no-path suppressing path managers. DMP
and other PM might both attempt to manage access to the same devices, with obvious conflicts. Use
Foreign Device concept to avoid this. Vxddladm addforeign command declares a device to be foreign.
DMP does not control path access to foreign devices, but vm can still incorporate them in disk groups
and use them as volume components.
3.0.2 - 1TB, 3.5 - 32 TB, 4.x - 256 TB. This assumes 8k block size. If you want >32TB, you need a
Storage Foundation License Key. Standalone vxfs won't turn it on
vxconfigd -k
It accepts and executes I/O requests to a single LUN on two or more ports simultaneously.
A/P disk array accepts and executes I/O requets to a LUN on one or more ports on one array controller
(prim) but is able to switch access to the LUN to alternate ports (seco) on other controllers. In addition
there are 3 more sub categories: Multiple Primary Paths (A/PC): It accepts and executes I/O requests to
a LUN on 2 or more ports of the same array controller.
Explicit Failover: all primary I/O paths failover to secondary I/O paths either on receiving an explicit
command, or when I/O request is sent for seco path explicitly. Useful in clusters using A/P array
LUN Group Failover (A/PG): A group of LUNs can fail over LUNs from primary to secondary
simultaneously.
/etc/vx/diag.d/vxdmpinq /dev/vx/rdmp/HDS9970V0_4s2
When DMP finds only one path for a LUN, it links its metanode with its OS device tree. This path is called
fast path and the I/O will be sent down that path.
VxFS 24K and UFS 8K
vxio, vxspec, vxdmp
/usr/bin/vxvm/bin and /etc/vx/bin
/etc/init.d/isisd start/stop/restart
vxsvc -k, kill `cat /var/vx/isis/vxisis.lock`
qlc.conf, fcaw.conf, lpfc.conf all in /kernel/drv
In order to create more manageable file systems or partition sizes, disks/logical volumes might be
needed to be subdivided into more than eight partitions. This is achieved by SVM's soft partition.
User should build a volume on top of disk slices, then build soft partitions on top of the volume. This
strategy allows to add components to the volume later, and then expand the soft partitions as needed.
For example, you could create 1000 soft partitions on top of a RAID-1 or RAID-5 volume so that each of
your users can have a home directory on a separate file system. If a user needs more space, you could
simply expand the soft partition.
You can do this with ZFS and SVM.
What is MBR?
What is SMT?
What is hyper-threading?
What is SMP?
mii-tool –v. If mii doesn’t work, then try dmesg | grep eth0
/sbin/pump -i eth0 –status, OR netstat
Edit /etc/sysconfig/network-scripts/route-eth0
GATEWAY0=10.10.0.1
NETMASK0=255.0.0.0
ADDRESS0=10.0.0.0
/etc/sysconfig/network
#route add default gw <gateway-ip>
Using mii-tool
mii-tool -F 100baseTx-FD eth0
OR ethtool:
ethtool –s eth0 speed 100 duplex full autoneg off
Edit /etc/sysconfig/network-scripts/ifcfg-eth0 and add follo:
ETHTOOL_OPTS=”speed 100 duplex full autoneg off”
Or netcfg OR netconfig
RedHat: /etc/sysconfig/network-scripts/ifcfg-eth*
SuSE: /etc/sysconfig/networks/ifcfg-interface
#redhat-config-network (prompts GUI if in graphical mode or text if in cli mode)
Edit /etc/xinetd.d/telnetd
echo 1 > /proc/sys/net/ipv4/ip_forward
/etc/sysctl.conf
rpm –i kernel.package.rpm (Looking for the “-i” option and rather than the “-U” option)
Boot loaders
Initrd is a temporary root file system that is mounted during system boot to support the 2nd stage of two-
state boot process. It contains various executables and drivers that permit the real root file system to be
mounted, after which the initrd RAM disk is unmounted and its memory freed.
It is first 512 byte on bootable media. 446 bytes boot loader, 64 bytes partition table (4 partitions), 2 bytes
magic number (integrity).
It can store the boot records of only one OS. Hence for multiple OS, boot loaders are used.
GRUB knows about file systems, LiLo doesn’t. Lilo uses raw sectors on the disk, whereas GRUB can load a
Linux kernel from ext2/3 file system.
GRUB lets you amend the parameters for selected kernel before booting using it.
Yes, it can be only increased (not reduce) using ext2online. It can be done only when the file system is
mounted. FS size can only be increased up to 1000 times its original size: (100MB x 1000 = 100000MB)
Initrd file is a compressed cpio archive of temporary root FS. To view the contents of the file copy it to a
directory and name it with the .gz extension.
Then use gunzip program to decompress it. Extract the contents using cpio command. The file has a nash
script that is used to load kernel modules.
Create physical volumes from the hard drives
Create volume groups from the physical volumes
Create logical volumes from the volume groups and assign the logical volumes mount points
Download required kernel package and run rpm –ivh.
#tcpdump –i eth0 –w /var/tmp/network_traffic or
#tcpdump –i eth0
Comment out following line in /etc/pam.d/login file:
#auth required pam_security.so
1. Install packages – ypbind, portmap, yp-tools
2. Edit /etc/yp.conf to add domainname and NIS server name
3. Edit /etc/hosts to add entry for NIS server
4. Edit /etc/nsswitch.conf to specify the NIS order
5. Edit /etc/sysconfig/network to add NISDOMAIN name
6. Restart portmap
7. Restart ypbind
OR the same thing can be achieved through the command authconfig.
5
It is run by init and it performs required low-level setup tasks such as setting the system clock, checking the
disks for errors and subsequently mounting file systems.
#service –-status-all
Using chkconfig
The script should have following format in order to be managed by chkconfig:
1. the first line indicates what shell is used to run the script
2. the second line is just a blank comment
3. the 3rd line should be a comment that indicates which runlevels should the service be started as well as
start|stop priority (chkconfig line)
4. 4th line should be a description of the service
The example is as below:
#!/bin/sh
#
# chkconfig: - 91 35
# description: Starts and stops the Samba smbd and nmbd daemons \
# used to provide SMB network services.
Chkconfig doesn’t make immediate changes to service. Instead it makes the changes persistent for next
reboot. Service is completely opposite to it.
Write a script with chkconfig and start, stop, restart, status parameters and manage using chkconfig later on
RHEL uses lock files to indicate the status of a service. RC script for a service includes touching a lock file
when it is started and removing a lock file when it is stopped.
/var/local/subsys/<service-name>
This file indicates that the service should be running. #service <name> status checks the PID as well as the
file. If the PID is not found but file is found (locked), it gives a message:
<service> but subsys locked
This lock file is useful while changing the run level, and bringing the services down gracefully.
SMT – Simultaneous Multithreading (hyper-threading for intel P4). It permits multiple independent threads of
execution to better utilize the resources provided by processor.
Temporal-Multithreading (or multithreading) allows multiple processes and threads to utilize the processor
one at a time, giving exclusive ownership to a particular thread for a time slice (Cycle) in the order of
milliseconds. Quite often the exclusive owner process will wait for some external resources and the cycles will
be left unutilized.
Super-threading allows processor to execute instructions from a different thread each cycle. Thus cycles left
unused by a thread can be used by another that is ready to run.
SMT allows multiple threads to execute different instructions in the same clock, using the executions units
that the first thread (owner) left space.
It is officially called as Hyper-Threading Technology (HTT). It is Intel’s trademark for their implementation of
SMT on P4. It debuted on Intel Xeon and was later added to P4.
An OS should have SMP (Symmetric Multiprocessing) support. SMP presents logical processors (created by
SMT) as standard separate processors to the OS and scheduler.
It may cause cache miss (data not found in cache), branch misprediction (incorrect prediction of the next
instruction following the execution of current conditional statement), or data dependency (instructions refer to
the results of preceding instructions that have not yet been completed).
There is also a security threat wherein a malicious thread operating with limited privileges permits monitoring
of the execution of another thread, allowing for the possibility of theft of cryptographics keys.
It is an architecture in which each instruction can execute several low-level operations, such as a load from
memory, an arithmetic operation, and a memory store, all in a single instruction. Example is X86.
RISC favors a simpler set of instructions that all take about the same amount of time to execute. Example is
SPARC.
It refers to sequencing methods used in a one-dimensional system. It refers to byte order. There are 2 types:
Big-endian –big end goes in first – most significant byte - SPARC
Little-endian – little end goes in first – least significant byte – X86
Big-Endian
| 100 | 101 | 102 | 103 |
... | 4A | 3B | 2C | 1D | ...
| 100 | 101 | 102 | 103 |
... | 4A 3B | 2C 1D |...
| 100 | | 101 | |
... | 4A 3B | 2C 1D |...
Little-Endian
| 100 | 101 | 102 | 103 |
... | 1D | 2C | 3B | 4A |…
| 100 | 101 | 102 | 103 |
... | 2C 1D | 4A 3B |...
| 100 | | 101 | |
... | 2C 1D | 4A 3B |...
Persistent Resource:
Relation between parent/child resources?
What are the attributes of mount reources?
Symmetric Cluster (active/active) – SG runs on both servers. Upon failure, SG moves over to
another server. 0% redundant hardware cost but the performance of multiple SGs on single
node is affected.
N-to-1 Cluster – Multiple servers each with individual connection to storage and one spare
server with connection to all storages. Failback is manual when the original server comes
back.
N+1 Cluster – Multiple servers each running a SG are connected to storage through SAN
along with one redundant server. Failback is not an issue when server comes back online
because of SAN storage. When server comes back online, it becomes spare.
N-to-N Cluster – Multiple servers each running SGs are connected to storage through SAN
WITHOUT redundant server. When server faults, SGs are failed over to rest of the servers.
Stop HAD (use –force to keep services running)
#hastop –local
#/etc/rc2.d/S70llt start
#/etc/rc2.d/S92gab start
LLT warning level specifies how much info is written to console or syslog. Default value is 20
(LLT works silently and reports only timeout problems such as delayed heart bits etc).
Warning level 0 means no info reporting at all. It can be changed by:
A SG whose resources don’t online or offline need phantom agent to report the correct status.
MultiNICA resource is one such group which doesn’t go offline or online.
A SG which is online on more than on one node at the same time.
Proxy resource in a service group will replicate the state of the resource it is representing to
reduce the additional monitoring and hence reduced system load.
1. Create one group with MultiNICA resource to monitor the devices
2. In each of the other groups, proxy resource will represent above created MultiNICA
resource
3. In each of the other groups, IPMultiNIC resource will use TargetResName of above
MultiNICA resource. This will attach the additional virtual IPs on the same device for all the
service groups.
Change the “Device” attribute from global to local and assign individual values to that
attribute:
#hares –local MultiNICA1 Device
#hares –modify MultiNICA1 Device hme0 “15.48.56.3” qfe2 “10.10.10.10” –sys node1
#hares –modify MultiNICA1 Device hme0 “15.48.56.5” qfe2 “10.10.10.20” –sys node2
group NFS_group1 (
SystemList = { Server1, Server2 }
AutoStartList = { Server1 }
)
DiskGroup DG_shared1 (
DiskGroup = shared1
)
IP IP_nfs1 (
Device = hme0
Address = "192.168.1.3"
)
Mount Mount_home (
MountPoint = "/export/home"
BlockDevice = "/dev/vx/dsk/shared1/home_vol"
FSType = vxfs
FsckOpt = "-y"
MountOpt = rw
)
NFS NFS_group1_16 (
Nservers = 16
)
NIC NIC_group1_hme0 (
Device = hme0
NetworkType = ether
)
Share Share_home (
PathName = "/export/home"
)
IP_nfs1 requires Share_home
IP_nfs1 requires NIC_group1_hme0
Mount_home requires DG_shared1
Share_home requires NFS_group1_16
Share_home requires Mount_home
group grp1 (
SystemList = { node1, node2 }
AutoStartList = { node1 }
Parallel = 1
)
MultiNICA MultiNICA1 (
Device @node1 = { hme0 = "152.48.56.3", qfe2 = "152.48.56.3" }
Device @node2 = { hme0 = "152.48.56.4", qfe2 = "152.48.56.4" }
)
Phantom grp1phantom (
)
group grp2 (
SystemList = { node1, node2 }
AutoStartList = { node2, node1 }
)
IPMultiNIC IPMulti2 (
Address = "152.48.56.5"
MultiNICResName = MultiNICA1
)
Proxy MultiNICproxy (
TargetResName = MultiNICA1
)
IPMulti2 requires MultiNICproxy
)
Modify AutoStartList
#hagrp –modify grp1 AutoStartList node1
#hagrp –modify grp1 AutoStartList node2 node2
Change parallel attribute so that it is online on all the nodes at the same time
#hagrp –modify grp1 Parallel 1
Change Device attribute from global to local to allow differing entries per node
#haers –local MultiNICA1 Device
Enable resource:
#hares –modify grp1phantom Enabled 1
Add the virtual IP address to the address attribute for IPMulti2 resource:
#hares –modify IPMulti2 Address 152.48.56.5
Add the name of the MultiNICA resource to which the virtual IP address will be assigned, notice
that it is the MultiNICA resource from grp1:
#hares –modify IPMulti2 MultiNICResName MultiNICA1
Add the name of the resource whose state the Proxy resource will be replicating. In this case it
is the MultiNICA resource from grp1:
#hares –modify MultiNICproxy TargetResName MultiNICA1
hastatus –sum
Can't connect to server -- Retry later -> communication problem
gabconfig -a
Port a not listed? GAB problem - Check seed number in /etc/gabtab, start GAB /etc/gabtab.
Port h not listed? HAD problem - Verify main.cf file (hacf -verify config_dir)
lltconfig -a list
llt not running? LLT problem - Check console and log for missing or misconfigured LLT files,
Check LLT configuration files (llttab, llthosts, sysname), Start LLT (lltconfig -c), Ensure all
systems can see each other (lltstat -nvv), Verify physical connections as below:
Use /opt/VRTSllt/getmac /dev/ce:1 to get mac
Start server on one node: #/opt/VRTSllt/dlpiping –s /dev/hme:1
Ping server from other node: #/opt/VRTSllt/dlpiping –c /dev/hme:1 first_node_mac
hastatus -sum
Service groups/resource offline -> groups/resource problem
If all systems are in STALE_ADMIN_WAIT or ADMIN_WAIT, first validate the config file and
then enter #hasys –force node1. Other systems will perform a remote build automatically.
If VCS is started on a system with stale config file, and all other systems are in
STALE_ADMIN_WAIT state (INITING=>STALE_DISCOVER_WAIT=>STALE_ADMIN_WAIT)
hastatus -sum
Systems in WAIT state -> startup problem
STALE_ADMIN_WAIT
Visually inspect main.cf file and restore if necessary
Check main.cf file for syntax errors, and fix them (hacf -verify config_dir)
Start VCS (hasys -force system_name)
ADMIN_WAIT
Check main.cf file for syntax errors, and fix them (hacf -verify config_dir)
Start VCS (hasys -force system_name)
This file is typically left behind if VCS is stopped while configuration is still open. The .stale file
is deleted automatically if changes are correctly saved and will therefore not force the
relevant node into an ADMIN state. The file can be removed safely if the main.cf file is ok.
Start other system with hastart without –stale option. This will prompt it to build the cluster
configuration in memory from its old main.cf file on disk. The first node then builds its
configuration from in-memory configuration on second node, moves its main.cf to
main.cf.previous and then writes the old configuration that is now in memory to main.cf.
#hastop –all
#vi main.cf
#hacf –verify
#hastart
#hastatus –sum
#hastatus –stale (on other systems)
This prevents data corruption (split brain) in a situation where 3rd node is still running, SG is
still online, but its LLT heartbeats are not running. If other nodes try to bring the same SG
online, it will cause the split brain.
Features of Jeopardy Membership: If a system is in JM and then it loses its final LLT link
1. SGs in JM are autodisabled in regular cluster membership
2. SGs in regular membership are autodisabled in jeopardy membership
3. Failover dues to a resource fault is still effective
4. Switch over of SG at operator request is still effective
Fix and reconnect the link. GAB detects the link is back online and removes the JM.
If the last LLT link fails:
- A new regular cluster membership is formed that includes only Sys1 and Sys2. This is
referred to as a 2 node mini-cluster
- A new separate membership is created for system 3, which is a mini-cluster with a single
system
- SGs from each cluster can’t failover to each other – network partition
- Since both clusters can’t communication, each maintains and updates only its own version
of the cluster configuration and the systems on different sides of the network partition have
different cluster configurations.
1. On the cluster with fewest systems, stop VCS and leave services running
2. Recable or fix LLT
3. Restart VCS. VCS autoenables all SGs so that failover can occur
GAB automatically stops HAD in each of the following scenario:
- In a 2-node cluster, system with lowest LLT node number continues to run VCS and VCS is
stopped on other node
- In multi-node cluster, mini-cluster with most systems running continues to run VCS. It is
stopped on systems in the smaller mini-clusters
- If a multinode cluster is split into two equal size mini-clusters, the cluster containing the
lowest node number continues to run VCS
When one LLT fails, system enters into jeopardy.
When both LLT fails simultaneously:
- The cluster partitions into 2 separate clusters
- Each cluster assumes that the other systems are down and tries to start the SG
- Both cluster try to start SGs causing data corruption (Split Brain)
PNEP occurs if LLT links fail while a system is down. If the system comes back up and starts
running services without being able to communicate with the rest of the cluster, a split brain
can occur.
VCS prevents system on one side of the partition from starting HAD. When system reboots,
the network failure prevents GAB from communicating with any other cluster systems,
therefore the system can’t be seeded.
1. VCS can’t distinguish between a system failure and interconnect (LLT) failure.
2. When a sys is so busy, it appears to be hung and seems to have failed
3. On systems where the hardware supports a break and resume function. If the sys is
dropped to command prompt level with a break and a subsequent resume, the system can
appear to have failed and cluster reformed; then the sys recovers and begins writing to
shared storage again
Loss of heartbeats leads to creation of network partition. Multiple nodes racing for control of
the coordinator disks.
1. LLT on node 1 informs GAB that it has not received a HB from node 2 within timeout period
2. GAB notifies fencing drives about cluster membership change on both the nodes. Both
nodes begin racing to gain control of CO disks. Node 1 reaches the first CO disk and ejects
node 2 keys. Both nodes can’t knock each other out simultaneously because SCSI command
tag queuing creates a stack of commands to process, so there is no chance of these 2 ejects
occurring at the same time. This means – only one system can win.
3. Node 1 also wins the race for second disk. Bcoz node 2 lost the first race, the fencing
driver algorithm causes node 2 to reread the CO disk keys a number of times before it tries to
eject the other system’s keys. This favours the winner of first disk node 1 to win the
remaining coordinator disks 2 and 3. Node 2 loses the race, calls kernel panic to shutdown
immediately and reboot.
4. Node 1 removes node 2 keys from data drives with multiple kernel threads.
5. When ejection is complete, the fencing driver hands off the GAB membership change to
HAD.
6. HAD then performs whatever failover operations are defined for SG that were running
on the departed system.
1. Node 2 fails, LLT finds out by HB time outs, LLT informs GAB, GAB informs HAD and HAD
informs fencing driver vxfen
2. Node 1 races to win all 3 CO disks and data disks by ejecting node 2 keys
3. vxfen informs VxVM to import required disk group
1. Node 2 fails, LLT finds out by HB time outs, LLT informs GAB, GAB informs HAD and HAD
informs fencing driver vxfen
2. Node 1 races to win all 3 CO disks and data disks by ejecting node 2 keys
3. vxfen informs VxVM to import required disk group"
A PENP occurs when the cluster interconnect is severed and a node subsequently reboots to
attempt to form a new cluster. After the node starts up, it is prevented from gaining control
of shared disks.
1. Cluster interconnect is severed. Node 1 is with its keys registered with CO disks
2. Node 2 starts up. Bcoz LLT is severed, GAB doesn’t know about node 1.
3. vxfen initializes on node 2, vxfen receives a list of current nodes in GAB membership (NO
node 1) and also reads the keys present on CO disks (node 1).
4. After comparing above, vxfen determines that a PENP exists and prints an error message
to the console. The fencing driver prevents HAD from starting, which in turn prevents VxVM
disk groups from coming online.
To enable node 2 to rejoin the cluster, repair the interconnect and restart node2.
I/O fencing uses the same key for all paths from a host. A single pre-empt and abort ejects a
host from all paths to storage.
1. Disks must support SCSI-III persistent reservation
2. Must be within a separate disk group used only for fencing
3. Do not store data on CO disks
4. Deport this DG from all the nodes permanently
5. Use the smallest possible LUN
6. Configure HW mirrors of CO disks, they can’t be replaced without stopping the cluster
1. Create a DG for CO disks with 3 CO disks. Initialize and add them to DG.
2. Verify that the array is configured properly and supports SCSI3.
vxfenadm –i disk_dev_path
3. Verify that the disk groups support SCSI-3
vxfentsthdw –g CO_DG
vxfentsthdw –rg DATADG
vxfentsthdw utility overwrites and destroys existing data on the disks by default. Use –r
(read-only).
4. Deport the CO_DG permanently
vxdg deport –g co_dg
vxdg import –t –g co_dg
vxdg deport –g co_dg
5. Create /etc/vxfendg as echo “vxfencoorddg” > /etc/vxfendg
6. Start fencing driver on each system using /etc/init.d/vxfen start. It creates vxfentab file
with a list of all paths to each CO disk.
7. Stop VCS on all systems. Do not use –force option. Stopping VCS deports DGs.
8. Set UseFence attribute to value SCSI3 (UseFence=SCSI3) in main.cf. You can’t set this
dynamically while cluster is running.
9. Start VCS on each system.
1. vxdisk –o alldgs list command no longer shows the DG that are imported on other systems
2. format command shows disks with a SCSI-3 reservation as type unknown
3. xvdg –C import DGNAME to determine if a DG is imported on the local system. The
command fails if the DG is deported
4. vxdisk list shows imported disks
The fencing driver must be stopped and restarted to populate vxfentab file with the updated
paths to the replaced CO disks. This is accomplished as below:
1. vxfen reads vxfendg file to obtain the name of the CO DG
2. Runs grep to create a list of each device name (path) in the CO DG
3. For each disk device in this list, run vxdisk list diskname and create a list of each device
that is in the enabled state
4. Write the list of enabled devices to the vxfentab file
This insures that any time a system is rebooted, the fencing driver reinitialized the vxfentab
file with up-to-date list fo all paths to the CO disks.
PIT – all nodes are fenced off. Node 1 fails first, node 0 fails before node 1 is repaired, node 1
is repaired and boots while node 0 is down, node 1 can’t access the CO disk bcoz node 0 keys
are still on the disk. To recover
1. Verify node 0 is actually down to prevent possible corruption
2. Verify systems currently registered with CO disks
vxfenadm –g all –f /etc/vxfentab
3. The o/p out this command identifies the keys registered with the CO disks
4. Clear all keys on the CO disks in addition to data disks
/opt/VRTSvcs/vxfen/bin/vxfenclearpre
5. Repair the faulted system.
6. Reboot all systems in the cluster.
Atomic means all systems receive update or all are back to the previous state.
8
- Agents communicates with had
- ‘had’ processes on each node communicates status information by way of GAB
- GAB determines cluster membership by monitoring heartbeats transmitted from each
system over LLT
Both show the information about interfaces upto 32 even though they are not present
physically. To remove them from the output, use exclude in llttab file.
/etc/llttab - sets node id, cluster id, links – sample file in /opt/VRTSllt dir. It will be
different on each node bcoz of different node id.
/etc/llthosts - maps node ids to system names mentioned in llttab and main.cf files. It
will be same on all nodes.
/etc/VRTSvcs/conf/sysname – If this file doesn’t present, VCS determines local host name
using uname which might be FQDN or a bit different than the one mentioned in main.cf and
llthosts. Its presence removes VCS dependency on unix for sys name.
During initial start up, VCS autodisables a SG until all its resources are probled on all system
in SysList that have GAB running. This prevents SG from starting on any system. This
protects against a situation where enough systems are running LLT and GAB to seed the
cluster, but not all systems have HAD running. VCS doesn’t know whether a service is
running on a system where HAD is not running.
It carries only heartbeat traffic for cluster membership and link state maintenance. The
freqeuency of HBs is reduced by half to minimize network overhead.
#hastop –all –force
#gabconfig –U
#lltconfig –U
#vi llttab, llthosts, sysname, gabtab file
#lltconfig –c
#gabconfig –c –n # (# is number of systems)
#hastart
http://eval.symantec.com/mktginfo/products/White_Papers/High_Availability/agent_dev_by_example.pdf
Parallel service will still need different IP addresses on each node. It is better to put a load balancer in
front of the cluster. It will distribute the load between parallel nodes.
By setting OnlineRetryLimit and ConfInterval using hatype
libvcsagfw.sl
# ls -al /usr/lib/libvcsagfw*
-rwxr-xr-x 1 bin bin 4584048 Aug 16 13:28 /usr/lib/libvcsagfw.1
lrwxr-xr-x 1 bin bin 21 Aug 25 14:31 /usr/lib/libvcsagfw.sl -> /usr/lib/libvcsagfw.1
Entry point is something that agents use to perform its 4 functions - online, offline, monitor, clean. Offline is
systematic, clean is abrupt.
had is agent manager. It checkes agents to know the status of resources. Hashadw monitors HAD &
viceversa and restarts if not running.
Failover SG runs only on one node, parallel runs on multiple nodes at the same time (eg Oracle RAC).
Data corrpution is not a danger for parallel SG.
Can't be brought online/offline. It is always needed and hence online. Eg NIC
Parent resource (mount) depends upon child resource (volume)
MountPoint, BlockDevice, FSType, FsckOpt (used when mount fails. It runs fsck with -y parameter and
mounts again), MountOpt (options like -ro)
LLT - handles kernel-to-kernel communication over the LAN heartbeat links
GAB - handles shared disk communicaiton and messaging between cluster members
VCS - handles management of services. It is started once nodes can communicate via LLT/GAB
/etc/llttab looks like:
set-node 0
link hme1 /dev/hme:1 - ether - -
lnl-lowpri hme0 /dev/hme:0 - ether - -
set-cluster 0
start
/etc/llthosts (links nodeid to systemnames) looks as below:
0 node1
1 node2
svcadm -t
boot -v
/var/svc/log
Fault Management Resouce Identifier
There are 7: application, device, network, milestone, platform, site, system
Booting first time, services listed in /etc/inetd.conf are automatically converted into SMF services.
The syntax for a converted inetd services is: network/<service name>/<protocol>. For rpc protocol
it is network/rpc-<servicename>/rpc-protocol. Here service name is teh name defined in
/etc/inetd.conf.
degraded - enabled but running with limited capacity
disabled - instance is disabled and not running
legacy_run - Legacy services are not managed but are only observed by SMF. State is only for
legacy services
maintenance - service instance has encountered an error
offline - service instance is enabled, but service is not running yet
online - service instance is disabled, and is started successfully
uninitialized - state is the initial state for all services before their config has been read
It is an xml file that contains a complete set of properties that are associated with a service or a
service instance. These files are stored in /var/svc/manifest. Manifests are read into service
configuration repository which is the authoritative source of configuration information. Manifest files
should not be edited directly.
It is an xml file that lists a set of services that are enabled when a system is booted. It is stored
in /var/svc/profile.
generic_open.xml - It enables most of the standard internet services that have been enabled by
default in earlier Solaris. It is the default profile.
Generic_limited_net.xml - It disables many of the standard internet services. The sshd service and
NFS services are started but most of the rest of teh internet services are disabled.
IT stores persistent configuration information as well as SMF runtime data for services. It is
distributed among local memory and local files.
It is service configuration repository daemon - svc.configd.
1. Boot backup is taken immediately before the changes to the repository is made during each system startup.
2. Manifest_import backup occurs after svc:/system/manifest-import:default completes, if it imported any new
manifests or ran any upgrade scripts.
Four backups of each type are maintained by the system. They are stored as /etc/svc/repository-type-
YYYY<<DD_HHMMSS
Use /lib/svc/bin/restore_repository command
The data in service include snapshots (data about each service) as well as a configuration that can
be edited. The standard snapshots are:
initial - taken on the first import of the manifest
running - used when the service methods are executed
start - taken at the last successful start
Service always executes with running snapshot. It is automatically created if it does not exist.
svccfg
inetadm - observe and configure services controlled by inetd
svcadm - perform common service management tasks such as enabling/disabling/restarting
svccfg - display/manipulate the contents of the service configuration repository
svcprop - retrieves property values from the service configuration repository with a output format
appropriate for use in all shell scripts
svcs - gives details view of the service state of all service instances
Master restarter daemon (svc.startd) and delegated restarters
svc.startd is the master process starter and restarter. It is responsible for managing service
dependencies for the entire system. It does the same what init did (starting appropriate /etc/rc*.d
scripts at the appropriate run levels). First it retrieves the information in service configuration
repository. Next, the daemon starts services when their dependencies are met. It also restarts
services that have failed and shuts down services whose dependencies are no longer satisfied.
Delegated restarters takes the responsibility of managing those services who have common behaviour. It can
be used to provide more complex or app specific restarting behaviour. A current example is inetd which starts
services on demand rather than having them always running.
#svcadm milestone all
boot -m verbose
/etc/svc/repository.db
svc.startd and svc.configd
It is a logical entity that owns a subset of the system resources, like CPU/Mem. These subsets are
known as resource sets. Currently, there is only one type of resource set - a processor set. so if you
want to give a pool its own unique CPUs, you will need to define the processor set, the number of
processors it contains, and associate it with a pool.
All CPUs are initially a part of default resource pool. They are taken out of DRP when they are
allocated to other dynamically created DRP. You must have at least one CPU for default pool.
Resource sets contain processors.
Resource sets are attached to resource pool.
Resource pools are attached to zone.
A container contains zones and its resource pools.
FSS is used when a single resource pool is shared by more than one zone. It assures the allocation
of CPU resources proportionally to gurantee a minimum requirement.
There are 2 types - global zone and non-global zone. The global zone is the original Solaris OS instance. It has
access to the physical hardware and can control all processes. It also has the authority to create and control
new zones, called non-global zones, in which applications run. Non-global zones do not run inside the global
zone—they run along side it—yet the global zone can look inside non-global zones to see how they are
configured, monitor, and control them.
Files and directories from global zone are not writable from non-global zones. They have to be mounted in a
different ways to be writable from within non-global zone.
#zoneadm list -vc or
#zoneadm -z zonename info
There are 4 steps involved - creation, configuration, installation, reboot
a. #zonecfg –z zonename - Enter zone configuration mode
b. #zonecfg:zonename > create
> set zonepath=/zone/1
> set autoboot=true
> add net
c. #zonecfg:zonename:net > set address= 192.1.1.1
> set physical=hme1
>end
d. #zonecfg:zonename > info
> verify
> commit
> ^D
~.
#zlogin –C zonename (zonename appears as hostname)
create 2 default gateway files in global zone (set 2 default gateways)
/etc/zone
# metaset -s diskset-name -t -f
-s diskset-name Specifies the name of a disk set to take.
-t Specifies to take the disk set.
-f Specifies to take the disk set forcibly.
host1# metaset -s blue -t
# metaset -s diskset-name -r
# metaimport -r -v (verify the diskset is available for import)
# metaimport -s diskset-name disk-name (import the available diskset)
What is equivalent of heartbit?
private interconnect
VM
What are different disk types Auto/sliced/cds/simple/ none?
What is read-writeback procedure?
Talk through the steps of Veritas Volume Manager
installation for a rootdisk and rootmirror.
What are the requirements for root disk encapsulation?