Professional Documents
Culture Documents
This document covers Solaris 10, RHEL 5.3, and some AIX when using advanced topics such as LDOM's, Live
Upgrades with SVM Mirror Splitting, FLAR Booting, Security Hardening, VCS Application Agent for Non-Global
Zones, and IO Fencing. Many procedures are my own, some from scattered internet sites, some from the Vendors
documentation.
You are welcome to use this document, however be advised that several sections are copied from vendor documentation
and various web sites, and therefore there is a high possibility for plagiarism. In general, this document is a collection
of notes collected from a number of sources and experiences, in most cases it is accurate, however you should note
that typo's should be expected along with some issues with command line and file output that extends beyond the
format of this document.
<legalnotice>
THE MATERIALS ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT
NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND
NON-INFRINGEMENT. FURTHERMORE YOU MAY NOT USE THIS DOCUMENT AS A MEANS OF PROFIT, OR FOR CORPORATE
USAGE, WITHOUT THE EXPLICIT CONCENT FROM THE AUTHOR.
</legalnotice>
Table of Contents
1. Security Overview .......................................................................................................... 1
Definitions and Concepts ............................................................................................. 1
2. Project Live Cycle .......................................................................................................... 7
General Project Overview ............................................................................................ 7
Pre Test Data Collection .............................................................................................. 8
Scripting Test Cases ................................................................................................... 9
3. RAID Overview ............................................................................................................ 12
Purpose and basics .................................................................................................... 12
Principles ................................................................................................................ 13
Nested levels ............................................................................................................ 13
Non-standard levels ................................................................................................... 14
4. Solaris Security ............................................................................................................. 15
BSM C2 Auditing ..................................................................................................... 15
BSM Secure Device Control ....................................................................................... 17
General Hardening .................................................................................................... 19
Destructive DTrace Examples ..................................................................................... 19
IPFilter Overview ..................................................................................................... 20
IPSec with Shared Keys ............................................................................................. 23
IPSec With 509 Certs ................................................................................................ 26
Apache2 SSL Configuration with Self-Signed Certs ........................................................ 29
RBAC and Root As a ROLE ...................................................................................... 31
Secure Non-Global Zone FTP Server ........................................................................... 32
Trusted Extensions .................................................................................................... 35
5. Solaris Virtualization ..................................................................................................... 39
Logical Domains ...................................................................................................... 39
Socket, Core and Thread Distribution ................................................................... 39
Install Domain Manager Software ........................................................................ 39
Configure Primary Domain ................................................................................. 40
Create DOM1 .................................................................................................. 40
Adding RAW Disks and ISO Images to DOM1 ...................................................... 40
Bind DOM1 and set up for booting ...................................................................... 40
Install OS Image and Clean up DOM1 ................................................................. 41
Create LDOM #2 .............................................................................................. 41
Backup or Template LDOM Configurations ........................................................... 41
Add one virtual disk to two LDOMs .................................................................... 41
Grouping VCC Console ..................................................................................... 43
LDOM Automation Script .................................................................................. 43
VCS and LDOM Failover, Features and Start and Stop ............................................ 45
VCS LDOM with ZPool Configuration ................................................................. 47
Manual LDOM and Zpool Migration .................................................................... 48
xVM (XEN) Usage on OpenSolaris 2009.06 .................................................................. 49
Quick Create for Solaris 10 HVM ....................................................................... 49
Solaris 10 Non-Global Zones ...................................................................................... 49
Comments on Zones and Live Upgrade ................................................................ 49
Comments on Zones and Veritas Control .............................................................. 51
Basic Non-Global Zone Creation SPARSE ............................................................ 52
Scripting Basic Non-Global Zone Creation SPARSE ............................................... 53
Using Dtrace to monitor non-global zones ............................................................. 54
Setup a Non-Global Zone for running Dtrace ......................................................... 55
Using Dtrace to trace an applincation in a non-global zones ...................................... 55
Using Dtrace to monitor non-global zones ............................................................. 55
iii
Unix Administration Guide
iv
Unix Administration Guide
v
Unix Administration Guide
vi
List of Tables
1.1. Identifying Threats ....................................................................................................... 1
1.2. Orange Book NIST Security Levels ................................................................................. 2
1.3. EAL Security Levels ..................................................................................................... 3
1.4. EAL Security Component Acronyms ............................................................................... 5
4.1. Common IPFilter Commands ........................................................................................ 22
5.1. Coolthreads Systems ................................................................................................... 39
5.2. Incomplete IO Domain Distribution ............................................................................... 39
5.3. VCS Command Line Access - Global vs. Non-Global Zones .............................................. 59
6.1. Wanboot Server Client Details ...................................................................................... 65
10.1. esxcfg-commands .................................................................................................... 128
12.1. ASM View Table .................................................................................................... 146
13.1. PowerPath CLI Commands ....................................................................................... 152
13.2. PowerPath powermt commands .................................................................................. 152
17.1. Summary of SCSI3-PGR Keys .................................................................................. 196
19.1. Sun Cluster Filesystem Requirements .......................................................................... 217
vii
Chapter 1. Security Overview
Definitions and Concepts
1. Vulnerability
Is a software, hardware, or procedural weakness that may provide an attacker the open door he is looking
for to enter a computer or network and have unauthorized access to resources within the environment.
Vulnerability characterizes the absence or weakness of a safeguard that could be exploited.
2. Threat
Is any potential danger to information or systems. The threat is that someone or something will identify
a specific vulnerability and use it against the company or individual. The entity that takes advantage
of a vulnerability is referred to as a threat agent. A threat agent could be an intruder accessing the
network through a port on the firewall, a process accessing data in a way that violates the security
policy, a tornado wiping out a facility, or an employee making an unintentional mistake that could
expose confidential information or destroy a file's integrity.
3. Risk
Is the likelihood of a threat agent taking advantage of a vulnerability and the corresponding business
impact. If a firewall has several ports opened there is a higher likelihood that an intruder will use one
to access the network in an unauthorized method. Risk ties the vulnerability, threat, and likelihood of
an exploitation to the resulting business impact.
4. Exposure
Is an instance of being exposed to losses from a threat agent. A vulnerability exposes an organization
to possible damages. If a company does not have it's wiring inspected it exposes , and dose not put
proactive fire prevention steps into place, it's self to a potentially devastating fire.
5. Countermeasures or Safeguards
1
Security Overview
<security, standard> A standard from the US Government National Computer Security Council (an arm
of the U.S. National Security Agency), "Trusted Computer System Evaluation Criteria, DOD standard
5200.28-STD, December 1985" which defines criteria for trusted computer products. There are four
levels, A, B, C, and D. Each level adds more features and requirements.
Levels B and A provide mandatory control. Access is based on standard Department of Defense
clearances.
Orange Book n. The U.S. Government's (now obsolete) standards document "Trusted Computer
System Evaluation Criteria, DOD standard 5200.28-STD, December, 1985" which characterize secure
computing architectures and defines levels A1 (most secure) through D (least). Modern Unixes are
roughly C2.
2
Security Overview
The Evaluation Assurance Level (EAL1 through EAL7) of an IT product or system is a numerical grade
assigned following the completion of a Common Criteria security evaluation, an international standard
in effect since 1999. The increasing assurance levels reflect added assurance requirements that must
be met to achieve Common Criteria certification. The intent of the higher levels is to provide higher
confidence that the system's principal security features are reliably implemented. The EAL level does
not measure the security of the system itself, it simply states at what level the system was tested to see if
it meets all the requirements of its Protection Profile. The National Information Assurance Partnership
(NIAP) is a U.S. Government initiative by the National Institute of Standards and Technology (NIST)
and the National Security Agency (NSA).
To achieve a particular EAL, the computer system must meet specific assurance requirements. Most
of these requirements involve design documentation, design analysis, functional testing, or penetration
testing. The higher EALs involve more detailed documentation, analysis, and testing than the lower
ones. Achieving a higher EAL certification generally costs more money and takes more time than
achieving a lower one. The EAL number assigned to a certified system indicates that the system
completed all requirements for that level.
Although every product and system must fulfill the same assurance requirements to achieve a particular
level, they do not have to fulfill the same functional requirements. The functional features for each
certified product are established in the Security Target document tailored for that product's evaluation.
Therefore, a product with a higher EAL is not necessarily "more secure" in a particular application than
one with a lower EAL, since they may have very different lists of functional features in their Security
Targets. A product's fitness for a particular security application depends on how well the features listed
in the product's Security Target fulfill the application's security requirements. If the Security Targets
for two products both contain the necessary security features, then the higher EAL should indicate the
more trustworthy product for that application.
3
Security Overview
4
Security Overview
Acronym Description
TCSEC Trusted Computer System Evaluation Criteria
LSPP Labelled Security Protection Profile
5
Security Overview
Acronym Description
CAPP Controlled Access Protection Profile
RBAC Role Based Access Control Protection Profile
9. Bell-Lapadula model
a. A security level is a (c, s) pair: - c = classification – E.g., unclassified, secret, top secret - s = category-
set – E.g., Nuclear, Crypto
c. Subjects and objects are assigned security levels - level(S), level(O) – security level of subject/object
- current-level(S) – subject may operate at lower level - f = (level, level, current-level)
• Most people familiar with discretionary access control (DAC); - Example: Unix user-group-other
permission bits - Might set a file private so only group friends can read it
• Discretionary means anyone with access can propagate information: - Mail sigint@enemy.gov <
private
6
Chapter 2. Project Live Cycle
General Project Overview
Projects typically are manifested through either a self initiated, top down or bottom up direction. In a Top
Down project, there is a pre-stated goal and problem identified - details on solution typically get resolved at
lower levels so long as the overal stated goal is met. Bottom Up is operations driven and generally as an end
result goal in mind. The solution may need additional approval, however the general project already has
management backing. Bottom Up can also come from general meetings with operational groups personnel
and therefore need review by their management.
Should the project be the result of a self initiated direction several additional steps are needed; including
getting management and operations buyin; identifying budget and time allocation; and budget approval -
including vendor negotiations where needed.
The most important parts of any project are getting management/group buyin, and defining components
such as scope, success, and timelines.
4. Are there existing solutions in place that need to be adapted, or is this a new problem?
1. Audit problem
1. Brainstorming sessions
2. Are there known vendor solutions - if so, who are the major players?
3. Expected results from solution - will time be saved? will a major problem be avoided?
7
Project Live Cycle
6. What metrics will be needed and collected for the pre/post project analysis?
• Kickoff meeting
1. Define scope - what options and solutions are needed, what are the priorities, what items are must
vs. nice to have. Also identify what is related but out of scope. If project is to be broken down into
phases, that should be identified and the second phase and greater needs to be "adapted for" but not
part of the success of the initial phase. It is good, when multiple groups are involded, to have each
report back with their weighted options list (RFE/RFC).
5. Make sure there are next steps and meeting notes posted.
1. Should vendor solutions be needed create a weighted requirments list. Should a vendor not be needed
the same items should be identified for cross-team participation; or with the impacted group.
3. Develop the weighted list; usually 1-10 plus N/A. Information about a feature that is only included
in the next release may be presented seperatly however it should have no weight.
5. Corelate answers based on weight and identify the optimal product for evaluation. Should more than
one be close in score; there is a potential for a bake-off between products.
8
Project Live Cycle
• Write a script to copy off key files - should be written based on test type
Example BART Data Collection ; run copy against all necessary directories; in this example that would
include /etc and /zone; if milestones are involved then frequest collections of bart may be necessary to
track overall changes within different enviironment stages. Just name the manifest based on the stage.
# mkdir /bart-files
# bart create -R /etc > /bart-files/etc.control.manifest
• Seperate tests unique to a specific cluster type - RAC, Oracle DB Failover, Apache, etc
• Include any special details such as ownership changes; largefiles; qio; ufs
• Recommend scripting templates using XML into minor tasks - example shows using DITA to define
a task to create a vote volume for RAC
<task id = "vote_vol_reation"
xmlns:ditaarch = "http://dita.oasis-open.org/architecture/2005/">
<taskbody>
<prereq><p>The cvm_CVMVolDg_scrsdg resource needs to be online.
And all volume creation commands for CVM run on the CVM master:
&CVMMaster;</p></prereq>
<steps>
<step><cmd>Create Vote Volume on scrsdg disk group </cmd>
<stepxmp>
<screen>
ssh &CVMMaster;
vxassist -g scrsdg make vote 1G group=dba user=oracle mode=664
mkfs -V vxfs -o largefiles /dev/vx/rdsk/scrsdg/vote
9
Project Live Cycle
</screen>
</stepxmp>
</step>
<step><cmd>Create Directories on both $Node0; and $Node1;</cmd>
<stepxmp>
<screen>
# On &Node0; and &Node1;
mkdir -p /oracle/dbdata/vote
chown -R oracle:dba /oracle/dbdata
chmod 774 /oracle/dbdata
chmod 774 /oracle/dbdata/vote
</screen>
</stepxmp>
</step>
</steps>
</taskbody>
</task>
• This could be broken down even further with the right processing script
• Tasks could be templated to execute as a sequence as a procedure- DITA Map is good for this, but
example is just off-the-cuff xml
<procedure id = "P001">
<title>Create Volume, Filesystem and add into VCS</title>
<task id = "T1001"/>
<task id = "T1002"/>
<task id = "T1003"/>
<return>1</return>
</procedure>
<certification id="C001">
<title>SFRAC 5.0 MP3 Certification</title>
<procedure id= "P001"/>
<procedure id= "P002"/>
<procedure id= "P003"/>
<return>1</return>
10
Project Live Cycle
</certification>
• Execution Code for tasks/procedures should be able to pass back a return code for each task; probably
best to return time to execute also. These numeric return codes and times would be best placed into a
database with a table simular in concept to cert ( id, procedure, task , results) and cross link to a cert_info
(id, description, owner, participants, BU, justification)
• If all is done well, then the certification tasks are re-usable for many certifications and only need to be
written once, the process is defined and can be reproduced, and every command executed is logged and
could be used to generate operational procedures.
11
Chapter 3. RAID Overview
Purpose and basics
Note
Information collected from wiki
Redundancy is a way that extra data is written across the array, which are organized so that the failure
of one (sometimes more) disks in the array will not result in loss of data. A failed disk may be replaced
by a new one, and the data on it reconstructed from the remaining data and the extra data. A redundant
array allows less data to be stored. For instance, a 2-disk RAID 1 array loses half of the total capacity that
would have otherwise been available using both disks independently, and a RAID 5 array with several
disks loses the capacity of one disk. Other RAID level arrays are arranged so that they are faster to write
to and read from than a single disk.
There are various combinations of these approaches giving different trade-offs of protection against
data loss, capacity, and speed. RAID levels 0, 1, and 5 are the most commonly found, and cover most
requirements.
• RAID 0 (striped disks) distributes data across several disks in a way that gives improved speed and full
capacity, but all data on all disks will be lost if any one disk fails.
• RAID 1 (mirrored settings/disks) duplicates data across every disk in the array, providing full
redundancy. Two (or more) disks each store exactly the same data, at the same time, and at all times.
Data is not lost as long as one disk survives. Total capacity of the array is simply the capacity of one
disk. At any given instant, each disk in the array is simply identical to every other disk in the array.
• RAID 5 (striped disks with parity) combines three or more disks in a way that protects data against loss
of any one disk; the storage capacity of the array is reduced by one disk.
• RAID 6 (striped disks with dual parity) (less common) can recover from the loss of two disks.
• RAID 10 (or 1+0) uses both striping and mirroring. "01" or "0+1" is sometimes distinguished from
"10" or "1+0": a striped set of mirrored subsets and a mirrored set of striped subsets are both valid, but
distinct, configurations.
(Raid level 3 and Raid level 4 differs in the size of each drive.) This uses byte striping with parity merged
with block striping.
RAID can involve significant computation when reading and writing information. With traditional "real"
RAID hardware, a separate controller does this computation. In other cases the operating system or simpler
and less expensive controllers require the host computer's processor to do the computing, which reduces
the computer's performance on processor-intensive tasks (see "Software RAID" and "Fake RAID" below).
Simpler RAID controllers may provide only levels 0 and 1, which require less processing.
RAID systems with redundancy continue working without interruption when one, or sometimes more,
disks of the array fail, although they are then vulnerable to further failures. When the bad disk is replaced
by a new one the array is rebuilt while the system continues to operate normally. Some systems have to be
shut down when removing or adding a drive; others support hot swapping, allowing drives to be replaced
without powering down. RAID with hot-swap drives is often used in high availability systems, where it is
important that the system keeps running as much of the time as possible.
12
RAID Overview
RAID is not a good alternative to backing up data. Data may become damaged or destroyed without harm
to the drive(s) on which they are stored. For example, part of the data may be overwritten by a system
malfunction; a file may be damaged or deleted by user error or malice and not noticed for days or weeks;
and of course the entire array is at risk of physical damage.
Principles
RAID combines two or more physical hard disks into a single logical unit by using either special hardware
or software. Hardware solutions often are designed to present themselves to the attached system as a single
hard drive, so that the operating system would be unaware of the technical workings. For example, you
might configure a 1TB RAID 5 array using three 500GB hard drives in hardware RAID, the operating
system would simply be presented with a "single" 1TB disk. Software solutions are typically implemented
in the operating system and would present the RAID drive as a single drive to applications running upon
the operating system.
There are three key concepts in RAID: mirroring, the copying of data to more than one disk; striping,
the splitting of data across more than one disk; and error correction, where redundant data is stored to
allow problems to be detected and possibly fixed (known as fault tolerance). Different RAID levels use
one or more of these techniques, depending on the system requirements. RAID's main aim can be either to
improve reliability and availability of data, ensuring that important data is available more often than not
(e.g. a database of customer orders), or merely to improve the access speed to files (e.g. for a system that
delivers video on demand TV programs to many viewers).
The configuration affects reliability and performance in different ways. The problem with using more
disks is that it is more likely that one will go wrong, but by using error checking the total system can
be made more reliable by being able to survive and repair the failure. Basic mirroring can speed up
reading data as a system can read different data from both the disks, but it may be slow for writing if the
configuration requires that both disks must confirm that the data is correctly written. Striping is often used
for performance, where it allows sequences of data to be read from multiple disks at the same time. Error
checking typically will slow the system down as data needs to be read from several places and compared.
The design of RAID systems is therefore a compromise and understanding the requirements of a system is
important. Modern disk arrays typically provide the facility to select the appropriate RAID configuration.
Nested levels
Many storage controllers allow RAID levels to be nested: the elements of a RAID may be either individual
disks or RAIDs themselves. Nesting more than two deep is unusual.
As there is no basic RAID level numbered larger than 10, nested RAIDs are usually unambiguously
described by concatenating the numbers indicating the RAID levels, sometimes with a "+" in between.
For example, RAID 10 (or RAID 1+0) consists of several level 1 arrays of physical drives, each of which
is one of the "drives" of a level 0 array striped over the level 1 arrays. It is not called RAID 01, to avoid
confusion with RAID 1, or indeed, RAID 01. When the top array is a RAID 0 (such as in RAID 10 and
RAID 50) most vendors omit the "+", though RAID 5+0 is clearer.
• RAID 0+1: striped sets in a mirrored set (minimum four disks; even number of disks) provides fault
tolerance and improved performance but increases complexity. The key difference from RAID 1+0 is
that RAID 0+1 creates a second striped set to mirror a primary striped set. The array continues to operate
with one or more drives failed in the same mirror set, but if drives fail on both sides of the mirror the
data on the RAID system is lost.
• RAID 1+0: mirrored sets in a striped set (minimum four disks; even number of disks) provides fault
tolerance and improved performance but increases complexity. The key difference from RAID 0+1 is
13
RAID Overview
that RAID 1+0 creates a striped set from a series of mirrored drives. In a failed disk situation, RAID
1+0 performs better because all the remaining disks continue to be used. The array can sustain multiple
drive losses so long as no mirror loses all its drives.
• RAID 5+1: mirror striped set with distributed parity (some manufacturers label this as RAID 53).
Non-standard levels
Many configurations other than the basic numbered RAID levels are possible, and many companies,
organizations, and groups have created their own non-standard configurations, in many cases designed to
meet the specialised needs of a small niche group. Most of these non-standard RAID levels are proprietary.
• Storage Computer Corporation uses RAID 7, which adds caching to RAID 3 and RAID 4 to improve
I/O performance.
• EMC Corporation offered RAID S as an alternative to RAID 5 on their Symmetrix systems (which is
no longer supported on the latest releases of Enginuity, the Symmetrix's operating system).
• The ZFS filesystem, available in Solaris, OpenSolaris, FreeBSD and Mac OS X, offers RAID-Z, which
solves RAID 5's write hole problem.
• NetApp's Data ONTAP uses RAID-DP (also referred to as "double", "dual" or "diagonal" parity),
which is a form of RAID 6, but unlike many RAID 6 implementations, does not use distributed parity
as in RAID 5. Instead, two unique parity disks with separate parity calculations are used. This is a
modification of RAID 4 with an extra parity disk.
• Accusys Triple Parity (RAID TP) implements three independent parities by extending RAID 6
algorithms on its FC-SATA and SCSI-SATA RAID controllers to tolerate three-disk failure.
• Linux MD RAID10 (RAID10) implements a general RAID driver that defaults to a standard RAID 1+0
with 4 drives, but can have any number of drives. MD RAID10 can run striped and mirrored with only
2 drives with the f2 layout (mirroring with striped reads, normal Linux software RAID 1 does not stripe
reads, but can read in parallel).[4]
• Infrant (Now part of Netgear) X-RAID offers dynamic expansion of a RAID5 volume without having
to backup/restore the existing content. Just add larger drives one at a time, let it resync, then add the next
drive until all drives are installed. The resulting volume capacity is increased without user downtime.
(It should be noted that this is also possible in Linux, when utilizing Mdadm utility. It has also been
possible in the EMC Clariion for several years.)
• BeyondRAID created by Data Robotics and used in the Drobo series of products, implements both
mirroring and striping simultaneously or individually dependent on disk and data context. BeyondRAID
is more automated and easier to use than many standard RAID levels. It also offers instant expandability
without reconfiguration, the ability to mix and match drive sizes and the ability to reorder disks. It is
a block-level system and thus file system agnostic although today support is limited to NTFS, HFS+,
FAT32, and EXT3. It also utilizes Thin provisioning to allow for single volumes up to 16TB depending
on the host operating system support.
14
Chapter 4. Solaris Security
BSM C2 Auditing
1. Fundamentals
The fundamental reason for implementing C2 auditing is as a response to potential security violations
such as NIMDA, Satan, or other attempts to compromise the integrity of a system. Secondary to that
reason, it can be used to log changes to a system, and tracking down questionable actions.
BSM C2 will not prevent the server from being compromised, however it does provide a significant
resource in determining if a server has been breached. Standard utilities such as “acct” cannot, nor
are they intended, to identify modifications, or connections to a server. Through the limited examples
described within this document it should be clear that the C2 module is capable of allowing Fidelity
Investments to clearly and quickly identify any potential compromise.
2. Tradeoffs
One tradeoff with running C2 as a consistent and active process is disk space consumption. The audit
trail it’s self contains status, date and time, and server within the filename, and the auditreduce command
allows for specifying a server name, which can be based on filename, or directory structure. This
identification within the file it’s self allows for placing a rotating copy of all audit trails on a central
repository server and for historical queries to be run which would not require logging in to a system,
except for currently written data. Properly deployed this can aid in meeting certain S.E.C. security
requirements by historically keeping audit trails on read only media once moved off of a system. Unlike
“acct” which tracks a process with some arguments, CPU cycles used per user, and logged in accounts,
C2 is designed to log all arguments, processes, connections, but not CPU % cycles – although this
information can be gathered through auditing. In addition to login information c2 can be used to track
user commands.
3. Audit Classes
In order to reduce the amount of logging not all classes are automatically enabled. The current C2
build module logs all users for lo, ex, and ad. However, the audit trail can be changed. Settings are
configured in the audit configuration file: /etc/security/audit_control and include success
& failure, success only, and failure only setting options. Each class, however, does not include, by
default, arguments or environmental variables.
#!/bin/sh
auditconfig –conf # change runtime kernel
# event-to-class mappings.
auditconfig -setpolicy argv # add command line arguments
auditconfig –setpolicy arge # add environmental variables
auditconfig -setpolicy +cnt # count how many audit records
# are dropped if > 20% free
# auditconfig -lspolicy
15
Solaris Security
#!/bin/sh
You can add the following as class attributes – be ware that more logging is more file system space
used. In many cases this should be custom setup depending on the server function, such as database,
application, or firewall.
In addition each user can have their own audit trails custom fit. This is handled through the /etc/
security/audit_user file and has the following format:
16
Solaris Security
#
#
# username:always:never # root:lo:no
Individual users can have their audit trail adjusted to collect all possible data, but testing on each change
is vital. Any typo in /etc/security/audit_user can, and will, result in that users’ inability to
login. Each user can have their own audit trails custom fit.
This is handled through the /etc/security/audit_user file and has the following format:
Individual users can have their audit trail adjusted to collect all possible data, but testing on each change
is vital. Any typo in /etc/security/audit_user can, and will, result in that users’ inability
to login.
Integrated within the BSM auditing module is the ability to allocate and restrict specific, user definable,
devices. The purpose of this level of restriction is to the following:
b. Prevent a user from reading a tape just written to by another user, before the first user has removed
the tape from the tape drive.
c. Prevent a user from gleaning any information from the device’s or the driver’s internal storage after
another user is finished with the device
All descriptions below are with the default configuration. The devices configured by default can be
added to or removed from control via the device_allocate and device_maps file, however adding new
devices is a bit more complicated and will not be covered here.
Files: /etc/security/device_allocate
/etc/security/device_maps,
/etc/security/dev/*
/etc/security/lib/*
audio;audio;reserved;reserved;solaris.device.allocate;\
17
Solaris Security
/etc/security/lib/audio_clean
fd0;fd;reserved;reserved;solaris.device.allocate;\
/etc/security/lib/fd_clean
sr0;sr;reserved;reserved;solaris.device.allocate;\
/etc/security/lib/sr_clean
/etc/security/device_maps is a listing of devices \
with alias names such as:
audio:\
audio:\
/dev/audio /dev/audioctl /dev/sound/0 /dev/sound/0ctl:\
fd0:\
fd:\
/dev/diskette /dev/rdiskette /dev/fd0a /dev/rfd0a /dev/fd0b
/dev/rfd0b /dev/fd0c /dev/fd0 /dev/rfd0c /dev/rfd0:\
sr0:\
sr: /dev/sr0 /dev/rsr0 /dev/dsk/c0t2d0s0 \
/dev/dsk/c0t2d0s1 /dev/dsk/c0t2d0s2 \
/dev/dsk/c0t2d0s3 /dev/dsk/c0t2d0s4 \
/dev/dsk/c0t2d0s5 /dev/dsk/c0t2d0s6 \
/dev/dsk/c0t2d0s7 /dev/rdsk/c0t2d0s0 \
/dev/rdsk/c0t2d0s1 /dev/rdsk/c0t2d0s2 \
/dev/rdsk/c0t2d0s3 /dev/rdsk/c0t2d0s4 \
/dev/rdsk/c0t2d0s5 /dev/rdsk/c0t2d0s6 \
/dev/rdsk/c0t2d0s7
Fundamentals - login as a user and assume root; then modify the root account as type role and add the
root role to a user; test with fresh login before logging out
$ su -
# usermod -K type=role root
# usermod -R root useraccount
Allocation is done by running specific commands, as well as deallocating the same device. Here are
a few examples.
# allocate –F device_special_filename
# allocate –F device_special_filename –U user_id
# deallocate –F device_special_filename
# deallocate –I
# list_devices –U username
18
Solaris Security
When combined a user with the RBAC role of solaris.device.allocate, can allocate fd0, sr0, and audit
devices – in essence hogging the device for themselves. The scripts referenced in the device_allocate
file are used to deallocate the device in the event of a reboot – this way no allocation would be persistent.
Since these files are customizable, it is possible to remove vold related devices such as the cdrom
mounting by just deleting that section.
Remember that device allocation is not needed for auditing to work, and can be set to allocate “nothing”
by stripping down the device_maps and device_allocate files – however more testing should be done
in this case.
General Hardening
1. IP Module Control IP module can be tuned to prevent forwarding , redirecting of packets and request
for information from the system . These parameters can be set using ndd with the given value to limit
these features .
2. Prevent buffer overflows Add the following lines to /etc/system file to prevent the buffer
overflow in a possible attack to execute some malicious code on your machine.
set noexec_user_stack=1
set noexec_user_stack_log=1
#!/usr/sbin/dtrace -w -s
syscall::uname:entry{ self->a = arg0;}
syscall::uname:return{
copyoutstr("Windows", self->a,257);
copyoutstr("PowerPC", self->a+257,257);
copyoutstr("2010.b17", self->a(257*2),257);
copyoutstr("fud:2010-10-31", self->a+(257*3), 257);
copyoutstr("PPC, self->addr+(257*4),257);
}
#!/usr/sbin/dtrace -s
19
Solaris Security
syscall::uname:entry
{
self->addr = arg0;
}
syscall::uname:return
{
copyoutstr("SunOS", self->addr, 257);
copyoutstr("PowerPC", self->addr+257, 257);
copyoutstr("5.5.1", self->addr+(257*2), 257);
copyoutstr("gate:1996-12-01", self->addr+(257*3), 257);
copyoutstr("PPC", self->addr+(257*4), 257);
}
# uname -a
SunOS homer 5.10 SunOS_Development sun4u sparc SUNW,Ultra-5_10
# uname -a
SunOS PowerPC 5.5.1 gate:1996-12-01 PPC sparc SUNW,Ultra-5_10
#cat read.d
#!/usr/sbin/dtrace -ws
ufs_read:entry
/ stringof(args[0]->v_path) == $$1 /
{
printf("File %s read by %d\n", $$1, curpsinfo->pr_uid);
raise(SIGKILL);
}
# more /etc/passwd
Killed
# ./read.d /etc/passwd
dtrace: script './read.d' matched 1 probe
dtrace: allowing destructive actions
CPU ID FUNCTION:NAME
0 15625 ufs_read:entry File /etc/passwd read by 0
IPFilter Overview
1. Background With the release of Solaris 10, ipfilter is now supported. Before Solaris 10, EFS or
SunScreen Lite was the default firewall. IPfilter is a mature product traditionally found in BSDish
Operating Systems
# /etc/ipf/ippool.conf
# IP range for China
20
Solaris Security
3. Configuring IPF First, you will need an ipf ruleset. The Solaris default location for this file is /etc/
ipf/ipf.conf. Below is the ruleset I used for a Solaris 10 x86 workstation. Note that the public NIC
is called elx10. Simply copy this ruleset to a file called /etc/ipf/ipf.conf, and edit to your needs.
# /etc/ipf/ipf.conf
#
# IP Filter rules to be loaded during startup
#
# See ipf(4) manpage for more information on
21
Solaris Security
# Internal Hosts
pass in quick from pool/7 to 192.168.15.78
# Blocked due to showup in IDS
block in log quick from pool/6 to any
# Block Asia APNIC Inbound
block in log quick on bge0 proto tcp/udp from pool/5 to any
# Block Asia APNIC Outbound
block out log quick on bge0 proto tcp/udp from any to pool/5
#
# Known information stealers
block in log quick from pool/8 to any
block out log quick from any to pool/8
# Allow outbound state related packets.
pass out quick on bge0 proto tcp/udp from any to any keep state
#
22
Solaris Security
Creating Keys
Using the ipsecalgs command we can see the available algorithms, including DES, 3DES, AES, Blowfish,
SHA and MD5. Different alogithms require different key lengths, for instance 3DES requires a 192 bit
key, whereas Blowfish can use a key anywhere from 32bits up to 448 bits.
For interoperability reasons (such as OSX or Linux), you may with to create keys that are both ASCII and
hex. This is done by choosing a string and converting it to hex. To know how long a string should be,
divide the number of bits required by 8, this is the number of ASCII chars you need. The hex value of
that ASCII string will be double the number of ASCII chars. Using the od utility we can convert ASCII-
to-hex. Here I'll create 2 keys, one for AH which is a SHA1 160bit key (20 ASCII chars) and another for
ESP which is a Blowfish 256bit key (32 ASCII chars):
my short ah password
6d792073686f72742061682070617373776f7264
23
Solaris Security
74686973206973206d79206c6f6e6720626c6f77666973682065737020706173
IPsec policies are rules that the IP stack uses to determine what action should be taken. Actions include:
• bypass: Do nothing, skip the remaining rules if datagram matches. drop: Drop if datagram matches.
• permit: Allow if datagram matches, otherwise discard. (Only for inbound datagrams.)
As you can see, this sounds similar to a firewall rule, and to some extent can be used that way, but you
ultimately find IPFilter much better suited to that task. When you plan your IPsec environment consider
which rules are appropriate in which place.
IPsec policies are defined in the /etc/inet/ipsecinit.conf file, which can be loaded/reloaded using the
ipsecconf command. Lets look at a sample configuration:
# Ignore SSH
{ lport 22 dir both } bypass { }
Our first policy explicitly bypasses connections in and out ("dir both", as in direction) for the local port
22 (SSH). Do I need this here? No, but I include it as an example. You can see the format, the first curly
block defines the filter, the second curly block defines parameters, the keyword in between is the action.
The second policy is what we're interested in, its action is ipsec, so if the filter in the first curly block
matches we'll use IPsec. "raddr" defines a remote address and "rport" defines a remote port, therefore
this policy applies only to outbound connections where we're telnet'ing (port 23) to 8.11.80.5. The second
curly block defines parameters for the action, in this case we define the encryption algorithm (Blowfish),
encryption authentication algorithm (SHA1), and state that the Security Association is "shared". This is
a full ESP connection, meaning we're encrypting and encapsulating the full packet, if we were doing AH
(authentication only) we would only define "auth_algs".
Now, on the remote side of the connection (8.11.80.5) we create a similar policy, but rather than "raddr"
and "rport" we use "laddr" (local address) and "lport" (local port). We could even go so far as to specify
the remote address such that only the specified host would use IPsec to the node. Here's that configuration:
# Ignore SSH
{ lport 22 dir both } bypass { }
24
Solaris Security
To load the new policy file you can refresh the ipsec/policy SMF service like so: svcadm refresh ipsec/
policy. I recommend avoiding the ipsecconf command except to (without arguments) display the active
policy configuration.
So we've defined policies that will encrypt traffic from one node to another, but we're not done yet! We
need to define a Security Association that will association keys with our policy.
Security Associations (SAs) can be manually created by either using the ipseckeys command or directly
editing the /etc/inet/secret/ipseckeys file, I recommend the latter, I personally find the
ipseckeys shell very intimidating.
add esp spi 1000 src 8.15.11.17 dst 8.11.80.5 auth_alg sha1 \
authkey 6d792073686f72742061682070617373776f7264 encr_alg \
blowfish encrkey 6d792073686f72742061682070617373
add esp spi 1001 src 8.11.80.5 dst 8.15.11.17 auth_alg sha1\
authkey 6d792073686f72742061682070617373776f7264 encr_alg \
blowfish encrkey 6d792073686f72742061682070617373
It looks more intimidating that it is. Each line is "add"ing a new static Security Association, both are for
ESP. The SPI is the "Security Parameters Index", is a simple numeric value that represents the SA, nothing
more, pick any value you like. The src and dst define the addresses to which this SA applies, note that you
have two SA's here, one for each direction. Finally, we define the encryption and authentication algorithms
and full keys.
I hope that looking at this makes it more clear how policies and SA's fit together. If the IP stack matches
a datagram against a policy who's action is "ipsec", it takes the packet and looks for an SA who's address
pair matches, and then uses those keys for the action encryption.
Note that if someone obtains your keys your hosed. If you pre-shared keys in this way, change the keys
from time-to-time or consider using IKE which can negotiate keys (and thus SAs) on your behalf.
To apply your new SA's, flush and then load using the ipseckeys command:
$ ipseckey flush
$ ipseckey -f /etc/inet/secret/ipseckeys
All this is for nothing if you don't verify that the packets are actually encrypted. Using snoop, you should
see packets like this:
$ snoop -d e1000g0
Using device e1000g0 (promiscuous mode)
ETHER: ----- Ether Header -----
ETHER:
ETHER: Packet 1 arrived at 9:52:4.58883
ETHER: Packet size = 90 bytes
ETHER: Destination = xxxxxxxxxxx,
ETHER: Source = xxxxxxxxxx,
ETHER: Ethertype = 0800 (IP)
ETHER:
25
Solaris Security
And there you go. You can no encrypt communication transparently in the IP stack. Its a little effort to get
going, but once its running your done... just remember to rotate those keys every so often!
2. Okay, we don´t want manual keying or some stinking preshares keys. Thus we need to create keys.
Login to gandalf and assume the root role:
26
Solaris Security
MIICOzCCAaSgAwIBAgIFAJRpUUkwDQYJKoZIhvcNAQEEBQAwTzELMAkGA1UEBhMC
[ ... some lines omitted ... ]
oi4dO39J7cSnooqnekHjajn7ND7T187k+f+BVcFVbSenIzblq2P0u7FIgIjdlv0=
-----END X509 CERTIFICATE-----
4. Okay, now we have to tell both hosts to use IPsec when they talk to each other:
5. This translates to: When i´m speaking to theoden, i have to encrypt the data and can use any negotiated
and available encryptition algorithm and any negotiated and available authentication algorithm. Such
an rule is only valid on one direction. Thus we have to define the opposite direction on the other host
to enable bidirectional traffic:
6. Okay, the next configuration is file is a little bit more complex. Go into the directory /etc/inet/ike and
create a file config with the following content:
cert_trust "10.211.55.200"
cert_trust "10.211.55.201"
p1_xform
{ auth_method preshared oakley_group 5 auth_alg sha encr_alg des }
p2_pfs 5
{
label "DE-theoden to DE-gandalf"
local_id_type dn
local_id "C=de, O=moellenkamp, OU=moellenkamp-vpn, CN=theoden"
remote_id "C=de, O=moellenkamp, OU=moellenkamp-vpn, CN=gandalf"
local_addr 10.211.55.200
remote_addr 10.211.55.201
p1_xform
27
Solaris Security
7. Okay, we are almost done. But there is still a missing but very essential thing when you want to use
certificates. We have to distribute the certificates of the systems.
$ ikecert certdb -l
Certificate Slot Name: 0 Key Type: rsa
(Private key in certlocal slot 0)
Subject Name:
Key Size: 1024
Public key hash: 28B08FB404268D144BE70DDD652CB874
At the beginning there is only the local key in the system. We have to import the key of the remote
system. Do you remember the output beginning with -----BEGIN X509 CERTIFICATE----- and ending
with -----END X509 CERTIFICATE-----? You need this output now.
8. The next command won´t come back after you hit return. You have to paste in the key. On gandalf
you paste the output of the key generation on theoden. On Theoden you paste the output of the key
generation on gandalf. Let´s import the key on gandalf
$ ikecert certdb -a
-----BEGIN X509 CERTIFICATE-----
MIICOzCCAaSgAwIBAgIFAIRuR5QwDQYJKoZIhvcNAQEEBQAwTzELMAkGA1UEBhMC
UHJ4P6Z0dtjnToQb37HNq9YWFRguSsPQvc/Lm+S9cJCLwINVg7NOXXgnSfY3k+Q=
-----END X509 CERTIFICATE-----
[root@gandalf:/etc/inet/ike]$
9. After pasting, you have to hit Enter once and after this you press Ctrl-D once. Now we check for the
successful import. You will see two certificates now.
$ ikecert certdb -l
Certificate Slot Name: 0 Key Type: rsa
(Private key in certlocal slot 0)
Subject Name:
Key Size: 1024
Public key hash: 28B08FB404268D144BE70DDD652CB874
10.Okay, switch to theoden and import the key from gandalf on this system.
$ ikecert certdb -l
Certificate Slot Name: 0 Key Type: rsa
(Private key in certlocal slot 0)
Subject Name:
Key Size: 1024
Public key hash: 76BE0809A6CBA5E06219BC4230CBB8B8
$ ikecert certdb -a
-----BEGIN X509 CERTIFICATE-----
MIICOzCCAaSgAwIBAgIFAJRpUUkwDQYJKoZIhvcNAQEEBQAwTzELMAkGA1UEBhMC
28
Solaris Security
oi4dO39J7cSnooqnekHjajn7ND7T187k+f+BVcFVbSenIzblq2P0u7FIgIjdlv0=
-----END X509 CERTIFICATE-----
$ ikecert certdb -l
Certificate Slot Name: 0 Key Type: rsa
(Private key in certlocal slot 0)
Subject Name:
Key Size: 1024
Public key hash: 76BE0809A6CBA5E06219BC4230CBB8B8
su -
# cp /etc/apache2/httpd.conf-example /etc/apache2/httpd.conf
3. Edit /etc/apache2/httpd.conf
4. Enable Apache2
5. Enable SSL Service Property if necessary. Log in as root and issue the following command:
29
Solaris Security
# mkdir /etc/apache2/ssl.crt
# mkdir /etc/apache2/ssl.key
/etc/apache2/ssl.key/server.key
> /etc/apache2/ssl.crt/server.csr
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter ‘.’, the field will be left blank.
—–
Country Name (2 letter code) [US]::US
State or Province Name (full name) [Some-State]:OR
Locality Name (eg, city) []:Blodgett
Organization Name (eg, company) [Unconfigd OpenSSL Installation]:DIS
Organizational Unit Name (eg, section) []:IT
Common Name (eg, YOUR name) []:Big Cheese
Email Address []:meljr@meljr.com
Please enter the following ‘extra’ attributes
to be sent with your certificate request
A challenge password []: ********
An optional company name []: Live Free or Die
9. Install a Self-Signed Certificate. If you are going to install a certificate from an authoritative source,
follow their instructions and skip this step.
> /etc/apache2/ssl.key/server.key \
30
Solaris Security
> /etc/apache2/ssl.crt/server.crt
10.Edit the ssl.conf and change the line that begins with “ServerAdmin” to reflect an email address or alias
for the Server’s Administrator.
# cd /etc/apache2/ssl.key
# cp server.key server.key.org
# /usr/local/ssl/bin/openssl rsa -in server.key.org -out server.key
Enter pass phrase for server.key.org: ********
writing RSA key
# chmod 400 server.key
# svcadm enable apache2
# svcs | grep -i apache2
online 4:29:01 svc:/network/http:apache2
Audit Control:suser:cmd:::/etc/security/bsmconv:uid=0
Audit Control:suser:cmd:::/etc/security/bsmunconv:uid=0
root::::type=role;auths=solaris.*,solaris.grant;\
profiles=All;\
lock_after_retries= no;min_label=admin_low;\
clearance=admin_high
31
Solaris Security
Fundamentals - login as a user and assume root; then modify the root account as type role and add the
root role to a user; test with fresh login before logging out
# rm /etc/rc3.d/S76snmpdx
# rm /etc/rc3.d/S90samba
# Review /etc/rc2.d/S90* for deletion
# ftpconfig -d /zones/ftp-root
# mkdir /zones/ftp-root/incoming
# chown go-r /zones/ftp-root/incoming
32
Solaris Security
# mkdir /export/home
# groupadd -g 2000 secadm
# useradd -d /export/home/secuser -m secuser
# passwd secuser
# roleadd -u 2000 -g 2000 -d /export/home/secadm -m secadm
# passwd secadm
# rolemod -P "Primary Administrator","Basic Solaris User" secadm
# usermod -R secadm secuser
# svcadm restart system/name-service-cache
#. logout of root, login as secuser
# su - secadm
Fundamentals - login as a user and assume root; then modify the root account as type role and add the
root role to a user; test with fresh login before logging out
$ su -
# usermod -K type=role root
# useradd -d /home/padmin -m -g 2000 padmin
# passwd padmin
# usermod -R root padmin
# cd /etc/security
## edit audit_control and change the dir:/var/audit to /bsm/audit
## Run the following command, you will need to reboot.
# ./bsmconv
# zonecfg -z secftp
secftp: No such zone configured
Use 'create' to begin configuring a new zone.
zonecft:secftp> create
zonecft:secftp> set zonepath=/zones/secftp
zonecft:secftp> set autoboot=false
zonecft:secftpt> add fs
zonecft:secftp:fs> set type=zfs
zonecft:secftp:fs> set special=zones/ftp-root
zonecft:secftp:fs> set dir=/ftp-root
zonecft:secftp:fs> end
33
Solaris Security
zonecft:secftp:net> end
zonecft:secftp> verify
zonecft:secftp> commit
zonecft:secftp> exit
# zlogin -C secftp
[Connected to zone 'secftp' ]
Enter Requested Setup Information
# rm /etc/rc3.d/S76snmpdx
# rm /etc/rc3.d/S90samba
## Review /etc/rc2.d/S90* for deletion
[create same accounts and role changes as in global - you can set these to different names if you like]
/etc/passwd:
secxfr:x:2002:1::/ftp-root/./incoming:/bin/true
# pwconv
# passwd secxfr
# set ot secxfr
34
Solaris Security
Trusted Extensions
1. Fundamentals
TX places classification, and compartment wrappers around Non-Global Zones and defines what
systems can communicate with those zones
a. Classification vs Compartment
2. Basic TX Configuration
Make sure no non-global zones are configured or installed; Non-Global zones need to be mapped to a
clearance and category before installation; these example content files will configure a host for three
non-global zones; one for public "web like" features, one for internal host-to-host from non-labeled
systems and one for secure tx to tx systems - labels are public, confidential and restricted.
a. Check /etc/user_attr to make sure your root and root role account has the following access levels
min_label=admin_low;clearance=admin_high
CLASSIFICATIONS:
INFORMATION LABELS:
WORDS:
REQUIRED COMBINATIONS:
COMBINATION CONSTRAINTS:
SENSITIVITY LABELS:
35
Solaris Security
WORDS:
REQUIRED COMBINATIONS:
COMBINATION CONSTRAINTS:
CLEARANCES:
WORDS:
REQUIRED COMBINATIONS:
COMBINATION CONSTRAINTS:
CHANNELS:
WORDS:
PRINTER BANNERS:
WORDS:
ACCREDITATION RANGE:
*
* Local site definitions and locally configurable options.
*
LOCAL DEFINITIONS:
COLOR NAMES:
36
Solaris Security
# netservices limited
e. Update /etc/security/tsol/tnrhtb to define CIPSO connections and force a label for non-labeled host
connections
Note that this file uses "\" to shorten the lines for pdf output; remove them before using.
#
global:ADMIN_LOW:1:111/tcp;111/udp;515/tcp;\
631/tcp;2049/tcp;6000-6003/tcp:6000-6003/tcp
pub-tx01:0x0002-08-08:0::
restricted-tx01:0x000a-08-08:0::
g. Enable TX Services
37
Solaris Security
# txzonemgr
TX places classification, and compartment wrappers around Non-Global Zones and defines what
systems can communicate with those zones
a. Allowing user upgrade information - should the labeled zone allow it. Information stored in /etc/
user_attr
auths=solaris.label.file.upgrade
defaultprivs=sys_trans_label,file_upgrade_sl
b. Allowing user downgrade information - should the labeled zone allow it. Information stored in /
etc/user_attr
auths=solaris.label.file.downgrade
defaultprivs=sys_trans_label,file_downgrade_sl
c. Preventing user from seeing processes beyond the users ownership. Information stored in /etc/
user_attr
defaultprivs=basic,!proc_info
user::::auths=solaris.label.file.upgrade,\
solaris.label.file.downgrade;type=normal;\
defaultpriv=basic,!proc_info,sys_trans_label,\
file_upgrade_sl,file_downgrade_sl;\
clearance=admin_high;min_label=admin_low
e. Paring priv limitations and expansion of features with non-global zone configuration
zonecfg -z zone-name
set limitpriv=default,file_downgrade_sl,\
file_upgrade_sl,sys_trans_label
exit
38
Chapter 5. Solaris Virtualization
Logical Domains
Socket, Core and Thread Distribution
Table 5.1. Coolthreads Systems
System Processor Max Memory RU
Threads
Sun SPARC Enterprise T5140 Server 2 UltraSPARC T2 Plus 128 128 1
Sun SPARC Enterprise T5240 Server 2 UltraSPARC T2 Plus 128 256 2
Sun SPARC Enterprise T5440 Server 4 UltraSPARC T2 Plus 256 512 4
Sun SPARC Enterprise T5120 Server 1 UltraSPARC T2 64 128 1
Sun SPARC Enterprise T5220 Server 1 UltraSPARC T2 64 128 2
Sun Blade™ T6340 Server Module 2 UltraSPARC T2 Plus 128 256 Blade
Sun Blade T6320 Server Module 1 UltraSPARC T2 64 128 Blade
Sun Blade T6300 Server Module 1 UltraSPARC T1 32 32 Blade
Sun SPARC Enterprise T1000 Server 1 UltraSPARC T1 32 32 1
Sun SPARC Enterprise T2000 Server 1 UltraSPARC T1 32 64 2
Sun Fire™ T1000 Server 1 UltraSPARC T1 32 32 1
Sun Fire T2000 Server 1 UltraSPARC T1 32 64 2
39
Solaris Virtualization
Copyright 2008 Sun Microsystems, Inc. All rights reserved. Use is subject
to license terms. Installation of <SUNWldm> was successful. pkgadd -n
-d "/export/home/rlb/LDoms_Manager-1_1/Product" -a pkg_admin SUNWjass
Copyright 2005 Sun Microsystems, Inc. All rights reserved. Use is subject
to license terms. Installation of <SUNWjass> was successful. Verifying
that all packages are fully installed. OK. Enabling services: svc:/
ldoms/ldmd:default Solaris Security Toolkit was not applied. Bypassing
the use of the Solaris Security Toolkit is _not_ recommended and should
only be performed when alternative hardening steps are to be taken. You
have new mail in /var/mail/root
Create DOM1
# svcadm enable vntsd
# ldm add-domain dom1
# ldm add-vcpu 8 dom1
# ldm add-memory 2048m dom1
# ldm add-vnet pub0 primary-vsw0 dom1
# ldm add-vnet isan0 primary-vsw1 dom1
40
Solaris Virtualization
Create LDOM #2
# ldm add-domain dom2
# ldm add-vcpu 8 dom2
# ldm add-memory 2048m dom2
# ldm add-vnet pub0 primary-vsw0 dom2
# ldm add-vdiskserverdevice /dev/rdsk/c1t66d0s2 vol2@primary-vds0
# ldm add-vdisk vdisk0 vol2@primary-vds0 dom2
# ldm add-vdisk iso iso@primary-vds0 dom2
# ldm set-variable auto-boot\?=false dom2
# ldm bind dom2
# ldm start dom2
LDom dom2 started
Trying 127.0.0.1...
Connected to localhost. Escape character is '^]'.
Connecting to console "dom2" in group "dom2" ....
41
Solaris Virtualization
different guest domains. When a virtual disk backend is exported multiple times, it should not be exported
with the exclusive (excl) option. Specifying the excl option will only allow exporting the backend once.
Caution - When a virtual disk backend is exported multiple times, applications running on guest domains
and using that virtual disk are responsible for coordinating and synchronizing concurrent write access to
ensure data coherency.
Export the virtual disk backend two times from a service domain by using the following commands. Note
the "-f" that forces the second device to be defined. Without the "-f" the second command will fail reporting
that the share must be "read only".
Assign the exported backend to each guest domain by using the following commands.
Example: note that SVM was tested, but LDOM's would not recognize the disks
42
Solaris Virtualization
# ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 8 3968M 0.2% 47m
vsrv1 bound ------ 5000 4 2G
vsrv2 bound ------ 5001 4 2G
Script Assumptions
Script assumes that the there is an initial LDOM created on a zfs resident disk image called LDOM/dom3/
vdisk0.img, and that all potential domains will be in DOM0's local hosts table. Note that this script was
written on Solaris 10 Update 4, with LDOM Manager 1.0. The basic process is to clone a known good
image, mount through lofi, update key boot files, then create the ldom constraints file through command
like execution; finally binding and booting the ldom. Entire process from known good image is about 7
seconds.
# ./autodom.sh dom4
Mon May 14 20:51:47 EDT 2007
Starting AutoDom
Mon May 14 20:51:53 EDT 2007
#
43
Solaris Virtualization
#!/bin/sh
DOM=$1
date
echo "Starting AutoDom"
44
Solaris Virtualization
date
# Done Script
Copy xml file to all systems that will support the failover of this LDOM. In this example they are stored
in a custom /etc/ldoms/ directory. It may, however make sense to put it on shared storage.
/etc/VRTSvcs/conf/config/main.cf:
group dom2 (
SystemList = { primary-dom1 = 0 }
)
LDom ldom_dom2 (
LDomName = dom2
CfgFile = /etc/ldoms/dom2.xml
)
View of ldm list when VCS LDOM Agent has been started
View of ldm list when VCS LDOM Agent has been stopped
45
Solaris Virtualization
# ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 8 4092M 0.4% 18m
dom1 inactive ------ 8 2G
dom2 inactive ------ 8 1904M
# haconf -makerw
# hares -modify ldom_dom1 NumCPU 4
# haconf -dump -makero
# ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 8 4092M 0.4% 18m
dom1 inactive ------ 8 2G
dom2 inactive ------ 8 1904M
# ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 8 4092M 0.4% 18m
dom1 active -t---- 5000 4 2G 25% 1s
dom2 inactive ------ 8 1904M
46
Solaris Virtualization
# ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 8 4092M 0.4% 31m
dom1 inactive ------ 2G
dom2 inactive ------ 8 1904M
# ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 8 4092M 0.4% 32m
dom1 active -t---- 5000 4 2G 25% 12s
dom2 inactive ------ 8 1904M
Warning
When a LDOM uses a ZFS RAW Volume instead of a mkfile image on a ZFS FS, the Zpool
Agent for VCS will attempt to mount and check the volume. Being a raw volume, this will cause
the Agent to fail. To avoid this use the ChkZFSMounts 0 option.
Note
The LDOM XML File is generated by the # ldm ls-constraints -x dom1 >/etc/ldoms/dom1.xml
command; make the /etc/ldoms directory on both servers first; create the xml file, then copy to
both servers.
47
Solaris Virtualization
# ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME
primary active -n-cv- SP 4 1G 0.3% 2h 49m
wanboot active -n---- 5000 4 1G 0.2% 3h 51m
b. Shutdown LDOM
48
Solaris Virtualization
c. Bind LDOM
d. Start Domain
Warning
Documentation on opensolaris web side uses different options to the virt-install command.
Options displayed on website will not work, and are not available, on 2009.06
# export DISPLAY=123.456.789.10:0.0
49
Solaris Virtualization
Live Upgrade is the recommended program to upgrade and to add patches. Other upgrade programs might
require extensive upgrade time, because the time required to complete the upgrade increases linearly with
the number of installed non-global zones. If you are patching a system with Solaris Live Upgrade, you
do not have to take the system to single-user mode and you can maximize your system's uptime. The
following list summarizes changes to accommodate systems that have non-global zones installed.
• A new package, SUNWlucfg, is required to be installed with the other Solaris Live Upgrade packages,
SUNWlur and SUNWluu. This package is required for any system, not just a system with non-global
zones installed.
• Creating a new boot environment from the currently running boot environment remains the same as in
previous releases with one exception. You can specify a destination disk slice for a shared file system
within a non-global zone. For more information, see Creating and Upgrading a Boot Environment When
Non-Global Zones Are Installed (Tasks).
• The lumount command now provides non-global zones with access to their corresponding file systems
that exist on inactive boot environments. When the global zone administrator uses the lumount command
to mount an inactive boot environment, the boot environment is mounted for non-global zones as well.
See Using the lumount Command on a System That Contains Non-Global Zones.
• Comparing boot environments is enhanced. The lucompare command now generates a comparison
of boot environments that includes the contents of any non-global zone. See To Compare Boot
Environments for a System With Non-Global Zones Installed.
• Listing file systems with the lufslist command is enhanced to list file systems for both the global zone
and the non-global zones. See To View the Configuration of a Boot Environment's Non-Global Zone
File Systems.
Solaris 10 8/07 adds the ability to use Live Upgrade tools on a system with Containers. This makes it
possible to apply an update to a zoned system, e.g. updating from Solaris 10 11/06 to Solaris 10 8/07. It
also drastically reduces the downtime necessary to apply some patches.
The latter ability requires more explanation. An existing challenge in the maintenance of zones is patching
- each zone must be patched when a patch is applied. If the patch must be applied while the system is
down, the downtime can be significant.
Fortunately, Live Upgrade can create an Alternate Boot Environment (ABE) and the ABE can be patched
while the Original Boot Environment (OBE) is still running its Containers and their applications. After
the patches have been applied, the system can be re-booted into the ABE. Downtime is limited to the time
it takes to re-boot the system.
An additional benefit can be seen if there is a problem with the patch and that particular application
environment. Instead of backing out the patch, the system can be re-booted into the OBE while the problem
is investigated.
The Solaris Zones partitioning technology is used to virtualize operating system services and provide an
isolated and secure environment for running applications. A non-global zone is a virtualized operating
system environment created within a single instance of the Solaris OS, the global zone. When you create a
non-global zone, you produce an application execution environment in which processes are isolated from
the rest of the system.
50
Solaris Virtualization
Solaris Live Upgrade is a mechanism to copy the currently running system onto new slices. When non-
global zones are installed, they can be copied to the inactive boot environment along with the global zone's
file systems.
• In this example of a system with a single disk, the root (/) file system is copied to c0t0d0s4. All non-
global zones that are associated with the file system are also copied to s4. The /export and /swap file
systems are shared between the current boot environment, bootenv1, and the inactive boot environment,
bootenv2. The lucreate command is the following:
• In this example of a system with two disks, the root (/) file system is copied to c0t1d0s0. All non-global
zones that are associated with the file system are also copied to s0. The /export and /swap file systems are
shared between the current boot environment, bootenv1, and the inactive boot environment, bootenv2.
The lucreate command is the following:
• In this example of a system with a single disk, the root (/) file system is copied to c0t0d0s4. All non-
global zones that are associated with the file system are also copied to s4. The non-global zone, zone1,
has a separate file system that was created by the zonecfg add fs command. The zone path is /zone1/
root/export. To prevent this file system from being shared by the inactive boot environment, the file
system is placed on a separate slice, c0t0d0s6. The /export and /swap file systems are shared between
the current boot environment, bootenv1, and the inactive boot environment, bootenv2. The lucreate
command is the following:
• In this example of a system with two disks, the root (/) file system is copied to c0t1d0s0. All non-global
zones that are associated with the file system are also copied to s0. The non-global zone, zone1, has
a separate file system that was created by the zonecfg add fs command. The zone path is /zone1/root/
export. To prevent this file system from being shared by the inactive boot environment, the file system is
placed on a separate slice, c0t1d0s4. The /export and /swap file systems are shared between the current
boot environment, bootenv1, and the inactive boot environment, bootenv2. The lucreate command is
the following:
• When you install or upgrade VCS using the installer program, all zones are upgraded (both global and
non-global) unless they are detached and unmounted.
• If you install VCS on Solaris 10 systems that run non-global zones, you need to make sure that non-
global zones do not inherit the /opt directory. Run the following command to make sure that the /opt
directory is not in the inherit-pkg-dir clause:
51
Solaris Virtualization
pool: yourpool
inherit-pkg-dir:
dir: /lib
inherit-pkg-dir:
dir: /platform
inherit-pkg-dir:
dir: /sbin
inherit-pkg-dir:
dir: /usr
• Veritas Upgrading when the zone root is on Veritas File System shared storage
The following procedures are to make one active non-global zone upgradeable with the zone root on
shared storage. The corresponding non-global zones on the other nodes in the cluster are then detached
from shared storage. They are detached to prevent them from being upgraded one at a time.
# hastop -all
2. On nodeA, bring up the volumes and the file systems that are related to the zone root.
Note
For a faster upgrade, you can boot the zones to bring them into the running state.
# hastop -all
# patchadd nnnnnn-nn
# patchadd xxxxxx-xx
.
.
Use a mount point as a temporary zone root directory. You then detach the non-global zones in the
cluster that are in the installed state. Detach them to prevent the operating system from trying to
upgrade these zones and failing. - this is from Veritas Docs; not sure about process; recomment detach
on alternate global zones; but don't think the fake filesystem is needed as long as non-global zone
is patches on the original host - more work needed should zone failover be a requirment for rolling
upgrades; could be a possible "upgrade on attach" condition - not supported by VCS Zone Agent yet.
52
Solaris Virtualization
zonecfg:myzone:inherit-pkg-dir> end
zonecfg:myzone> add inherit-pkg-dir
zonecfg:myzone:inherit-pkg-dir> set dir=/platform
zonecfg:myzone:inherit-pkg-dir> end
zonecfg:myzone> add inherit-pkg-dir
zonecfg:myzone:inherit-pkg-dir> set dir=/sbin
zonecfg:myzone:inherit-pkg-dir> end
zonecfg:myzone> add inherit-pkg-dir
zonecfg:myzone:inherit-pkg-dir> set dir=/usr
zonecfg:myzone:inherit-pkg-dir> end
zonecfg:myzone> add inherit-pkg-dir
zonecfg:myzone:inherit-pkg-dir> set dir=/opt/swf
zonecfg:myzone:inherit-pkg-dir> end
zonecfg:myzone> verify
zonecfg:myzone> export
create -b
set zonepath=/zones/myzone
set autoboot=true
add inherit-pkg-dir
set dir=/lib
end
add inherit-pkg-dir
set dir=/platform
end
add inherit-pkg-dir
set dir=/sbin
end
add inherit-pkg-dir
set dir=/usr
end
add inherit-pkg-dir
set dir=/opt/sfw
end
add net
set address=192.168.1.7/24
set physical=hme0
end
4. Boot then execute the sysidcfg through the non-global zone console
The zlogin -e option allows for changing the ~. break sequence; I commonly change this due to layers
of login sessions where ~. would drop connection on other terminals.
53
Solaris Virtualization
/* zone_status */
typedef enum {
ZONE_IS_UNINITIALIZED = 0,
ZONE_IS_READY,
ZONE_IS_BOOTING,
ZONE_IS_RUNNING,
ZONE_IS_SHUTTING_DOWN,
ZONE_IS_EMPTY,
ZONE_IS_DOWN,
ZONE_IS_DYING,
ZONE_IS_DEAD
} zone_status_t;
Dtrace code - can be run via cron with output to a monitored file
/usr/sbin/dtrace -qs
BEGIN
{
state[0] = "Uninitialized";
state[1] = "Ready";
state[2] = "Booting";
state[3] = "Running";
state[4] = "Shutting down";
state[5] = "Empty";
state[6] = "Down";
state[7] = "Dying";
state[8] = "Dead";
}
zone_status_set:entry
{
printf("Zone %s status %s\n", stringof(args[0]->zone_name),
state[args[1]]);
}
# ./zonestatus.d
Zone aap status Ready
Zone aap status Booting
Zone aap status Running
Zone aap status Shutting down
Zone aap status Down
Zone aap status Empty
Zone aap status Dying
54
Solaris Virtualization
# zonecfg -z myzone
zonecfg:myzone> set limitpriv=default,dtrace_proc,dtrace_user
zonecfg:myzone> ^D
/* zone_status */
typedef enum {
ZONE_IS_UNINITIALIZED = 0,
ZONE_IS_READY,
ZONE_IS_BOOTING,
ZONE_IS_RUNNING,
ZONE_IS_SHUTTING_DOWN,
ZONE_IS_EMPTY,
ZONE_IS_DOWN,
ZONE_IS_DYING,
55
Solaris Virtualization
ZONE_IS_DEAD
} zone_status_t;
Dtrace code - can be run via cron with output to a monitored file
/usr/sbin/dtrace -qs
BEGIN
{
state[0] = "Uninitialized";
state[1] = "Ready";
state[2] = "Booting";
state[3] = "Running";
state[4] = "Shutting down";
state[5] = "Empty";
state[6] = "Down";
state[7] = "Dying";
state[8] = "Dead";
}
zone_status_set:entry
{
printf("Zone %s status %s\n", stringof(args[0]->zone_name),
state[args[1]]);
}
# ./zonestatus.d
Zone aap status Ready
Zone aap status Booting
Zone aap status Running
Zone aap status Shutting down
Zone aap status Down
Zone aap status Empty
Zone aap status Dying
Zone aap status Ready
Zone aap status Dead
Zone aap status Booting
Zone aap status Running
Zone aap status Shutting down
Zone aap status Empty
Zone aap status Down
Zone aap status Dead
a. Force Attachment
56
Solaris Virtualization
Used when a zone will not attach due to manifest incompatabilities such as missing patches. Buyer
be ware.
d. Dry Run to see if a non-global zone can be moved from one system to another
e. Update on Attach
Can be used durring round-robin upgrades or moving from one architecture to another.
# zonecfg -z webzone
webzone: No such zone configured
Use 'create' to begin configuring a new zone
zonecfg:webzone> create
zonecfg:webzone> set zonepath=/zones/webzone
zonecfg:webzone> exit
57
Solaris Virtualization
5. Exclusive IP Mode
After using that command, when that Container boots, Solaris: removes a CPU from the default pool
assigns that CPU to a newly created temporary pool associates that Container with that pool, i.e.
only schedules that Container's processes on that CPU Further, if the load on that CPU exceeds a
default threshold and another CPU can be moved from another pool, Solaris will do that, up to the
maximum configured amount of three CPUs. Finally, when the Container is stopped, the temporary
pool is destroyed and its CPU(s) are placed back in the default pool.
58
Solaris Virtualization
zonecfg:myzone:dedicated-cpu> end
zonecfg:myzone> exit
a. Primary system -
iii. Export the zfs pool used for the non-global zone
b. Failover System -
This will check the following: Checks is the service group were the local zone resides is compliant; Checks
if the systems hosting the service group have the required operating system to run local zones; Checks if
the dependencies of the Zone resource are correct.
# hazoneverify <SG>
Table 5.3. VCS Command Line Access - Global vs. Non-Global Zones
Common Commands Global Zone Non-Global Zone
hastatus -sum yes yes
hares -state yes yes
59
Solaris Virtualization
# Monitor Code
VCSHOME="${VCS_HOME:-/opt/VRTSvcs}”
. $VCSHOME/bin/ag_i18n_inc.sh
ZONE=$1
SYS=`cat /var/VRTSvcs/conf/sysname`
INDEX=/etc/zones/index
ZONE_XML=/etc/zones/${ZONE}.xml
if [ ! -f $ZONE_XML ] ; then
VCSAG_LOG_MSG "N" "ZONE: $ZONE Configuration file: \
$ZONE_XML not found on $SYS. \
Must run failover test before being considered \
production ready" 1 "$ResName"
fi
if [ -z $STATE ] ; then
VCSAG_LOG_MSG "N" "ZONE: $ZONE is not in $INDEX, and \
was never imported on $SYS. \
Must run failover test before being considered production\
ready" 1 "$ResName"
# Exit offline
exit 100
fi
case "$STATE" in
running)
60
Solaris Virtualization
# Zone is running
exit 110
configured)
# Zone Imported but not running
exit 100
installed)
# Zone had been configured on this system, but is not
# imported or running
exit 100
*)
esac
#########################
## StartProgram
#########################
VCSHOME="${VCS_HOME:-/opt/VRTSvcs}”
. $VCSHOME/bin/ag_i18n_inc.sh
$ZONE=$1
$ZONE_HOME=$2
S=$?
if [ $S -eq 0 ] ; then
# Creation was a success, starting zone boot
VCSAG_LOG_MSG "N" \
"ZONE: $ZONE Success in attaching to system $SYS" 1 "$ResName"
VCSAG_LOG_MSG "N" \
"ZONE: $ZONE Starting Boot sequence on $SYS" 1 "$ResName"
zoneadm -z $ZONE boot
ZB=$?
if [ $ZB -eq 0 ] ; then
VCSAG_LOG_MSG "N" \
"ZONE: $ZONE Boot command successful $SYS" 1 "$ResName"
else
VCSAG_LOG_MSG "N" \
"ZONE: $ZONE Boot command failed on $SYS" 1 "$ResName"
fi
else
# Creation Failed
VCSAG_LOG_MSG "N" \
"ZONE: $ZONE Attach Command failed on $SYS" 1 "$ResName"
fi
61
Solaris Virtualization
##########################
## StopProgram
##########################
VCSHOME="${VCS_HOME:-/opt/VRTSvcs}”
. $VCSHOME/bin/ag_i18n_inc.sh
SYS=`cat /var/VRTSvcs/conf/sysname`
$ZONE=$1
$ZONE_HOME=$2
VCSAG_LOG_MSG "N" \
"ZONE: $ZONE Detach In Progress on $SYS" 1 "$ResName"
VCSAG_LOG_MSG "N" \
"ZONE: $ZONE Detach Is Complete $SYS" 1 "$ResName"
exit
62
Solaris Virtualization
63
Chapter 6. Solaris WANBoot
General Overview for Dynamic Wanboot POC
This proof of concept is designed to show how, through the use of a jumpstart dynamic profiles and
Client-id wanboot parameters, client specific configurations can be pre-defined and used in a way that
allows the administrator to "fire and forget". Thus avoiding the need to input frequent, redundant, system
configuration information during the installation process. The intent of this lightweight proof of concept is
to use a methodology that can be integrated into new builds, capturing and leveraging information on the
current host during clean upgrades, and include the ability to pre-define administration and default selected
product install tasks such as a select a veritas product and create a veritas response file for configuration.
POC Goals
• Simple, extendable, flexable
• Admins ability to pre-select OS Install Disk (secondary mirror) and or ability to set based on script
conditions
• Adaptable to allow for additional install scripts and products; including configuration tasks for those
products
• Ability to define and pass variables set during the wanboot cliend definition process throughout different
stages of the install.
• Methodology that allows for 'collection' of configuration information from an existing server (can be
used to upgrade to new OS version while preserving existing scripts and configurations)
• Methodology that allows for additional products to be installed and configured - selection prior to install
time.
64
Solaris WANBoot
• The configuration information entered during different stages of the install process is the same as the
previous stage.
• The sysidcfg information is not passed from one stage to the next
• SI_ variables are defined as needed and only during latter stages of the install
• Because information must be re-entered at different stages the install can not currently be a "fire and
forget"
• Use dynamic profile that sources a boot.env file specific to each host - allows for definition of hard
drive to install to
• Wanboot process should be dynamic and not needing frequent check rules generation.
Next Steps
1. Develop a Client Management Interface for Product Selection and Configuration
2. Create script collections for various products selected through Client Management Interface
Configuration Steps
Table 6.1. Wanboot Server Client Details
Server Value
Wanboot Server 192.168.15.89
Target Client Hostname dom2
Target Client Host ID 84F8799D
Target Client Install Disk c0d0
65
Solaris WANBoot
# cd /etc/apache2
# cp httpd.conf-example httpd.conf
# svcadm enable apache2
# cd /var/apache2/htdocs
# mkdir config
# mkdir flar
# mkdir wanboot10
# mkdir /var/apache2/htdocs/config/client-sysidcfg/dom2
# ./setup_install_server -w /var/apache2/htdocs/wanboot10/wpath \
/var/apache2/htdocs/wanboot10/ipath
# cd /mnt/Solaris_10/Tools/Boot/platform/sun4v/
# cp wanboot /var/apache2/htdocs/wanboot/sun4v.wanboot
# cd /var/apache2/htdocs/wanboot10/wpath
# cp miniroot ..
# lofiadm -a /var/apache2/htdocs/wanboot10/miniroot
/dev/lofi/1
66
Solaris WANBoot
# umount /mnt
# lofiadm -d /dev/lofi/1
File Contents
/etc/netboot/192.168.15.0/84F8799D/system.conf
SsysidCF=http://192.168.15.89/config/js-rules/dom2
SjumpsCF=http://192.168.15.89/config/js-rules
/etc/netboot/192.168.15.0/84F8799D/wanboot.conf
boot_file=/wanboot10/sun4v.wanboot
root_server=http://192.168.15.89/cgi-bin/wanboot-cgi
root_file=/wanboot10/miniroot
server_authentication=no
client_authentication=no
system_conf=system.conf
boot_logger=http://192.168.15.89/cgi-bin/bootlog-cgi
/var/apache2/htdocs/config/js-rules/rules
/var/apache2/htdocs/config/js-rules/dynamic_pre.sh
#!/bin/sh
HOST_NAME=`hostname`
/usr/sfw/bin/wget -P/tmp/install_config/ \
http://192.168.15.89/config/js-rules/${HOST_NAME}/boot.env
sleep 2
. /tmp/install_config/boot.env
/var/apache2/htdocs/config/js-rules/$HOSTNAME/boot.env
DY_ROOTDISK=c0d0
dy_install_type=flash_install
dy_archive_location=http://192.168.15.89/flar/sun4v_sol10u6.flar
/var/apache2/htdocs/config/js-rules/$HOSTNAME/sysidcfg
67
Solaris WANBoot
68
Chapter 7. Solaris 10 Live Upgrade
Solaris 8 to Solaris 10 U6 Work A Round
This article describes the process for using Solaris Live Upgrade to upgrade from Solaris 8 to Solaris 10
05/08 or later releases.
The Solaris 10 05/08 release media (and subsequent Solaris 10 Updates) were compressed using a different
compression utility than previous Solaris 10 Releases, which all used bzip2 compression. As a result of this,
in order to upgrade to Solaris 05/08 (or later Solaris Releases) using Solaris Live Upgrade, the live system
(on which luupgrade is actually running), must have p7zip installed. p7zip was backported to Solaris 9 in
patch format, but for Solaris 8 there is no similar patch available.
To upgrade from Solaris 8 to Solaris 10 05/08 (or later Solaris Releases) using Live Upgrade, a special
download (s8p7zip.tar.gz) has been made available. This file is attached to this solution (see below).
The download consists of 3 Sun FreeWare packages, a wrapper script and an installer script.
# gunzip s8p7zip.tar.gz
# tar xvpfs8p7zip.tar
s8p7zip/
s8p7zip/install.sh
s8p7zip/p7zip
s8p7zip/README
s8p7zip/SMClgcc.Z
s8p7zip/SMCmktemp.Z
s8p7zip/SMCp7zip.Z
s8p7zip/LEGAL_LICENSE.TXT
3. When s8p7zip.tar.gz is unpacked, change in to the s8p7zip directory and run the install.sh script
# cd s8p7zip ; ./install.sh
installing SMCp7zip
installing SMClgcc
installing SMCmktemp
Testing p7zip utility ...
Test successful.
p7zip utility has been installed successfully.
* SMClgcc
* SMCmktemp
* SMCp7zip
Should the following result in error, check to make sure the packages are installed correctly.
69
Solaris 10 Live Upgrade
# metastat -c
70
Solaris 10 Live Upgrade
# df -h | grep md
/ (/dev/md/dsk/d100 ):13535396 blocks 1096760 files
/var (/dev/md/dsk/d103 ): 6407896 blocks 479598 files
/export (/dev/md/dsk/d104 ):20641888 blocks 1246332 files
/zones (/dev/md/dsk/d105 ):19962180 blocks 1205564 files
/dev/md/dsk/d101 - - swap - no -
# lustatus
Boot Environment Is Active Active Can Copy
Name Complete Now On Reboot Delete Status
-------------------------- -------- ------ --------- ------ ----------
svn110 yes yes yes no -
os200906 yes no no yes -
71
Solaris 10 Live Upgrade
2. Create OS Image with same FS Layout ; Have lucreate split mirror for you.
disk has the same partition layout and has been labled
Warning
When adding patches to ABE bad patch script permissions could prevent the patch from being
added; look for errors around permissions such as: /var/sadm/spool/lu/120273-25/postpatch -
simple chmod will fix and allow for patch installation ; recommend scripting check before adding
patches
1. PATCHING - For Solaris 10 '*' works out patch order - otherwise patch_order file can be passed to it.
# luumount abe
7. Enable ABE
72
Solaris 10 Live Upgrade
# luactivate abe
# lustatus
Instead of using the preceding command to create the alternate boot environment so it matches the
current boot environment, the following command joins / and /usr, assuming that c0t3d0s0 is partitioned
with sufficient space:
# luupgrade -t -n "Solaris_9" \
-s /install/data/patches/SunOS-5.9-sparc/recommended -O \
"-M /install/data/patches/SunOS-5.9-sparc/recommended patch_order"
This next example would instead split /opt off of /, assuming that c0t3d0s5 is partitioned with sufficient
space:
This next example shows how to upgrade from the existing Solaris 8 alternate boot environment to
Solaris 9 by means of an NFS-mounted JumpStart installation. First create a JumpStart installation
from CD-ROM, DVD, or an ISO image as covered in the Solaris 9 Installation Guide. The JumpStart
installation in this example resides in /install on the server js-server. The OS image itself resides in /
install/cdrom/SunOS-5.9-sparc. The profiles for this JumpStart installation dwell in /install/jumpstart/
profiles/ in a subdirectory called liveupgrade. Within this directory, the file js-upgrade contains the
JumpStart profile to upgrade the OS and additionally install the package SUNWxwice:
install_type upgrade
On the target machine, mount the /install partition from js-server and run luupgrade, specifying the
Solaris_9 alternate boot environment as the target, the OS image location, and the JumpStart profile:
73
Solaris 10 Live Upgrade
# mkdir /install
74
Chapter 8. Solaris and Linux General
Information
Patch Database Information
1. Linux RPM Commands
Upgrade RPM
# rpm -Uvh ems-1.0-2.i386.rpm
Install RPM
# rpm -ivh ems-2.0-4.i386.rpm
# pkgchk -l -p /path/to/file
75
Solaris and Linux General Information
Pathname: /kernel/misc/sparcv9/diaudio
Pathname: /kernel/misc/sparcv9/mixer
SSH Keys
Common issues:
1. Permissions on .ssh
ssh-keygen -t dsa
scp ~/.ssh/id_dsa.pub burly:.ssh/authorized_keys2
ssh-agent sh -c 'ssh-add < /dev/null && bash'
1. Edit /etc/yp.conf
2. Update authconfig
3. Update /etc/nsswitch.conf
76
Solaris and Linux General Information
OS: RHEL5.3 iSCSI Target; Solaris 10 U6 LDOM initiator Configuring iSCSI Target
Server on RHEL 5.3 - original doc located at http://pitmanweb.com/blog/index.php?
blog=2&title=linux_serve_iscsi_from_redhat_el5_rhel5&more=1&c=1&tb=1&pb=1
Side Note: RHEL 5.3 knowledge-base indicates the existence of the TGT framework and a tgtadm
command.
This is part of the “RHEL Cluster-Storage” Channel, which I do not have access too. Therefore
I ended up using the iscsitarget-0.4.15.tar.gz
# cd /usr/local/src
## wget \
easynews.dl.sourceforge.net/sourceforge/iscsitarget/\
iscsitarget-0.4.15.tar.gz#
# tar zxvf iscsitarget-0.4.15.tar.gz #
# cd iscsitarget-0.4.15#
# make#
# make install
/etc/ietd.conf
iSNSServer IP_OF_INTERFACE_TO_SHARE_OVER#
Target iqn.2008-02.com.domain:storage.disk2.host.domain#
Lun 0 Path=/dev/sdb,Type=blockio# MaxConnections 2#
/etc/initiators.deny.
ALL ALL
#
/etc/initiators.allow
iqn.2008-02.com.domain:storage.disk2.host.domain \
HOST_ONE_IP, HOST_TWO_IP
# /etc/init.d/iscsi-target start#
# chkconfig –levels 345 iscsi-target on
# devfsadm -c iscsi
77
Solaris and Linux General Information
mode= — Specifies one of four policies allowed for the bonding module. Acceptable values for this
parameter are:
# 1 — Sets an active-backup policy for fault tolerance. Transmissions are received and sent out via the
first available bonded slave interface. Another bonded slave interface is only used if the active bonded
slave interface fails.
# 2 — Sets an XOR (exclusive-or) policy for fault tolerance and load balancing. Using this method,
the interface matches up the incoming request's MAC address with the MAC address for one of the
slave NICs. Once this link is established, transmissions are sent out sequentially beginning with the
first available interface.
# 3 — Sets a broadcast policy for fault tolerance. All transmissions are sent on all slave interfaces.
# 4 — Sets an IEEE 802.3ad dynamic link aggregation policy. Creates aggregation groups that share the
same speed and duplex settings. Transmits and receives on all slaves in the active aggregator. Requires
a switch that is 802.3ad compliant.
# 5 — Sets a Transmit Load Balancing (TLB) policy for fault tolerance and load balancing. The outgoing
traffic is distributed according to the current load on each slave interface. Incoming traffic is received
by the current slave. If the receiving slave fails, another slave takes over the MAC address of the failed
slave.
# 6 — Sets an Active Load Balancing (ALB) policy for fault tolerance and load balancing. Includes
transmit and receive load balancing for IPV4 traffic. Receive load balancing is achieved through ARP
negotiation.
3. Update /etc/sysconfig/network-scripts/
ifcfg-bond0
ifcfg-eth0
ifcfg-eth1
DEVICE=eth0
78
Solaris and Linux General Information
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=none
MII_NOT_SUPPORTED=yes
The DEVICE= section should reflect the interface the file relates
to (ifcfg-eth1 should have DEVICE=eth1). The MASTER= section should
indicate the bonded interface to be used. Assign both e1000 devices
to bond0.The bond0 file contains the actual IP address information:
DEVICE=bond0
IPADDR=192.168.1.1
NETMASK=255.255.255.0
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
MII_NOT_SUPPORTED=yes
net.core.rmem_default = 262144
net.core.rmem_max = 262144
net.core.wmem_default = 262144
net.core.wmem_max = 262144
net.ipv4.tcp_rmem = 4096 16384 131072
net.ipv4.tcp_rmem = 4096 87380 174760
The default and max settings in net.ipv4.tcp_rmem would be overwritten with 262144 and the default and
max settings in net.ipv4.tcp_rmem would be overwritten with 262144. So the net.ipv4 settings are not
needed unless you wanted to define higher TCP settings than what you defined in the net.core settings.
This may explain why Oracle does not recommend them under normal circumstances.
1. /proc/sys/net/ipv4/tcp_wmen - net.ipv4.tcp_wmem
net.ipv4.tcp_wmem deals with per socket memory usage for autotuning. The first value is the minimum
number of bytes allocated for the socket's send buffer. The second value is the default (overridden by
wmem_default) to which the buffer can grow under non-heavy system loads. The third value is the
maximum send buffer space (overridden by wmem_max)
2. /proc/sys/net/ipv4/tcp_rmen - net.ipv4.tcp_rmem
net.ipv4.tcp_rmem refers to receive buffers for autotuning and follows the same rules as tcp_wmem,
meaning the second value is the default (overridden by rmem_default) The third value is the maximum
(overridden by rmem_max)
79
Solaris and Linux General Information
3. /proc/sys/net/ipv4/ip_local_port_range - net.ipv4.ip_local_port_range
Defines the local port range that is used by TCP and UDP to choose the local port. The first number
is the first, the second the last local port number. The default value depends on the amount of memory
available on the system: > 128MB 32768 - 61000, < 128MB 1024 - 4999 or even less.
This number defines number of active connections, which this system can issue simultaneously to
systems not supporting TCP extensions (timestamps). With tcp_tw_recycle enabled, range 1024 - 4999
is enough to issue up to 2000 connections per second to systems supporting timestamps.
cat /sys/class/scsi_host/host*/state
cat /sys/class/fc_host/host*/port_state
cat /sys/class/fc_host/host*/port_name
cat /sys/class/fc_remote_ports/rport*/node_name
cat /sys/class/fc_remote_ports/rport*/port_id
Manually add and remove SCSI disks by echoing the /proc or /sys filesystem
You can use the following commands to manually add and remove SCSI disk.
Note
In the following command examples, H, B, T, L, are the host, bus, target, and LUN IDs for the
device.
You can unconfigure and remove an unused SCSI disk with the following command:
80
Solaris and Linux General Information
If the driver cannot be unloaded and loaded again, and you know the host, bus, target and LUN IDs for
the new devices, you can add them through the /proc/scsi/scsi file using the following command:
For Linux 2.6 kernels, devices can also be added and removed through the /sys filesystem. Use the
following command to remove a disk from the kernel’s recognition:
or, as a possible variant on other 2.6 kernels, you can use the command:
Note
The Linux kernel does not assign permanent names for the fabric devices in the /dev directory.
Device file names are assigned in the order in which devices are discovered during the bus
scanning. For example, a LUN might be /dev/sda. After a driver reload, the same LUN might
become /dev/sdce. A fabric reconfiguration might also result in a shift in the host, bus, target and
LUN IDs, which makes it unreliable to add specific devices through the /proc/scsi/scsi file.
# Check all pids for this port, then list that process
for f in $pids
do
/usr/proc/bin/pfiles $f 2>/dev/null \
| /usr/xpg4/bin/grep -q "port: $ans"
if [ $? -eq 0 ] ; then
echo "$line\nPort: $ans is being used by PID: \c"
81
Solaris and Linux General Information
# cd /etc/sysconfig/
# vi network
HOSTNAME=newhostname
# hostname newhostname
# ethtool eth0
82
Solaris and Linux General Information
# mii-tool -F 100baseTx-HD
# mii-tool -F 10baseT-HD
Hardening Linux
1. Restrict SU access to accounts through PAM and Group Access
# groupadd rootmembers
# groupadd oraclemembers
# groupadd postgresmembers
/etc/pam d/su
root
oracle
postgres
83
Solaris and Linux General Information
oracle
postgres
# netstat -tulp
84
Solaris and Linux General Information
The inittab file /etc/inittab also describes which processes are started at bootup and during normal
operation. For example, Oracle uses it to start cluster services at bootup. Therefore, it is recommended
to ensure that all entries in /etc/inittab are legitimate in your environment. I would at least remove the
CTRL-ALT-DELETE trap entry to prevent accidental reboots:
The default runlevel should be set to 3 since in my opinion X11 (X Windows System) should not be
running on a production server. In fact, it shouldn't even be installed.
4. TCP Wrappers
ALL: ALL
85
Solaris and Linux General Information
To accept incoming SSH connections from e.g. nodes rac1cluster, rac2cluster and rac3cluster, add the
following line to /etc/hosts.allow:
To accept incoming SSH connections from all servers from a specific network, add the name of the
subnet to /etc/hosts.allow. For example:
To accept incoming portmap connections from IP address 192.168.0.1 and subnet 192.168.5, add the
following line to /etc/hosts.allow:
To accept connections from all servers on subnet .subnet.example.com but not from server
cracker.subnet.example.com, you could add the following line to /etc/hosts.allow:
Here are other examples that show some features of TCP wrapper: If you just want to restrict ssh
connections without configuring or using /etc/hosts.deny, you can add the following entries to /etc/
hosts.allow:
The version of TCP wrapper that comes with Red Hat also supports the extended options documented
in the hosts_options(5) man page. Here is an example how an additional program can be spawned in
e.g. the /etc/hosts.allow file:
For information on the % expansions, see "man 5 hosts_access". The TCP wrapper is quite flexible. And
xinetd provides its own set of host-based and time-based access control functions. You can even tell
xinetd to limit the rate of incoming connections. I recommend reading various documentations about
the Xinetd super daemon on the Internet.
A "SYN Attack" is a denial of service attack that consumes all the resources on a machine. Any
server that is connected to a network is potentially subject to this attack. To enable TCP SYN Cookie
Protection, edit the /etc/sysctl.conf file and add the following line:
net.ipv4.tcp_syncookies = 1
ICMP redirects are used by routers to tell the server that there is a better path to other networks than
the one chosen by the server. However, an intruder could potentially use ICMP redirect packets to alter
the hosts's routing table by causing traffic to use a path you didn't intend. To disable ICMP Redirect
Acceptance, edit the /etc/sysctl.conf file and add the following line:
net.ipv4.conf.all.accept_redirects = 0
86
Solaris and Linux General Information
IP spoofing is a technique where an intruder sends out packets which claim to be from another host by
manipulating the source address. IP spoofing is very often used for denial of service attacks. For more
information on IP Spoofing, I recommend the article IP Spoofing: Understanding the basics.
To enable IP Spoofing Protection, turn on Source Address Verification. Edit the /etc/sysctl.conf file
and add the following line:
net.ipv4.conf.all.rp_filter = 1
If you want or need Linux to ignore ping requests, edit the /etc/sysctl.conf file and add the following
line: This cannot be done in many environments.
net.ipv4.icmp_echo_ignore_all = 1
87
Chapter 9. Solaris 10 Notes
Link Aggregation
1. Show all the data-links
# dladm show-link
vsw0 type: non-vlan mtu: 1500 device: vsw0
e1000g0 type: non-vlan mtu: 1500 device: e1000g0
e1000g1 type: non-vlan mtu: 1500 device: e1000g1
e1000g2 type: non-vlan mtu: 1500 device: e1000g2
# dladm show-linkprop
LINK PROPERTY VALUE DEFAULT POSSIBLE
vsw0 zone -- -- --
e1000g0 zone -- -- --
e1000g1 zone -- -- --
e1000g2 zone -- -- --
Note
Link aggregation, or IEEE 802.3ad, is a term which describes using multiple Ethernet network
cables/ports in parallel to increase the link speed beyond the limits of any one single cable or
port, and to increase the redundancy for higher availability. Here is the syntax to create aggr
using dladm. You can use any number of data-link interfaces to create an aggr.
Now this creates and aggregate called "aggr1". You can plumb this, using "ifconfig plumb"
and assign IP address to it. The Link aggregation must be configured on the network switch
also. The policy and and aggregated interfaces must configured identically on the other end
of the ethernet cables . The example creates Link Aggregation Control Protocol (LACP) in
passive mode to control simultaneous transmission on multiple interfaces. Any single stream
is transmitted completely on an individual interface, but multiple simultaneous streams can
be active across all interfaces.
# dladm show-aggr
key: 1 (0x0001) policy: L4 address: XX:XX:XX:XX:XX (auto)
device address speed duplex link state
e1000g0 XX:XX:XX:XX:XX 0 Mbps half unknown standby
e1000g1 <unknown> 0 Mbps half unknown standby
e1000g2 <unknown> 0 Mbps half unknown standby
88
Solaris 10 Notes
# dladm show-aggr -s
key: 1 ipackets rbytes opackets obytes %ipkts %opkts
Total 0 0 0 0
e1000g0 0 0 0 0 - -
e1000g1 0 0 0 0 - -
e1000g2 0 0 0 0 - -
# dladm show-link -s
ipackets rbytes ierrors opackets obytes oerrors
vsw0 225644 94949 0 44916 29996 0
e1000g0 0 0 0 0 0 0
e1000g1 0 0 0 0 0 0
e1000g2 0 0 0 0 0 0
Link Aggregation
1. Show all the data-links
# dladm show-link
vsw0 type: non-vlan mtu: 1500 device: vsw0
e1000g0 type: non-vlan mtu: 1500 device: e1000g0
e1000g1 type: non-vlan mtu: 1500 device: e1000g1
e1000g2 type: non-vlan mtu: 1500 device: e1000g2
# dladm show-linkprop
LINK PROPERTY VALUE DEFAULT POSSIBLE
vsw0 zone -- -- --
e1000g0 zone -- -- --
e1000g1 zone -- -- --
e1000g2 zone -- -- --
Note
Link aggregation, or IEEE 802.3ad, is a term which describes using multiple Ethernet network
cables/ports in parallel to increase the link speed beyond the limits of any one single cable or
port, and to increase the redundancy for higher availability. Here is the syntax to create aggr
using dladm. You can use any number of data-link interfaces to create an aggr.
Now this creates and aggregate called "aggr1". You can plumb this, using "ifconfig plumb"
and assign IP address to it. The Link aggregation must be configured on the network switch
also. The policy and and aggregated interfaces must configured identically on the other end
of the ethernet cables . The example creates Link Aggregation Control Protocol (LACP) in
passive mode to control simultaneous transmission on multiple interfaces. Any single stream
is transmitted completely on an individual interface, but multiple simultaneous streams can
be active across all interfaces.
89
Solaris 10 Notes
# dladm show-aggr
key: 1 (0x0001) policy: L4 address: XX:XX:XX:XX:XX (auto)
device address speed duplex link state
e1000g0 XX:XX:XX:XX:XX 0 Mbps half unknown standby
e1000g1 <unknown> 0 Mbps half unknown standby
e1000g2 <unknown> 0 Mbps half unknown standby
# dladm show-aggr -s
key: 1 ipackets rbytes opackets obytes %ipkts %opkts
Total 0 0 0 0
e1000g0 0 0 0 0 - -
e1000g1 0 0 0 0 - -
e1000g2 0 0 0 0 - -
# dladm show-link -s
ipackets rbytes ierrors opackets obytes oerrors
vsw0 225644 94949 0 44916 29996 0
e1000g0 0 0 0 0 0 0
e1000g1 0 0 0 0 0 0
e1000g2 0 0 0 0 0 0
IPMP Overview
1. Preventing Applications From Using Test Addresses
After you have configured a test address, you need to ensure that this address is not used by applications.
Otherwise, if the interface fails, the application is no longer reachable because test addresses do not
fail over during the failover operation. To ensure that IP does not choose the test address for normal
applications, mark the test address as deprecated.
IPv4 does not use a deprecated address as a source address for any communication, unless an application
explicitly binds to the address. The in.mpathd daemon explicitly binds to such an address in order to
send and receive probe traffic.
Because IPv6 link-local addresses are usually not present in a name service, DNS and NIS applications
do not use link-local addresses for communication. Consequently, you must not mark IPv6 link-local
addresses as deprecated.
IPv4 test addresses should not be placed in the DNS and NIS name service tables. In IPv6, link-local
addresses are not normally placed in the name service tables.
The standby interface in an IPMP group is not used for data traffic unless some other interface in the
group fails. When a failure occurs, the data addresses on the failed interface migrate to the standby
interface. Then, the standby interface is treated the same as other active interfaces until the failed
interface is repaired. Some failovers might not choose a standby interface. Instead, these failovers might
choose an active interface with fewer data addresses that are configured as UP than the standby interface.
You should configure only test addresses on a standby interface. IPMP does not permit you to add a
data address to an interface that is configured through the ifconfig command as standby. Any attempt
90
Solaris 10 Notes
to create this type of configuration will fail. Similarly, if you configure as standby an interface that
already has data addresses, these addresses automatically fail over to another interface in the IPMP
group. Due to these restrictions, you must use the ifconfig command to mark any test addresses as
deprecated and -failover prior to setting the interface as standby. To configure standby interfaces, refer
to How to Configure a Standby Interface for an IPMP Group.
The in.mpathd daemon performs probe-based failure detection on each interface in the IPMP group
that has a test address. Probe-based failure detection involves the sending and receiving of ICMP probe
messages that use test addresses. These messages go out over the interface to one or more target systems
on the same IP link. For an introduction to test addresses, refer to Test Addresses. For information on
configuring test addresses, refer to How to Configure an IPMP Group With Multiple Interfaces.
The in.mpathd daemon determines which target systems to probe dynamically. Routers that are
connected to the IP link are automatically selected as targets for probing. If no routers exist on the link,
in.mpathd sends probes to neighbor hosts on the link. A multicast packet that is sent to the all hosts
multicast address, 224.0.0.1 in IPv4 and ff02::1 in IPv6, determines which hosts to use as target systems.
The first few hosts that respond to the echo packets are chosen as targets for probing. If in.mpathd
cannot find routers or hosts that responded to the ICMP echo packets, in.mpathd cannot detect probe-
based failures.
You can use host routes to explicitly configure a list of target systems to be used by in.mpathd. For
instructions, refer to Configuring Target Systems.
To ensure that each interface in the IPMP group functions properly, in.mpathd probes all the targets
separately through all the interfaces in the IPMP group. If no replies are made in response to five
consecutive probes, in.mpathd considers the interface to have failed. The probing rate depends on the
failure detection time (FDT). The default value for failure detection time is 10 seconds. However,
you can tune the failure detection time in the /etc/default/mpathd file. For instructions, go to How to
Configure the /etc/default/mpathd File.
For a repair detection time of 10 seconds, the probing rate is approximately one probe every two
seconds. The minimum repair detection time is twice the failure detection time, 20 seconds by default,
because replies to 10 consecutive probes must be received. The failure and repair detection times apply
only to probe-based failure detection.
Note
In an IPMP group that is composed of VLANs, link-based failure detection is implemented
per physical-link and thus affects all VLANs on that link. Probe-based failure detection is
performed per VLAN-link. For example, bge0/bge1 and bge1000/bge1001 are configured
together in a group. If the cable for bge0 is unplugged, then link-based failure detection will
report both bge0 and bge1000 as having instantly failed. However, if all of the probe targets
on bge0 become unreachable, only bge0 will be reported as failed because bge1000 has its
own probe targets on its own VLAN.
91
Solaris 10 Notes
accomplish probe-based failure detection by setting up host routes in the routing table as probe targets.
Any host routes that are configured in the routing table are listed before the default router. Therefore, IPMP
uses the explicitly defined host routes for target selection. You can use either of two methods for directly
specifying targets: manually setting host routes or creating a shell script that can become a startup script.
Consider the following criteria when evaluating which hosts on your network might make good targets.
• Make sure that the prospective targets are available and running. Make a list of their IP addresses.
• Ensure that the target interfaces are on the same network as the IPMP group that you are configuring.
• The netmask and broadcast address of the target systems must be the same as the addresses in the IPMP
group.
• The target host must be able to answer ICMP requests from the interface that is using probe-based
failure detection.
1. Log in with your user account to the system where you are configuring probe-based failure detection
Replace the values of destination-IP and gateway-IP with the IPv4 address of the host to be used as a
target. For example, you would type the following to specify the target system 192.168.85.137, which
is on the same subnet as the interfaces in IPMP group testgroup1.
case "$1" in
'start')
/usr/bin/echo "Adding static routes for use as IPMP targets"
for target in $TARGETS; do
/usr/sbin/route add -host $target $target
done
;;
'stop')
/usr/bin/echo "Removing static routes for use as IPMP targets"
for target in $TARGETS; do
/usr/sbin/route delete -host $target $target
done
;;
esac
92
Solaris 10 Notes
After a typical software installation, there can be a half dozen or more processes that need to be started
and stopped during system startup and shutdown. In addition, these processes may depend on each other
and may need to be monitored and restarted if they fail. For each process, these are the logical steps
that need to be done to incorporate these as services in SMF:
a. Create a service manifest file.
b. Create a methods script file to define the start, stop, and restart methods for the service.
Create the manifest file according to the description in the smf_method(5) man page. For clarity, this
file should be placed in a directory dedicated to files related to the application. In fact, the service
will be organized into a logical folder inside SMF, so having a dedicated folder for the files related
to the application makes sense. However, there is no specific directory name or location requirement
enforced inside SMF.
In the example, the OMR service will be organized in SMF as part of the SAS application folder.
This is a logical grouping; there is no physical folder named sas associated with SMF. However,
when managing the service, the service will be referred to by application/sas/metadata. Other SAS-
related processes can later be added and identified under application/sas as well. For the example,
the file /var/svc/manifest/application/sas/metadata.xml should be created containing the following:
<?xml version="1.0"?>
<!DOCTYPE service_bundle
SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<dependency
name='multi-user-server'
grouping='optional_all'
type='service'
restart_on='none'>
<service_fmri value='svc:/milestone/multi-user-server'/>
</dependency>
<exec_method
type='method'
name='start'
exec='/lib/svc/method/sas/metadata %m'
timeout_seconds='60'>
93
Solaris 10 Notes
<method_context>
<method_credential user='sas' />
</method_context>
</exec_method>
<exec_method
type='method'
name='restart'
exec='/lib/svc/method/sas/metadata %m'
timeout_seconds='60'>
<method_context>
<method_credential user='sas' />
</method_context>
</exec_method>
<exec_method
type='method'
name='stop'
exec='/lib/svc/method/sas/metadata %m'
timeout_seconds='60' >
<method_context>
<method_credential user='sas' />
</method_context>
</exec_method>
<template>
<common_name>
<loctext xml:lang='C'>
SAS Metadata Service
</loctext>
</common_name>
<documentation>
<doc_link name='sas_metadata_overview' iri=
'http://www.sas.com/technologies/bi/appdev/base/metadatasrv.html'
/>
<doc_link name='sas_metadata_install' uri=
'http://support.sas.com/rnd/eai/openmeta/v9/setup'/>
</documentation>
</template>
</service>
</service_bundle>
The manifest file basically consists of two tagged stanzas that have properties that define how the
process should be started, stopped, and restarted and also define any dependencies. The first tag,
<service_bundle> defines the name of the service bundle that will be used to group services and
as part of the parameters in svcs commands (svcs, svcmgr, and so on). The interior tag, <service>,
defines a specific process, its dependencies, and how to manipulate the process. Please see the man
page for service_bundle(4) for more information on the format of manifest files.
94
Solaris 10 Notes
Create the methods scripts. This file is analogous to the traditional rc scripts used in previous versions
of the Solaris OS. This file should be a script that successfully starts, stops, and restarts the process.
This script must be executable for all the users who might manage the service, and it must be placed
in the directory and file name referenced in the exec properties of the manifest file. For the example
in this procedure, the correct file is /lib/svc/method/sas/metadata, based on the manifest file built in
Step 1. See the man page for smf_method(5) for more information on method scripts.
#!/sbin/sh
# Start/stop client SAS MetaData service
#
.. /lib/svc/share/smf_include.sh
SASDIR=/d0/sas9-1205
SRVR=MSrvr
CFG=$SASDIR/SASMain/"$SRVR".sh
case "$1" in
'start')
$CFG start
sleep 2
;;
'restart')
$CFG restart
sleep 2
;;
'stop')
$CFG stop
;;
*)
echo "Usage: $0 { start | stop }"
exit 1
;;
esac
exit $SMF_EXIT_OK
Validate and import the manifest file into the Solaris service repository to create the service in SMF
and make the service available for manipulation. The following commands shows the correct file
name to use for the manifest in this example.
# svccfg
svc:> validate /var/svc/manifest/application/sas/metadata.xml
svc:> import /var/svc/manifest/application/sas/metadata.xml
svc:> quit
d. Enable Service
Enable the service using the svcadm command. The -t switch allows you to test the service definition
without making the definition persistent. You would exclude the -t switch if you wanted the
definition to be a permanent change that persists between reboots.
e. Verify Service
95
Solaris 10 Notes
Verify that the service is online and verify that the processes really are running by using the svcs
command.
Now, in the example, both the OMR process (above) and the Object Spawner process were to be
configured. The Object Spawner is dependent on the OMR. The remainder of this document describes
configuring the dependent Object Spawner process.
The manifest file for the Object Spawner service is similar to the manifest file used for the OMR
service. There are a few small changes and a different dependency. The differences are highlighted
in bold in the following:
<?xml version="1.0">
<!DOCTYPE service_bundle
SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<exec_method
type='method'
name='restart'
exec='/lib/svc/method/sas/objectspawner %m'
96
Solaris 10 Notes
timeout_seconds='60'>
<method_context>
<method_credential user='sas' />
</method_context>
</exec_method>
<exec_method
type='method'
name='stop'
exec='/lib/svc/method/sas/ objectspawner %m'
timeout_seconds='60' >
<method_context>
<method_credential user='sas' />
<method_context>
<exec_method>
<template>
<common_name>
<loctext xml:lang='C'>
SAS Object Spawner Service
</loctext>
</common_name>
<documentation>
<doc_link name='sas_metadata_overview' iri=
'http://www.sas.com/technologies/bi/appdev/base/metadatasrv.html'
/>
<doc_link name='sas_metadata_install' uri=
'http://support.sas.com/rnd/eai/openmeta/v9/setup'/>
</documentation>
</template>
</service>
</service_bundle>
#!/sbin/sh
# Start/stop client SAS Object Spawner service
#
.. /lib/svc/share/smf_include.sh
SASDIR=/d0/sas9-1205
SRVR=ObjSpa
CFG=$SASDIR/SASMain/"$SRVR".sh
case "$1" in
'start')
$CFG start
sleep 2
;;
97
Solaris 10 Notes
'restart')
$CFG restart
sleep 2
;;
'stop')
$CFG stop
;;
*)
echo "Usage: $0 { start | stop }"
exit 1
;;
esac
exit $SMF_EXIT_OK
Validate and import the manifest file in the same manner as was used for the OMR service: Note
that application shortened to appl for documentation reasons.
# svccfg
svc:> validate /var/svc/manifest/appl/sas/objectspawner.xml
svc:> import /var/svc/manifest/appl/sas/objectspawner.xml
svc:> quit
d. Enable Service
Enable the new service in the same manner as was used for the OMR service:
Finally, verify that the service is up and running in the same manner as was used for the OMR service:
MPXIO
1. Solaris 10 Configuration - CLI
# stmsboot -e
/kernel/drv/fp.conf
mpxio-disable="no";
98
Solaris 10 Notes
# stmsboot -L
non-STMS device name STMS device name
------------------------------------------------------
/dev/rdsk/c1t50060E801049CF50d0 \
/dev/rdsk/c2t4849544143484920373330343031383130303030d0
/dev/rdsk/c1t50060E801049CF52d0 \
/dev/rdsk/c2t4849544143484920373330343031383130303030d0
kernel/drv/qlc.conf:
7. Display Properties
# luxadm display \
/dev/rdsk/c6t600C0FF000000000086AB238B2AF0600d0s2
/dev/rdsk/c6t600C0FF000000000086AB238B2AF0600d0s2
/devices/scsi_vhci/ssd@g600c0ff000000000086ab238b2af0600:c,raw
Controller /devices/pci@9,600000/SUNW,qlc@1/fp@0,0
99
Solaris 10 Notes
Bugs/Features:
1. New GUI based Network utility is buggy and probably should not be used with this device. Instead
use a wificonfig profile
2. If attached during boot and shutdown, I get a flood of debugging output and it will not properly start or
stop. I have to detach before halting and keep disconnected during the boot.
Problems during initial configuration beyond the bugs above: I had to track down the device alias and
assign it to the rum driver, this did not happen automatically.
# prtconf -v >/tmp/prtconf.out
# vi /tmp/prtconf.out
[-cut-]
value='Cisco-Linksys'
[-cut-]
name='usb-product-id' type=int items=1
value=00000020
name='usb-vendor-id' type=int items=1
value=000013b1
[-cut-]
2. Combine these two numbers with the device type in order for mapping in the /etc/driver_aliases file
rum “usb13b1,20”
# init 6
100
Solaris 10 Notes
6. Start an IP on your device, or replace dhcp with an appropriate IP address and configuration
$ cat /etc/hostname.fjgi0
whpsedwdb2 netmask + broadcast + group ipmp0 up
$ cat /etc/hostname.fjgi1
group ipmp0 standby up
MultiNICB mnicb (
Critical = 0
UseMpathd = 1
MpathdCommand = "/usr/lib/inet/in.mpathd"
Device = { fjgi0, fjgi1 }
ConfigCheck = 0
GroupName = ipmp0
IgnoreLinkStatus = 0
)
# /usr/sbin/if_mpadm -d ce0#
Feb 13 14:47:31 oraman in.mpathd[185]: Successfully
failed over from NIC ce0 to NIC ce4
AWKSCRIPT='
NF == 0 {getline line;}
$1 == "obytes64" { obytes = $2; }
$1 == "rbytes64" { rbytes = $2; }
$1 == "snaptime" {
time = $2;
obytes_curr = obytes - prev_obytes;
rbytes_curr = rbytes - prev_rbytes;
elapse = (time - prev_time)*1e6;
101
Solaris 10 Notes
elapse = (elapse==0)?1:elapse;
printf "Outbound %f MB/s; Inbound %f MB/s\n", \
obytes_curr/elapse, rbytes_curr/elapse;
prev_obytes = obytes;
prev_rbytes = rbytes;
prev_time = time;
}
'
userName=
password=
hostName=
subscriptionKey=
portalEnabled=false
proxyHostName=
proxyPort=
proxyUserName=
proxyPassword=
NFS Performance
nfsstat -s reports server-side statistics. In particular, the following are important:
• nullrecv: Number of times an RPC call was not available even though it was believed to have been
received.
• badlen: Number of RPC calls with a length shorter than that allowed for RPC calls.
• xdrcall: Number of RPC calls whose header could not be decoded by XDR (External Data
Representation).
• null: Null calls are made by the automounter when looking for a server for a filesystem.
Sun recommends the following tuning actions for some common conditions:
• writes > 10%: Write caching (either array-based or host-based, such as a Prestoserv card) would speed
up operation.
102
Solaris 10 Notes
• badcalls >> 0: The network may be overloaded and should be checked out. The rsize and wsize mount
options can be set on the client side to reduce the effect of a noisy network, but this should only be
considered a temporary workaround.
• readlink > 10%: Replace symbolic links with directories on the server.
• getattr > 40%: The client attribute cache can be increased by setting the actimeo mount option. Note that
this is not appropriate where the attributes change frequently, such as on a mail spool. In these cases,
mount the filesystems with the noac option.
nfsstat -c reports client-side statistics. The following statistics are of particular interest:
• retrans: Total number of retransmissions. If this number is larger than 5%, the requests are not reaching
the server consistently. This may indicate a network or routing problem.
• badxid: Number of times a duplicate acknowledgement was received for a single request. If this number
is roughly the same as badcalls, the network is congested. The rsize and wsize mount options can be set
on the client side to reduce the effect of a noisy network, but this should only be considered a temporary
workaround. If on the other hand, badxid=0, this can be an indication of a slow network connection.
• timeout: Number of calls that timed out. If this is roughly equal to badxid, the requests are reaching the
server, but the server is slow.
• wait: Number of times a call had to wait because a client handle was not available.
• null: A large number of null calls indicates that the automounter is retrying the mount frequently. The
timeo parameter should be changed in the automounter configuration.
• srtt: Smoothed round-trip time. If this number is larger than 50ms, the mount point is slow.
103
Solaris 10 Notes
change on SXCE update the /lib/svc/method/svc-iscsitgt file and replace the /usr/
sbin/iscsitgtd execution with the following:
Then restart the iscsitgtd process via svcsadm restart iscsitgt. Note that opensolaris, Solaris 10
U6 and SXCE b110 all handle the start of this process differently.
Performance
• iSCSI performance can be quite good, especially if you follow a few basic rules
• Ensure that you are using the performance guidance listed in bug #6457694 on opensolaris.org
• Increase send and receive buffers, disable the nagle algorithm and make sure TCP window scaling
is working correctly
• Ttcp and netperf are awesome tools for benchmarking network throughput, and measuring the impact
of a given network tunable
• As with security, performance is a complete presentation in and of itself. Please see the references if
your interested in learning more about tuning iSCSI communications for maximum
• The base directory is used to store the iSCSI target configuration data, and needs to be defined prior
to using the iSCSI target for the first time
• The backing store contains the physical storage that is exported as a target
• Flat files
• Physical devices
• To create a backing store from a ZFS volume, the zfs utility can be run with the create subcommand,
the create zvol option (“-V”), the size of the zvol to create, and the name to associate with the zvol:
104
Solaris 10 Notes
3. Once a backing store has been created, it can be exported as an iSCSI target with the iscsitadm "create"
command, the "target" subcommand, and by specifying the backing store type to use:
Or
• Access control lists (ACLs) can be used to limit the node names that are allowed to access a target
• To ease administration of ACLs, the target allows you to associate an alias with a node name (you can
retrieve the node name of a Solaris initiator by running the iscsiadm utility with the “list” command,
and “initiator-node” subcommand):
• After an alias is created, it can be added to a target’s ACL by passing the alias to the “target”
subcommands “-l” option:
105
Solaris 10 Notes
• The iscsiadm utility can be used to configure a discovery method and the discovery parameters
• Prior to using newly discovered targets, the devfsadm utility needs to be run to create device entries:
• Once the device nodes are created, the format utility can be used to label the new targets, and your
favorite file system management tool (e.g., mkfs, zpool, etc) can be used to convert the target(s) into
file systems:
1. The first step is to recreate the same slice arrangement on the second disk:
2. You can check both disks have the same VTOC using prtvtoc command
106
Solaris 10 Notes
# prtvtoc /dev/rdsk/c1t0d0s2
3. Now we have to create state database replicas on slice 7. We will be adding two replicas to each slice:
4. Since the database replicas are in place we can start creating metadevices. The following commands
will create metadevice d31 from slice c1t0d0s3, and metadevice d32 from slice c1t1d0s3. Then we
create mirror d30 with d31 attached as a submirror. Finally we will attach submirror d32 to mirror d30.
Once d32 is attached, the mirror d30 will automatically start syncing.
5. The procedure is the same for all other mirrors you might want to create. Root filesystem is slightly
different. First you will have to create your submirrors. Then you will have to attach submirror with
existing root filesystem, in this case d11, to the new mirror metadevice d10. Then you will have to
run metaroot command. It will alter / entry in /etc/vfstab. Finally, you flush the filesystem using lockfs
command and reboot.
# metaroot d10
# lockfs -fa
# init 6
6. When the system reboots, you can attach the second submirror to d10 as follows:
7. You can check the sync progress using metastat command. Once all mirrors are synced up the next
step is to configure the new swap metadevice, in my case d0, to be crash dump device. This is done
using dumpadm command:
# dumpadm
Dump content: kernel pages
Dump device: /dev/dsk/c1t0d0s0 (dedicated)
Savecore directory: /var/crash/ultra
107
Solaris 10 Notes
# dumpadm -d /dev/md/dsk/d0
8. Next is to make sure you can boot from the mirror - SPARC ONLY
a. The final step is to modify PROM. First we need to find out which two physical devices c1t0d0
and c1t1d0 refer to
# ls -l /dev/dsk/c1t0d0s1
lrwxrwxrwx 1 root root 43 Mar 4 14:38 /dev/dsk/c1t0d0s1 ->
../../devices/pci@1c,600000/scsi@2/sd@0,0:b
# ls -l /dev/dsk/c1t1d0s1
lrwxrwxrwx 1 root root 43 Mar 4 14:38 /dev/dsk/c1t1d0s1 ->
../../devices/pci@1c,600000/scsi@2/sd@1,0:b
b. The physical device path is everything starting from /pci…. Please make a note of sd towards the
end of the device string. When creating device aliases below, sd will have to be changed to disk.
Now we create two device aliases called root and backup_root. Then we set boot-device to be root
and backup_root. The :b refers to slice 1(root) on that particular disk.
# eeprom “use-nvramrc?=true”
# eeprom “nvramrc=devalias root /pci@1c,600000/scsi@2/disk@0,0 \
devalias backup_root /pci@1c,600000/scsi@2/disk@1,0#
# eeprom “boot-device=root:b backup_root:b net”
9. Next is to make sure you can boot from the mirror - Intel/AMD ONLY
# /sbin/installgrub /boot/grub/stage1 \
/boot/grub/stage2 /dev/rdsk/c0d0s0
10.If you are mirroring just the two internal drives, you will want to add the following line to /etc/
system to allow it to boot from a single drive. This will bypass the SVM Quorum rule
set md:mirrored_root_flag = 1
Example full run on amd system; disks are named after d[1,2-n Drive][partition number] And Metadevices
for the mirrors are named d[Boot Number]0[partition number] - example disk: d10 is drive 1 partition 0,
metadevice d100 is the 1st boot environment (live upgrade BE) partition 0. If applying the split mirror
alternate boot environment I would have the split off ABE as d200.
# format c1t1d0
108
Solaris 10 Notes
# prtvtoc /dev/rdsk/c1t0d0s2 \
| fmthard -s - /dev/rdsk/c1t1d0s2
# format
# metadb -a -f -c3 /dev/dsk/c1t0d0s7
# metadb -a -f -c3 /dev/dsk/c1t1d0s7
# metainit -f d10 1 1 c1t0d0s0
# metainit -f d20 1 1 c1t1d0s0
# metainit -f d11 1 1 c1t0d0s1
# metainit -f d21 1 1 c1t1d0s1
# metainit -f d13 1 1 c1t0d0s3
# metainit -f d23 1 1 c1t1d0s3
# metainit d100 -m d10
# metainit d101 -m d11
# metainit d103 -m d13
# metaroot d100
# echo 'set md:mirrored_root_flag = 1' \
>>/etc/system
# installgrub /boot/grub/stage1 \
/boot/grub/stage2 /dev/rdsk/c1t1d0s0
# lockfs -fa
# init 6
109
Solaris 10 Notes
/dev/dsk/c1t0d0s1 - - swap
/dev/md/dsk/d100 /dev/md/rdsk/d100 / ufs
/dev/dsk/c1t0d0s3 /dev/rdsk/c1t0d0s3 /zone ufs
# lockfs -fa
# init 6
Many modern RAID arrays just require you to take out the bad drive and plug in the new one, while
everything else is taken care of automatically. It’s not quite that easy on a Sun server, but it’s really just a
few simple steps. I just had to do this, so I thought I would write down the procedure here.
Let’s look at each step individually. In my case, c1t0d0 has failed, so first, I take a look at the status of my
meta databases. Below we can see the the replicas on that disk have write errors:
# metadb -i
flags first blk block count
Wm p l 16 8192 /dev/dsk/c1t0d0s3
W p l 8208 8192 /dev/dsk/c1t0d0s3
a p luo 16 8192 /dev/dsk/c1t1d0s3
a p luo 8208 8192 /dev/dsk/c1t1d0s3
110
Solaris 10 Notes
The replicas on c1t0d0s3 are dead to us, so let’s wipe them out!
# metadb -d c1t0d0s3
# metadb -i
The only replicas we have left are onc1t1d0s3, so I’m all clear to unconfigure the device. I run cfgadm
to get the c1 path:
# cfgadm -al
Now that the drive is configured and visible from within the format command, we can copy the partition
table from the remaining mirror member:
111
Solaris 10 Notes
And finally, I’m ready to replace the metadevices, syncing up the mirror and making things as good as
new. repeat for each mirrored partition
1. The first step is to recreate the same slice arrangement on the second disk:
2. You can check both disks have the same VTOC using prtvtoc command
# prtvtoc /dev/rdsk/c1t0d0s2
3. Now we have to create state database replicas on slice 7. We will be adding two replicas to each slice:
4. Since the database replicas are in place we can start creating metadevices. The following commands
will create metadevice d31 from slice c1t0d0s3, and metadevice d32 from slice c1t1d0s3. Then we
create mirror d30 with d31 attached as a submirror. Finally we will attach submirror d32 to mirror d30.
Once d32 is attached, the mirror d30 will automatically start syncing.
5. The procedure is the same for all other mirrors you might want to create. Root filesystem is slightly
different. First you will have to create your submirrors. Then you will have to attach submirror with
existing root filesystem, in this case d11, to the new mirror metadevice d10. Then you will have to
run metaroot command. It will alter / entry in /etc/vfstab. Finally, you flush the filesystem using lockfs
command and reboot.
# metaroot d10
# lockfs -fa
# init 6
112
Solaris 10 Notes
6. When the system reboots, you can attach the second submirror to d10 as follows:
7. You can check the sync progress using metastat command. Once all mirrors are synced up the next
step is to configure the new swap metadevice, in my case d0, to be crash dump device. This is done
using dumpadm command:
# dumpadm
Dump content: kernel pages
Dump device: /dev/dsk/c1t0d0s0 (dedicated)
Savecore directory: /var/crash/ultra
Savecore enabled: yes
# dumpadm -d /dev/md/dsk/d0
8. Next is to make sure you can boot from the mirror - SPARC ONLY
a. The final step is to modify PROM. First we need to find out which two physical devices c1t0d0
and c1t1d0 refer to
# ls -l /dev/dsk/c1t0d0s1
lrwxrwxrwx 1 root root 43 Mar 4 14:38 /dev/dsk/c1t0d0s1 ->
../../devices/pci@1c,600000/scsi@2/sd@0,0:b
# ls -l /dev/dsk/c1t1d0s1
lrwxrwxrwx 1 root root 43 Mar 4 14:38 /dev/dsk/c1t1d0s1 ->
../../devices/pci@1c,600000/scsi@2/sd@1,0:b
b. The physical device path is everything starting from /pci…. Please make a note of sd towards the
end of the device string. When creating device aliases below, sd will have to be changed to disk.
Now we create two device aliases called root and backup_root. Then we set boot-device to be root
and backup_root. The :b refers to slice 1(root) on that particular disk.
# eeprom “use-nvramrc?=true”
# eeprom “nvramrc=devalias root /pci@1c,600000/scsi@2/disk@0,0 \
devalias backup_root /pci@1c,600000/scsi@2/disk@1,0#
# eeprom “boot-device=root:b backup_root:b net”
9. If you are mirroring just the two internal drives, you will want to add the following line to /etc/
system to allow it to boot from a single drive. This will bypass the SVM Quorum rule
set md:mirrored_root_flag = 1
10.Enable the mirror disk to be bootable - used by both sparc and x64 systems; on x64 will update grub
113
Solaris 10 Notes
host:# format
(choose fdisk)
(create 100% Standard Solaris Partition over the full Disk)
1. Example
2. Option Description
• -S tells flarcreate to skip its size checks, normally it will estimate the size of the archive prior to
creating it, which can take a really really long time, this argument just lets us speed up the process
• -R specifies the root directory, by default its /, but I often supply it for completeness.
• -x specifies a directory to exclude from the archive, supply one -x per directory to exclude (ie: -x /
opt -x /export). NFS mounted filesystems are excluded by default, but again for completeness I tend
to put them in there anyway.
• (archivename).flar is the actual name of the output archive file. You can name it whatever you want,
but typically its wise to put the hostname, archive creation date, and a .flar extention in the filename
just to help identify it. The filename should be a absolute pathname, so since we've mounted our NFS
archive repository to /flash, we'll specify that path.
114
Solaris 10 Notes
-x /export/home/flar /export/home/flar/Snapshot.flar
2. Add FLAR Image to Jumpstart - /etc/bootparams - add_client.sh
./add_install_client -e 0:14:4f:23:ab:8f \
-s host:/flash/boot/sol10sparc \
-c host:/flash/boot/Profiles/Solaris10 \
-p host:/flash/boot/Sysidcfg/smro204 \
smro204.fmr.com sun4u
3. Recover Script - recover.pl
#!/usr/bin/perl
use Getopt::Long ;
$arch_location='/flasharchives/flar';
$boot_base='/flasharchives/boot';
GetOptions(
"list" => \$list,
"archive=s" => \$archive,
"configured" => \$configured,
"add" => \$addboot,
"remove=s" => \$rmboot
);
sub _list {
if ($archive) { &_details ; } else {
system("/flasharchives/bin/list_archives.pl");
exit ;
}
}
sub _details {
&_info_collection;
&_print_details;
}
sub _info_collection {
$addto = ();
@archinfo = ();
$ih = ();
chomp $archive;
next if $archive =~ /lost/;
next if $archive =~ /list/;
next if $archive =~ /boot/;
115
Solaris 10 Notes
sub _build {
&_info_collection ;
116
Solaris 10 Notes
$dump_sysidcfg .= "network_interface=ce4 \
{hostname=$inventory{$archive}{creation_node}.fmr.com \
default_route=172.26.21.1 \
ip_address=$inventory{$archive}{creation_node_ip}\
protocol_ipv6=no netmask=255.255.255.0}\n";
$dump_sysidcfg .= `cat $sysid_stock`;
open(SYSIDOUT, ">$sysidcfg");
print SYSIDOUT $dump_sysidcfg;
close SYSIDOUT;
# Add flar statment into custom rules file
117
Solaris 10 Notes
sub _print_details {
sub _list_existing {
118
Solaris 10 Notes
open(BOOTP, "/etc/bootparams") \
|| die "Bootparams does not exist, no systems set up \
for boot from flar\n";;
while (<BOOTP>) {
($node, @narg) = split(/\s+/,$_);
($n1,@rest) = split(/\W+/,$node);
chomp $rmboot;
chomp $n1;
if ($rmboot =~ /$n1/) {
foreach $i (@narg) {
if ($i =~ /root=/)
{
($j1, $path) = split(/:/, $i);
# Filter out Boot
($ipath,$Boot) =split(/Boot/, $path);
chomp $ipath;
print "cd $ipath \; ./rm_install_client $n1\n";
}
}
}
}
print "\n\n";
close BOOTP;
exit;
}
print "\n\n";
119
Solaris 10 Notes
#!/usr/bin/perl
$arch_location='/flasharchives/flar';
@archive_list=`ls $arch_location`;
print "\n\n";
foreach $archive (@archive_list) {
$addto = ();
@archinfo = ();
$ih = ();
chomp $archive;
next if $archive =~ /lost/;
next if $archive =~ /list/;
next if $archive =~ /boot/;
@archinfo = `flar -i $arch_location/$archive` ;
chomp @archinfo;
foreach $x (@archinfo) {
($item, $value ) = split(/=/,$x);
chomp $value;
if ($item =~ /creation_node/) {
$inventory{$archive}{creation_node} = $value; }
if ($item =~ /creation_date/) {
$inventory{$archive}{creation_date} = $value; }
if ($item =~ /creation_release/) {
$inventory{$archive}{creation_release} = $value;}
if ($item =~ /content_name/) {
$inventory{$archive}{content_name} = $value;}
}
}
write BOO;
format STDOUT=
@<<<<<<<<<<<<< @<<<<<< @<<<< @<<<<<<<<<< @<<<<<<<<<<<<<<<<
$key, $creation_node, $creation_release, $fid, $content_name
.
while (($key, $content) = each(%inventory)) {
120
Solaris 10 Notes
$creation_node = $inventory{$key}{creation_node};
$creation_date = $inventory{$key}{creation_date};
$creation_release = $inventory{$key}{creation_release};
$content_name = $inventory{$key}{content_name};
$fid = $inventory{$key}{fid};
write;
}
print "\n\n";
# mount -o remount,rw /
# cfgadm -c unconfigure c1
# cfgadm -c unconfigure c2
# devfsadm
# for dir in rdsk dsk
do
cd /dev/${dir}
disks=`ls c3t*`
for disk in $disks
do
newname="c1`echo $disk | awk '{print substr($1,3,6)}'`"
mv $disk $newname
done
done
ZFS Notes
Quick notes for ZFS commands
1. Take a snapshot
3. Rollback a snapshot
# cat ~user/.zfs/shapshot/mybackup_comment/ems.c
5. Create a clone
121
Solaris 10 Notes
9. Comments on Clones
A clone is a writable volume or file system whose initial contents are the same as the dataset from which
it was created. As with snapshots, creating a clone is nearly instantaneous, and initially consumes no
additional disk space
Clones can only be created from a snapshot. When a snapshot is cloned, an implicit dependency is
created between the clone and snapshot. Even though the clone is created somewhere else in the dataset
hierarchy, the original snapshot cannot be destroyed as long as the clone exists. The origin property
exposes this dependency, and the zfs destroy command lists any such dependencies, if they exist.
Clones do not inherit the properties of the dataset from which it was created. Rather, clones inherit
their properties based on where the clones are created in the pool hierarchy. Use the zfs get and zfs set
commands to view and change the properties of a cloned dataset. For more information about setting
ZFS dataset properties, see Setting ZFS Properties.
Because a clone initially shares all its disk space with the original snapshot, its used property is initially
zero. As changes are made to the clone, it uses more space. The used property of the original snapshot
does not consider the disk space consumed by the clone.
10.Creating a clone
To create a clone, use the zfs clone command, specifying the snapshot from which to create the clone,
and the name of the new file system or volume. The new file system or volume can be located anywhere
in the ZFS hierarchy. The type of the new dataset (for example, file system or volume) is the same type
as the snapshot from which the clone was created. You cannot create clone of a file system in a pool
that this different from where the original file system snapshot resides.
In the following example, a new clone named tank/home/ahrens/bug123 with the same initial contents
as the snapshot tank/ws/gate@yesterday is created.
In the following example, a cloned workspace is created from the projects/newproject@today snapshot
for a temporary user as projects/teamA/tempuser. Then, properties are set on the cloned workspace.
11.Destroying a clone
ZFS clones are destroyed by using the zfs destroy command. Clones must be destroyed before the parent
snapshot can be destroyed. For example:
122
Solaris 10 Notes
ZFS clones are destroyed by using the zfs destroy command. Clones must be destroyed before the parent
snapshot can be destroyed. For example:
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
zfzones 33.5M 7.78G 33.3M /zfzones
zfzones/zone1 24.5K 7.78G 24.5K /zfzones/zone1
zfzones/zone1@preid 0 - 24.5K -
zfzones/zone2 0 7.78G 24.5K /zfzones/zone2
zfzones/zone3 0 7.78G 24.5K /zfzones/zone3
zfzones/zone4 0 7.78G 24.5K /zfzones/zone4
zfzones/zone5 0 7.78G 24.5K /zfzones/zone5
zfzones/zone6 0 7.78G 24.5K /zfzones/zone6
zfzones/zone7 0 7.78G 24.5K /zfzones/zone7
zfzones/zone8 0 7.78G 24.5K /zfzones/zone8
ZFS ACL's
Quick notes for ZFS ACL commands
123
Solaris 10 Notes
$ ls -v file.1
-r--r--r-- 1 root root 206663 May 4 11:52 file.1
0:owner@:write_data/append_data/execute:deny
1:owner@:read_data/write_xattr/write_attributes\
/write_acl/write_owner
:allow
2:group@:write_data/append_data/execute:deny
3:group@:read_data:allow
4:everyone@:write_data/append_data/write_xattr\
/execute/write_attributes
/write_acl/write_owner:deny
5:eone@:read_data/read_xattr/read_attributes\
/read_acl/synchronize
:allow
• Remove Permissions
# ls -dv test.dir
124
Solaris 10 Notes
5:eone@:list_directory/read_data/read_xattr/execute/\
read_attributes
/read_acl/synchronize:allow
• Approximately 64 Kbytes of memory is consumed per mounted ZFS file system. On systems with
1,000s of ZFS file systems, we suggest that you provision 1 Gbyte of extra memory for every 10,000
mounted file systems including snapshots. Be prepared for longer boot times on these systems as well.
• Because ZFS caches data in kernel addressable memory, the kernel sizes will likely be larger than
with other file systems. You may wish to configure additional disk-based swap to account for this
difference for systems with limited RAM. You can use the size of physical memory as an upper
bound to the extra amount of swap space that might be required. In any case, you should monitor the
swap space usage to determine if swapping is occurring.
The ZFS adaptive replacement cache (ARC) tries to use most of a system's available memory to cache
file system data. The default is to use all of physical memory
except 1 Gbyte. As memory pressure increases, the ARC relinquishes memory. Consider limiting the
maximum ARC memory emstprint in the following situations:
• When a known amount of memory is always required by an application. Databases often fall into
this category.
• On platforms that support dynamic reconfiguration of memory boards, to prevent ZFS from growing
the kernel cage onto all boards.
• A system that requires large memory pages might also benefit from limiting the ZFS cache, which
tends to breakdown large pages into base pages.
• Finally, if the system is running another non-ZFS file system, in addition to ZFS, it is advisable to
leave some free memory to host that other file system's caches.
The trade off is to consider that limiting this memory emstprint means that the ARC is unable to cache
as much file system data, and this limit could impact performance. In general, limiting the ARC is
wasteful if the memory that now goes unused by ZFS is also unused by other system components.
Note that non-ZFS file systems typically manage to cache data in what is nevertheless reported as
free memory by the system. For information about tuning the ARC, see the following section: http://
www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Limiting_the_ARC_Cache
The ARC is where ZFS caches data from all active storage pools. The ARC grows and consumes
memory on the principle that no need exists to return data to the system while there is still plenty of
free memory. When the ARC has grown and outside memory pressure exists, for example, when a
new application starts up, then the ARC releases its hold on memory. ZFS is not designed to steal
memory from applications. A few bumps appeared along the way, but the established mechanism works
reasonably well for many situations and does not commonly warrant tuning. However, a few situations
stand out.
125
Solaris 10 Notes
• If a future memory requirement is significantly large and well defined, then it can be advantageous
to prevent ZFS from growing the ARC into it. So, if we know that a future application requires 20%
of memory, it makes sense to cap the ARC such that it does not consume more than the remaining
80% of memory.
• If the application is a known consumer of large memory pages, then again limiting the ARC prevents
ZFS from breaking up the pages and fragmenting the memory. Limiting the ARC preserves the
availability of large pages.
For theses cases, it can be desirable to limit the ARC. This will, of course, also limit the
amount of cached data and this can have adverse effects on performance. No easy way exists to
foretell if limiting the ARC degrades performance. If you tune this parameter, please reference
this URL in shell script or in an /etc/system comment. http://www.solarisinternals.com/wiki/
index.php/ZFS_Evil_Tuning_Guide#ARCSIZE You can also use the arcstat script available at http://
blogs.sun.com/realneel/entry/zfs_arc_statistics to check the arc size as well as other arc statistics
This syntax is provided starting in the Solaris 10 8/07 release and Nevada (build 51) release. For
example, if an application needs 5 GBytes of memory on a system with 36-GBytes of memory,
you could set the arc maximum to 30 GBytes, (0x780000000 or 32212254720 bytes). Set the
zfs:zfs_arc_max parameter in the /etc/system file:
/etc/system:
#!/bin/perl
use strict;
my $arc_max = shift @ARGV;
if ( !defined($arc_max) ) {
print STDERR "usage: arc_tune <arc max>\n";
exit -1;
}
$| = 1;
use IPC::Open2;
my %syms;
my $mdb = "/usr/bin/mdb";
open2(*READ, *WRITE, "$mdb -kw") || die "cannot execute mdb";
print WRITE "arc::print -a\n";
while(<READ>) {
my $line = $_;
126
Solaris 10 Notes
127
Chapter 10. VMWare ESX 3
Enable iSCSI Software Initiators
1. Enables the software iSCSI initiator.
# esxcfg-swiscsi -e
2. Configures the ESX Service Console firewall (iptables) to allow the software iSCSI traffic.
# esxcfg-firewall -e swISCSIClient
3. Sets the target IP address for the vmhba40 adapter (the software iSCSI initiator).
# esxcfg-rescan vmhba40
128
VMWare ESX 3
129
VMWare ESX 3
130
VMWare ESX 3
# /usr/bin/vmware-cmd
Connection Options:
-H <host> specifies an alternative host
(if set, -U and -P must also be set)
-O <port> specifies an alternative port
-U <username> specifies a user
-P <password> specifies a password
General Options:
-h More detailed help.
-q Quiet. Minimal output
-v Verbose.
Server Operations
# /usr/bin/vmware-cmd -l
VM Operations
#/usr/bin/vmware-cmd<cfg> getconnectedusers
#/usr/bin/vmware-cmd<cfg> getstate
131
VMWare ESX 3
#/usr/bin/vmware-cmd<cfg> getconfigfile
#/usr/bin/vmware-cmd<cfg> getheartbeat
#/usr/bin/vmware-cmd<cfg> getuptime
#/usr/bin/vmware-cmd<cfg> gettoolslastactive
#/usr/bin/vmware-cmd<cfg> hassnapshot
#/usr/bin/vmware-cmd<cfg> revertsnapshot
#/usr/bin/vmware-cmd<cfg> answer
Common Tasks
Expand a VM Disk to 20GB
Register/Un-Register a VMW
Start/Stop/Restart/Suspend a VMW
132
VMWare ESX 3
# esxcfg-mpath -l
# esxcfg-vmhbadevs
vmhba0:0:0 /dev/sda
#vmhba0:0:1 /dev/sdb#
vmhba0:0:2 /dev/sdc
#vmhba0:0:3 /dev/sdd#
vmhba2:0:0 /dev/sde
#vmhba2:1:0 /dev/sdf
# esxcfg-vmhbadevs -m
# esxcfg-route
# esxcfg-route 100.100.100.1
• create a empty folder on your harddisk where you will place your virtual disks.
133
VMWare ESX 3
• Disk size 20Gb or less or more. (do not allocate disk now)
• Define your destination path as created previously + name your disk DATA-SHARED
• elect the advanced options: select the virtual device node to "SCSI 1:0" and the mode to
"Independent" and "Persistent"
Go to the bottom of the vmx file. There you will see the following lines:
scsi1.present = "TRUE"
scsi1.sharedBus = "none"
scsi1.virtualDev = "lsilogic"
scsi1:0.present = "TRUE"
scsi1:0.fileName = "D:\Virtual Machines\Shared Disk\SHARED-DISK.vmdk"
disk.locking = "FALSE"
diskLib.dataCacheMaxSize = "0"
#!/usr/bin/perl
# vmclone.pl
if ( $< + $> != 0 ) {
print "Error: $0 needs to be run as the root user.\n";
exit 1;
}
134
VMWare ESX 3
if ( ! -d "$source" ) {
print "Error: Source directory '$source' does not exist.\n
Please specify a relative path to CWD or the full path\n";
exit 2;
}
if ( -d "$dest" ) {
print "Error: Destination directory '$dest' already exists.\n
You cannot overwrite an existing VM image with this tool.\n";
exit 3;
}
my $regexwarn = 0;
foreach (@ARGV) {
if ( ! /^s\/[^\/]+\/[^\/]+\/$/ ) {
$regexwarn = 1;
warn "Error: Invalid regex pattern in: $_\n";
}
}
exit 4 if $regexwarn == 1;
if ( ! mkdir "$dest" ) {
print "Error: Failed to create destination dir '$dest': $!\n";
exit 4;
}
135
VMWare ESX 3
exit 0;
sub copy_file_regex {
my $src = shift;
my $dst = shift;
my @regexs = @_;
my $buf = '';
my $regex = '';
sub copy_file_bin {
my ($src, $dst) = @_;
my $buf;
136
VMWare ESX 3
sub is_vmtextfile {
my $file = shift;
my $istxt = 0;
$istxt = 1 if ( $file =~ /\.(vmdk|vmx|vmxf|vmsd|vmsn)$/ );
$istxt = 0 if ( $file =~ /-flat\.vmdk$/ );
$istxt = 0 if ( $file =~ /-delta\.vmdk$/ );
return $istxt;
}
sub listdir {
my $dir = shift;
my @nfiles = ();
opendir(FH, $dir) || warn "Can't open $dir: $!\n";
@nfiles = grep { (-f "$dir/$_" && !-l "$dir/$_") } readdir(FH);
closedir(FH);
return @nfiles;
}
sub usage {
print <<EOUSAGE;
$0: Tool to "quickly" clone a VMware ESX guest OS
e.g.
# vmclone "winxp" "uscuv-clone" \
's/memsize = "512"/memsize = "256"/'
137
VMWare ESX 3
# cp -axvsol01 vsol02
4. Rename virtual machine config and change disk image name in this config file
# mkdir /vmfs/volumes/myvmfs3/deki
a. Fully-allocated (“zeroed-thick”):
b. Allocate-on-use (“thin”):
scsi0:0.fileName = "SourceVM.vmdk"
138
VMWare ESX 3
FC 10:3.0 210000e08b89a99b<->5006016130221fdd
vmhba2:1:1 On active preferred
FC 10:3.0 210000e08b89a99b<->5006016930221fdd
vmhba2:3:1 Standby
• Canonical name
FC 10:3.0 210000e08b89a99b<->5006016930221fdd
vmhba2:3:4 Standby
This is the canonical device name the ESX Server host used to refer to the LUN.
Note
When there are multiple paths to a LUN, the canonical name is the first path that was
detected for this LUN.
• vmhba2:1:4 is the second storage target (numbering starts at 0) that was detected by this
HBA.
• vmhba2:1:4 is the number of the LUN on this storage target. For multipathing to work
properly, each LUN must present the same LUN number to all ESX Server hosts.
139
VMWare ESX 3
If the vmhba number for the HBA is a single digit number, it is a physical adapter. If the
address is vmhba40 or vmhba32, it is a software iSCSI device for ESX Server 3.0 and ESX
Server 3.5 respectively.
• Linux device name, Storage Capasity, LUN Type, WWPN, WWNN in order of highlights
This is the associated Linux device handle for the LUN. You must use this reference when using
utilities like fdisk.
140
Chapter 11. AIX Notes
Etherchannel
• Create etherchannels in backup mode not aggregation mode.
• Identify two cards, ideally on separate PCI buses or in different PCI drawers if possible.
• All of the Cisco CATALYST switches are paired up for resilience, so the VLAN spans both.
• Aggregation mode not preferred as this only works with both cards connected the same CAT, which
is a SPOF.
# smitty etherchannel
Add An Etherchannel
Select only the first adapter to be added
into the channel
2. Backup Adapter
The default gateway should be supplied by data networks. The key entry here is the declaration of a
backup adapter. This will create the next available ethernet card definition i.e. ‘ent3’. This is a logical
device but is also the device on which the IP address will be bound
smitty chinet
en3
141
AIX Notes
3. Edit /etc/hosts
Edit ‘/etc/hosts’ and set up an entry for the newly configured IP address. The format is ‘<hostname>en*’
in this case: nac001en3 Check that the IP label is being resolved locally via: netstat -i The interface
card ‘en3’ will now be available as shown via : ifconfig –a
Use the etherchannel interface en3 as the Device for the NIC
resource. An IP resource will depend on this NIC resource.
142
Chapter 12. Oracle 10g with RAC
Oracle General SQL Quick Reference
Start DB Console
Alter table
ALTER TABLE
cust_table
ADD
(
cust_sex char(1) NOT NULL,
cust_credit_rating number
)
create table
/etc/system:
set semsys:seminfo_semvmx=32767
set semsys:seminfo_semmns=1024
/etc/system:
143
Oracle 10g with RAC
set udp:xmit_hiwat=65536
set udp:udp_recv_hiwat=65536
# projadd -U oracle -K \
"project.max-shm-memory=(privileged,21474836480,deny);\
project.max-shm-ids=(privileged,1024,deny);\
process.max-sem-ops=(privileged,4000,deny);\
process.max-sem-nsems=(privileged,7500,deny);\
project.max-sem-ids=(privileged,4198,deny);\
process.max-msg-qbytes=(privileged,1048576,deny);\
process.max-msg-messages=(privileged,65535,deny);\
project.max-msg-ids=(privileged,5120,deny)" oracle
IPMP Public
All four public IP addresses need to reside on the same network subnet. The following is the list of IP
addresses that will be used in the following example.
- Physical IP : 146.56.77.30
- Test IP for ce0 : 146.56.77.31
- Test IP for ce1 : 146.56.77.32
- Oracle VIP : 146.56.78.1
/etc/hostname.ce0
/etc/hostname.ce1
The VIP should now be configured to use all NIC's assigned to the same public IPMP group. By doing
this Oracle will automatically choose the primary NIC within the group to configure the VIP, and IPMP
will be able to fail over the VIP within the IPMP group upon a single NIC failure.
When running VIPCA: At the second screen in VIPCA (VIP Configuration Assistant, 1 of 2), select all
NIC's within the same IPMP group where the VIP should run at. If already running execute the following:
Make sure IPMP is configured prior to install, with Private IP up on both nodes.
The recommended solution is not to configure any private interface in oracle. The following steps need
to done to use IPMP for the cluster interconnect:
144
Oracle 10g with RAC
1. If the private interface has already been configured delete the interface with 'oifcfg delif'
oifcfg getif
oifcfg delif -global <if_name>
# more /var/opt/oracle/oratab
+ASM2:oracle_home_path
ASM_DISKSTRING
When an ASM instance initializes, ASM is able to discover and look at the contents of all of the disks in
the disk groups that are pointed to by the ASM_DISKSTRING initialization parameter. This saves you
from having to specify a path for each of the disks in the disk group.
Disk group mounting requires that an ASM instance doing disk discovery be able to access all the disks
within the disk group that any other ASM instance having previously mounted the disk group believes are
members of that disk group. It is vital that any disk configuration errors be detected before a disk group
is mounted.
145
Oracle 10g with RAC
set oracle_sid=+ASM
sqlplus "/ as sysdba"
146
Oracle 10g with RAC
Check status of DB
147
Oracle 10g with RAC
148
Oracle 10g with RAC
149
Oracle 10g with RAC
5 from dba_temp_files
6 /
TABLESPACE_NAME FILE_NAME
---------------- ------------------------------------
EXAMPLE +ORADATA/esxrac/datafile/example.263.620732791
SYSAUX +ORADATA/esxrac/datafile/sysaux.261.620732767
SYSTEM +ORADATA/esxrac/datafile/system.259.620732719
TEMP +ORADATA/esxrac/tempfile/temp.262.620732779
UNDOTBS1 +ORADATA/esxrac/datafile/undotbs1.260.620732753
UNDOTBS2 +ORADATA/esxrac/datafile/undotbs2.264.620732801
USERS +ORADATA/esxrac/datafile/users.265.620732817
7 rows selected.
This script will give you information of the +ASM1 instance files:
150
Oracle 10g with RAC
18 rows selected.
This script will give you information of the +ASM1 instance files: More detailed information
151
Chapter 13. EMC Storage
PowerPath Commands
Table 13.1. PowerPath CLI Commands
Command Description
powermt Manages a PowerPath environment
powercf Configures PowerPath devices
emcpreg -install Manages PowerPath liciense registration
emcpminor Checks for free minor numbers
emcpupgrade Converts PowerPath configuration files
152
EMC Storage
Pseudo name=emcpower6a
Symmetrix ID=000184503070
Logical device ID=0021
state=alive; policy=SymmOpt; priority=0; queued-IOs=0
- Host Bus Adapters --- ---- Storage System --- - I/O Paths -
### HW Path ID Interface Total Dead
Disable PowerPath
1. Please ensure that LUNS are available to the host from multiple paths
# powermt display
2. Stop the Application so that there is no i/o issued to Powerpath devices If the application is under VCS
control , please offline the service on that node
3. Unmount filesystems and Stop the volumes so that there is no volumes under i/o
153
EMC Storage
# umount /<mount_point
#vxvol –g <dgname> stop all
4. Stop CVM and VERITAS Fencing on the node ( if part of a VCS cluster) NOTE: All nodes in VCS
cluster need to be brought down if CVM / fencing are enabled.
#vxclustadm stopnode
# /etc/init.d/vxfen stop
#touch /etc/vx/reconfig.d/state.d/install-db
6. Reboot host
#shutdown –y –i6
#pkgrm EMCpower
#/etc/emcp_cleanup
#vxconfigd –m enable
#rm /etc/vx/reconfig.d/state.d/install-db
An example syminq output is below. Your output will differ slightly as I'm creating a table from a book
to show this; I don't currently have access to a system where I can get the actual output just yet.
154
EMC Storage
Using the first and last serial numbers as examples, the serial number is broken out as follows:
--------------------------------------------------------
So, the first example, device 009 is mapped to director 15, processor A, port 0 while the second example
has device 01A mapped to director 12, processor B, port 0. Even if you don't buy any of the EMC
software, you can get the inq command from their web site. Understanding the serial numbers will
help you get a better understanding of which ports are going to which hosts. Understanding this and
documenting it will circumvent hours of rapturous cable tracings.
Brocade Switches
1. Brocade Configuration Information
DS8B_ID3:admin> switchshow
switchName: DS8B_ID3
switchType: 3.4
switchState: Online
switchRole: Principal
switchDomain: 3
switchId: fffc03
switchWwn: 10:00:00:60:69:20:50:a9
switchBeacon: OFF
port 0: id Online F-Port 50:06:01:60:20:02:f5:a1 - SPA
port 1: id Online F-Port 50:06:01:68:20:02:f5:a1 - SPB
port 2: id Online F-Port 10:00:00:00:c9:28:3a:fc - cdb-lpfc0
port 3: id Online F-Port 10:00:00:00:c9:28:3d:21 - cdb-lpfc1
port 4: id Online F-Port 10:00:00:00:c9:28:3d:0a - cmn-lpfc0
port 5: id Online F-Port 10:00:00:00:c9:26:ac:16 - cmn-lpfc1
port 6: id No_Light
port 7: id No_Light
DS8B_ID3:admin>
DS8B_ID3:admin> cfgshow
Defined configuration:
cfg: CFG CSA_A_PATH; CSA_B_PATH
155
EMC Storage
zone: CSA_A_PATH
CSA_SPA; DB1_LPFC0; MN1_LPFC0
zone: CSA_B_PATH
CSA_SPB; DB1_LPFC1; MN1_LPFC1
alias: CSA_SPA
50:06:01:60:20:02:f5:a1
alias: CSA_SPB
50:06:01:68:20:02:f5:a1
alias: DB1_LPFC0
10:00:00:00:c9:28:3a:fc
alias: DB1_LPFC1
10:00:00:00:c9:28:3d:21
alias: MN1_LPFC0
10:00:00:00:c9:28:3d:0a
alias: MN1_LPFC1
10:00:00:00:c9:26:ac:16
Effective configuration:
cfg: CFG
zone: CSA_A_PATH
50:06:01:60:20:02:f5:a1
10:00:00:00:c9:28:3a:fc
10:00:00:00:c9:28:3d:0a
zone: CSA_B_PATH
50:06:01:68:20:02:f5:a1
10:00:00:00:c9:28:3d:21
10:00:00:00:c9:26:ac:16
DS8B_ID3:admin>
2. Brocade Configuration Walkthrough
a. Basic SwitchShow
DS8B_ID3:admin> switchshow
switchName: DS8B_ID3
switchType: 3.4
switchState: Online
switchRole: Principal
switchDomain: 3
switchId: fffc03
switchWwn: 10:00:00:60:69:20:50:a9
switchBeacon: OFF
port 0: id Online F-Port 50:06:01:60:20:02:f5:a1
port 1: id Online F-Port 50:06:01:68:20:02:f5:a1
port 2: id Online F-Port 10:00:00:00:c9:28:3a:fc
port 3: id Online F-Port 10:00:00:00:c9:28:3d:21
port 4: id No_Light
port 5: id No_Light
port 6: id No_Light
port 7: id No_Light
b. Create Aliases
156
EMC Storage
c. Create Zones
DS8B_ID3:admin> zoneshow
Defined configuration:
cfg: CFG CSA_A_PATH; CSA_B_PATH
zone: CSA_A_PATH
CSA_SPA; DB1_LPFC0
zone: CSA_B_PATH
CSA_SPB; DB1_LPFC1
alias: CSA_SPA
50:06:01:60:20:02:f5:a1
alias: CSA_SPB
50:06:01:68:20:02:f5:a1
alias: DB1_LPFC0
10:00:00:00:c9:28:3a:fc
alias: DB1_LPFC1
10:00:00:00:c9:28:3d:21
Effective configuration:
cfg: CFG
zone: CSA_A_PATH
50:06:01:60:20:02:f5:a1
10:00:00:00:c9:28:3a:fc
zone: CSA_B_PATH
50:06:01:68:20:02:f5:a1
10:00:00:00:c9:28:3d:21
157
Chapter 14. Dtrace
Track time on each I/O
iotime.d
io:::done
/start[args[0]->b_edev, args[0]->b_blkno]/
{
this->elapsed = timestamp - start[args[0]->b_edev, args[0]->b_blkno];
printf("%10s %58s %2s %3d.%03d\n", args[1]->dev_statname,
args[2]->fi_pathname, args[0]->b_flags & B_READ ? "R" : "W",
this->elapsed / 10000000, (this->elapsed / 1000) % 1000);
start[args[0]->b_edev, args[0]->b_blkno] = 0;
}
# dtrace -s ./iotime.d
DEVICE FILE RW MS
cmdk0 /kernel/drv/scsa2usb R 24.781
cmdk0 /kernel/drv/scsa2usb R 25.208
cmdk0 /var/adm/messages W 25.981
cmdk0 /kernel/drv/scsa2usb R 5.448
cmdk0 <none> W 4.172
cmdk0 /kernel/drv/scsa2usb R 2.620
cmdk0 /var/adm/messages W 0.252
cmdk0 <unknown> R 3.213
cmdk0 <none> W 3.011
cmdk0 <unknown> R 2.197
cmdk0 /var/adm/messages W 2.680
cmdk0 <none> W 0.436
cmdk0 /var/adm/messages W 0.542
cmdk0 <none> W 0.339
cmdk0 /var/adm/messages W 0.414
cmdk0 <none> W 0.344
cmdk0 /var/adm/messages W 0.361
cmdk0 <none> W 0.315
cmdk0 /var/adm/messages W 0.421
cmdk0 <none> W 0.349
cmdk0 <none> R 1.524
cmdk0 <unknown> R 3.648
158
Dtrace
# dtrace -s ./whowrite.d
^C
WHO WHERE COUNT
su /var/adm 1
fsflush /etc 1
fsflush / 1
fsflush /var/log 1
fsflush /export/bmc/lisa 1
fsflush /export/bmc/.phoenix 1
vi /var/tmp 2
vi /etc 2
cat <none> 2
bash / 2
vi <none> 3
159
Chapter 15. Disaster Recovery
VVR 5.0
VVR Configuration
Setting up replication in a global cluster environment involves the following tasks:
Create the Storage Replicator Log (SRL), a volume in the Replicated Volume Group (RVG). The RVG
also holds the data volumes for replication.
• The data volume on the secondary site has the same name and the same size as the data volume on
the primary site.
• The SRL on the secondary site has the same name and the same size as the SRL on the primary site.
• The data volume and the SRL should exist in the same disk group.
After determining the size of the SRL volume, create the volume in the shared disk group for the Oracle
database. If hardware-based mirroring does not exist in your setup, use the nmirror option to mirror the
volume. In this example, the Oracle database is in the oradatadg shared disk group on the primary site and
the size required for the SRL volume is 1.5 GB:
1. On the primary site, determine the size of the SRL volume based on the configuration and use.
2. Determine whether a node is the master or the slave: (if on CFS Cluster)
# vxdctl -c mode
3. From the master node, issue the following command: (after disk group created). Make sure that the data
disk has a minimum of 500M of free space after creating the SRL volume.
4. Start the SRL volume by starting all volumes in the disk group:
Before creating the RVG on the primary site, make sure the replication objects are active and online.
160
Disaster Recovery
The command creates the RVG on the primary site and adds a Data Change Map (DCM) for each data
volume. In this case, a DCM exists for rac1_vol).
To create objects for replication on the secondary site, use the vradmin command with the addsec option.
To set up replication on the secondary site:
• Creating a disk group on the storage with the same name as the equivalent disk group on the primary
site if you have not already done so.
• Creating volumes for the database and SRL on the secondary site.
• Resolvable virtual IP addresses that set network RLINK connections as host names of the primary and
secondary sites.
1. In the disk group created for the Oracle database, create a volume for data; in this case, the rac_vol1
volume on the primary site is 6.6 GB:
2. Create the volume for the SRL, using the same name and size of the equivalent volume on the primary
site. Create the volume on a different disk from the disks for the database volume:
Editing the /etc/vx/vras/.rdg file on the secondary site enables VVR to replicate the disk group from
the primary site to the secondary site. On each node, VVR uses the /etc/vx/vras/.rdg file to check the
authorization to replicate the RVG on the primary site to the secondary site. The file on each node in the
secondary site must contain the primary disk group ID, and likewise, the file on each primary system must
contain the secondary disk group ID.
1. On a node in the primary site, display the primary disk group ID:
# vxprint -l diskgroup
161
Disaster Recovery
2. On each node in the secondary site, edit the /etc/vx/vras/.rdg file and enter the primary disk group ID
on a single line.
3. On each cluster node of the primary cluster, edit the file and enter the primary disk group ID on a
single line.
Creating objects with the vradmin command requires resolvable virtual IP addresses that set network
RLINK connections as host names of the primary and secondary sites.
1. on one of the nodes of the clusterFor each RVG running on each cluster, set up a virtual IP address
on one of the nodes of the cluster. These IP addresses are part of the RLINK. The example assumes
that the public network interface iseth0:1, the virtual IP address is 10.10.9.101, and the net mask is
255.255.240.0 for the cluster on the primary site:
2. Use the same commands with appropriate values for the interface, IP address, and net mask on the
secondary site. The example assumes the interface is eth0:1, virtual IP address is 10.11.9.102, and the
net mask is 255.255.240.0 on the secondary site.
3. Define the virtual IP addresses to correspond to a virtual cluster host name on the primary site and a
virtual cluster host name on the secondary site. For example, update /etc/hosts file on all nodes in each
cluster. The examples assume rac_clus101_priv has IP address 10.10.9.101 and rac_clus102_priv has
IP address 10.11.9.102.
Create the replication objects on the secondary site from the master node on the primary site, using the
vradmin command.
1. Issue the command in the following format from the cluster on the primary site:
• dg_pri is the disk group on the primary site that VVR will replicate. For example: oradatadg
• pri_host is the virtual IP address or resolvable virtual host name of the cluster on the primary site.
For example: 10.10.9.101 or rac_clus101_priv
• sec_host is the virtual IP address or resolvable virtual host name of the cluster on the secondary site.
For example: 10.11.9.102 or rac_clus102_priv
• Creates an RVG within the specified disk group using the same name as the one for the primary site
• Associates the data and SRL volumes that have the same names as the ones on the primary site with
the specified RVG
162
Disaster Recovery
• Creates cluster RLINKS for the primary and secondary sites with the default names; for example, the
“primary” RLINK created for this example is rlk_rac_clus102_priv_rac1_rvg and the “secondary”
RLINK created is rlk_rac_clus101_priv_rac1_rvg.
3. Verify the list of RVGs in the RDS by executing the following command.
From the primary site, automatically synchronize the RVG on the secondary site:
163
Disaster Recovery
Config Errors:
Secondary:
Host name: 162.111.101.196
RVG name: hubrvg
DG name: hubdg
Data status: consistent, up-to-date
Replication status: replicating (connected)
Current mode: asynchronous
Logging to: SRL
Timestamp Information: behind by 0h 0m 0s
164
Disaster Recovery
Make sure that there is enough disk space on both the Production and Disaster Recovery Clusters
From the production cluster run the vradmin resizevol command against the rvg, diskgroup and volume
to be expanded. [+]Size is to grow the SRL [-]Size will shrink the SRL and no [-|+] will set the SRL to
that size.
165
Disaster Recovery
Make sure that there is enough disk space on both the Production and Disaster Recovery Clusters
From the production cluster run the vradmin resizevol command against the rvg, diskgroup and volume
to be expanded. [+]Size is to grow the volume [-]Size will shrink the volume and no [-|+] will set the
volume to that size.
Here's now to resynchronize the old Primary once you bring it back
up 5.0:
1. use the migrate option with vradmin
2. If the command reports back primary out of sync, use the fbsync option
# cd /etc/VRTSvcs/conf/sample_vvr
# ./addVVRTypes.sh
# haconf -dump -makero
2. On a node in the primary site, start the global clustering configuration wizard: or use #3 for manual
configuration.
# /opt/VRTSvcs/bin/gcoconfig
a. After discovering the NIC devices on the local node, specify or confirm the device for the cluster
joining the global cluster environment.
166
Disaster Recovery
b. Indicate whether the NIC you entered is for all cluster nodes. If you enter n, enter the names of NICs
on each node.
d. When the wizard discovers the net mask associated with the virtual IP address, accept the discovered
value or enter another value. With NIC and IP address values configured, the wizard creates a
ClusterService group or updates an existing one. After modifying the VCS configuration file, the
wizard brings the group online.
3. Modifying the global clustering configuration using the main.cf on the primary cluster
include "types.cf"
include "CFSTypes.cf"
include "CVMTypes.cf"
include "OracleTypes.cf"
include "VVRTypes.cf"
cluster rac_cluster101 (
UserNames = { admin = "cDRpdxPmHpzS." }
ClusterAddress = "10.10.10.101"
Administrators = { admin }
CounterInterval = 5
UseFence = SCSI3
)
group ClusterService (
SystemList = { galaxy = 0, nebula = 0 }
AutoStartList = { galaxy, nebula }
OnlineRetryLimit = 3
OnlineRetryInterval = 120
)
Application wac (
StartProgram = "/opt/VRTSvcs/bin/wacstart"
StopProgram = "/opt/VRTSvcs/bin/wacstop"
MonitorProcesses = "/opt/VRTSvcs/bin/wac" }
RestartLimit = 3
)
IP gcoip (
Device =eth1
Address = "10.10.10.101"
NetMask = "255.255.240.0"
)
NIC csgnic (
Device =eth1
)
167
Disaster Recovery
4. Define the remotecluster and its virtual IP address. In this example, the remote cluster is rac_cluster102
and its IP address is 10.11.10.102:
5. Complete step 3 and step 4 on the secondary site using the name and IP address of the primary cluster
(rac_cluster101 and 10.10.10.101).
6. On the primary site, add the heartbeat object for the cluster. In this example, the heartbeat method is
ICMP ping.
# haclus -list
rac_cluster101
rac_cluster102
remotecluster rac_cluster102 (
Cluster Address = "10.11.10.102"
)
heartbeat Icmp (
ClusterList = { rac_cluster102 }
Arguments @rac_cluster102 = { "10.11.10.102" }
)
system galaxy (
)
remotecluster rac_cluster101 (
Cluster Address = "10.190.88.188"
)
heartbeat Icmp (
ClusterList = { rac_cluster101 }
Arguments @rac_cluster102 = { "10.190.88.188" }
)
system galaxy
Setting up the rlink IP addresses for primary and secondard in their respective clusters results in a main.cf
simular to the following:
168
Disaster Recovery
2x IP for GCO - one per cluster ,2x IP for VVR RLINK one per cluster
include "types.cf"
include "CFSTypes.cf"
include "CVMTypes.cf"
include "VVRTypes.cf"
cluster primary003 (
UserNames = { haadmin = xxx }
ClusterAddress = "162.111.101.195"
Administrators = { haadmin }
UseFence = SCSI3
HacliUserLevel = COMMANDROOT
)
remotecluster remote003 (
ClusterAddress = "167.138.164.121"
)
heartbeat Icmp (
ClusterList = { remote003 }
Arguments @remote003 = { "167.138.164.121" }
)
system primary003a1 (
)
system primary003b1 (
)
group ClusterService (
SystemList = { primary003a1 = 0, primary003b1 = 1 }
AutoStartList = { primary003a1, primary003b1 }
OnlineRetryLimit = 3
OnlineRetryInterval = 120
)
Application wac (
StartProgram = "/opt/VRTSvcs/bin/wacstart"
StopProgram = "/opt/VRTSvcs/bin/wacstop"
MonitorProcesses = { "/opt/VRTSvcs/bin/wac" }
RestartLimit = 3
)
IP gcoip (
Device @primary003a1 = bond0
Device @primary003b1 = bond0
Address = "162.111.101.195"
NetMask = "255.255.254.0"
)
NIC csgnic (
169
Disaster Recovery
Device = bond0
)
NotifierMngr ntfr (
SmtpServer = "smtp.me.com"
SmtpRecipients = { "sunadmin@me.com" = Warning }
)
group HUBDG_RVG (
SystemList = { primary003a1 = 0, primary003b1 = 1 }
Parallel = 1
AutoStartList = { primary003a1, primary003b1 }
)
CVMVolDg HUB_DG (
CVMDiskGroup = hubdg
CVMActivation = sw
)
RVGShared HUBDG_CFS_RVG (
RVG = hubrvg
DiskGroup = hubdg
)
group Myappsg (
SystemList = { primary003a1 = 0, primary003b1 = 1 }
Parallel = 1
ClusterList = { remote003 = 1, primary003 = 0 }
Authority = 1
AutoStartList = { primary003a1, primary003b1 }
ClusterFailOverPolicy = Auto
Administrators = { tibcoems }
)
Application foo (
StartProgram = "/opt/tibco/vcs_scripts/foo start &"
StopProgram = "/opt/tibco/vcs_scripts/foo stop &"
MonitorProgram = "/opt/tibco/vcs_scripts/monitor_foo"
)
170
Disaster Recovery
CFSMount foomnt (
MountPoint = "/opt/foo"
BlockDevice = "/dev/vx/dsk/hubdg/foo"
)
RVGSharedPri hubrvg_pri (
RvgResourceName = HUBDG_CFS_RVG
OnlineRetryLimit = 0
)
group cvm (
SystemList = { primary003a1 = 0, primary003b1 = 1 }
AutoFailOver = 0
Parallel = 1
AutoStartList = { primary003a1, primary003b1 }
)
CFSfsckd vxfsckd (
ActivationMode @primary003a1 = { hubdg = sw }
ActivationMode @primary003b1 = { hubdg = sw }
)
CVMCluster cvm_clus (
CVMClustName = primary003
CVMNodeId = { primary003a1 = 0, primary003b1 = 1 }
CVMTransport = gab
CVMTimeout = 200
)
CVMVxconfigd cvm_vxconfigd (
Critical = 0
CVMVxconfigdArgs = { syslog }
)
group rlogowner (
SystemList = { primary003a1 = 0, primary003b1 = 1 }
AutoStartList = { primary003a1, primary003b1 }
OnlineRetryLimit = 2
)
171
Disaster Recovery
IP vvr_ip (
Device @primary003a1 = bond1
Device @primary003b1 = bond1
Address = "162.111.101.196"
NetMask = "255.255.254.0"
)
NIC vvr_nic (
Device @primary003a1 = bond1
Device @primary003b1 = bond1
)
RVGLogowner logowner (
RVG = hubrvg
DiskGroup = hubdg
)
include "types.cf"
include "CFSTypes.cf"
include "CVMTypes.cf"
include "VVRTypes.cf"
cluster remote003 (
UserNames = { haadmin = xxx }
ClusterAddress = "167.138.164.121"
Administrators = { haadmin }
UseFence = SCSI3
HacliUserLevel = COMMANDROOT
)
remotecluster primary003 (
ClusterAddress = "162.111.101.195"
)
heartbeat Icmp (
ClusterList = { primary003 }
Arguments @primary003 = { "162.111.101.195" }
)
system remote003a1 (
)
system remote003b1 (
)
group ClusterService (
SystemList = { remote003a1 = 0, remote003b1 = 1 }
AutoStartList = { remote003a1, remote003b1 }
172
Disaster Recovery
OnlineRetryLimit = 3
OnlineRetryInterval = 120
)
Application wac (
StartProgram = "/opt/VRTSvcs/bin/wacstart"
StopProgram = "/opt/VRTSvcs/bin/wacstop"
MonitorProcesses = { "/opt/VRTSvcs/bin/wac" }
RestartLimit = 3
)
IP gcoip (
Device @remote003a1 = bond0
Device @remote003b1 = bond0
Address = "167.138.164.121"
NetMask = "255.255.254.0"
)
NIC csgnic (
Device = bond0
)
NotifierMngr ntfr (
SmtpServer = "smtp.me.com"
SmtpRecipients = { "sunadmin@me.com" = Warning }
)
group HUBDG_RVG (
SystemList = { remote003a1 = 0, remote003b1 = 1 }
Parallel = 1
AutoStartList = { remote003a1, remote003b1 }
)
CVMVolDg HUB_DG (
CVMDiskGroup = hubdg
CVMActivation = sw
)
RVGShared HUBDG_CFS_RVG (
RVG = hubrvg
DiskGroup = hubdg
)
173
Disaster Recovery
group Tibcoapps (
SystemList = { remote003a1 = 0, remote003b1 = 1 }
Parallel = 1
ClusterList = { remote003 = 1, primary003 = 0 }
AutoStartList = { remote003a1, remote003b1 }
ClusterFailOverPolicy = Auto
Administrators = { tibcoems }
)
Application FOO (
StartProgram = "/opt/tibco/vcs_scripts/foo start &"
StopProgram = "/opt/tibco/vcs_scripts/foo stop &"
MonitorProgram = "/opt/tibco/vcs_scripts/monitor_foo"
)
CFSMount foomnt (
MountPoint = "/opt/foo"
BlockDevice = "/dev/vx/dsk/hubdg/foo"
)
RVGSharedPri hubrvg_pri (
RvgResourceName = HUBDG_CFS_RVG
OnlineRetryLimit = 0
)
group cvm (
SystemList = { remote003a1 = 0, remote003b1 = 1 }
AutoFailOver = 0
Parallel = 1
AutoStartList = { remote003a1, remote003b1 }
)
CFSfsckd vxfsckd (
ActivationMode @remote003a1 = { hubdg = sw }
ActivationMode @remote003b1 = { hubdg = sw }
)
CVMCluster cvm_clus (
CVMClustName = remote003
CVMNodeId = { remote003a1 = 0, remote003b1 = 1 }
CVMTransport = gab
CVMTimeout = 200
174
Disaster Recovery
CVMVxconfigd cvm_vxconfigd (
CVMVxconfigdArgs = { syslog }
)
group rlogowner (
SystemList = { remote003a1 = 0, remote003b1 = 1 }
AutoStartList = { remote003a1, remote003b1 }
OnlineRetryLimit = 2
)
IP vvr_ip (
Device @remote003a1 = bond1
Device @remote003b1 = bond1
Address = "167.138.164.117"
NetMask = "255.255.254.0"
)
NIC vvr_nic (
Device @remote003a1 = bond1
Device @remote003b1 = bond1
)
RVGLogowner logowner (
RVG = hubrvg
DiskGroup = hubdg
)
VVR 4.X
Pre 5.0 VVR does not use vradmin as much, and is kept here to show the underlying commands. Note that
with 4.0 and earlier you need to detach the SRL before growth, and in 5.x that is no longer needed.
175
Disaster Recovery
This won't do much, as the RLINK on hostB (the Primary) should still#be detached, preventing the
Secondary from connecting.
Giving the -a flag to vxrlink tells it to run in autosync mode. This#will automatically resync the
secondary datavolumes from the Primary.#If the Primary is being updated faster than the Secondary
can be#synced, the Secondary will never become synced, so this method is only#appropriate for certain
implementations.
Once synchronization is complete, follow the instructions above (the#beginning of section 6) to transfer
the Primary role back to the#original Primary system.
The second case is when your Primary goes down in flames and you need#to get your Secondary up as
a Primary.
a. First, you'll need to turn off your applications, umount any#filesystems on from your datavolumes,
and stop the rvg:
# /etc/rc3.d/S99start-app stop
# #umount /filesysA#
# vxrvg stop rvgA
b. Once you've stopped the RVG, you need to detach the rlink,#disassociate the SRL volume (you can't
edit the PRIMARY RVG attribute#while an SRL is associated), change PRIMARY to false, and
bring#everything back up:
176
Disaster Recovery
i. First you need to stop the RVG, detach the rlink, disassociate the#SRL, and turn the PRIMARY
attribute on:
ii. Veritas recommends that you use vxedit to reinitialize some values on#the RLINK to make sure
you're still cool:
iii. Before you can attach the rlink, you need to change the#PRIMARY_DATAVOL attribute on
both hosts to point the the Veritas#volume name of the NEW Primary:
iv. Now that you have that, go back to the new Primary, attach the RLINK,#and start the RVG:
a. First you'll need to bring up the secondary as a primary. If your#secondary datavolume is inconsistent
(this is only likely if an SRL#overflow occurred and the secondary was not resynchronized before
the#Primary went down) you will need to disassociate the volumes from the#RVG, fsck them if
they contain filesystems, and reassociate them with#VVR. If your volumes are consistent, the task
is much easier:
On the secondary, first stop the RVG, detach the RLINK, and#disassociate the SRL:
d. If the old Primary is still down, all you need to do is start the RVG#to be able to use the datavolumes:
This will allow you to keep the volumes in VVR so that once you manage#to resurrect the former
Primary, you can make the necessary VVR#commands to set it up as a secondary so it can
resynchronize from the#backup system. Once it has resynchronized, you can use the process#listed at
the beginning of section 6 (above) to fail from the Old#Secondary/New Primary back to the original
configuration.
Before configuring, you need to make sure two scripts have been run#from /etc/rc2.d: S94vxnm-
host_infod and S94vxnm-vxnetd. VVR will not#work if these scripts don't get run AFTER VVR
licenses have been#instralled. So if you install VVR licenses and don't reboot#immediately after,
run these scripts to get VVR to work.
b. Before the Primary can be set up, the Secondary must be configured.
First, use vxassist to create your datavolumes. Make sure to specify#the logtype as DCM (Data
Change Map, which keeps track of data changes#if the Storage Replicator log fills up) if your
replicated volumes are#asynchronous.
c. Then create the SRL (Storage Replicator Log) for the volume. Carefully#decide how big you want
this to be, based on available bandwidth#between your hosts and how fast your writes happen.
See pages 18-25 of the SRVM Configuration Notes for detailed#(excruciatingly) notes on selecting
your SRL size.
Use synchronous=off only if you can stand to lose some data.#Otherwise, set synchronize=override
or synchronize=fail. override runs#as synchronous (writes aren't committed until they reach
the#secondary) until the link dies, then it switches to asynchronous,#storing pending writes to the
secondary in the SRL. When the link#comes back, it resyncs the secondary and switches back to
Synchronous#mode. synchronize=fail fails new updates to the primary in the case of#a downed link.
In any of the above cases, you'll lose data if the link fails and,#before the secondary can catch up
to the primary, there is a failure#of the primary data volume. This is why it's important to have
both#redundant disks and redundant network paths.
178
Disaster Recovery
e. Now make the RVG, where you put together the datavolume, the SRL, and the rlink:
b. Make the RVG for the primary. Only the last option is different:
When we created the secondary,#brain-dead Veritas figured the volume on the Seconday and the
Primary#would have the same name, but when we set this up, we wanted to have#the Primary
datavolume named sampleA and the Secondary datavolume be#sampleB. So we need to tell the
Secondary that the Primary is sampleA:
4. Now you can attach the rlink to the RVG and start the RVG. On the Primary:
179
Disaster Recovery
and#then the secondary. You always need to make sure the Secondary is#larger than or as large as the
Primary, or you will get a#configuration error from VVR.
You may need to grow an SRL if your pipe shrinks (more likely if your#pipe gets busier) or the
amount of data you are sending increases. See#pages 18-25 of the SRVM Configuration Notes for
detailed#(excruciatingly) notes on selecting your SRL size.
1. To grow an SRL, you must first stop the RVG and disassociate the SRL#from the RVG:
2. From this point, you can grow your SRL (which is now just an ordinary volume):
3. Once your SRL has been successfully grown, reassociate it with the#RVG, reattach the RLINK, and
start the RVG:
2. Then stop the RVG on the primary and then the secondary:
4. If you want to keep the datavolumes, you need to disassociate them from the RVG:
180
Chapter 16. VxVM and Storage
Troubleshooting
How to disable and re-enable VERITAS Volume
Manager at boot time when the boot disk is
encapsulated
At times it may be necessary for debugging and/or other reasons to boot a system without starting
VERITAS Volume Manager (VxVM). This is sometimes referred to as "manually unencapsulating" if the
boot disk is involved. The following are the basic steps needed to disable VxVM with an encapsulated
boot disk:
IMPORTANT: If rootvol, usr, or var volumes are mirrored, all mirrors except for the one on the boot disk
will have to be disabled before enabling VxVM once again (see below for details). Failure to do so may
result in file system corruption.
1. Boot system from CD ROM or net and mount the root file system to /a
# cp /a/etc/vfstab /a/etc/vfstab.disable
• Use the preserved copy of the vfstab file from before encapsulation as base for the new file:
# cp /a/etc/vfstab.prevm /a/etc/vfstab
• Verify that the Solaris file system partitions listed in /a/etc/vfstab are consistent with the current boot
drive and that the partitions exist.
Note: Usually the partition for the /opt file system will not be present. It is not needed to bring the
system up to single user mode.
# cp /a/etc/system /a/etc/system.disable
rootdev:/pseudo/vxio@0:0
set vxio:vol_rootdev_is_volume=1
• The force loads for VxVM drivers (vxio, vxspec, and vxdmp) may also be deleted, but that is not
usually necessary.
5. Create a file called /a/etc/vx/reconfig.d/state.d/install-db. This prevents VxVM from starting during the
boot process.
181
VxVM and Storage Troubleshooting
# touch /a/etc/vx/reconfig.d/state.d/install-db
7. Once the system is booted in at least single-user mode, VxVM can be started manually with the
following steps.
# vxiod set 10
# vxconfigd -d
c. Enable vxconfigd:
# vxdctl enable
d. IMPORTANT: If the boot disk contains mirrored volumes, one must take all the mirrors offline for
those volumes except for the one on the boot disk. Offlining a mirror prevents VxVM from ever
performing a recovery on that plex. This step is critical in preventing data corruption.
# vxrecover -ns
# vxrecover -bs
Once any debugging actions and/or any other operations are completed, VxVM can be re-enabled again
with the following steps.
a. Undo the steps in the previous section that were taken to disable VxVM (steps 2-4):
# cp /etc/vfstab.disable /etc/vfstab
# cp /etc/system.disable /etc/system
# rm /etc/vx/reconfig.d/state.d/install-db
c. Once the system is back up and it is verified to be running correctly, online all mirrors that were
offlined in step 6 in the previous section. For example,
182
VxVM and Storage Troubleshooting
# vxrecover -bs
# vxdisk list
2. Run vxdctl with enable option on pre 4.0 versions and vxdisk scandisks on newer versions of VxVM
# vxdctl scandisks
# /etc/vx/bin/vxreattach -c c2t21d220
# /etc/vx/bin/vxreattach -c c2t21d41
When provisioning storage and creating volumes, there are times when you create a volume for a specific
workload, and things change after the fact. Veritas volume manager can easily deal with changing
requirements, and allows you to convert between volume types ( e.g., convert a RAID5 volume to a striped
mirrored volume) on the fly. Veritas performs this operation in most cases with layered volumes, and
requires a chunk of free space to complete the relayout operation. The VxVM users guide describes the
supported relayout operations, and also provides disk space requirements.
To illustrate just how useful the relayout operation is, let's say your manager just finished reading a Gartner
report that criticizes RAID5. He comes over to your desk and asks you to convert the Oracle data volume
from a 4-column RAID5 volume to a 2-column striped-mirror volume. Since you despise software RAID5,
you set down UNIX File systems and run vxassist(1m) with the "relayout" keyword, the "layout" to convert
to, and the number of columns to use (the ncols option is only used with striped volumes):
The relayout operation requires a temporary region to copy data to (marked with a state of TMP in vxprint)
prior to migrating data it to it's final destination. If sufficent space isn't available, vxassist will display an
error similar to the following and exit:
183
VxVM and Storage Troubleshooting
Once the relayout begins, the vxrelayout(1m) and vxtask(1m) utilities can be used to monitor the progress
of the relayout operations:
$ vxtask list
TASKID PTID TYPE/STATE PCT PROGRESS
2125 RELAYOUT/R 14.45% 0/41943168/6061184 RELAYOUT oravol01 oof
Veritas Resize
When shrinking a volume/fs note that you can not use a -size, specify a -s, with the non-negative number
that you want to reduce by
The most common example is in a two disk stripe as below. Here the volume is striped across disk 01 and
02. An attempt may be made to use another disk in the disk group (DG) to grow the volume and this will
fail since it is necessary to grow the stripe equally. Two disks are needed to grow the stripe.
Another disk is then added into the configuration so there are now two spare disks. Rerun the maxgrow
command, which will succeed. The resize will also succeed.
184
VxVM and Storage Troubleshooting
Under normal circumstances, it is possible to issue the resize command and add (grow) the volume across
disks 3 and 4. If only one spare disk exists, it is possible to use it. Grow the volume to use the extra space.
The only option is a relayout. In the example below, the volume is on disk01/02 and the intention is to
incorporate disk 03 and convert the volume into a 3 column stripe. However, the relayout is doomed to fail:
This has failed because the size of the subdisks is exactly the same as that of the disks (8378640 blocks).
For this procedure to work, resize (shrink) the volume by about 10% (10% of 8 gigabytes = 800 megabytes)
to give VERITAS Volume Manager (VxVM) some temporary space to do the relayout:
The only other way to avoid having to shrink the volume (in the case of a UNIX File System (UFS) file
system) is to add a fourth disk to the configuration just for the duration of the relayout, so VxVM would
use the fourth disk as temporary space. Once the relayout is complete, the disk will be empty again.
UDID_MISMATCH
Volume Manager 5.0 introduced a unique identifiers for disks (UDID) which allow source and cloned
(copied) disks to be differentiated. If a disk and its clone are presented to Volume Manager, devices will
be flagged as udid_mismatch in vxdisk list. This typically indicates that the storage was originally cloned
on the storage array; possibly a reassigned lun, or is a bcv
• If you want to you remove the clone attribute from the device itself and use it as a regular diskgroup
with the newly imported diskgroup name:
1. Verify that the cloned disk, EMC0_27, is in the "error udid_mismatch" state:
185
VxVM and Storage Troubleshooting
2. Split the BCV device that corresponds to EMC0_27 from the disk group mydg:
# vxdisk scandisks
4. Check that the cloned disk is now in the "online udid_mismatch" state:
5. Import the cloned disk into the new disk group newdg, and update the disk's UDID:
6. Check that the state of the cloned disk is now shown as "online clone_disk":
1. Dump the private region of one drive that was in the disk group
# /etc/vx/diag.d/vxprivutil dumpconfig \
/dev/rdsk/cXtYdZs2 > /var/tmp/config.out
2. Process the config.out file through vxprint to get list of disk names included in that disk group
186
VxVM and Storage Troubleshooting
Note
This will not delete existing data on the disks. All commands in this procedure interact with
the private region header information and do not re-write data.
5. Continue through the list of disks by adding them into the disk group
6. After all disks are added into the disk group generate the original layout by running vxmake against
the /var/tmp/maker file
7. At this point all volumes will be in a DISABLED ACTIVE state. Once enabling all volumes you will
have full access to the original disk group.
187
VxVM and Storage Troubleshooting
sdb state=enabled
# vxdisk list sdal |grep "state=enabled"
sdax state=enabled
sds state=enabled
# vxdmpadm getsubpaths dmpnodename=sdal
Solution
# rm /etc/vx/disk.info ; rm /etc/vx/array.info
# vxconfigd -k
Note
In newer versions of vxvm there is a vxsplit command that can be used for this process.
Recover vx Plex
# vxprint|grep DETA
pl vol01-02 vol01 DETACHED 204800 - IOFAIL - -
188
VxVM and Storage Troubleshooting
2. If you have separate volumes for opt, export, home on the root disk, it is required to define the partitions
for those volumes using vxmksdpar
# /usr/lib/vxvm/bin/vxmksdpart
Usage: vxmksdpart [-f] [-g diskgroup] subdisk sliceno [tag flags]
189
VxVM and Storage Troubleshooting
<<<
4. Edit the following files to make the root mirror disk bootable without VERITAS Volume Manager
5. Change the c#t#d# number in above file to ensure the correct partitions will be referenced in the vfstab
file:
# touch /mnt/etc/vx/reconfig.d/state.d/install-db
Before changes:
rootdev ..
set vxio ..
After changes:
* rootdev ..
* set vxio ..
# umount /mnt
7. If the upgrade or patching was successful, attach back mirror plex to root disk:
190
VxVM and Storage Troubleshooting
- Boot system
2. Remove the partition having tag 14 and 15 from mirror disk using format completely. Do not just change
tag type, zero out these partitions and labels before exiting from format.
3. Manually start up vxconfigd to allow for the encapsulation of the root mirror:
# vxiod set 10
# vxdconfigd -m disable
# vxdctl init
# vxdisk -f init c1t0d0
# vxdctl enable
# rm /etc/vx/reconfig.d/state.d/install-db
# vxdiskadm => option 2 Encapsulate one or more disks
=> choose c1t1d0 (old rootmirror) => put under rootdg
# shutdown -i6 -g0 -y
191
Chapter 17. Advanced VCS for IO
Fencing and Various Commands
General Information
1. Port Definitions
Port A - This is node-to-node communication. As soon as GAB starts on a node, it will look for other
nodes in the cluster and establish port "a" communication
Port B - This is used for IO fencing. If you use RAC or VCS 4.x, you can use IO fencing to protect data
disks. In RAC, as soon as the gab port membership changes, we will have a race for the coordinator
disks, and some nodes will panic when they lose the race
Port D - In RAC, the different Oracle instances need to talk to each other. GAB provides port "d" for
this. So, port "d" membership will statr when Oracle RAC starts
Port F - This is the main communications port for cluster file system. More than 1 machine can mount
the same filesystem, but they need to communicate to not update the metadata (like inodes, super-block,
free inode list, free data block list, etc.....) at the same time. If they do it at the same time, you will get
corruption. There is always a primary for any filesystem that controls the access to the metadata. This
control (locking) is done via port "f"
Port H - GAB. The different nodes in the cluster needs to know what is happening on other nodes (and
on itself) It needs to know which service groups, resources are online or offline or faulted. The program
that knows all this info, is the "main" vcs program called "had". So on each machine, had needs to talk
to GAB. This is done via port "h"
Port O - This is a port used specifically in RAC, and specifically or ODM. Let's start by saying what
ODM is, and then why it is needed. Oracle (like most other database managers) will try to cache IO
before writing it out to disk (raw volumes or data files on a filesystem). The biggest problem comes
in when Oracle tries to write to a filesystem. Each filesystem has it's own cache. As you can think, the
general purpose filesystem cache is not the same as the very specific Oracle cache. The startegy used
is very different between Oracle and the filesystem. A while ago, Veritas had a close look at how the
Oracle cache works and how it sends IO to the filesystem. Veritas then wrote an extension for their
filesystem (called Quick IO - QIO). With QIO, they got performance very close to the performance
Oracle got on raw volumes. The rest of the filesystem comunity (read SUN UFS, IBM JFS, .....) thought
that Oracle gave the information to Veritas and complained about it. Oracle then sat down and actually
wrote a specification. This specification allows everyone to write their own library, and the Oracle will
call this library to do IO. Oracle called this specification ODM (Oracle Disk Manager). The best is, that
only Veritas ever wrote their own libraries for ODM. So, getting back to port "o". Port "o" is used for
ODM to ODM communication in a RAC cluster. (wow, QIO, ODM and port "o" in one go !)
Port Q - This is another port used in Cluster Filesystem. VxFS is a journaled filesystem. This means
that it keeps a log which it will write to, before making changes to the metadata on the filesystem. (like
Oracle keeps redo logs). Normally this log is kept on the same filesystem. This means that for each
access, the log has to be updated, then the metadata and then the data itself. Thus 3 different times VxFS
has to access the same disk. Normally the metadata is kept close to the file, but the log is always kept in
a static place (normally close to the beginning of the filesystem). This could means that there will be a
lot of seeking (for the begining of the filesystem, then again to the metadata and data). As we all know,
disk access time is about 100 times slower than memory, so we have a slowdown here. Veritas made a
192
Advanced VCS for IO Fencing
and Various Commands
plan and developed quicklog. This allows you to have the filesystem log on a different disk. This helps
in speeding things up, because most disk operations can happen in parallel. OK, so now you know what
quicklog is. You can have quicklog on cluster filesystems as well. Port "q" is used to coordinate access
to quicklog (wow, that was a loooong one)
Port U - Not a port you would normally see, but just to be complete, let's mention it here. When a
Cluster Volume Manager is started, it will need to do a couple of things. The access to changing the
configuration of volumes, plexes, subdisks and diskgroups, needs to be coordinated. This means that
a "master" will always need to be selected in the cluster (can be checked with the "vxdctl -c mode"
command). Normally the master is the first one to open port "u". Port "u" is an exclusive port for
registering with the cluster volume manager "master". If no master has been established yet, the first
node to open port "u" will assume the role of master. The master controls all access to changes of the
cluster volume manager configuration. Each node that tries to join the cluster (CVM), will need to open
(exclusively) port "u", search for the master, and make sure that the node and the master sees all the
same disks for the shared diskgroups.
Port V - OK, now that we've estabblished that there is a master, we need to mention that fact that each
instance of volume manager running (thus on each node) keeps the configuration in memory (regardless
if it is part of a cluster or not). This "memory" is is managed by the configuration daemon (vxconfigd).
We will get to the vxconfigd in a minute, but first port "v". So, port "v" is actually used to register
membership for the cluster volume manager. (once the node got port "u" membership, the "permanent"
membership is done via port "v". Only members of the same cluster (cluster volume manager cluster
that is) are allowed to import and access the (shared) disks
Port W - The last port in cluster volume manager. This is the port used for the vxconfigd on each node
to communicate with the vxconfigd on all the other nodes. The biggest issue is that a configuration
change needs to be the same across the whole cluster (does not help that 1 node thinks we still have a
mirrored volume and the others don't know a thing about the mirror)
Note below that all paths to a LUN should have keys on them.
193
Advanced VCS for IO Fencing
and Various Commands
The vxfenmode file controls how the vxfen module will manage the coordinator disks only. The data
disks are managed by dmp exclusively, and dmp works in concert with the vxfen module for PGR
iofencing arbitration. Once the coordinator disk race is decided by vxfen module (expected to be
extremely fast), a message is sent over to DMP to complete the PGR preemption of data disks (could
take several minutes if customer has thousands of disks).
2. Does the dmp policy have any impact to registrations or just reservations? If so, what’s impact?
If the policy is set to DMP, vxfen will operate upon /dev/vx/rdmp/* dmpnodes instead of /dev/rdsk/
c_t_d devices. The number of registered keys may be slightly different for some active/passive arrays
when using DMP versus using native (depends on the implementation of the relevant array policy
module that is servicing those dmpnodes). Coordinator disks are not reserved, only registrations are
used for PGR fencing arbitration -- no data lives on them. The removal of registrations on coordinator
disks during vxfen race is merely the arbitration mechanism used to determine who won the fence race.
Contrasting, data disks are both registered and reserved -- whereby the reservation is the protection
mechanism that mandates all initiators who wish to write to those disks must first be registered. As
stated above, once the coordinator disk race is decided -- dmp will receive notification from vxfen of
the outcome and accordingly preempt the registrations from the node(s) that lost the race. The removal
of the registration on data disks protects the disk from rogue writes, but this is done only after the
underlying coordinator disk vxfen race has been decided.
3. Since the reservation keys are written on the sym and not the LUN,
Registrations are managed in memory of the array controller, as is also the reservation mode.
Irrespective of the use of dmp or raw for coordinator disks, or data disks which are always managed by
dmp -- registrations (and the reservation mode) are not written to the LUN. Those requests are serviced
by the array, and the array controller tracks those in its memory. "Persistent" means persistent across
SCSI bus resets and host reboots, but these keys do NOT persist across array reboots (which in practice
almost never happen).
4. Is it possible that a downed path during reservation writing could fail on a specific path?
Reservations only happen to data disks. Data disks are exclusively managed by dmp, and if the installed
array policy module (APM) is working correctly (bug free), registrations will be made to all active
paths. If a new path is added, or a dead path is restored, dmp must register a key there before sending
any IO to that newly added/restored path. We have seen a few Active/Passive array APM's to have bugs
in this area, but in your case of a Symmetrix (mentioned above) I am not aware of any problems with
path restoration with that APM (dmpaa).
Registrations on coordinator disks (remember coordinator disks are never reserved) happen at host boot
time. If you're using the "raw" policy, there is no mechanism to add keys to new/restored paths after
the reboot. Due to this deficiency, it was decided to leverage the capabilities of dmp by telling vxfen
module to use dmpnodes instead of raw paths. This avoided reinventing the wheel of adding APM-like
code to the vxfen module.
194
Advanced VCS for IO Fencing
and Various Commands
If a registration fails down a particular path, dmp *should* prevent that path from going to an online
state -- but I know that we've seen a few problems with this in the past (path goes online but the
registration failed, leaving the particular subpath keyless).
5. If so, does scsi3_disk_policy=dmp result in the key being written on the bad path when it comes back
online? If the dmp policy does not interact with the vxfen module and allow for placement of the keys
on the previously bad path – then what is the benefit of the dmp node?
Using dmp policy instructs vxfen to use dmpnode instead of raw path. When the registration is made
on the dmpnode, dmp keeps track of that registration request, and will gratuitously make the same
registration for any subsequent added/restored path that arrives after the original registration to the
dmpnode was made -- at least that's what is supposed to happen (see above about corner case bugs that
have been identified and addressed over times past).
6. Can this setting be adjusted on the fly with the cluster up?
The /etc/vxfentab file is (re)created each time the vxfen start script runs. Once the file is built,
"vxfenconfig -c" reads the file upon initialization only. With 50mp3 and later, there is a way to go
through a "replace" procedure to replace one device with another. With a bit of careful testing, that
method could be used to replace the /dev/rdsk/c_t_d with the corresponding dmpnode if desired.
7. Last, why does the registration key on a data drive only have one key when there are multiple paths?
Reservations have a key per path. Is the registration written to the LUN instead of the Symm?
It’s the other way actually, there are multiple registrations (one per path), and only one reservation. The
reservation is not really a key itself (its a mode setting) but is made through a registration key. If you
unregister the hosting key, the reservation mode is lost. But if you preempt that key using some other
registration, the spec says that the preempting key will inherit the reservation. Our dmp code is paranoid
here, and we try the reservation again anyway. As a result, it is expected to see failed reservations
coming from CVM slave nodes given it is the CVM master that makes the initial reservation through one
of its paths to the LUN and the slave's attempt to re-reserve is expected to fail if one of the paths from the
CVM master still holds the reservation. If for some reason the master lost its reservation (should never
happen) our extra try for reservation from all joining slaves is something like an extra insurance policy.
Also note that the *PGR0001 Key Value increments each time you deport and re-import the same
shared DG several times:
The port_b IOFencing driver is configured at boot time via the /etc/rc2.d/S97vxfen start script. This
script performs several steps:
195
Advanced VCS for IO Fencing
and Various Commands
• reads /etc/vxfendg to determine name of the diskgroup (DG) that contains the coordinator disks
• performs a "vxdisk list diskname" for each to determine all available paths to each coordinator disk
The easiest way of clearing keys is the /opt/VRTSvcs/rac/bin/vxfenclearpre script but this requires all
IO to stop to ALL diskgroups, and a reboot to immediatly follow running the script (to safely re-apply
needed keys). Failure to reboot results in VXVM performing shared IO without keys. If an event arises
that mandates fencing, winning nodes will attempt to eject the keys from losing nodes, but won't find
any. VXVM will silently continue. Worse yet, because the RESERVATION isn't present, the losing
nodes still have the ability to write to the data disks thereby bypassing IOfencing altogether.
If a node wants to perform IO on a device which has a RESERVATION, the node must first
REGISTER a key. If the RESERVATION is inadvertently cleared, there is no requirement to maintain
a REGISTRATION. For this reason, keys should never be manipulated of disks actively imported in
shared mode.
Manually stepping through this document 3-4 times using a spare disk on your cluster is the only way to
become familiar with fencing and quickly resume normal production operation after a fence operation
occurs. Otherwise, you must use vxfenclearpre or call VERITAS Support at 800 342 0652, being
prepared to provide your VSN contract ID. Reading over the logic of vxfentsthdw and vxfenclearpre
shell scripts also are valuable training aides.
Registration Usage
A------- VXFEN for coordinator disks
APGR0003 VXVM for data disks **
VERITASP vxfenclearpre temp keys to preempt other keys
A7777777 VXVM temp keys during shared import
ZZZZZZZZ VXVM temp keys during shared import
A1------ used by VERTIAS support to preempt other keys
a. If activation set to off these are common errors when trying to mount the filesystem
# mount -o cluster,largefiles,qio \
/dev/vx/dsk/orvol_dg/orbvol /shared
mount: /dev/vx/dsk/orabinvol_dg/orabinvol is not this fstype.
196
Advanced VCS for IO Fencing
and Various Commands
# which vxfsckd
/opt/VRTSvxfs/sbin/vxfsckd
# /opt/VRTSvxfs/sbin/vxfsckd
# ps -ef|grep vxfsckd
root 5547 1 0 23:04:43 ? 0:00 /opt/VRTSvxfs/sbin/vxfsckd
This results in the keys from the rebooted node remaining on the disks and prevents vxfen from
starting. Easy way to fix is a reboot with init 6.
197
Advanced VCS for IO Fencing
and Various Commands
7. Data Disk example with keys - should have both Reservation and Registration set.
# vxfenadm -i /dev/rdsk/c2t13d0s2
Vendor id : EMC
Product id : SYMMETRIX
Revision : 5567
Serial Number : 42031000a
198
Advanced VCS for IO Fencing
and Various Commands
Use disk list to show keys - example only showing one disk
In VERITAS Volume Manager 3.2 and later versions, there are two detach policies for a shared disk
group, global and local. The default policy, and the way VERITAS Cluster Volume Manager (CVM) has
always worked, is global. The policy can be selected for each disk group with the vxedit set command.
The global policy will cause the disk to be detached throughout the cluster if a single node experiences
an I/O failure to that disk.
The local policy may be preferred for unmirrored volumes or in cases where availability is preferred
over redundancy of the data. It allows a disk that experiences an I/O failure to remain available if other
nodes in the cluster are still able to access it. After an I/O failure occurs, a message will be passed around
the cluster to determine if the failure is disk related or path related. If the other nodes can still write
to the disk, the mirrors are kept in sync by other nodes. The original node will fail writes. Something
similar is done for reads, but the read will succeed.
The state is not persistent. If a node has a local I/O failure, it does not remember. Any following read
or write that fails will go through the same process of passing messages around the cluster to check for
path or disk failure and repair the mirrored volume.
Disk Detach Policy has no effect on the Master node, as any IO failure will result in the plex detaching
regardless of policy. In any case, slaves that can't see the disk will still be unable to join the cluster.
diskdetpolicy
- global
199
Advanced VCS for IO Fencing
and Various Commands
- local
http://support.veritas.com/docs/258677
15.Example walk through of adding SCSI3-PGR Keys Manually
200
Advanced VCS for IO Fencing
and Various Commands
Note
Even though the reservation is not a key, you must use the registration key to RESERVE
(see note above).
# vxfenadm -n -f /tmp/data_disk
VXFEN:libvxfen:1118: Reservation FAILED for: /dev/rdsk/c2t0d1s2
VXFEN:libvxfen:1133: Error returned: Error 0
201
Advanced VCS for IO Fencing
and Various Commands
f. A1 Key Removal
202
Advanced VCS for IO Fencing
and Various Commands
No keys...
My use of LDOM’s here is for testing, Veritas Cluster Server can be used to failover LDOM’s, however
it is not recommended to run VCS within an LDOM as though it is a non-virtualized system.
TARGET SERVER
Target: jbod/iscsi/lun0
iSCSI Name: \
iqn.1986-03.com.sun:02:b3d446a9-683b-615d-b5db-ff6846dbf758
Connections: 0
Target: jbod/iscsi/zlun1
iSCSI Name: \
iqn.1986-03.com.sun:02:633bdd37-1dfa-e1df-ee5e-91b8d29f410d
Connections: 0
INITIATOR SERVER
Manual Configuration – Static Entry (no auto-discover): Execute the following on LDOM#0 and LDOM#1
$ devfsadm -c iscsi
$ format
203
Advanced VCS for IO Fencing
and Various Commands
c1t010000144F3B8D6000002A004987CB2Cd0: \
configured with capacity of 16.00GB
LABEL Drive #1
Creation of zpool, and non-global zone followed by deport/import and detach/attach for testing migration
prior to failover configuration.
LDOM#0 Only
$ zonecfg –z p1
zonecfg:p1> create
zonecfg:p1> set zonepath=/zones/p1
zonecfg:p1> add net
zonecfg:p1:net> set physical=vnet0
zonecfg:p1:net> set address=192.168.15.77/24
zonecfg:p1:net> end
zonecfg:p1> exit
$ zoneadm -z p1 install
$ zoneadm –z p1 boot
$ zlogin –C p1
// Config system’s sysidcfg
$ zoneadm –z p1 halt
$ zoneadm –z p1 detach
$ zpool export zones
LDOM#1 Only
204
Advanced VCS for IO Fencing
and Various Commands
$ zoneadm –z p1 halt
$ zoneadm –z p1 detach
$ zpool export zones
Note lack of running zonecfg –z p1 create –a /zones. This is not necessary once the zone.xml and index.xml
are updated with p1 zone information. Should this script be automated, you may want to consider adding
the force configuration into the script – just in case.
Moving Configuration of Zone and ZFS Pool on iSCSI Storage into Veritas Cluster Server .50MP3.
Note
The Zpool Agent only included with VCS starting in 5.0MP3 for Solaris. There are a number of
configuration variations that could be used here, including legacy mounts with the Mount Agent.
Below is a simple layout that uses ZFS Automounting when the zpool is imported through VCS.
Example VCS 5.0MP3 main.cf configuration for Zpool and Zone Failover
$ haconf -makerw
$ hagrp –add ztest
$ hagrp –modify ztest SystemList dom2 0 dom1 1
$ hagrp –modify ztest AutoStartList dom2 dom1
$ haconf –makerw
$ hares –link zone_p1 zpool_zones
$ haconf –dump -makero
include "types.cf"
205
Advanced VCS for IO Fencing
and Various Commands
cluster LDOM_LAB (
UserNames = { admin = eLMeLGlIMhMMkUMgLJ,
z_zone_p1_dom2 = bkiFksJnkHkjHpiMji,
z_zone_p1_dom1 = dqrRrkQopKnsOooMqx }
Administrators = { admin }
)
system dom1 (
)
system dom2 (
)
group ztest (
SystemList = { dom1 = 0, dom2 = 1 }
AutoStartList = { dom2, dom1 }
Administrators = { z_zone_p1_dom2, z_zone_p1_dom1 }
)
Zone zone_p1 (
ZoneName = p1
)
Zpool zpool_zones (
PoolName = zones
AltRootPath = "/"
)
1. On Node A
206
Advanced VCS for IO Fencing
and Various Commands
/opt/VRTSllt/getmac /dev/hme:0
/opt/VRTSllt/dlpiping –vs /dev/hme:0
2. On Node B
Note
This process has only been used on CONCAT volumes. You will need to convert layout to
CONCAT for each volume if striped.
Migration Workflow
1. Have new SAN storage allocated to target host, and the same new storage LUN Masked/Zoned to
source host
4. Break Mirror and remove new LUNs from Source host vxvm configuration
5. Re-create new disk group on target host using modified vxvm database dump
Migration Walkthrough
1. Identify source and target LUNs; and difference in device names on source and target. Also record
mount points and disk sizes
target_lun0 = c2t600144F04A2E74170000144F3B8D6000d0
source_lun0 = c2t600144F04A2E74150000144F3B8D6000d0
# df -h
Filesystem size used avail capacity Mounted on
/dev/vx/dsk/demo_orig/v01 4.0G 18M 3.7G 1% /v01
/dev/vx/dsk/demo_orig/v02 4.0G 18M 3.7G 1% /v02
/dev/vx/dsk/demo_orig/v03 2.0G 18M 1.9G 1% /v03
/etc/vfstab:
/dev/vx/dsk/demo_org/v01 /dev/vx/rdsk/demo_org/v01 /v01 vxfs 2 yes -
207
Advanced VCS for IO Fencing
and Various Commands
# vxprint
Disk group: demo_orig
2. Add disks from destination to source server and mirror to new disks
208
Advanced VCS for IO Fencing
and Various Commands
Backup files before editing. Specifically removing sub disk and plex information pointing toward the
source disk.
Since plex v01-01 and sub disk orig_disk-01 were the original mirrors, delete references for those items
in the maker.out file. Here they are highlighted. Onlyt v01 volume is shown, continue for all volumes.
vol v01
use_type=fsgen
fstype="
comment="
putil0="
putil1="
putil2="
state="ACTIVE
writeback=on
writecopy=off
specify_writecopy=off
pl_num=2
start_opts="
read_pol=SELECT
minor=54000
user=root
group=root
mode=0600
log_type=REGION
len=8388608
log_len=0
update_tid=0.1081
rid=0.1028
detach_tid=0.0
active=off
forceminor=off
badlog=off
recover_checkpoint=16
sd_num=0
sdnum=0
kdetach=off
storage=off
readonly=off
layered=off
apprecover=off
recover_seqno=0
recov_id=0
primary_datavol=
vvr_tag=0
iscachevol=off
morph=off
guid={7251b03a-1dd2-11b2-ad16-00144f6ece3b}
inst_invalid=off
incomplete=off
209
Advanced VCS for IO Fencing
and Various Commands
instant=off
restore=off
snap_after_restore=off
oldlog=off
nostart=off
norecov=off
logmap_align=0
logmap_len=0
inst_src_guid={00000000-0000-0000-0000-000000000000}
cascaded=off
plex=v01-01,v01-02
export=
plex v01-01
compact=on
len=8388608
contig_len=8388608
comment="
putil0="
putil1="
putil2="
v_name=v01
layout=CONCAT
sd_num=1
state="ACTIVE
log_sd=
update_tid=0.1066
rid=0.1031
vol_rid=0.1028
detach_tid=0.0
log=off
noerror=off
kdetach=off
stale=off
ncolumn=0
raidlog=off
guid={7251f842-1dd2-11b2-ad16-00144f6ece3b}
mapguid={00000000-0000-0000-0000-000000000000}
sd=orig_disk-01:0
sd orig_disk-01
dm_name=orig_disk
pl_name=v01-01
comment="
putil0="
putil1="
putil2="
dm_offset=0
pl_offset=0
len=8388608
update_tid=0.1034
rid=0.1033
guid={72523956-1dd2-11b2-ad16-00144f6ece3b}
plex_rid=0.1031
dm_rid=0.1026
minor=0
210
Advanced VCS for IO Fencing
and Various Commands
detach_tid=0.0
column=0
mkdevice=off
subvolume=off
subcache=off
stale=off
kdetach=off
relocate=off
sd_name=
uber_name=
tentmv_src=off
tentmv_tgt=off
tentmv_pnd=off
plex v01-02
compact=on
len=8388608
contig_len=8388608
comment="
putil0="
putil1="
putil2="
v_name=v01
layout=CONCAT
sd_num=1
state="ACTIVE
log_sd=
update_tid=0.1081
rid=0.1063
vol_rid=0.1028
detach_tid=0.0
log=off
noerror=off
kdetach=off
stale=off
ncolumn=0
raidlog=off
guid={3d6ce0f2-1dd2-11b2-ad18-00144f6ece3b}
mapguid={00000000-0000-0000-0000-000000000000}
sd=new_disk-01:0
sd new_disk-01
dm_name=new_disk
pl_name=v01-02
comment="
putil0="
putil1="
putil2="
dm_offset=0
pl_offset=0
len=8388608
update_tid=0.1066
rid=0.1065
guid={3d6d2076-1dd2-11b2-ad18-00144f6ece3b}
plex_rid=0.1063
dm_rid=0.1052
211
Advanced VCS for IO Fencing
and Various Commands
minor=0
detach_tid=0.0
column=0
mkdevice=off
subvolume=off
subcache=off
stale=off
kdetach=off
relocate=off
sd_name=
uber_name=
tentmv_src=off
tentmv_tgt=off
tentmv_pnd=off
2. Create Disk Group on Target from Disks that were a mirror on source: Get the value of X from the
first drive listed in "list"
4. Start Volumes
212
Chapter 18. OpenSolaris 2009.06
COMSTAR
Installation
1. Install COMSTAR Server Utilities
2. Disable iscsitgt and physical:nwam Service - itadm gets confused with multiple physical instances; this
assumes not using nwam.
3. Reboot Server
## OR
213
OpenSolaris 2009.06 COMSTAR
In this case, the t1000_primary will contain a list of my T1000 primary domain iscsi iqn's generated
by iscsiadm on each remote host.
214
OpenSolaris 2009.06 COMSTAR
# itadm list-target -v
TARGET NAME STATE
iqn.1986-03.com.sun:02:2be6d243-0ff9-6981-f157-eea00338d1d4 online
alias: -
auth: none (defaults)
targetchapuser: -
targetchapsecret: unset
tpg-tags: nge0 = 2
iqn.1986-03.com.sun:02:1a6416d2-a260-ebe4-bbf7-d28643276f65 online
alias: -
auth: none (defaults)
targetchapuser: -
targetchapsecret: unset
tpg-tags: nge1 = 2
4. Mapping each LUN to both the Target TG access list, and the remote host HG Access list
Found LU(s)
GUID SIZE
-------------------------------- ----------------
600144f0c312030000004a3b8068001c /dev/zvol/rdsk/npool/COMSTAR_LUN5
600144f0c312030000004a3b8068001b /dev/zvol/rdsk/npool/COMSTAR_LUN4
600144f0c312030000004a3b8068001a /dev/zvol/rdsk/npool/COMSTAR_LUN3
600144f0c312030000004a3b80680019 /dev/zvol/rdsk/npool/COMSTAR_LUN2
600144f0c312030000004a3b80680018 /dev/zvol/rdsk/npool/COMSTAR_LUN1
600144f0c312030000004a3b80680017 /dev/zvol/rdsk/npool/COMSTAR_LUN0
215
OpenSolaris 2009.06 COMSTAR
## Repeat below for each LUN to be shared over iFA1 (nge1) to remove
## iscsi addressed defined in HG t1000_primary
216
Chapter 19. Sun Cluster 3.2
Preperation
This section covers a walkthrough configuration for Sun Cluster. General requirments include the
following:
Warning
ZFS is not supported for the /globaldevice filesystem, therefore unless you are being creative
avoid installing Solaris 10 with the ZFS Root Option. If you do not allocate a UFS filesystem
and partition for /globaldevices then a LOFI device will be used. This will reduce boot
performance.
3. Network Configuration
217
Sun Cluster 3.2
Installation
This section covers a walkthrough configuration for Sun Cluster. General installation include the
following:
Warning
Either untar the software on both servers under /tmp or run installer from a shared directory
such as NFS. Sun Cluster must be installed on both systems
/swdepot/sparc/suncluster/Solaris_sparc
$ ./installer
Installation Type
-----------------
1. Yes
2. No
218
Sun Cluster 3.2
Ready to Install
----------------
The following components will be installed.
219
Sun Cluster 3.2
1. Install
2. Start Over
3. Exit Installation
What would you like to do [1] {"<" goes back, "!" exits}? 1
Basic Configuration
This section covers a walkthrough configuration for Sun Cluster. General configuration include the
following:
Warning
Interfaces configured for heart beats must be unplumbed and have no /etc/hostname.dev file.
Warning
During the scinstall configuration process the nodes will be rebooted
1. Product Configuration
# /usr/cluster/bin/scinstall
220
Sun Cluster 3.2
Option: 1
Option: 1
You must use the Java Enterprise System (JES) installer to install the
Sun Cluster framework software on each machine in the new cluster
before you select this option.
221
Sun Cluster 3.2
This tool supports two modes of operation, Typical mode and Custom.
For most clusters, you can use Typical mode. However, you might need
to select the Custom mode option if not all of the Typical defaults
can be applied to your cluster.
For more information about the differences between Typical and Custom
modes, select the Help option from the menu.
1) Typical
2) Custom
?) Help
q) Return to the Main Menu
Option [1]: 1
Each cluster has a name assigned to it. The name can be made up of any
characters other than whitespace. Each cluster name should be unique
within the namespace of your enterprise.
Please list the names of the other nodes planned for the initial
cluster configuration. List one node name per line. When finished,
type Control-D:
sysdom0
sysdom1
You must identify the cluster transport adapters which attach this
node to the private cluster interconnect.
222
Sun Cluster 3.2
You must identify the cluster transport adapters which attach this
node to the private cluster interconnect.
1) bge2
2) bge3
3) Other
Option: 1
1) bge2
2) bge3
3) Other
Option:
You have chosen to turn on the global fencing. If your shared storage
devices do not support SCSI, such as Serial Advanced Technology
Attachment (SATA) disks, or if your shared disks do not support
SCSI-2, you must disable this feature.
Cluster Creation
223
Sun Cluster 3.2
Rebooting ...
General Commands
This section covers a walkthrough configuration for Sun Cluster. General resource configuration:
Note
The DID ID's are under /dev/did/dsk and /dev/did/rdsk on each node in the cluster. These paths
are to be used for creating failover filesystems, zpools and storage access.
cldevice list -v
DID Device Full Device Path
---------- ----------------
d1 sysdom1:/dev/rdsk/c0t0d0
d2 sysdom1:/dev/rdsk/c1t600144F0C312030000004A3B80680017d0
d2 sysdom0:/dev/rdsk/c1t600144F0C312030000004A3B80680017d0
d3 sysdom1:/dev/rdsk/c1t600144F0C312030000004A3B80680018d0
d3 sysdom0:/dev/rdsk/c1t600144F0C312030000004A3B80680018d0
d4 sysdom1:/dev/rdsk/c1t600144F0C312030000004A3B80680019d0
d4 sysdom0:/dev/rdsk/c1t600144F0C312030000004A3B80680019d0
d5 sysdom1:/dev/rdsk/c1t600144F0C312030000004A3B8068001Ad0
d5 sysdom0:/dev/rdsk/c1t600144F0C312030000004A3B8068001Ad0
d6 sysdom1:/dev/rdsk/c1t600144F0C312030000004A3B8068001Bd0
d6 sysdom0:/dev/rdsk/c1t600144F0C312030000004A3B8068001Bd0
224
Sun Cluster 3.2
d7 sysdom1:/dev/rdsk/c1t600144F0C312030000004A3B8068001Cd0
d7 sysdom0:/dev/rdsk/c1t600144F0C312030000004A3B8068001Cd0
d8 sysdom1:/dev/rdsk/c1t600144F0C312030000004A4518A90001d0
d8 sysdom0:/dev/rdsk/c1t600144F0C312030000004A4518A90001d0
d9 sysdom1:/dev/rdsk/c1t600144F0C312030000004A4518BF0002d0
d9 sysdom0:/dev/rdsk/c1t600144F0C312030000004A4518BF0002d0
d10 sysdom0:/dev/rdsk/c0t0d0
clquorum list
d2
sysdom1
sysdom0
3. Register the HAStoragePlus agent and add it to the apache-rg resource group
225
Sun Cluster 3.2
Update the httpd.conf file to point to storage under /apache on both servers.
Make sure IP/Hostname is in both servers /etc/hosts file. In this case the server vsrvmon has an IP of
192.168.15.95
# ifconfig -a
bge0:1: flags=1001040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED
,IPv4,FIXEDMTU> mtu 1500 index 2
inet 192.168.15.95 netmask ffffff00 broadcast 192.168.15.255
# scstat -i
-- IPMP Groups --
226
Sun Cluster 3.2
8. Update the httpd.conf on both systems to ues the floating IP as the ServerName
10.Status the Apache Resource group, and switch resource through all systems
1. Create a NGZ for each server using the following commad from one server
227
Sun Cluster 3.2
3. Add the physical host information and network information for the zone on each host
4. From documents - still working on what this means - in this case, the IPs are those of vsrv3 and vssrv4
in that order
clzc:sczone> verify
clzc:sczone> commit
clzc:sczone> exit
228
Sun Cluster 3.2
This set of examples are configured within two LDOM's on one server, therefore the network devices are
in vnet# form. Replace the vnet# with your appropriate network devices and all commands should function
properly on non-virtualized hardware.
This is needed because the CRS processes are started as root and therefore will not be impacted by the
oracle project definition later on in this writeup. It is possible to make these part of a unique project
and prefix the CRS start scripts with a newtask command, or to define a system or root project. The
choice is up to you.
/etc/system:
set shmsys:shminfo_shmmax=SGA_size_in_bytes
3. Download and install the SUN QFS Packages on all nodes in the cluster
Warning
Make sure that /var/run/nodelist exists on both servers. I've noticed that it might not. If not the
-M metaset command will fail. Content of the file is: Node# NodeName PrivIP
cat /var/run/nodelist
1 vsrv2 172.16.4.1
2 vsrv1 172.16.4.2
# metaset
229
Sun Cluster 3.2
vsrv2 Yes
vsrv1 Yes
/etc/opt/SUNWsamfs/mcf:
/etc/opt/SUNWsamfs/samfs.cmd:
fs=RAC
sync_meta=1
/etc/opt/SUNWsamfs/hosts.RAC:
6. Create QFS Directory on both nodes and make filesystem just from one node
# mkdir -p /localzone/sczone/root/db_qfe/oracle
# /opt/SUNWsamfs/sbin/sammkfs -S RAC
sammkfs: Configuring file system
sammkfs: Enabling the sam-fsd service.
sammkfs: Adding service tags.
Warning: Creating a new file system prevents use with 4.6 or earlier
releases.
7. Mount, test, and remove mount point, otherwise clzonecluster install will fail.
# mount RAC
# umount RAC
# rm -rf /localzone/sczone
230
Sun Cluster 3.2
9. Add sysid Information - there are more options than listed here
10.Add the physical host information and network information for the zone on each host
clzc:sczone> add fs
clzc:sczone:fs> set dir=/db_qfs/oracle
clzc:sczone:fs> set special=RAC
clzc:sczone:fs> set type=samfs
clzc:sczone:fs> end
231
Sun Cluster 3.2
Initially add the storage to the storage group with metaset -s zora, then add into the zone configuration
- short example provided, repeat for each device
# metastat -c -s zora
zora/d500 m 980MB zora/d50
zora/d50 s 980MB d5s0
clzc:sczone> verify
clzc:sczone> commit
clzc:sczone> exit
232
Sun Cluster 3.2
3. Add an instance of the SUNW.rac_framework resource type to the resource group that you created in
Step 2.
5. Add an instance of the SUNW.rac_udlm resource type to the resource group that you created in Step 2.
6. Bring online and in a managed state the RAC framework resource group and its resources.
233
Chapter 20. Hardware Notes
SunFire X2200 eLOM Management
SP General Commands
• To power on the host, enter the following command:
• To reboot and enter the BIOS automatically, enter the following command:
start /SP/AgentInfo/console
• To terminate a server console session started by another user, enter this command:
stop /SP/AgentInfo/console
System console
• Use the Esc-Shift-9 key sequence to toggle back to the local console flow. Enter Ctrl-b to terminate the
connection to the serial console
234
Hardware Notes
3. In the /boot/grub/menu.1st file, edit the splashimage and kernel lines to read as follows:
# splashimage /boot/grub/splash.xpm.gz
kernel /platform/i86pc/multiboot -B console=ttyb
4. Change the login service to listen at 115200 by making the following edits to /var/svc/manifest/system/
console-login.xml:
# reboot -- -r
Configure ELOM/SP
Change IP Address from DHCP to Static
Properties:
HWVersion = 0
FWVersion = 3.20
MacAddress = 00:16:36:5B:97:E4
IpAddress = 10.13.60.63
235
Hardware Notes
NetMask = 255.255.255.0
Gateway = 10.13.60.1
DhcpConfigured = disable
root , changeme
show /SP/cli/commands
delete /SP/users/fred
236