Professional Documents
Culture Documents
Workshop
ST-350
Student Guide With Instructor Notes
Copyright 2002 Sun Microsystems, Inc., 901 San Antonio Road, Palo Alto, California 94303, U.S.A. All rights reserved.
This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and
decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of
Sun and its licensors, if any.
Third-party software, including font technology, is copyrighted and licensed from Sun suppliers.
Sun, Sun Microsystems, the Sun Logo, Solaris, Ultra, SunSolve Online, Sun Explorer Data Collector, Sun Enterprise Ultra, SunSpectrum,
BigAdmin, Sun System Configuration Check, Sun Blade, Sun Fire, Sun4U, Solaris Management Console, SunSolve Online, SunVTS,
AnswerBook2, and OpenWindows are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries.
All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and
other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc.
UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd.
Adobe is a registered trademark of Adobe Systems, Incorporated.
Federal Acquisitions: Commercial Software Government Users Subject to Standard License Terms and Conditions
Export Laws. Products, Services, and technical data delivered by Sun may be subject to U.S. export controls or the trade laws of other
countries. You will comply with all such laws and obtain all licenses to export, re-export, or import as may be required after delivery to
You. You will not export or re-export to entities on the most current U.S. export exclusions lists or to any country subject to U.S. embargo
or terrorist controls as specified in the U.S. export laws. You will not use or provide Products, Services, or technical data for nuclear, missile,
or chemical biological weaponry end uses.
DOCUMENTATION IS PROVIDED AS IS AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS, AND
WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE
OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE
LEGALLY INVALID.
Please
Recycle
Copyright 2002 Sun Microsystems Inc., 901 San Antonio Road, Palo Alto, California 94303, Etats-Unis. Tous droits rservs.
Ce produit ou document est protg par un copyright et distribu avec des licences qui en restreignent lutilisation, la copie, la distribution,
et la dcompilation. Aucune partie de ce produit ou document ne peut tre reproduite sous aucune forme, par quelque moyen que ce soit,
sans lautorisation pralable et crite de Sun et de ses bailleurs de licence, sil y en a.
Le logiciel dtenu par des tiers, et qui comprend la technologie relative aux polices de caractres, est protg par un copyright et licenci
par des fournisseurs de Sun.
Sun, Sun Microsystems, le logo Sun, Solaris, Ultra, SunSolve Online, Sun Explorer Data Collector, Sun Enterprise Ultra, SunSpectrum,
BigAdmin Portal, Sun System Configuration Check, Sun Blade, Sun Fire, Sun4U, Solaris Management Console, SunSolve Online, SunVTS,
AnswerBook2, et OpenWindows sont des marques de fabrique ou des marques dposes de Sun Microsystems, Inc. aux Etats-Unis et dans
dautres pays.
Toutes les marques SPARC sont utilises sous licence sont des marques de fabrique ou des marques dposes de SPARC International, Inc.
aux Etats-Unis et dans dautres pays. Les produits portant les marques SPARC sont bass sur une architecture dveloppe par Sun
Microsystems, Inc.
UNIX est une marques dpose aux Etats-Unis et dans dautres pays et licencie exclusivement par X/Open Company, Ltd.
Adobe est une marque enregistree de Adobe Systems, Incorporated.
Lgislation en matire dexportations. Les Produits, Services et donnes techniques livrs par Sun peuvent tre soumis aux contrles
amricains sur les exportations, ou la lgislation commerciale dautres pays. Nous nous conformerons lensemble de ces textes et nous
obtiendrons toutes licences dexportation, de r-exportation ou dimportation susceptibles dtre requises aprs livraison Vous. Vous
nexporterez, ni ne r-exporterez en aucun cas des entits figurant sur les listes amricaines dinterdiction dexportation les plus courantes,
ni vers un quelconque pays soumis embargo par les Etats-Unis, ou des contrles anti-terroristes, comme prvu par la lgislation
amricaine en matire dexportations. Vous nutiliserez, ni ne fournirez les Produits, Services ou donnes techniques pour aucune utilisation
finale lie aux armes nuclaires, chimiques ou biologiques ou aux missiles.
LA DOCUMENTATION EST FOURNIE EN LETAT ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES
EXPRESSES OU TACITES SONT FORMELLEMENT EXCLUES, DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE, Y
COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A LAPTITUDE A UNE
UTILISATION PARTICULIERE OU A LABSENCE DE CONTREFAON.
Please
Recycle
Table of Contents
About This Course ............................................................Preface-xvii
Course Goals........................................................................Preface-xvii
Course Map........................................................................ Preface-xviii
Topics Not Covered............................................................. Preface-xix
How Prepared Are You?...................................................... Preface-xx
Introductions ........................................................................ Preface-xxi
How to Use Course Materials ........................................... Preface-xxi
Conventions .........................................................................Preface-xxii
Icons .............................................................................Preface-xxii
Typographical Conventions ....................................Preface-xxiii
Introducing the Fault Analysis and Diagnosis Methodology .......1-1
Objectives ........................................................................................... 1-1
Relevance............................................................................................. 1-2
Additional Resources ........................................................................ 1-3
Describing the Fault Analysis and Diagnosis Methodology ....... 1-4
Stating the Problem Clearly..................................................... 1-5
Listing Facts ............................................................................... 1-6
Documenting Each Item Carefully ......................................... 1-9
Introducing the Fault Diagnosis Methodology ........................... 1-11
Prioritizing Planned Tests...................................................... 1-12
Verifying the Corrective Action............................................ 1-14
Documenting Each Item......................................................... 1-14
Identifying the Basic Layers and Error Types in
Sun Systems ................................................................................... 1-17
Overview of the Four Basic Layers of a Sun System ......... 1-17
Introducing Types of Faults in Sun Systems....................... 1-18
Identifying Error-Reporting Mechanisms ........................... 1-20
Exercise: Performing Fault Analysis and Diagnosis................... 1-22
Preparation............................................................................... 1-22
Tasks ......................................................................................... 1-22
Fault Analysis and Diagnosis Worksheet Template.................. 1-23
Analysis Phase......................................................................... 1-23
Diagnosis Phase....................................................................... 1-24
v
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
vi
vii
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
viii
ix
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
xi
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
xii
xiii
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
xiv
xv
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
Preface
Preface-xvii
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
Course Map
Course Map
The following course map enables you to see what you have
accomplished and where you are going in reference to the course goals.
Introducing the
OBP Device Tree
and BOOT Sequence
Sun Software
Performing Solaris OE
Diagnosing Faults
Diagnostics
Preface-xviii
Software installation
Printer management
Network configuration
Refer to the Sun Educational Services catalog for specific information and
registration.
Preface-xix
If any students indicate they cannot do the above, meet with them at the first break to decide how to
proceed with the class. Do they want to take the class at a later date? Is there some way to get the
extra help needed during the week?
It might be appropriate here to recommend resources from the Sun Educational Services catalog that
provide training for topics not covered in this course.
Preface-xx
Introductions
Introductions
Now that you have been introduced to the course, introduce yourself to
the other students and the instructor, addressing the items shown on the
overhead.
Visual aids The instructor might use several visual aids to convey a
concept, such as a process, in a visual form. Visual aids commonly
contain graphics and summarized text.
Preface-xxi
Conventions
Conventions
The following conventions are used in this course to represent various
training elements and alternative learning resources.
Icons
Additional resources Indicates other references that provide additional
information on the topics described in the module.
Insert Title of OH
here
This is an instructor-only icon. Use the visual aid icon to indicate which
slide to present.
Note Indicates additional information that can help students but is not
crucial to their understanding of the concept being described. Students
should be able to understand the concept or complete the task without
this information. Examples of notational information include keyword
shortcuts and minor system adjustments.
Caution Indicates that there is a risk of personal injury from a
nonelectrical hazard, or risk of irreversible damage to data, software, or
the operating system. A caution indicates that the possibility of a hazard
(as opposed to certainty) might happen, depending on the action of the
user.
Preface-xxii
Conventions
Typographical Conventions
Courier is used for the names of commands, files, directories,
programming code, and on-screen computer output, such as:
The /etc/hostname.hme0 file on the faulty system does not have
the Internet Protocol (IP) address defined on it.
Courier bold is used for characters and numbers that you type, such as:
To check the architecture of your system, use the following
command:
# uname -m
Preface-xxiii
Conventions
Objectives
Relevance
References
Lab exercises
Emphasize that the main purpose of this course is to develop an approach to fix system errors. The
performance of students does not depend on the number of faults they fix but the approach they use to
debug faults.
To enable you to follow this structure, the following supplementary materials are provided with this course:
Relevance
These questions or scenarios set the context of the module. It is suggested that you ask these questions and
discuss the answers with students. The answers are provided only in the instructor guide.
Course map
The course map allows the students to get a visual picture of the course. It also helps students know the
status. The course map is presented in the About This Course section of the student guide.
Lecture overheads
Small-group discussion
After the lab exercises, it is a good idea to debrief the students. You can gather them back into the classroom
and have them discuss their discoveries, problems, and issues in programming the solution to the problem in
small groups of four or five, one-on-one, or one-on-many.
Preface-xxiv
Conventions
Each module contains a Relevance section after the course map. This section may present a scenario
relating to the content presented in the module, or it may present questions that stimulate students to think
about the content that will be presented. Engage the students in relating experiences or posing possible
answers to the questions. Spend no more that 1015 minutes on this section.
Module
Lecture
(Minutes)
Lab
(Minutes)
Total Time
(Minutes)
Preface
30
NA
30
Module 1
60
90
150
Module 2
90
90
180
Module 3
90
90
180
Module 4
90
90
180
Module 5
120
90
210
Module 6
90
60
150
Module 7
45
45
90
Module 8
45
90
135
Preface-xxv
Module 1
1-1
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
Relevance
Relevance
Present the following questions to stimulate students and get them thinking about the issues and topics
presented in this module. While they are not expected to know the answers to these questions, the answers
should be of interest to them and inspire them to learn the material presented in this module.
Relevance on
page OH 1-3
!
?
List some of the faults that you encounter in your day-to-day work.
Ask students to recount their experiences, and list the faults on a white board or a flip chart. If the faults
encompass a wide spectrum, use them to highlight the differences in problems that occur in different
circumstances. Alternatively, provide examples of diverse problems that students might encounter at their
work places. Use these examples to explain why it is essential to follow a consistent Fault Analysis and
Diagnosis methodology.
1-2
Additional Resources
Additional Resources
Additional resources The following references can provide additional
information on the topics discussed in this module:
Watchdog-resets.pdf
(http://sunsolve.sun.com/kmsattachments/41107.watchdogresets.pdf), accessed 25 February 2002.
1-3
Fault analysis
Fault diagnosis
Inform students that the first part of the module describes the Fault Analysis methodology and the second
part focuses on the Diagnosis methodology.
Introducing the
Fault Analysis
Methodology on
page OH 1-4
Figure 1-1
1-4
Identifying the fault correctly is critical to the success of the Fault Analysis
and Diagnosis methodology. Incorrect fault analysis can cause complete
system failure. Many faults become critical because the initial
identification of the fault is incorrect.
Caution While creating a problem statement, do not assume the cause of
the fault.
1-5
Listing Facts
Listing Facts on
page OH 1-5
The next step is to list the facts about the problem. This helps to establish
the possible causes of the fault. Figure 1-2 shows the recommended order
of steps to arrive at a list of facts.
Figure 1-2
1-6
Note The questions to be asked vary, depending on the fault and the
user. Use your judgement, and ask questions that help you to analyze the
fault.
If required, expand each bullet by presenting an example from student experiences.
System crash dumps Help identify the potential causes for the fault
by analyzing the /var/crash/`uname -n`/unix.n and
/var/crash/`uname -n`/vmcore.n dump files if they exist.
Log files Evaluate the messages recorded in system log files, such
as the /var/adm/messages file, for information about the fault.
After you identify the sources of information, you should start collecting
the information.
1-7
Are the two systems that are being compared similar, in terms of
hardware architecture, OE revisions, patch levels, and application
versions?
Does the known functional system display the same symptoms and
conditions as the faulty system?
Identify the differences between the faulty system and the nonfaulty
system.
Analyze the comparative facts about the systems for any similarities.
Note While the similarities between systems do not directly identify the
source of the fault, they can help to eliminate most of the potential
sources. Therefore, you can easily isolate the possible sources of the
system fault.
1-8
Problem Statement
When attempting to run the patchadd -p command, the system
generates a core dump in the /var/sadm/patch directory and terminates
the command.
1-9
Resources
Problem Statement The patchadd -p command is not executing
correctly, and it terminates with the error message listed in Table 1-1.
Problem Description
Table 1-1 shows the problem description.
Table 1-1 Problem Description
Error Messages
Symptoms and
Conditions
patchadd:
Program
unexpectedly
terminates
with signal 11.
The
patchadd -p
command is
terminated on
execution.
Recent Changes
Comparative Facts
The remon
package was
recently installed
on the system.
!
?
Ask students to recall the original problem. Compare it with the responses they provide for this question, and
highlight the amount of details they now have on the fault.
Ask students to explain the logic behind their proposed solutions. Capture student responses, and use them
to introduce the Fault Diagnosis methodology.
1-10
Figure 1-3
1-11
After evaluating all the data gathered in the Analysis phase, you generate
a list of probable causes for the fault. Next, you prioritize the probable
causes and various feasible test methods, according to the steps shown in
Figure 1-4 on page 1-12.
Figure 1-4
Formulating Hypotheses
A hypothesis states the most probable cause of the fault and is based on
the data collected in the Analysis phase. Although multiple hypotheses
exist, each hypothesis is tested separately, starting with the most probable
hypothesis.
1-12
1-13
2.
3.
1-14
Tests
Results
Verification
Execute the
truss patchadd
command on the
system.
# truss patchadd
...
open("/var/sadm/pkg/SUNWremon/pkginfo", O_RDONLY) = 4
fstat(4, 0xEFFFDA98)= 0
ioctl(4, TCGETA, 0xEFFFDA24)Err#25 ENOTTY
read(4, " C L A S S E S = S T A T".., 8192)= 1296
lseek(4, 0xFFFFFF45, SEEK_CUR)= 1109
close(4)
= 0
Incurred fault #6, FLTBOUNDS %pc = 0x00014B30
siginfo: SIGSEGV SEGV_MAPERR addr=0x0000000B
Received signal #11, SIGSEGV [default]
siginfo: SIGSEGV SEGV_MAPERR addr=0x0000000B
*** process killed ***
1-15
Corrective Action
Table 1-3 shows the corrective action that you must take for the preceding
problem.
Table 1-3 Corrective Action
Final Repair
Communication
Documentation
!
?
What are your inferences about the fault, based on the Diagnosis
section of the Fault Analysis and Diagnosis worksheet?
Ask students to recall the original problem statement. Compare it with their responses for this question, and
highlight the process by which they achieved the result. Next, explain that the same methodology is
applicable to all the problems that the student might encounter in their day-to-day work.
Remind students that while it might not be necessary to complete the entire Fault Analysis and Diagnosis
methodology, they should not miss any of the steps. Actions taken on a system without proper analysis and
diagnosis often do more harm than good.
1-16
Figure 1-5
1-17
Figure 1-6 shows the types of faults that might occur in a Sun system. The
corrective action depends on the type of the fault.
Figure 1-6
Error Categories
Software Errors
All errors that do not originate in the hardware are known as software
errors. The system processor detects and reports these errors. Examples of
software errors include programming errors in applications and bugs in
kernel code.
Hardware Errors
A hardware interrupt can indicate a hardware error. Examples of
hardware errors include corrupt disks and failures of power supply and
fan trays.
1-18
Critical Errors
Critical errors require immediate attention. You must shut down the
system immediately. Examples of critical errors include the following:
Fatal Errors
A fatal error corresponds to an error in which you cannot guarantee
system recovery. Examples of fatal errors include the following:
System Panics
A system panic occurs when the system detects a fatal error that can
corrupt data. The system responds by halting all processes and calling the
panic() kernel function. The panic() kernel function is not an error
condition but a protective reaction to an error condition that is designed
to safeguard system data. The panic() kernel function performs the
following:
Performs a stack trace and lists routines that led to the panic
You analyze the crash dump files generated during a system panic to
determine the cause of the panic.
Introducing the Fault Analysis and Diagnosis Methodology
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
1-19
Bus errors
Interrupts
Watchdog resets:
CPU
System
Bus Errors
A bus error occurs when a process receives a signal indicating that it
attempted to perform input/output (I/O) operations on a device that is
either restricted or does not exist.
Interrupts
An interrupt is a signal that the device driver of a hardware component
sends to the CPU. This signal requires a response to an event. An example
of such an event can be a completed I/O request or a hardware-error
condition. Interrupts are categorized as hardware or software interrupts.
Hardware interrupts are generated by I/O devices, and software
interrupts are established through a call to the kernel add_softintr()
function.
When the CPU receives an interrupt, it stops processing the instructions
of the current process, locates the interrupt in the trap table, and then
branches off to a special kernel code, known as the Interrupt Service
Routine, to manage the interrupt. After managing the interrupt
successfully, the process resumes its activity.
Each hardware component provides different services to the system in
different ways. Therefore, each Interrupt Service Routine is uniquely
tailored for the supporting device.
1-20
Watchdog Resets
A watchdog reset occurs when you reset the CPU. In such a situation, the
system immediately drops to the programmable read-only memory
(PROM) monitor without creating a system crash dump. The absence of a
crash dump makes the watchdog reset condition difficult to analyze.
Hardware or software faults cause watchdog resets. The following are the
types of watchdog resets:
Note You can use the ok .traps command to view the types of traps.
1-21
Preparation
Based on the size of the class, the instructor will divide students into
groups of two or three. In this exercise, the instructor plays the role of a
customer and provides answers to the questions asked by students.
Provide students with the following guidelines to perform the exercise:
Write down the steps for solving the fault to ensure that all the steps are completed.
Discuss the fault in a group to clarify the thinking process and validate assumptions.
Summarize and document the steps followed and the solution of the fault when you complete the
exercise.
Tasks
Consider the following fault descriptions:
When attempting a telnet connection from the Instructor1 system to the
Host1 system, the command fails. The following error message is
displayed:
Telnet: Unable to connect to remote host: Connection refused
However, when attempting to reverse the telnet connection from the
Host1 system to the Instructor1 system, the command is successful. In
addition, when attempting to open a File Transfer Protocol (FTP)
connection to the Instructor1 system, the connection is refused. The
following error message is displayed:
> ftp: connect: Connection refused
The same error is reported when attempting an FTP connection to the
Host1 system.
1-22
telnet
ftp
Use the methods described in this module to analyze and diagnose the
fault. Document the observations in the Fault Analysis and Diagnosis
worksheet. A template for the Fault Analysis and Diagnosis worksheet is
provided in the following pages.
Analysis Phase
Document the observations made during the analysis phase.
Problem Statement
Resources
1-23
Problem Description
Table 1-4 describes the problem.
Table 1-4 Problem Description
Error Messages
Symptoms and
Conditions
Recent Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
1-24
Tests
Results
Verification
Corrective Action
Table 1-6 lists the corrective action taken.
Table 1-6 Corrective Action
Final Repair
Communication
Documentation
1-25
Exercise Summary
Exercise Summary
Manage the discussion based on the time allowed for this module, which was provided in the About This
Course module. If you do not have time to spend on discussion, highlight just the key concepts students
should have learned from the lab exercise.
Experiences
Ask students what their overall experiences with this exercise have been. Go over any trouble spots or
especially confusing areas at this time.
Interpretations
Ask students to interpret what they observed during any aspect of this exercise.
Conclusions
Have students articulate any conclusions they reached as a result of this exercise experience.
Applications
Explore with students how they might apply what they learned in this exercise to situations at their workplace.
1-26
Exercise Solution
Exercise Solution
You use the Fault Analysis and Diagnosis worksheet to log the faults and
observations made during the Fault Analysis and Fault Diagnosis phases.
You can modify the worksheet to suit the requirements of your
organization.
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Two systems in the network are not accepting the following as shown in
Table 1-7.
Table 1-7 Problem Statement
System Name
Fault
Host1
Instructor1
Resources
The following are the available resources:
Customer interviews
1-27
Exercise Solution
Problem Description
Table 1-8 describes the problem.
Table 1-8 Problem Description
Error
Messages
Symptoms and
Conditions
Recent Changes
and/or History
Telnet:
Unable to
connect to
remote
host:
Connection
refused
Modifications were
made to the
/etc/services file.
An entry is missing
for the telnet
service in the
/etc/services
file on the system
Host1.
> ftp:
connect:
Connection
refused
Modifications were
made to the
/etc/services file.
An entry is missing
for the FTP service
in the
/etc/services
file on the systems
Host1 and
Instructor1.
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
1-28
Tests
Results
Verification
The error is
replicated.
Exercise Solution
Table 1-9 Test and Verification (Continued)
Probable
Causes
An entry is
missing for the
FTP service in
the
/etc/services
file on the
systems Host1
and
Instructor1.
Tests
Results
Verification
The error is
replicated.
Corrective Action
Table 1-10 lists the corrective action taken.
Table 1-10 Corrective Action
Final Repair
Communication
None
None
Documentation
1-29
Module 2
2-1
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
Relevance
Relevance
Present the following questions to stimulate students and get them thinking about the issues and topics
presented in this module. While they are not expected to know the answers to these questions, the answers
should be of interest to them and inspire them to learn the material presented in this module.
Relevance on
page OH 2-3
!
?
OpenBoot firmware is the resident firmware in all Sun systems, which provides basic hardware testing and
initialization operations before the system boots.
You can perform the basic functionality test of hardware components in the OpenBoot environment. The OBP
diagnostic commands include the test, watch, and probe commands.
2-2
Additional Resources
Additional Resources
Additional resources The following references provide additional
information on the topics described in this module:
2-3
OBP
Components on
page OH 2-4
2-4
Boot PROM
Figure 2-1
OBP Components
2-5
2-6
Feature
Description
Programmable user
interface
FCode interpreter
Facilities for
dynamically
constructing a device
tree structure in
nonpageable
memory
Plug-in device
drivers
Diagnostic
commands
Boot PROM
Revisions on
page OH 2-5
Each Sun system supports a minimum revision of the boot PROM. There
are four generations of Sun boot PROMs, and 4.x is the latest revision.
Table 2-2 shows the PROM revision numbers and examples of the
corresponding Sun platforms.
Table 2-2 PROM Revision Numbers and Examples of the Corresponding
Sun Platforms
PROM Revision
Platform
SPARCstation 2, SPARCstation 5,
SPARCstation 10, and SPARCstation 20
systems
Note The flash update feature enables you to upgrade the revision of
software within the OBP without actually replacing the boot PROM.
2-7
FPROM
Upgrades on
page OH 2-6
Firmware
Revision
Ultra 1
3.11.1
Ultra 2
3.11.2
Ultra 450
3.7.107
3.2.17
If students ask why Ultra 5, Ultra 10, and other Sun workstations are not listed in Table 2-3, inform them that
the Ultra 1, Ultra 2, Ultra 450, and all Enterprise Ultra servers are the only Ultra workstations that have a
minimum OBP revision requirement to support the 64-bit architecture. If students want further details, refer
them to collection document #21434 on the sunsolve.sun.com Web site.
Inform students that they can use the eeprom command at the OS level to save the configuration settings in
a file. The eeprom command is discussed later in this module.
2.
Caution You must power off the system before changing the state of the
FPROM jumper.
2-8
3.
Note To check the OBP firmware revision on your system, use either the
banner or .version command at the ok prompt or the prtconf -V
command at the Shell prompt.
Introducing NVRAM
NVRAM is a pluggable chip on the main system board. The NVRAM chip
uses a battery backed up complementary metal-oxide semiconductor
(CMOS) chip to store customized system configuration variables, macros,
and device aliases. NVRAM also contains the Time of Day (TOD) chip,
which provides the date and time to the system. A single lithium battery
provides the backup for the NVRAM chip and the clock.
The NVRAM chip includes the following information:
Note The host ID on the NVRAM chip forms the basis for a number of
software licenses. You must retain the chip if a new system board is
installed. If the chip fails, Sun replaces it with a chip containing the same
host ID and Ethernet address.
2-9
OBP variables provide you with the flexibility to modify the default
behavior of different aspects of the OBP firmware. You can use various
OBP commands to query and set OBP variables. Table 2-4 displays the
common OBP variables on a Sun4U system along with their
descriptions.
Table 2-4 OBP Variables on a Sun4U System
OBP Variable
Description
auto-boot?
diag-device
diag-switch?
sbus-probe-list
pcia-probe-list
pcib-probe-list
security-mode
tpe-link-test?
watchdog-reboot?
Inform students that several OBP variables exist in Sun systems. However, Table 2-4 explains only the
common OBP variables. For information on all the OBP variables, refer students to the OpenBoot 3.x Command
Reference Manual and the OpenBoot 4.x Command Reference Manual on the docs.sun.com Web site.
2-10
Note If you run the printenv command on a new system, all the OBP
variables with their corresponding default values are displayed.
Ask students to discuss instances in which they run the printenv command.
Value
true
Default Value
true
screen
boot
true
false
screen
boot
true
false
net
net
disk net
false
true
disk net
false
true
ok
You can also use the printenv command to display a single OBP variable
and its value. For example, to display the value of the boot-device
variable, you type the following at the ok prompt:
ok printenv boot-device
boot-device =
disk
2-11
Table 2-5 shows the commands that you use to modify OBP variables and
the locations from where you run each command.
Table 2-5 Commands to Modify OBP Variables
Commands to Modify OBP Variables
setenv
set-default
set-defaults
ok prompt
Keyboard at power on
stop-n
eeprom
Note The Stop-N key sequence is not supported on Universal Serial Bus
(USB)-equipped workstations, such as Sun Fire servers. The functionality
of the Stop-D key sequence is simulated by using the Safe NVRAM mode.
Provide the following information to students about the Safe NVRAM mode: During the boot process, if you
lose access to the system console due to a failed NVRAM configuration change, use the Safe NVRAM mode
to restore access to the console. The settings of the Safe NVRAM mode are temporary and ensure a
successful recovery boot.
2-12
2-13
Inform students that they can set the firmware security level at the ok prompt. For information on various
security levels and steps to set the security level, refer students to the documentation at the docs.sun.com
Web site.
Use
# eeprom
# eeprom parameter
2-14
Note When you use the eeprom command on variables with a question
mark, either enclose the variable in quotes or precede the question mark
with an escape character (\). You do this to prevent the shell from
interpreting the question mark.
2-15
Purpose
Output
probe-ide
probe-scsi
probe-scsi-all
Identifies SCSI
devices by their
target addresses
2-16
To test the hardware devices attached to the system, use the test
commands, as shown in Table 2-8.
Table 2-8 The test Commands
Command
Purpose
test-all
test floppy
test net
While running the test-all or the test floppy command to test the
removable media drives, such as a diskette or a CD-ROM drive, ensure
that the media is inserted in the drive.
Note Before running the test and watch commands, you must reset the
system once after dropping to the ok prompt. This helps to clear all
buffers and registers and ensures that the system does not hang.
To view sample outputs of the test commands, refer to Appendix B,
Additional Information, on page B-4.
2-17
To monitor the network traffic and clock function of the system, use the
watch commands, as shown in Table 2-9.
Table 2-9 The watch Commands
Command
Purpose
watch-net
watch-net-all
watch-clock
2-18
2-19
:
:
:
:
440.00MHz
110.00MHz
33MHz
33MHz
2-20
2-21
Preparation
To complete system diagnostics, perform a shutdown procedure to access
the OBP environment and run the OBP commands at the ok prompt.
Note Due to different PROM revisions, the syntax for the OBP
commands can vary slightly. For more information, refer to the OpenBoot
3.x Quick Reference Card or the OpenBoot 4.x Command Reference Manual.
Tasks
Perform the following tasks:
2-22
1.
Access the ok prompt, and set the appropriate variable so that the
system does not boot automatically.
2.
3.
Modify the boot-device variable so that the system boots from the
disk disk0.
4.
5.
Exercise Solutions
Exercise Solutions
The following are solutions for the exercise steps:
1.
Access the ok prompt, and set the appropriate variable so that the
system does not boot automatically.
ok printenv auto-boot?
ok setenv auto-boot? false
2.
3.
Modify the boot-device variable so that the system boots from the
disk disk0.
ok setenv boot-device disk0
4.
5.
2-23
Explain to students that the set of questions in this exercise facilitates the revision of the content in the
module. Instruct students to perform tasks and attempt questions in the sequence that they appear. Inform
them that they can refer to the lecture notes to attempt the exercise.
Preparation
Perform a shutdown procedure to access the OBP environment, and run
the OBP commands at the ok prompt to perform system diagnostics.
If the systems in the lab do not have a SCSI disk, arrange at least one
system with a SCSI drive. You can connect this system to the overhead
projector to enable students to observe the output of the probe-scsi and
probe-scsi-all diagnostics commands.
2-24
Tasks
Perform the following tasks:
1.
2.
b.
Set the appropriate variable so that the system does not boot
automatically.
c.
Installed memory
b.
c.
MAC address
d.
Host ID
3.
4.
Use the probe commands to display the list of IDE and SCSI devices
that are attached to your system.
5.
6.
7.
Use the watch command to check the clock function of the system.
8.
9.
2-25
Exercise Summary
Exercise Summary
Manage the discussion based on the time allowed for this module, which was provided in the About This
Course module. If you do not have time to spend on discussion, highlight just the key concepts students
should have learned from the lab exercise.
Experiences
Ask students what their overall experiences with this exercise have been. Go over any trouble spots or
especially confusing areas at this time.
Interpretations
Ask students to interpret what they observed during any aspect of this exercise.
Conclusions
Have students articulate any conclusions they reached as a result of this exercise experience.
Applications
Explore with students how they might apply what they learned in this exercise to situations at their workplace.
2-26
Exercise Solutions
Exercise Solutions
The following are the solutions to the exercise tasks:
1.
b.
Set the appropriate variable so that the system does not boot
automatically.
ok setenv auto-boot? false
c.
2.
Installed memory
b.
c.
MAC address
d.
Host ID
ok banner
3.
4.
Use the probe commands to display the list of IDE devices and SCSI
devices that are attached to your system.
ok probe-ide
ok probe-scsi
ok probe-scsi-all
5.
6.
Run the test command to test the disk drive of your system.
ok test-all
2-27
Exercise Solutions
7.
Use the watch command to check the clock function of the system
and monitor the system network traffic.
ok watch-clock
8.
9.
2-28
Module 3
3-1
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
Relevance
Relevance
Present the following questions to stimulate the students and get them to think about the issues and topics
presented in this module. While they are not expected to know the answers to these questions, the answers
should be of interest to them and inspire them to learn the material presented in this module.
Relevance on
page OH 3-3
!
?
Did the POST messages displayed on the screen during the boot
process help you to troubleshoot problems with your systems?
Allow students to share their work experiences, and ask them to list the hardware problems that they
encountered while working on their systems.
3-2
Additional Resources
Additional Resources
Additional resources The following references can provide additional
information on the topics discussed in this module:
Note OpenBoot Command Reference and Quick Reference Card are now part
of the software supplement for the Solaris 9 OE CD.
3-3
POST is a binary program written for the SPARC processor, which resides
within OBP and executes automatically upon system power on.
You use POST to initialize and test the hardware that is part of the system.
POST performs a series of diagnostic tests on the hardware components of
the system board to verify that all the components are functioning
properly. POST also helps to determine which components failed and
must be replaced. The error messages displayed during the POST
sequence help administrators and support personnel to determine if
hardware problems exist on the system.
CPU modules
Memory, such as the Init Memory and Block Memory Addr Tests
parameters
Interrupts
Inform students that Ecache RAM Addr, Ecache Tag Addr, Ecache RAM, and Ecache Tag are part of
Ecache Tests. In addition, Dcache RAM, Dcache Tag, Icache RAM, Icache Tag, Icache Next, and
Icache Predecode are part of Basic Cache Tests.
Register tests
POST does not perform extensive tests on any components of the main
logic board, such as SBus or PCI cards and associated I/O devices. These
tests are performed using OBP.
3-4
Description
Normal
diagnostic
Extended
diagnostic
If required, describe a tty-type terminal. You can provide the following definition:
A tty-type terminal is the serial port for the system console. To define the communication parameters on the
serial port, you set the configuration variables for the port.
3-5
Inform students that the diag-device parameter is described later in the module.
3-6
Different modes
of the system
key on page OH
3-5
You use the system key switch to control the power on mode of a system.
For example, on a Sun Enterprise 250 server, the system key has four
key-switch positions. Table 3-2 describes the function of each key-switch
position.
Table 3-2 System Key-Switch Settings and Functions
Name of
Switch Position
Description
Power-On
Diagnostics
Locked
Standby
Inform students that locked mode prevents users from suspending system operations and accessing the ok
prompt. This prevents you from modifying the OBP parameters that are stored in the NVRAM chip from the
console unless you are logged in as a superuser.
Inform students that when the key switch is in the Standby position, the keyboard power switch is disabled.
Inform students that on a Sun Fire server, the Standby mode is known as the off mode. For more information
on Sun Fire servers, refer students to Sun Fire 280R Server Owners Guide available at
http://www.sun.com/products-n-solutions/hardware/docs/pdf/806-4806-10.pdf.
3-7
Figure 3-1
3-8
3-9
tip An OS-level command run from a second system that you use
to establish a serial connection to the first system
DTE devices use pin 2 to transmit data and pin 3 to receive data while
DCE devices use pin 2 to receive data and pin 3 to transmit data. This pin
setup works well for terminal-to-modem communication (DTE to DCE).
However, when you set up a tip connection between two terminals, the
pins of a pass-through modem cable do not allow a terminal-to-terminal
communication (DTE to DTE) because both devices are transmitting on
pin 2 and receiving on pin 3.
3-10
Invoke the tip command manually, and provide the baud rate as an
option. In addition, provide the serial port to configure as an
argument. For example:
# tip -9600 /dev/term/a
The tip command establishes a serial connection at 9600 bauds
through the /dev/term/a port.
Note Workstations, such as the Ultra 5 and Ultra 10, have two serial
ports; serial port A, a 25-pin female port; and serial port B, a 9-pin male
port. Most null modem cables are 25-pin male to 25-pin female cables.
This indicates that unless you edit the hardwire entry within the
/etc/remote file, you cannot use the hardwire entry as an argument to
the tip command unless you use a 9-pin female to 25-pin male crossover
cable.
To manage the tip session, you can use the following tilde commands:
3-11
2.
Attach one end of a null modem cable to serial port A of the test
system and the other end to the remote Sun system.
3.
2.
When the ok prompt is displayed, ensure that you set the following
OBP variables that control extended POST:
ok setenv diag-switch? true
ok setenv diag-level max
3.
4.
5.
Note When POST completes execution on the test system, you can
terminate the tip connection from the remote system by using the ~.
command.
The sample POST output results in this module are executed on Ultra 10
workstations. Refer to the Architecture of the Ultra 5 and Ultra 10
Workstations section on page Appendix B-7 for the graphic displaying
the architecture of the Ultra 5 and Ultra 10 workstations.
Inform students that this is a partial output of the POST sequence.
3-12
3-13
Diagnostic information
3-14
Description
-v
-l
When you execute the prtdiag command, the following exit values are
returned:
3-15
3-16
3-17
Bank
---0
0
0
0
1
1
1
1
Interlv.
Group
----0
0
0
0
0
0
0
0
Socket
Name
-----1901
1902
1903
1904
1801
1802
1803
1804
Size
(MB)
---256
256
256
256
256
256
256
256
Status
-----OK
OK
OK
OK
OK
OK
OK
OK
The preceding output shows that the system has two banks of memory,
fully populated with eight memory modules containing 256 Mbytes of
memory each.
The fourth section displays the names and model numbers of all the
peripheral cards installed on the system.
3-18
3-19
3-20
3-21
2.
3.
4.
5.
3-22
Cpu0-OK=P
FailCode=0
Cpu1=P
FHC=P
SRAM=P
FPROM=P
LabCon=Not
Bank1=0
DTag0=P
DTag1=P
JTAG=P
Cpu1-OK=P
Bank1=Not
DC=ff
Cpu0-OK=P
FailCode=0
Cpu1=P
FHC=P
SRAM=P
FPROM=P
LabCon=Not
Bank1=0
DTag0=P
DTag1=P
JTAG=P
Bank1=Not
DC=ff
Sysio0=P
Sysio1=P
FEPS=P
FEPSFC=0
Sbus0=P
Sbus1=P
Sbus2=P
AC=P
FHC=P
SRAM=P
FPROM=P
TODC=P
JTAG=P
CntrPl=P
DC=ff
Slot 16 - Status=Fail, Type: Clock
Clock=P
Serial=P
AC=P
ACFan=P
V5-P=P
V12-P=P
V5-PC=P
RckFan=***
3.3V=P
P
***
Not
Cpu1-OK=P
Cpu0=P
FailCode=0
AC=P
Ovtemp=Not
Bank0=0
CntrPl=P
Bank0=P
Slot
LabCon=Not Ovtemp=Not
Cpu0=P
FailCode=0
AC=P
Ovtemp=Not
Bank0=0
CntrPl=P
Bank0=P
Slot
SOC=P
SOC=P
LabCon=Not Ovtemp=Not
KbdMse=P
KeyFan=P
V5-Aux=P
PPS-DC=P
PSFail=0
V5P-PC=P
5.0V=P
Triger=P
DCReg0=P
DCReg1=P
Ovtemp=Not
TODC=P
V12-PC=P
V3-PC=P
Coolng=P
AC-REV=P
= Present or Passed
= Failed Component
= Not present
Enabling and Monitoring POST Diagnostics
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
3-23
Preparation
The instructor divides the class into small groups. Two systems and a null
modem cable are required for each group. The students can review the
OBP variables diag-switch?, ttya-mode, and diag-device, which are
set or referenced within the remote diagnostic procedure.
Use a monitor or an ASCII terminal for remote diagnostic sessions. You
can perform this lab procedure on both the Sun4m and Sun4U
architectures.
The steps to insert a fault in the system are provided in the classroom setup file located at the
education.central Web site.
Note Before you begin, make sure that the functional system has the
Solaris OE running in multiuser mode and a remote terminal window is
attached.
POST tests just enough of the electronic circuitry to ensure that you can perform the boot command
and execute the instruction.
If POST fails, students take appropriate action, according to their company policy. The POST examples
in the Student Guide provide visual examples for students who have never viewed POST diagnostics.
3-24
Tasks
Use the following procedure for setting up a remote diagnostic session:
1.
2.
Connect the other end of the cable to port A of the faulty system.
3.
4.
5.
Turn off the faulty system to prevent blowing the keyboard fuse.
6.
Disconnect the keyboard from the rear of the faulty system. Send the
output to serial port A, ttya.
7.
Note The hardwire argument for ttya-mode specifies that the tip
command requires 9600 baud, 8 data bits, and 1 stop bit at port A on the
CPU board. These parameters are set for port A when a system is
powered on without using a keyboard.
8.
9.
3-25
11. Power on the faulty system. You can observe the power-on
diagnostic messages in the terminal window of the second system. If
not, the following might be the reasons:
12. When the POST diagnostic tests are complete, record all the errors.
3-26
Exercise Summary
Exercise Summary
Manage the discussion here based on the time allowed for this module, which was given in the About This
Course module. If you find you do not have time to spend on discussion, then just highlight the key concepts
students should have learned from the lab exercise.
Experiences
Ask students what their overall experiences with this exercise have been. You may want to go over any
trouble spots or especially confusing areas at this time.
Interpretations
Ask students to interpret what they observed during any aspects of this exercise.
Conclusions
Have students articulate any conclusions they reached as a result of this exercise experience.
Applications
Explore with students how they might apply what they learned in this exercise to situations at their workplace.
3-27
Exercise Solutions
Exercise Solutions
The following are the solutions for the tasks listed in this exercise:
1.
2.
Connect the other end of the cable to port A of the faulty system.
3.
5.
Turn off the faulty system to prevent blowing the keyboard fuse.
6.
Disconnect the keyboard from the rear of the faulty system. Send the
output to the serial port A, ttya. Remember to turn off the power
when you reconnect the keyboard.
7.
8.
9.
3-28
Exercise Solutions
10. In the terminal window, type the tip hardwire command.
Type the following command:
# tip hardwire
The workstation should respond with a message, connected. If it
does not, check the following:
The selected port is already active. (Invoke the SMC and ensure
that the port is disabled.)
11. Power on the faulty system. You can observe the power-on
diagnostic messages in the terminal window of the second system. If
not, the following reason might be:
12. When the POST diagnostic tests are complete, record all the errors.
3-29
Exercise Solutions
15. Return the faulty system to a running state. Remember to power off
the system when you reconnect the keyboard.
To return the faulty system to a running state:
a.
b.
c.
3-30
Module 4
4-1
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
Relevance
Relevance
Present the following questions to stimulate the students and get them to think about the issues and topics
presented in this module. While they are not expected to know the answers to these questions, the answers
should be of interest to them and inspire them to learn the material presented in this module.
Relevance on
page OH 4-3
!
?
A device tree organizes the devices that are attached to the system. Each node in the device tree represents
a device or firmware service on the system.
List the commands that you use to navigate and examine the OBP
device tree.
Ask students which commands they use to navigate the OBP device tree.
Allow students to share their work experiences and note the inputs on a white board or a flip chart.
4-2
Additional Resources
Additional Resources
Additional resources The following references provide additional
information on the topics discussed in this module:
4-3
Sun hardware uses the concept of a device tree to organize the devices
that are attached to the system. The OpenBoot firmware builds the device
tree from the information generated during POST and loads the device
tree into memory.
The kernel refers to the device tree during the boot process to determine
the hardware configuration of the system. For example, to identify the
card and slot configuration on your system, map the driver names, unit
addresses, and device arguments to the physical devices and their
locations on the system. You can examine the device path on a system by
using the following:
4-4
Introducing the
OBP Device Tree
on page OH 4-4
Figure 4-1
4-5
device-name@unit-address:device-arguments
Table 4-1 describes each parameter in a device path name.
Table 4-1 Parameters of a Device Path Name
Path Name
Parameter
4-6
Description
device-name
unit-address
device-arguments
Figure 4-2
4-7
For example, the Sun Ultra 250 UPA/PCI workstation has four PCI
plug-in slots that are distributed across a single PCI bus. Table 4-2
displays the two NVRAM configuration variables that control the probing
order of slots for the PCI buses attached to an Ultra 250 UPA/PCI
workstation.
Table 4-2 NVRAM Configuration Variables for PCI Probing on an Ultra
250 UPA/PCI Workstation
4-8
Variable
Default Value
Description
pci0-probe-list
3,2,4,5
pci-slot-skip-list
none
Default Value
Description
pcia-probe-list
1,2,3,4
pcib-probe-list
1,2,3
Note You can also specify dashes while defining the probe order of the
PCI slots.
The ls command
4-9
Note You can use the cd command to select a node as the current node
in a device tree.
4-10
fe000000
82011010 00000000 e1000000
82011018 00000000 e2000000
00000000 00001000
aty,fcode
1.60
aty,card#
109-41900-00
aty,rom#
113-41901-104
model
ATY,GT-C
name
SUNW,m64B
............<output truncated>
remove
install
draw-rectangle
set-colors
4-11
Note The custom device aliases are not saved after a system reset or
power cycle. To create permanent aliases, you must manually store the
alias names in the nvramrc variable of the NVRAM chip or use the
nvalias and nvunalias commands.
You use the following commands to examine, create, and change device
aliases:
4-12
4-13
show-tapes Displays a list of device paths for the installed SCSI tape controllers
show-displays Displays a list of device paths for the installed display devices
4-14
4-15
To make the mydisk alias the default boot device, use the following
command:
ok setenv boot-device mydisk
ok boot
Note You must use the reset-all command to save the changes made
by the nvunalias command.
4-16
Boot Sequence
You use the boot command at the ok prompt to boot the Solaris OE.
When you power on the system, the system invokes the POST diagnostic
tests. POST tests the hardware and memory of the system. If no errors are
detected, the system begins the automatic boot process.
Phases of the
Boot Sequence
on page OH 4-6
Figure 4-3
4-17
Figure 4-4 shows the events that occur during the boot PROM phase.
Figure 4-4
4-18
2.
The boot command determines the device from which the system
boots. In this step, the boot command reads the value specified in
the boot-device variable.
3.
The boot command loads the bootblk program from its location on
the boot device into memory.
A copy of the bootblk program is available in the
/usr/platform/`uname -i`/lib/fs/ufs directory.
The Boot
Programs Phase
on page OH 4-8
Figure 4-5 shows the events that occur during the boot programs phase.
Figure 4-5
This section discusses the concept of a disk boot. If required explain to students the concept of a network
boot. During a network boot, the bootblk program is not read from the disk, but the inetboot program is
read from the network by using ftp. In addition no ufsboot program is required.
4-19
2.
4-20
The Kernel
Initialization
Phase on
page OH 4-9
Figure 4-6 shows the events that occur during the kernel initialization
phase.
Figure 4-6
4-21
Note Before you edit the /etc/system file, make a copy. If you specify
incorrect values in the /etc/system file, the system might not boot.
2.
Depending on the experience of students, explain the following types of module subdirectories in the
/kernel and /usr/kernel directories:
sys Contains system calls, which are defined interfaces used by applications
Note The kernel uses the ufsboot program to load kernel modules.
When the kernel loads enough modules to mount the root file system,
the kernel unmaps the ufsboot program and proceeds to the next step.
3.
4-22
The init phase is the final phase in the boot process. Figure 4-7 shows the
events that occur during the init phase.
Figure 4-7
2.
The init daemon scans the inittab file for the sysinit and
initdefault entries, executing the sysinit entries as found and
recording the initdefault value.
3.
4-23
5.
The final entries in the default initatb file specify the sac and
console login session.
4-24
1.
2.
3.
4.
5.
6.
7.
8.
Note The preceding commands cause the system to boot from the disk
defined as disk in the list of device aliases.
Consider a scenario in which the system boots from the wrong disk. For
example, you have more than one disk in your system. You want the
system to boot from the disk disk2. However, the system boots from the
disk disk1.
The possible cause for the preceding scenario is that the boot-device
parameter is not set to the correct disk.
To set the boot-device parameter to the disk disk2, interrupt the
boot process with Stop-A, and type the following command at the
ok prompt:
ok setenv boot-device disk2
ok boot
The system will now boot from the disk disk2.
4-25
The Solaris OE has eight run levels that determine various modes of
system operation. These run levels are described in Table 4-4.
Table 4-4 Run Levels of the Solaris OE
4-26
Run Level
Function
s or S
Function
Mar 21 15:25
Figure 4-8
4-27
Executing the scripts in debug mode helps you to know the command
that is causing the error. Therefore, you can check the system
configuration that relates to the command for possible errors.
To execute the script in debug mode, add the line set -xv to the script:
#!/sbin/sh
set -xv
This prints each line of the executed script. Next, by viewing the last lines
executed before the error, you can track the errors that occur during the
boot process.
Note Some scripts, specifically those in the .sh shell execute in the
context of the calling shell. Therefore, the preceding shell command set
set -xv affects this shell and any subsequent processing by the shell. If
this occurs, add the shell command set set +xv to the end of the script to
rectify the problem.
4-28
2.
3.
4.
4-29
s Boots the system to single-user mode and prompts for the root
password.
Note Interactive booting enables you to test the changes made during
the booting process and recover from system problems quickly. This
procedure assumes that the system is already shut down.
4-30
Enter filename
[kernel/sparcv9/unix]:
4-31
4-32
Exercise: Introducing the OBP Device Tree and the Boot Sequence
Preparation
Perform a shutdown procedure to access the OBP environment and run
OBP commands at the ok prompt to perform system diagnostics. Ensure
that the systems are already at the ok prompt.
To complete the tasks for restoring the corrupt boot program files, ensure that the lab is set up with a system,
Host1, which has a corrupt file system.
Note Due to different PROM levels and architectures, the syntax for
OBP commands can vary slightly. For more information, refer to OpenBoot
3.x Quick Reference Card and OpenBoot 4.x Quick Reference Card.
Tasks
To navigate the OBP device tree, complete the following steps:
1.
2.
3.
4.
5.
Use the appropriate command to display the full device path name
for the disk alias. Note the device path name.
2.
4-33
Exercise: Introducing the OBP Device Tree and the Boot Sequence
3.
4.
5.
6.
7.
8.
9.
Reset your system, and then check whether the mydisk alias still
exists in the list of device aliases.
10. Set the OBP variables to their default values, and boot the system
from the default boot device.
11. Verify that the system boots.
To restore the corrupt boot program files from the CD-ROM device,
complete the following steps:
1.
2.
3.
4.
4-34
1.
2.
3.
4.
5.
6.
7.
8.
Exercise Summary
Exercise Summary
Manage the discussion here based on the time allowed for this module, which was given in the About This
Course module. If you find you do not have time to spend on discussion, then just highlight the key concepts
students should have learned from the lab exercise.
Experiences
Ask students what their overall experiences with this exercise have been. You may want to go over any
trouble spots or especially confusing areas at this time.
Interpretations
Ask students to interpret what they observed during any aspects of this exercise.
Conclusions
Have students articulate any conclusions they reached as a result of this exercise experience.
Applications
Explore with students how they might apply what they learned in this exercise to situations at their workplace.
4-35
Exercise Solutions
Exercise Solutions
To navigate the OBP device tree, complete the following steps:
1.
2.
3.
4.
5.
Use the appropriate command to display the full device path name
for the disk alias. Note the device path name.
ok devalias disk
2.
Note Select the device path that relates to the disk from step 1.
3.
4.
5.
4-36
Exercise Solutions
6.
7.
8.
9.
Reset your system and then check if the mydisk alias still exists in
the list of device aliases.
ok devalias mydisk
10. Set the OBP variables to their default values and boot the system
from the default boot device.
ok set-defaults
ok boot
11. Verify that the system boots.
To restore the corrupt boot program files from the CD-ROM, complete the
following steps:
1.
2.
3.
4.
4-37
Exercise Solutions
To boot the system in interactive mode, complete the following steps:
1.
2.
3.
4.
5.
6.
7.
8.
4-38
Module 5
5-1
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
Relevance
Relevance
Present the following questions to stimulate the students and get them to think about the issues and topics
presented in this module. While they are not expected to know the answers to these questions, the answers
should be of interest to them and inspire them to learn the material presented in this module.
Relevance on
page OH 5-3
!
?
Which commands and tools of the Solaris OE do you find the most
useful?
Allow students to share their work experiences, and list down the commands and tools that they find useful.
Can you describe a recent system problem and the commands and
tools that you used to solve the problem?
Ask students to share their work experiences and describe a recent system problem and the commands and
tools that they used to solve the problem.
5-2
Additional Resources
Additional Resources
Additional resources The following references can provide additional
information on the topics discussed in this module:
5-3
Note In the Solaris 9 OE, only the root user can run the device
management commands.
Note You need not run the devfsadm command interactively because
the devfsadm daemon automatically detects the changes in device
configuration.
5-4
Description
-C
-c device_class
-i driver_name
-n
-s
-t table_file
-r root_dir
-v
5-5
Description
drvconfig
disks
tapes
devlinks
ports
5-6
Note In the Solaris 9 OE, only the root user can run the disk and file
system management commands.
5-7
Note You cannot use the format command on diskette drives, CD-ROM
drives, or tape drives.
Note The control structures of the UFS file system that are repaired by
the fsck command include the superblock, the boot block, the inode
block, and the inode count.
5-8
When you use the -m option, the fsck command verifies whether the file
system is ready for mounting. If the file system is ready for mounting, the
fsck command displays the following message:
ufs fsck: sanity check: /dev/rdsk/c0t3d0s1 okay
If time permits, ask students to run the fsck command with the -m option and check whether the file system
on their respective systems is ready for mounting.
Consider a scenario in which you reboot a Sun system. When you start
the system, it reports the following error:
THE FOLLOWING FILE SYSTEM(S) HAD AN UNEXPECTED
INCONSISTENCY: /dev/rdsk/c0t0d0s7 (/export/home)
WARNING - Unable to repair one or more filesystems.
Run fsck manually (fsck filesystem...).
Exit the shell when done to continue the boot process.
Type control-d to proceed with normal startup,
(or give root password for system maintenance):
Caution You can run the fsck -y command to check and repair all the
file systems on the system without user intervention. However, this action
might cause serious damage to the file system and should be used only as
a last resort. The best method is to run the fsck command without the -y
option.
In the preceding scenario, the corrupt file system is located in the
/dev/rdsk/c0t0d0s7 (/) disk slice. To repair the file system, you use the
following command:
# fsck -F ufs /dev/rdsk/c0t0d0s7
Caution Before running the fsck command, ensure that the system is in
single-user mode. This is essential for the fsck command to repair
damaged file systems.
5-9
5-10
5-11
r/s
3.1
4.0
2.0
4.3
3.3
w/s
2.1
3.1
1.8
4.1
3.0
extended
kr/s kw/s
8.0 4.2
9.3 6.7
6.0 6.0
8.0 5.0
9.0 6.0
device statistics
wait actv svc_t
3.0 4.0
65.8
3.2 3.0
22.3
2.1 3.0
33.0
4.1 2.0
46.0
3.7 4.0
39.7
%w
4
5
8
8
7
tty
%b tin tout
9
2
2
8
7
8
6
5-12
Field
Representation
r/s
w/s
kr/s
kw/s
wait
actv
svc_t
%w
%b
tin
tout
5-13
Note In the Solaris 9 OE, only the root user can run the management
commands of a software package.
5-14
The contents or attributes of both the objects that are described in the
specified pkgmap file
The first set of the pkginfo syntax displays information about the
software packages that are installed on the system.
5-15
Note You cannot use the -r, -n, and -a options when transferring a
package to a spool directory.
5-16
Note The pkgrm command first searches the current working directory
for the admin file. If the specified admin file is not in the current working
directory, the pkgrm command searches the /var/sadm/install/admin
directory for the admin file.
The following is the syntax of the pkgrm command:
pkgrm [-nv] [-a admin] [ [ -A| -M] -R root_path] [-V
fs_file] [ pkginst... -Y category[,category...]]
pkgrm -s spool [ pkginst... -Y category[,category...]]
By default, the pkgrm command runs in interactive mode. You use the -n
option to change to noninteractive mode.
Consider a scenario in which one or more files within the SUNWjunk
package are corrupted. You use the following pkgrm command to remove
all files and directories associated with the SUNWjunk package:
# pkgrm SUNWjunk
5-17
5-18
2.
3.
5-19
Caution When reading two or more files, if you redirect the output of
the cat command to one of the files, original data is lost.
5-20
5-21
For example, you run the sum command on a patch that you want to
install on your system. Then, you compare the checksum value generated
against the checksum value reported in the SunSolve Online service and
verify whether the patch was successfully downloaded.
5-22
The ps command
Depending on the path name that you use, the command options and the
output of the command differ.
5-23
TIME
0:15
0:00
0:00
1:41
0:00
0:00
0:00
0:00
CMD
sched
/etc/init pageout
fsflush
/usr/lib/saf/sac -t 300
/usr/lib/utmpd
/usr/sbin/syslogd
Table 5-4 lists the fields of the preceding output and their descriptions.
Table 5-4 Fields and Descriptions of the Output of the ps Command
Field
Description
UID
PID
PPID
STIME
TTY
TIME
CMD
5-24
Check process status You use the status (S) field of the output to
check the status of different processes.
Inform students that they should refer to the mpstat command for information on the statistics of each
processor. The mpstat command displays information about CPU usage and the frequency of occurrence for
events, such as interrupts, page faults, and locking. Ask students to execute the mpstat command and view
the output displayed on the screen.
If you do not specify any options with the vmstat command, the
command displays a one-line summary of the virtual memory activity
that occurred since system startup.
The following is a sample output of the vmstat command:
# vmstat
kthr
memory
r b w
swap free
sy id
0 0 0 15020 4304
30 46
re
9
page
disk
mf pi po fr de sr f0 s1 s2 s3
faults
in
sy
cpu
cs us
86 1173
46 24
5-25
Description
kthr
memory
page
disk
faults
cpu
page
disk
pi po fr de sr s0 s1 s4 --
faults
in
sy
cpu
cs us
32 11 11
0 25
24 27 27
2 35
588 1900
946
51
0 21
0 10
423 1496
901
In the preceding output, observe the values in the r field of the procs
section and the id field of the cpu section. These values indicate a large
number of processes in the run queue and a low value for CPU idle time,
respectively. This information indicates that the system is overloaded with
multiple user processes waiting for simultaneous execution. Additional
investigation would be required to determine if this was the actual case.
The sr field, which represents the scan rate of the system, also helps you
to identify any problems with memory. A nonzero value for this field
indicates that the system is short of memory.
5-26
Note Use the -s option when you use the psrinfo command in shell
scripts.
Refer to the online man pages for information on the operand supported by the psrinfo command.
5-27
intr ithr
435 323
759 548
730 531
535 434
533 432
usr
1
57
59
79
83
sys wt idl
1
1 97
30
1 11
32
2
7
21
0
0
17
0
0
In the preceding output, observe that the usr and sys fields have large
values. In addition, the idl field has a low value. This information
indicates that the CPU is executing both user and system processes.
Table 5-6 lists the fields of the preceding output and their descriptions.
Table 5-6 Fields and Descriptions of the mpstat Command
5-28
Field
Description
CPU
minf
mjf
xcal
intr
ithr
csw
icsw
migr
smtx
srw
Description
syscl
usr
sys
wt
idl
5-29
5-30
Field
Description
Id
Loadaddr
Size
Info
Module-specific information.
Rev
Module Name
5-31
Note While you can view information about the status or configuration
of the network with user rights, you must have superuser access to make
changes to the parameters of network management commands.
Refer to the online man pages for information on the options supported by the ping command.
5-32
5-33
5-34
Address family
IP address
Netmask
Broadcast address
Refer students to the online man pages for information on the options supported by the ifconfig command.
Note The boot scripts that execute the ifconfig command reside in the
/sbin/rcS.d and /sbin/rc2.d directories.
5-35
Note Only a root user can view the Ethernet address of a network
interface and has the permission to modify the configuration of a network
interface.
You can also use the ifconfig command to configure a network
interface. To configure an interface, you specify the interface name using
the plumb keyword.
Note The plumb keyword opens the devices associated with the physical
interface name and sets up the streams required by the IP to use the
device.
For example, to configure the hme0 interface, you type the following
command:
# ifconfig hme0 plumb
5-36
You also use the plumb keyword when troubleshooting the interfaces that
you add and configure manually. Often, an interface reports that it is
functional, but a snoop session from another host shows that no traffic is
flowing out of that interface. The plumb keyword helps to resolve this
communication problem.
Refer to the online man pages for information on the options supported by the arp command.
Inform students that they can use the Internet dot notation to specify the host by either name or number.
5-37
Mask
--------------255.255.255.255
255.255.255.255
255.255.255.255
255.255.255.255
255.255.255.255
255.255.255.255
255.255.255.255
255.255.255.255
255.255.255.255
255.255.255.255
Flags
Phys Addr
----- --------------00:a0:c9:36:54:3b
00:00:21:27:bb:90
00:01:30:51:b0:00
00:10:5a:9f:af:21
SP
08:00:20:f9:12:25
00:10:5a:9f:ae:dc
00:50:ba:89:d8:3c
00:80:48:d6:de:b7
00:50:ba:d7:27:b8
SP
08:00:20:f9:12:25
Inform students that they can refer to the example for the definitions of the flags in the arp table.
Each entry in the ARP table might have the following flags associated
with it:
Publish (P) Includes the entries that you use to respond to ARP
requests for this address.
Static (S) Includes the entries that are manually inserted and are
not defined by the ARP protocol.
Mapping (M) Includes the entries that you use to map to the
Ethernet multicast MAC addresses in the range of 01:00:5e:00:00:00
through 01:00:5e:ff:ff:ff.
You can use the arp utility when attempting to locate network problems
that relate to duplicate IP address usage. For example, you need to
complete the following steps to determine if a system is responding:
5-38
1.
Determine the Ethernet address of the target host. To do this, use the
banner utility at the ok prompt or the ifconfig utility at a shell
prompt on a Sun system.
2.
Determine if you can reach the target host by using its IP address
with the ping command.
3.
Use the arp utility and verify that the arp table reflects the correct
Ethernet (MAC) address.
Routing tables
STREAMS statistics
The netstat command for the Solaris 9 OE can also generate reports for
the IPv6 protocols. The -i option with the netstat command shows the
state of the interfaces used by the system for locating the IP address.
The following is a sample output of the netstat -i command:
# netstat -i
Name Mtu Net/Dest
lo0
8232 loopback
hme0 1500 SUN
qfe0 1500 router-qfe0
qfe1 1500 router-qfe1
qfe2 1500 router-qfe2
qfe3 1500 router-qfe3
........
Address
localhost
SUN
router-qfe0
router-qfe1
router-qfe2
router-qfe3
Ipkts
17367
150518
6821
4241
0
902
Ierrs
0
84
0
0
0
0
Opkts
17367
161769
4658
1156
952
1989
Oerrs
0
42
0
0
0
0
Collis
0
1532
115
0
0
0
Queue
0
0
0
0
0
0
5-39
Description
Name
Mtu
Net/Dest
Address
Ipkts
Ierrs
Opkts
Oerrs
Collis
Queue
In the preceding output, the Queue field should have a nonzero value, and
the value in the Collis field should not be greater than five percent of the
Opkts field. In addition, the value of the Ierrs field should be zero and
less than one percent of the Ipkts field.
The information generated by the netstat command is critical for tuning
your network because the netstat command reports data on network
usage and network traffic. However, it is difficult to interpret the output
of the netstat command for a system with many network interfaces
because of the complexity of the output.
5-40
You use the netstat -s command to display the statistics related to the
Transmission Control Protocol (TCP), IP, ICMP, and Internet Group
Management Protocol (IGMP).
Ask students to execute the netstat -s command on their machines and read the statistics related to
various protocols.
5-41
Note You can use the snoop command to display network packets while
they are received or save them to a file for later analysis.
Inform students that the file in which you save the captured network packets must be RFC 1761-compliant.
Provides accurate time stamps for checking the response time of the
network Remote Procedure Call (RPC)
Refer to the online man pages for information on the options supported by the snoop command.
5-42
The nm command
5-43
When you use the -a option, the script command does not overwrite a
file name but appends the output to the file name.
Note The script command records all the output, including the
prompts displayed on the screen, in the file name.
The following is the syntax for the script command:
script <file-name>
For example, to store the location of the dtterm files to a script file called
dtterm_location, you type the following:
$ script dtterm_location
Script started, file is dtterm_location
$ find / -name dtterm -print
...
Script done, file is dtterm_location
5-44
If the file appears to be a text file, the file command examines the first
512 bytes and tries to determine the type of the file. To do this, the file
command uses the control file /etc/magic, which contains magic
numbers and sequences for different file formats and built-in rules for
natural languages. If the file is a symbolic link, by default, the file
command follows the link and tests the file referred to by the symbolic
link.
The following is the syntax for the file command:
file <file name>
In the following example, the file command displays the type of the
hosts file.
# file /etc/hosts
hosts:
ascii text
Note The file command does not use the name of the file to determine
the file type.
The following is the syntax of the tail command with the -f option:
tail -f <filename>
5-45
5-46
-n Prints the node name. The node name is the name by which the
system is known to a communications network.
To display the kernel architecture, run the uname command with the -m
option.
# uname -m
sun4u
To display the system name, run the uname command with the -n option.
# uname -n
sun-sparc-1
To display system information, run the uname command with the -a
option.
# uname -a
SunOS sun 5.9 Beta sun4u sparc SUNW,Ultra-5_10
In the preceding output, information is displayed in the following order:
<system name> <node name> <release> <version> <kernel
architecture> <processor type> <hardware platform>
5-47
5-48
Note The showrev -p command displays both the current and obsolete
patches on the system.
If no patches are installed on the system, the showrev -p command
displays the following output:
showrev -p
No patches are installed
Ask students to run the showrev -p command on their systems and view the information about the patches
installed on the systems.
Note In the Solaris 9 OE, the showrev command is obsolete and the
patchadd command is used instead.
5-49
Inform students that the device tree information displayed using this option is a snapshot of the initial
configuration and might not accurately reflect the reconfiguration events that occur later.
-D Displays the name of the device driver that you use to manage
a peripheral for each system peripheral in the device tree.
Inform students that some existing platforms might require a firmware upgrade to run the 64-bit kernel of the
system.
5-50
5-51
5-52
For example, to locate a symbol in the symbol table of the kernel, you
type the following command:
# /usr/ccs/bin/nm /dev/ksyms | more
/dev/ksyms:
[Index]
Value
Size
[677]
|
16987860|
[668]
|
16987880|
[669]
|
16987900|
[670]
|
16987916|
[675]
|
16987796|
[671]
|
16987728|
[672]
|
16987740|
[673]
|
16987784|
.........<output truncated>
Type
Bind
0|NOTY
0|NOTY
0|NOTY
0|NOTY
0|NOTY
0|NOTY
0|NOTY
0|NOTY
|LOCL
|LOCL
|LOCL
|LOCL
|LOCL
|LOCL
|LOCL
|LOCL
Other Shndx
|0
|0
|0
|0
|0
|0
|0
|0
|ABS
|ABS
|ABS
|ABS
|ABS
|ABS
|ABS
|ABS
Name
|$done
|$done1
|$done2
|$done3
|$nowalgnd
|$s1algn
|$s2algn
|$s3algn
5-53
-l Lists the status of all swap areas. The list displays the total
number of blocks and the number of free blocks for each swap area.
Note The output from the swap -s command includes the portion of
physical memory that is available for general programs and for all swap
spaces. However, the output from the swap -l command does not
include physical memory.
For example, if the output of the vmstat command indicates a shortage of
RAM, you can increase the swap space or upgrade RAM. If you increase
the swap space, the swap command displays the following information
about both available and free swap space:
Ask students to execute the swap command on their respective machines to view the amount of available
swap space.
swap -l
swapfile
/dev/dsk/c0t0d0s1
Note The Solaris OE supports the concept of the swapfs file system,
which enables a swap area to be a file residing on a file system and as a
logical or physical partition on a device.
5-54
You also use the truss command to analyze the stale or sick processes on
a system.
The following syntax displays the options for the truss command:
truss [-fcaeildD] [- [tTvx] [!] syscall , ...] [- [sS] [!]
signal , ...] [- [mM] [!] fault , ...] [- [rw] [!] fd ,
...] [- [uU] [!] lib , ... : [:] [!] func , ...] [-o
outfile] command | -p pid ...
Refer students to the online man pages for information on the options supported by the truss command.
5-55
5-56
coreadm -u
A superuser executes the preceding option of the coreadm
command. You use this option to update all system-wide core file
options. The system-wide core file options are based on the contents
of the /etc/coreadm.conf file. The startup script
/etc/init.d/coreadm uses the -u option only on system reboot.
Refer students to the online man pages for more information on the options supported by the coreadm
command.
core
disabled
enabled
disabled
disabled
disabled
Ask students to run the coreadm command in their respective systems to view the configuration information
about their respective systems.
To set the name pattern for a per-process core file, type the following at
the command line:
# coreadm -i $HOME/corefiles/%f.%p
The preceding command moves all the core dumps into the corefiles
subdirectory of the home directory.
To set the name pattern for a global core file, type the following at the
command line:
# coreadm -g /var/corefiles/%f.%p
If time permits, provide the following information to students. You can also ask students to perform the
following steps as part of an exercise.
5-57
# coreadm -e process
Run the following command to display the core file path for the current process to verify the
configuration:
# coreadm $$
To enable a global core file, complete the following steps:
# coreadm
To display the name pattern of the per-process core file for one or more
processes, run the coreadm command at the command line with a list of
PIDs.
$ coreadm 278 5678
278: core.%f.%p
5678: /home/george/cores/%f.%p.%t
5-58
Preparation
A standard Solaris 9 OE installation with access to the man pages is
required for this exercise.
Tasks
Complete the following tasks to perform diagnostics on the system:
1.
Log in as the root user, and open a terminal window. Use the
ifconfig command to display basic configuration information
about the network interfaces on the system.
Record the information for the following attributes.
Attribute
Value
IP address
Ethernet address
Netmask
Interface
up/down
2.
3.
Use the appropriate command to verify that your system can contact
the network interface on another system in the network. Does the
output of the snoop command contain requests and replies (yes or
no)?
5-59
5.
On the system whose interface remains up, use the ping command
to contact the system whose interface is down. What does the ping
command report?
6.
7.
On the system whose interface remained up, again attempt to use the
ping command to contact the other system.
_____________________________________________________
Does the snoop command report a reply from the target host?
_____________________________________________________
8.
Use the appropriate command to list the driver modules that are
loaded on your system.
9.
_____________________________________________________________
_____________________________________________________________
_____________________________________________________________
12. Use the appropriate command to identify the system calls made by
the ls command.
13. Use the appropriate command to display information about the
active processes running on the system.
14. Use the appropriate option with the netstat command to display a
list of active sockets for each protocol on the system.
15. Use the appropriate command to append a session record to the file
myfile1.
5-60
5-61
Exercise Summary
Exercise Summary
Manage the discussion based on the time allowed for this module, which was provided in the About This
Course module. If you do not have time to spend on discussion, highlight just the key concepts students
should have learned from the lab exercise.
Experiences
Ask students what their overall experiences with this exercise have been. Go over any trouble spots or
especially confusing areas at this time.
Interpretations
Ask students to interpret what they observed during any aspect of this exercise.
Conclusions
Have students articulate any conclusions they reached as a result of this exercise experience.
Applications
Explore with students how they might apply what they learned in this exercise to situations at their workplace.
5-62
Exercise Solutions
Exercise Solutions
The solutions for the tasks listed in this exercise are:
1.
Log in as the root user, and open a terminal window. Use the
ifconfig command to display basic configuration information
about the network interfaces on the system.
# ifconfig -a
Record the information for the following attributes.
Attribute
Value
IP address
Ethernet address
Netmask
Interface
up/down
2.
3.
Use the appropriate command to verify that your system can contact
the network interface on another system in the network. Does the
output of the snoop command contain requests and replies (yes or
no)?
# ping host 2
The output of the snoop command contains both requests and replies.
4.
5-63
Exercise Solutions
5.
On the system whose interface remains up, use the ping command
to contact the system whose interface is down.
# ping host
What does the ping command report?
After a time-out period, the ping command reports no answer from
host.
6.
7.
On the system whose interface remained up, again attempt to use the
ping command to contact the other system.
# ping host2
Does the snoop command report a reply from the target host?
Yes
8.
Use the appropriate command to list the driver modules that are
loaded on your system.
# modinfo
9.
5-64
Exercise Solutions
12. Use the appropriate command to identify the system calls made by
the ls command.
# truss ls
13. Use the appropriate command to display information about the
active processes running on the system.
# ps -e
14. Use the appropriate option with the netstat command to display a
list of active sockets for each protocol on the system.
# netstat -an
15. Use the appropriate command to append a session record to the file
myfile1.
# script -a myfile1
16. Use the appropriate option with the uname command to display the
name of the hardware platform on the system.
# uname -i
17. Use the appropriate prtconf command to display information about
the name of the device driver that manages a peripheral device on a
SPARC processor.
# /usr/sbin/prtconf -D
5-65
Module 6
6-1
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
Relevance
Relevance
Present the following questions to stimulate students and get them thinking about the issues and topics
presented in this module. While they are not expected to know the answers to these questions, the answers
should be of interest to them and inspire them to learn the material presented in this module.
Relevance on
page OH 6-3
!
?
Which online diagnostic tools do you find effective and easy to use
for solving problems in the Solaris OE?
Allow students to share their experiences. Provide examples of diverse problems that students might
encounter at their workplaces. Use these examples to highlight the significance of the online man pages in
the Solaris OE, in the SunSolve Online service, and on the docs.sun.com Web site.
6-2
Additional Resources
Additional Resources
Additional resources The following references provide additional
information on the topics described in this module:
6-3
Explain to students that they can run the man command with any options provided in the preceding syntax.
The choice of an option depends on the subject of the search. Inform students that the -M, -s, -l, and -k
options for the man command are described in the following pages.
6-4
-M /usr/man
-M /usr/man
The preceding output lists two man pages along with the section numbers
that contain information about the crypt command. You can then search
the particular section in the manual.
Note If you do not specify a section, the man command searches each
directory in the search path and displays the first matching man page.
6-5
# man -s3C
Standard C
NAME
crypt
SYNOPSIS
...
...<output
crypt
Library Functions crypt(3C)
- string encoding function
truncated>
The section name at the command line limits the search for the crypt
function to section 3C. The first line of the output specifies the section of
the crypt library function.
6-6
Note The windex database file is similar to an index file, which lists the
keywords and their corresponding reference pages.
For example, the following output is displayed when you run the man
command with the -k option:
# man -k passwd
d_passwd
d_passwd (4)
- dial-up password file
getpw
getpw (3c)
- get passwd entry from UID
kpasswd
kpasswd (1)
- change a user's Kerberos
password
nispasswd
nispasswd (1)
- change NIS+ password
information
nispasswdd
rpc.nispasswdd (1m) - NIS+ password update
daemon
...<output truncated>
The preceding output shows one-line summaries for all the entries in the
windex database file that contain the keyword passwd.
The keyword look-up feature for searching the man pages is not enabled
by default. To enable it, you must create the windex database file on the
system. If you attempt to run the man -k option without a windex
database file on the system, an error is displayed.
Inform students that to create a windex database file, they must use the catman command. This command is
described in the following section.
6-7
Keyword
The catman command indexes each page of the manual. If you make any
changes to the man pages, you must run the catman command to recreate
the windex database file.
Note Only the whatis command and the man -f and -k options use the
windex database file to perform search operations.
Refer students to the man pages for more information on the whatis command and the man -f and -k
options.
6-8
http://sunsolve.sun.com
http://sunsolve1.sun.com
http://docs.sun.com
You must have a valid SunSpectrumSM contract ID before you register for
a SunSolve Online account.
6-9
The SunSolve Online service contains several tools and utilities that help
you to diagnose and troubleshoot faults in your system. Figure 6-1
displays the main contents of this database.
Figure 6-1
Patches
Ask students to define a patch. Note student responses on the white board, and identify the close-to-correct
and correct responses. Add your inputs to the correct responses to define a patch as a bug fix, a firmware
upgrade, or a software revision upgrade.
A patch contains a set of files and directories that correct the known bugs
in the system or adds product enhancements. You can download the
recommended and security patches provided by Sun without logging in
to the SunSolve Online service. However, to download the
product-related and operating system patches, you must be a registered
user of this database.
Diagnostic Tools
The SunSolve Online service provides a set of diagnostic tools and
utilities. In addition, it provides links to related tools that help to diagnose
the problem in the system.
Ask students if they referred to the SunSolve Online service to perform system diagnosis. If yes, ask them
which tools and utilities they used to resolve the problems in the Solaris OE.
6-10
Description
PatchDiag tool
Online Support
Center
If students are logged in to the SunSolve Online service, ask them to access the diagnostic tools from the
SunSolve Online home page to know more about the tools. Depending on the available time, decide on the
time to be spent on this exercise.
Collection Documents
Documents containing related or similar information are grouped as
collections in the SunSolve Online service. This database provides several
collection documents that help you to perform system diagnosis.
The following lists some of the collection documents available in the
SunSolve Online service:
Bug Reports
FAQs
Early Notifiers
Info Docs
6-11
Note If you are a registered user of the SunSolve Online service, you can
mark a collection document for downloading or for receiving a
notification whenever the document is modified.
Inform students that to access a collection, they must click the Searchable Collections link on the
SunSolve Online home page.
Security Information
The SunSolve Online service provides information about the bugs and
security issues of various products. The SunSolve Online service also
provides information about the security patches to resolve securityrelated bugs in the system. Table 6-2 displays the security-related
information items available in the SunSolve Online service.
Table 6-2 Security Information
Security Information
Item
6-12
Description
Security bulletin
archive
Security t-patches
BigAdminSM Services
The SunSolve Online service provides a link to the BigAdminSM service,
which is a web-based, community-driven repository of resources for
system administrators. The BigAdmin service enables users to receive and
post information, resources, and tips.
FAQs, documentation, education resources, patches, scripts, software, and
services and support form an integral part of the BigAdmin service. The
service also includes discussion groups and technical guidance on shell
commands.
You can also access the BigAdmin service by visiting
www.sun.com/bigadmin/.
6-13
6-14
Operator
Name
Description
Verbatim
[]
AND
{}
OR
()
Near
Suffix
1.
Select the collections that you want to search. Figure 6-2 displays the
selected Info Docs collection.
Figure 6-2
6-15
Figure 6-3
Search Criteria
Figure 6-4 displays the search results for the Info Docs.
Figure 6-4
6-16
Search Results
The SunSolve Online service contains the following tools and utilities that
are related to patches, as shown in Table 6-4.
Table 6-4 Patch Support Tools
Patch Utility
Description
Patch Check
Recommended
and Security
Patches
PatchPro
Checksum file
Automate
Downloads
Patch Finder
Solaris Patches
Product Patches
Note You can access the Automate Downloads, Solaris Patches, and
Product Patches utilities only if you log in to the SunSolve Online service.
To access a patch tool or utility, click the corresponding link on the
SunSolve Online home page.
Diagnosing Faults Using Online Tools
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
6-17
Figure 6-5 displays a section of the patch report for the Solaris 8 OE.
Figure 6-5
Solaris OE Patches
You can view the current patches available in your Solaris OE by running
the patchadd -p command.
6-18
Latest revisions
Recommended patches
Security patches
Y2K patches
Note The PatchDiag tool is a compiled Perl script. The Perl source is in
the patchdiag.pl file of the installation directory.
To install and run the PatchDiag tool, complete the following steps:
1.
2.
3.
4.
Note The patchdiag.xref file contains the latest data about all the
patches. All users must have access to the patchdiag.xref file to run the
PatchDiag tool successfully.
6-19
Check that the following files are present in the Solaris OE. The
PatchDiag tool uses these files to generate the patch report:
6.
7.
Description
-l
-s <sfile>
<os_ver>
<arch>
-p <pfile>
<sfile>
<os_ver>
<arch>
-x <xref>
-h | -?
If you do not specify an option, the PatchDiag tool runs the showrev -p
command in the Solaris OE and prints the standard audit report. The
audit report contains information about the installed patches, the security
patches, and the uninstalled recommended patches.
To identify the patches that you must download and install on your
system, review the audit report generated by the PatchDiag tool.
6-20
Obtaining Explorer
You download Explorer from the www.sunsolve.sun.com Web site.
Installing Explorer
To unpack and install Explorer, complete the following steps:
1.
2.
3.
6-21
Note If you want to run Explorer on a cluster, select one node at a time
instead of selecting all the nodes simultaneously.
6-22
Figure 6-6
Subject Categories
Collection Titles
Product Categories
6-23
Subject Categories
The docs.sun.com Web site contains several books that are grouped
according to the subject matter. These subjects include the following
categories:
System Administration
Programming
Desktop Manuals
Hardware
Manpages
Figure 6-7
6-24
Collection Titles
A collection is a set of books that are grouped if they fulfill the following
requirements:
Collections help you to browse Sun documentation because you can easily
track information on any subject if you know the type of content. For
example, to locate information on managing users in the Solaris 9 OE, you
can select the Solaris 9 System Administration collection.
Ask students to locate the OpenBoot collection title and observe how books with similar subject matter are
grouped in the OpenBoot collection.
Figure 6-8
Collection Titles
6-25
Product Categories
This category groups and displays books according to the product
described by the books. The books are categorized under hardware and
software products, which are further categorized according to specific
products.
Figure 6-9 shows the product categories at the docs.sun.com Web site.
Figure 6-9
Product Categories
Ask students to access the docs.sun.com Web site. Ask them which document structure they prefer for
locating the man pages for the Solaris 9 OE. Note student responses. Highlight the fact that using the
Manpages subject category is the most convenient way of locating the required man pages because the
subject is known.
6-26
Search In functionality
Search Syntax
Table 6-6 shows the options provided by the Search In drop-down menu.
Table 6-6 Search In Options
Search Option
Description
All Books
Subject or Product
category
This collection
This Book
6-27
Search book titles only Performs the search only in the titles of
books.
Ignore old editions Performs the search only in the latest editions
of published books. For example, while searching for the Solaris
Advanced Users Guide, the result shows the Advanced Users
Guide for the Solaris 9 OE only. This is because the latest edition of
the book is published for the Solaris 9 OE.
6-28
Description
Sample Syntax
Words
Openboot firmware
Phrases
ok setenv
ok printenv
AND
ok setenv AND
OpenBoot
OR
Sun Enterprise OR
Ultra Enterprise
6-29
6-30
Click the link to the book that you want to print, as shown in
Figure 6-12.
Use the Print function in Adobe Acrobat Reader to print the file.
6-31
Move the mouse device over the pdf file that you want to download,
and note the URL of the file displayed on the status bar, as shown in
Figure 6-13.
Open a terminal window in the Solaris OE, and select the directory
in which you want to download the pdf file.
3.
Run the ftp command by using the URL address, which you noted
in Step 1. For example, if the URL address of the pdf file is
ftp://192.18.99.138/802-1958/802-1958.pdf, run the
following ftp command:
$ ftp 192.18.99.138
4.
6-32
After downloading the pdf file, you can view it in the browser by using
Adobe Acrobat Reader.
Introducing Icons
Icon Legends
on page OH 6-7
6-33
Control
Legends on
page OH 6-8
The control symbols enable you to view contents, wherever required. You
can click the control symbols to either expand or collapse collections or
groups of documents.
Figure 6-15 shows various control symbols and their associated
descriptions.
6-34
Indicator
Legends on
page OH 6-9
!
?
To locate information on IP network multipathing, select the Solaris 9 System Administration collection. Next,
select the System Administration Guide: IP Services book. Type multipathing in the Search For text field on
the search bar. Identify the indicator legend to select the most relevant search result.
6-35
By a section name
By a keyword
Preparation
Boot the Solaris OE, and log in as the root user if necessary.
Tasks
Perform the following tasks to display information from the online
reference manual:
6-36
1.
2.
Run the man command with an appropriate option to list all the man
pages that contain information about the passwd command.
3.
4.
Exercise Solutions
Exercise Solutions
The following are the solutions for the tasks listed in the exercise:
1.
2.
Run the man command with an appropriate option to list all the
manual pages that contain information about the passwd command.
$ man -l passwd
passwd (1)
-M /usr/man
passwd (4)
-M /usr/man
3.
4.
6-37
Preparation
For the purpose of this exercise, divide students into manageable groups, depending on the strength of
the class.
If Internet access is available, log in students to the SunSolve Online service on one system in each
group. The login information is provided in the setup file located at the education.central Web site.
If the Internet is not accessible, ask students to write the steps for performing the exercise tasks. They
can refer to the figures in the module, which display the screens from the SunSolve Online service.
Tasks
Perform the following tasks:
6-38
1.
2.
4.
Use the appropriate utility to find the patch with the patch ID
112552-01 on the SunSolve Online service. Note the bug IDs that are
fixed by the patch.
5.
b.
c.
d.
6-39
Exercise Summary
Exercise Summary
Manage the discussion based on the time allowed for this module, which was provided in the About This
Course module. If you do not have time to spend on discussion, highlight just the key concepts students
should have learned from the lab exercise.
Experiences
Ask students what their overall experiences with this exercise have been. Go over any trouble spots or
especially confusing areas at this time.
Interpretations
Ask students to interpret what they observed during any aspect of this exercise.
Conclusions
Have students articulate any conclusions they reached as a result of this exercise experience.
Applications
Explore with students how they might apply what they learned in this exercise to situations at their workplace.
6-40
Exercise Solutions
Exercise Solutions
The following are the solutions for the tasks listed in the exercise:
1.
2.
3.
b.
c.
Type sun system security in the Synopsis text field, and click
Go.
Use the appropriate utility to find the patch having the patch ID
112552-01 on SunSolve Online. Note the bug IDs that are fixed by the
patch.
Use the Patch Finder utility to locate the patch on SunSolve Online.
Patch 112552-01 fixes the bugs 4607337 and 4624965 in the Solaris 9 OE.
5.
c.
# directory-path/patchdiag-1.0.4/patchdiag_setup
d.
# perl directory-path/patchdiag-1.0.4/patchdiag.pl -l
Diagnosing Faults Using Online Tools
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
6-41
Module 7
7-1
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
Relevance
Relevance
Present the following questions to stimulate students and get them thinking about the issues and topics
presented in this module. While they are not expected to know the answers to these questions, the answers
should be of interest to them and inspire them to learn the material presented in this module.
Relevance on
page OH 7-3
!
?
Ask students what they understand by a core dump. Use the following description to explain a core dump.
A core dump is a file that contains the memory image of a process that was terminated by the kernel in
abnormal circumstances. When an application code attempts to perform an illegal action, the kernel causes
the application to terminate and creates a core dump file on the disk, representing the process memory. Core
dumps enable you to perform a postmortem analysis of the offending application. By viewing the process
memory image at the exact moment of termination, you can determine the cause of the problem within the
source code. The process that dumps the core file is the only one that is affected.
Ask students what they understand by a system crash dump. Use the following description to explain a
system crash dump.
A system crash dump is the conceptual equivalent of a core dump, which is generated when the kernel code
performs an illegal action that jeopardizes data integrity. If data integrity is jeopardized, the kernel notes the
disparity and calls a special kernel routine, known as a panic, to manage the situation. The panic routine
causes the memory image and symbol table of the kernel to be saved to the swap space, by default, and
forces the system to reboot. When the kernel generates a crash dump, all the applications are affected.
A system crash occurs when either the computer stops working or an application aborts unexpectedly. A
system crash signifies either a hardware fault or a critical software bug.
When a system crashes in such a way that it does not respond to any inputs from the keyboard, the mouse
device, or any other program, it is known as a system hang.
7-2
Additional Resources
Additional Resources
Additional resources The following references provide additional
information on the topics described in this module:
Drake, Chris, and Kimberley Brown. Panic! UNIX System Crash Dump
Analysis. Upper Saddle River, New Jersey: Prentice Hall PTR, May
1995.
Goodheart, Berny, and James Cox. The Magic Garden Explained. Upper
Saddle River, New Jersey: Prentice Hall Books, January 1994.
7-3
Note Before the release of the Solaris 2.6 OE, system crash dumps were
stored in the first dump device, as defined in the /etc/vfstab file. In the
Solaris 9 OE, you use the dumpadm command to specify the location where
the system stores the crash dumps.
Ask students to discuss the system failures that they experience at their work places. Note the responses on
the white board, and inform students that by the end of this module, they will be categorizing the list of
system failures as different types of system crashes.
7-4
Note The administrator can also use the savecore utility to generate a
system crash dump without causing the system to reboot.
7-5
Performs a stack trace to list the routines that caused the panic
Saves a core dump image of the system memory in the dump device
7-6
Type number
Cause
Data fault
Memory alignment
Illegal software
instruction
Unrecognized instruction
If the Solaris OE reboots after a bad trap, the trap messages are saved in
the /var/adm/messages file. However, if the trap messages are not saved,
the system administrator must run the dmesg command immediately after
system reboot. The dmesg command displays the messages generated by
the system crash.
Inform students that line #1 of the output indicates the trap type and the date and time of the trap. Line #2
indicates the name of the trap.
7-7
For more information on the dmseg command, refer students to the online man pages or the docs.sun.com
Web site.
When a bad trap occurs, the stack traceback of the process or thread that
caused the bad trap enables you to determine the cause of the trap. A
stack traceback provides the history of the thread that caused the trap.
The traceback also enables you to identify the sequence in which routines
were called before the trap.
Note You can compare the stack trace with the traces in bug reports to
verify whether the trap occurred because of a known bug.
7-8
2.
3.
Run the sync command at the ok prompt to force a system panic and
a reboot.
4.
5.
If required, inform students that the.registers and .locals commands are discussed later in the module.
7-9
Process of
System Crash
Dump
Generation on
page OH 7-4
This section describes the process of generating a system crash dump. The
first two phases are preparatory steps for generating a crash dump, and
the remaining phases are a part of the process of crash dump generation.
Figure 7-1 lists the steps in the process of generating a system crash
dump.
Figure 7-1
7-10
Note The swap devices configured on the system are listed in the
/etc/vfstab file. The first entry in the file corresponds to the primary
swap device.
You configure the system to contain a single primary swap partition and
multiple secondary swap partitions. If the dump is too large for the
primary swap partition, the system writes the core dump to the secondary
swap partitions.
Ask students to open the /etc/vfstab file and identify various swap devices configured on their systems.
Note If the aggregate size of all the swap partitions is less than the size
of the system crash dump, the kernel does not create a system crash
dump.
Each system crash dump contains a header to which the system always
writes the end of the primary swap partition. The header contains
information about the size and location of the dump. The header
information enables the system to locate and save the dump when the
system reboots.
You can configure the system to save a system crash dump either partially
or completely. A partial dump contains the crash dump header and a
copy of a part of physical memory. A complete dump contains the dump
header and a copy of the entire physical memory.
7-11
7-12
Options of the
dumpadm
Command on
page OH 7-5
Description
-c content-type
-d dump-device
-m mink|minm|min%
-n
-r root-dir
-s savecore-dir
-y
If necessary, remind students that the savecore utility is enabled by default when the system reboots.
7-13
kernel pages
/dev/dsk/c0t0d0s3 (swap)
/var/crash/SunSparc1
yes
Ask students to run the dumpadm command to view the current dump configuration on their systems.
7-14
Ask students to locate the default savecore directory on their respective systems. The savecore directory is
located in the /var/crash/hostname directory. For more information on the dumpadm utility, refer students to
the relevant man pages.
7-15
Moves the system crash dump and a copy of the kernel files from the
dump device to the file system specified by the dumpadm command.
This is true if the free space on the file system is greater than the
value specified by the minfree variable.
Note The savecore command saves the unix.n and vmcore.n files.
The variable n is incremented each time a system saves a crash dump.
7-16
7-17
Variable Definition
%p
PID
%u
Effective UID
%g
Effective group ID
%f
%n
%m
%t
%%
Literal percentage
To display the name pattern of the per-process core file for one or more
processes, run the coreadm command at the command line with a list of
PIDs.
$ coreadm 278 5678
278: core.%f.%p
5678: /home/george/cores/%f.%p.%t
Refer students to Module 5, Performing Solaris OE Diagnostics, for more information on the coreadm
command.
7-18
7-19
7-20
To ensure that the obpsym module is loaded across system resets, add the
following entry in the /etc/system file:
forceload:misc/obpsym
7-21
Preparation
Boot the Solaris OE.
Tasks
Answer the following questions:
7-22
1.
2.
3.
4.
5.
In which files of the default crash directory does the savecore utility
save the core dump?
6.
Which command do you use to set the path name for a global core
file to include the PID and the name of the executable file? Use the
default crash directory path for the core file.
7.
Which is the command that you use to assign the swap device in the
Solaris OE as the dump device?
8.
Exercise Summary
Exercise Summary
Manage the discussion based on the time allowed for this module, which was provided in the About This
Course module. If you do not have time to spend on discussion, highlight just the key concepts students
should have learned from the lab exercise.
Experiences
Ask students what their overall experiences with this exercise have been. Go over any trouble spots or
especially confusing areas at this time.
Interpretations
Ask students to interpret what they observed during any aspect of this exercise.
Conclusions
Have students articulate any conclusions they reached as a result of this exercise experience.
Applications
Explore with students how they might apply what they learned in this exercise to situations at their workplace.
7-23
Exercise Solutions
Exercise Solutions
The following are the solutions for the questions in the exercise:
1.
2.
System panics
Bad traps
System hangs
3.
4.
a.
Use the Stop-A key sequence to switch the system to the ok prompt.
b.
c.
d.
e.
Study the core dump to determine the cause of the system hang.
7-24
Copying data from the dump device to the crash dump directory
Exercise Solutions
5.
In which files of the default crash directory does the savecore utility
save the core dump?
The savecore utility saves the core dump data in the vmcore.n file and
in the kernel files of the unix.n file.
6.
Which command do you use to set the path name for a global core
file to include the PID and the name of the executable file? Use the
default crash directory path for the core file.
# coreadm -g /var/crash/%f.%p
7.
Which is the command that you use to assign the swap device in the
Solaris OE as the dump device?
To check the swap dump device in the Solaris OE, run the following
command:
# dumpadm
To configure the swap dump device as the dedicated dump device, run the
following command:
# dumpadm -d swap
8.
7-25
Module 8
8-1
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
Relevance
Relevance
Present the following questions to stimulate the students and get them to think about the issues and topics
presented in this module. While they are not expected to know the answers to these questions, the answers
should be of interest to them and inspire them to learn the material presented in this module.
Relevance on
page OH 8-3
!
?
Allow students to share their work experiences and describe how they analyzed the problem of a system
panic. Ask them to list the steps they performed to reach a solution.
Allow students to list the steps they performed at their work places to configure a system that helps them to
process core dumps successfully.
8-2
Additional Resources
Additional Resources
Additional resources The following references provide additional
information on the topics discussed in this module:
8-3
Note The syntax of the mdb utility is compatible with the syntax of the
kadb and adb utilities. The mdb utility can execute all the macros of the
kadb and adb utilities.
You use the following command to launch the mdb utility:
# mdb
Loading modules: [ unix krtld genunix ip ufs_log nfs random
ptm lofs ipc logindmux cpc ]
>
After you launch the mdb utility, you can change the default prompt:
> $P"mdb: "
You can also invoke help to find out which options are available for a
particular command. For example, you use the following command to
find out the options that are available with the ps command:
mdb: ::help ps
::ps [-fltTP] - list processes (and associated thr,lwp)
8-4
ADDR NAME
00000000014393b8 sched
00000300004e6008 fsflush
00000300004e6a20 pageout
00000300004e7438 init
0000030001b7a060 sendmail
0000030001b8d488 devfsadm
0000030001b8ca70 snmpXdmid
0000030001b6f478 dmispd
0000030001c5a088 dtterm
0000030001cbf4c0 mdb
In the preceding output, you determine the running thread that caused
the panic by using the status of the TS_ONPROC field.
8-5
Inform students that they can use the mdb utility to debug existing software programs and develop their own
modules. This helps them to debug the drivers and applications in the Solaris OE.
8-6
Command-line editing
Command history
Online help
Command pipe-lining
8-7
Examining process core dumps The mdb utility does not provide
support for examining the process core dumps generated on the
Solaris 2.4 OE to Solaris 2.6 OE versions. The runtime link editor
debugging interface (librtld_db) might not be initialized if you
examine the core dump on one Solaris OE version from another
Solaris OE version. Therefore, the symbol information for shared
libraries is not available. In addition, if the mappings for the shared
libraries are not available in the user core dumps, the text section
and the read-only data of the shared libraries might not be the same
as the data in the core dump.
Examining crash dumps The mdb utility uses the libkvm library
routine from the corresponding operating system release to examine
the crash dumps that are generated on the Solaris 2.4 OE to Solaris 7
OE versions. If you use debugger modules (dmods) from one Solaris
OE version to examine a crash dump on another Solaris OE version,
the changes in kernel implementation might prevent some debugger
commands (dcmds) or walkers from functioning properly.
A debugger command or dcmd (pronounced dee-command) is a routine in the debugger that can access any
properties of the current target. The mdb utility parses commands from the standard input and executes the
corresponding dcmds. Each dcmd can also accept a list of string or numerical arguments.
A debugger module or dmod (pronounced dee-mod) is a dynamically loaded library that contains a set of
dcmds and walkers. During initialization, the mdb utility attempts to load the dmods that correspond to the
load objects in the target. You can subsequently load or unload dmods any time while running the mdb utility.
walker
A set of routines that describe how to iterate through the elements of a particular program data structure. A
walker encapsulates the implementation of a data structure from dcmds and the mdb utility. You can use
walkers interactively or use them to build other dcmds or walkers.
Note The mdb utility might not provide support for examining the core
and crash dumps on an Intel platform from a SPARC platform or on a
SPARC platform from an Intel platform.
8-8
8-9
A macro file is a text file that contains a set of commands. Macro files
automate the process of displaying commonly referenced programming
structures. For example, the proc macro displays the process structure,
the thread macro displays the thread structure, and the inode macro
displays the inode structure. You use macros to annotate the output
displays that help to interpret the information on programming
structures.
Inform students that macros facilitate working with the debugger.
8-10
Ask students to assign the /usr/lib/adb directory as the current directory and run the ls command to
view the list of available macros.
The following are some of the frequently used header files with their
associated macros:
8-11
8-12
o6 Stack pointer or sp
o7 Program counter or pc
In a system dump analysis, the most important registers are the program
counter and the stack pointer. The program counter, <o7 or pc, contains
the current instruction, and the stack pointer, <o6 or sp, points to the
current stack frame for use with local variables or return addresses.
8-13
The thread that was running at the time of the system panic.
The process that was running at the time of the system panic.
To use the debugger utility for analyzing the crash dump, switch to the
directory in which the dump is located. To switch to the crash directory,
use the following syntax:
# cd /var/crash/`uname -n`
where /var/crash/`uname -n` is the default crash directory.
In this example, the crash directory is located in the
/var/crash/sun-sparc-1 directory. You can use the dumpadm command
to determine the current crash directory.
Consider a scenario in which one of the Sun systems panics. To solve this
problem, you must first understand the cause of the panic.
Discuss with students what you can achieve from an administrator's perspective after a system panic.
Identify the address of the thread that was running during the panic
Identify the name and arguments of the processes that were running
during the panic
8-14
2.
Introduce a bug into the ksyms driver that the system uses to access
the symbol table of the kernel.
Device drivers run in a full privileged state, and any critical
problems within the driver code cause the kernel to panic the
system.
To invoke the mdb utility on the live kernel, complete the following steps:
1.
Type the following command to invoke the mdb utility on the live
kernel:
# mdb -kw /dev/ksyms /dev/mem
Loading modules: [ unix krtld genunix ip usba ufs_log
logindmux ptm isp cpc ipc random nfs ]
2.
<kmem_free>
<kmem_alloc>
8-15
4.
0x0
5.
<kmem_free>
<kmem_alloc>
8-16
Note The kernel displays a stack traceback on the console to show the
routines that led to the panic and also displays the source of the panic.
After the kernel panics the system, the system reboots. Next, you use the
mdb utility to examine the offending address and thread that caused the
system to panic. To examine the cause of the panic, complete the
following steps:
1.
2.
Use the mdb utility to dump the values of the registers at the time of
the crash:
> $r
Note The mdb utility automatically pages the output to prevent scrolling.
3.
Press the space bar once, and look for the following register:
%pc = 0x00000000780ff8c0 ksyms_open+0x14
The %pc register contains the address of the instruction that the
processor was executing when the exception or error condition
occurred.
Note The mdb utility formats the output to display the hexadecimal
address (0x00000000780ff8c0) of the instruction that the processor was
executing. The address is followed by the symbolic name (ksyms_open)
associated with the routine and the hexadecimal offset (+0x14) from the
beginning of that routine.
8-17
0x14
The executed instruction was an illegal trap, and it resided at the memory
location specified by the symbol ksyms_open plus 20 (decimal) bytes
offset.
5.
Invoke the pointer to the address of the thread that was executing
when the system panicked:
> panic_thread/K
panic_thread:
panic_thread:
3000178fa80
6.
Run the thread macro along with the pointer to the data structure
address of the thread that was running when the system panicked.
Search for the procp pointer, which is the address of the proc
structure of the process that contains the thread.
> 3000178fa80$<thread
...........<output truncated>
0x3000178fb30: lpl
intr
did
142d3b8
0
42363
0x3000178fb50: tnf_tpdp
tid
waitfor
30000922490
1
-1
0x3000178fb60: sigqueue
sig
hold
0
0
0
0x3000178fb78: forw
back
thlink
3000178fa80
3000178fa80
0
0x3000178fb90: lwp
procp
audit_data
300017a0e10
300017c0060
0
0x3000178fba8: next
prev
trace
3000178ed60
30000d542e0
0
0x3000178fbc0: whystop whatstop
dslot
0
0
0
0x3000178fbc8: pollstate
pollcache
cred
0
0
300001cbce8
0x3000178fbe0: start
lbolt
stoptime
3cb4b228
7bfcb91
0
0x3000178fbf8: pctcpu
sysnum delay_cv
100000
5
0
0x3000178fc00: delay_lock
0x3000178fc00: owner
0
............<output truncated>
8-18
Run the proc2u macro to search user data structure for the process
that was executing when the system panicked. In the following
output, the psargs field contains the name of the process and any
associated arguments:
> 300017c0060$<proc2u
auxv
300017c0398
0x300017c04c8: start.tv_sec
start.tv_nsec
3cb4b228
343daa11
0x300017c0390: execsw
ticks
140e620
7bfcb89
0x300017c04f1: psargs /usr/ccs/bin/sparcv9/nm
/dev/ksyms\0\0\0\0\0\0\0\0\0\0\0
\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
0\0\0\0\0
\0\0\0
0x300017c04e0: comm
nm\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
0x300017c0544: argc
argv
envp
2
ffffffff7ffffca8
ffffffff7ffffcc0
0x300017c0558: cdir
rdir
mem
300003ebdb8
0
28
0x300017c0570: cmask
acflag systrap
022
0
0
entrymask
300017c0578
exitmask
300017c059c
0x300017c05c0: signodefer
sigonstack
sigresethand
0
0
0
0x300017c05d8: sigrestart
0
........<output truncated>
8.
8-19
2.
Identify the address of the thread that was running during the panic.
panic_thread:
3.
3000178fa80
Identify the name of the process that was running during the panic.
0x300017c04e0: comm nm
4.
8-20
Preparation
Consult the instructor to access the files required for the lab. While
performing the lab exercise, refer to the examples in this module, the
online header files, and the online man pages.
Tasks
Answer the following questions:
1.
2.
List the tasks for which the mdb utility enables you to formulate
complex queries.
3.
4.
5.
2.
3.
Run the $r command to display the registers at the time of the panic.
4.
From the displayed registers, use the %pc (the program counter)
value to display the instruction that caused the system to fail.
8-21
6.
7.
8.
Use the address from the output of the ::ps -lt command to
display the thread structure.
9.
Use the address under the procp field with the proc2u macro to
view the command name and arguments that caused the panic.
8-22
Exercise Summary
Exercise Summary
Manage the discussion based on the time allowed for this module, which was provided in the About This
Course module. If you do not have time to spend on discussion, highlight just the key concepts students
should have learned from the lab exercise.
Experiences
Ask students what their overall experiences with this exercise have been. Go over any trouble spots or
especially confusing areas at this time.
Interpretations
Ask students to interpret what they observed during any aspect of this exercise.
Conclusions
Have students articulate any conclusions they reached as a result of this exercise experience.
Applications
Explore with students how they might apply what they learned in this exercise to situations at their workplace.
8-23
Exercise Solutions
Exercise Solutions
The following are the solutions for the questions in the exercise:
1.
2.
List the tasks for which the mdb utility enables you to formulate
complex queries.
The mdb utility enables you to formulate complex queries for the following
tasks:
3.
8-24
Command-line editing
Command history
Online help
Command pipe-lining
Exercise Solutions
4.
5.
Does not provide support for examining the process core dumps that
are generated in the Solaris 2.4 OE version to the Solaris 2.6 OE
version
2.
3.
Run the $r command to display the registers at the time of the panic.
>
$r
4.
From the displayed registers, use the %pc (the program counter)
value to display the instruction that caused the system to fail.
5.
6.
8-25
Exercise Solutions
7.
8.
Use the address from the output of the ::ps -lt command to
display the thread structure.
The answer to this will differ, based on the system crash dump that varies
with systems.
9.
Use the address under the procp field with the proc2u macro to
view the command name and arguments that caused the panic.
The answer to this will differ, based on the system crash dump that varies
with systems.
8-26
Appendix A
Sample Outputs
This appendix provides sample outputs for the following:
A-1
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
A-2
Sample Outputs
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
A-3
A-4
Sample Outputs
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
A-5
A-6
Appendix B
Additional Information
This appendix provides additional information for the following:
B-1
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
B-2
Disk
Note The reset-all command resets the SCSI bus and memory to
ensure an effective probe of the devices.
Additional Information
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
B-3
B-4
Additional Information
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
B-5
B-6
Figure B-1
Additional Information
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
B-7
B-8
Field
Description
Cpu0/Cpu1
CPU{0,1}-OK
FailCode
FHC
SRAM
Static RAM
FPROM
Flash PROM
LabCon
Lab Console
Ovtemp
Overtemp
Bank0
Bank1
DTag0
DTags0 status
DTag1
DTags1 status
JTAG
JTAG status
CntrPl
Centerplane status
DC
Sysio0
SysIO 0 status
Sysio1
SysIO 1 status
FEPS
FEPSFC
SOC
FFB
Description
Sbus0
Sbus1
Sbus2
AC
Address Controller
TODC
Disk0
Disk1
Disk0P
Disk0 Present
Disk1P
Disk1 Present
VDDOK
Fan
Clock
Clock running
Serial
Serial Port
KBytes
PPS-DC
AC
AC power status
ACFan
KeyFan
PSFail
Ovtemp
Overtemp
V5-P
Peripheral 5V
V12-P
Peripheral 12V
V5-Aux
Auxiliary 5V
V5P-PC
Peripheral 5V Precharge
V12-PC
V3-PC
Additional Information
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
B-9
B-10
Field
Description
V5-PC
RKFan
3.3V
5.0V
2.
Click Register.
Figure B-2 displays the SunSolve Online home page.
Figure B-2
If the students have access to the SunSolve Online Web site, instruct them to follow steps 1-3 and open the
registration form used to create a SunSolve Online account. Ask students who do not have Internet access to
refer to the figures provided with each step.
Additional Information
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
B-11
Figure B-3
Note If you are a registered user of the SunSolve Online service, you can
click the Edit hyperlink to modify your current user profile.
4.
Inform students that the registration form is displayed in two parts. Figure B-4 on page B-13 displays the first
part of the form, and Figure B-5 on page B-14 displays the remaining part of the form.
B-12
Figure B-4
Additional Information
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
B-13
Figure B-5
Your contract information is checked for authenticity, and you are notified
through an email message when your SunSolve account is activated.
B-14
Appendix C
Workshop Exercises
Introduction
The Analysis and Diagnosis Worksheet templates, provided in this
appendix, are similar to those presented in Module 1, Introducing the
Fault Analysis and Diagnosis Methodology. In workshop groups, you
apply the Fault Analysis and Diagnosis methodology described earlier in
Module 1, Introducing the Fault Analysis and Diagnosis Methodology,
and record key observations about the analysis and diagnosis for each
problem.
You are not required to complete any particular number of workshops.
However, it is important to apply a logical fault analysis and diagnosis
methodology to the workshops that you complete.
A worksheet template is provided with each fault in the appendix. You do
not have to complete each field in the worksheet. The amount of
information that you record might vary for each problem.
Preparatory Tasks
If a non-root account does not exist on your system, create one during
your first workshop session. You require the student account for
troubleshooting faults in some workshops and for comparing faults in
other workshops.
Try to use all the troubleshooting tools in your fault analysis and
diagnosis workshops, and explore the use of new utilities.
C-1
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
Introduction
This appendix contains the following fault worksheets:
C-2
Introduction
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-3
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-1 to document the problem description.
Table C-1
Problem Description
Error Messages
C-4
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-3 to document the corrective action.
Table C-3
Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-5
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-4 to document the problem description.
Table C-4
Problem Description
Error Messages
C-6
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-6 to document the corrective action.
Table C-6
Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-7
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
C-8
Problem Description
Use Table C-7 to document the problem description.
Table C-7
Problem Description
Error Messages
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-9
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-9 to document the corrective action.
Table C-9
Corrective Action
Final Repair
C-10
Communication
Documentation
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-10 to document the problem description.
Table C-10
Problem Description
Error Messages
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-11
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-12 to document the corrective action.
Table C-12
Corrective Action
Final Repair
C-12
Communication
Documentation
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-13 to document the problem description.
Table C-13
Problem Description
Error Messages
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-13
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-15 to document the corrective action.
Table C-15
Corrective Action
Final Repair
C-14
Communication
Documentation
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-16 to document the problem description.
Table C-16
Problem Description
Error Messages
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-15
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-18 to document the corrective action.
Table C-18
Corrective Action
Final Repair
C-16
Communication
Documentation
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-19 to document the problem description.
Table C-19
Problem Description
Error Messages
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-17
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-21 to document the corrective action.
Table C-21
Corrective Action
Final Repair
C-18
Communication
Documentation
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-19
Problem Description
Use Table C-22 to document the problem description.
Table C-22
Problem Description
Error Messages
C-20
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-24 to document the corrective action.
Table C-24
Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-21
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-25 to document the problem description.
Table C-25
Problem Description
Error Messages
C-22
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-27 to document the corrective action.
Table C-27
Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-23
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-28 to document the problem description.
Table C-28
Problem Description
Error Messages
C-24
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-30 to document the corrective action.
Table C-30
Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-25
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-31 to document the problem description.
Table C-31
Problem Description
Error Messages
C-26
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-33 to document the corrective action.
Table C-33
Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-27
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-34 to document the problem description.
Table C-34
Problem Description
Error Messages
C-28
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-36 to document the corrective action.
Table C-36
Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-29
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-37 to document the problem description.
Table C-37
Problem Description
Error Messages
C-30
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-39 to document the corrective action.
Table C-39
Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-31
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-40 to document the problem description.
Table C-40
Problem Description
Error Messages
C-32
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-42 to document the corrective action.
Table C-42
Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-33
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-43 to document the problem description.
Table C-43
Problem Description
Error Messages
C-34
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-45 to document the corrective action.
Table C-45
Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-35
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-46 to document the problem description.
Table C-46
Problem Description
Error Messages
C-36
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-48 to document the corrective action.
Table C-48
Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-37
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-49 to document the problem description.
Table C-49
Problem Description
Error Messages
C-38
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-51 to document the corrective action.
Table C-51
Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-39
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-52 to document the problem description.
Table C-52
Problem Description
Error Messages
C-40
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-54 to document the corrective action.
Table C-54
Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-41
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-55 to document the problem description.
Table C-55
Problem Description
Error Messages
C-42
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-57 to document the corrective action.
Table C-57
Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-43
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-58 to document the problem description.
Table C-58
Problem Description
Error Messages
C-44
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-60 to document the corrective action.
Table C-60
Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-45
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-61 to document the problem description.
Table C-61
Problem Description
Error Messages
C-46
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-63 to document the corrective action.
Table C-63
Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-47
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-64 to document the problem description.
Table C-64
Problem Description
Error Messages
C-48
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-66 to document the corrective action.
Table C-66
Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-49
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
C-50
Problem Description
Use Table C-67 to document the problem description.
Table C-67
Problem Description
Error Messages
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-51
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-69 to document the corrective action.
Table C-69
Corrective Action
Final Repair
C-52
Communication
Documentation
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-70 to document the problem description.
Table C-70
Problem Description
Error Messages
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-53
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-72 to document the corrective action.
Table C-72
Corrective Action
Final Repair
C-54
Communication
Documentation
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-73 to document the problem description.
Table C-73
Problem Description
Error Messages
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-55
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-75 to document the corrective action.
Table C-75
Corrective Action
Final Repair
C-56
Communication
Documentation
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-76 to document the problem description.
Table C-76
Problem Description
Error Messages
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-57
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-78 to document the corrective action.
Table C-78
Corrective Action
Final Repair
C-58
Communication
Documentation
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-59
Problem Description
Use Table C-79 to document the problem description.
Table C-79
Problem Description
Error Messages
C-60
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-81 to document the corrective action.
Table C-81
Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-61
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-82 to document the problem description.
Table C-82
Problem Description
Error Messages
C-62
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-84 to document the corrective action.
Table C-84
Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-63
Analysis Phase
Document the observations made during the Analysis phase.
Customer call #1
The customer complains that the Time-of-Day-Clock checksum value
is destroyed during the process of power cycling the machine. The
message Fatal Error Reset and, sometimes, the wrong year is
displayed. The customer has an Ultra 3000 workstation running the
Solaris 7 OE.
Customer call #2
The customer notices problems with the at and cron utilities on the
Solaris 2.6 OE. Audit records are not properly generated, and the
date 2/29/2000, in particular, causes errors with the at utility.
Problem Statement
Resources
C-64
Problem Description
Use Table C-85 to document the problem description.
Table C-85
Problem Description
Error Messages
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-65
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-87 to document the corrective action.
Table C-87
Corrective Action
Final Repair
C-66
Communication
Documentation
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-88 to document the problem description.
Table C-88
Problem Description
Error Messages
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-67
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-90 to document the corrective action.
Table C-90
Corrective Action
Final Repair
C-68
Communication
Documentation
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-69
Problem Description
Use Table C-91 to document the problem description.
Table C-91
Problem Description
Error Messages
C-70
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-93 to document the corrective action.
Table C-93
Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-71
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-94 to document the problem description.
Table C-94
Problem Description
Error Messages
C-72
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-96 to document the corrective action.
Table C-96
Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-73
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-97 to document the problem description.
Table C-97
Problem Description
Error Messages
C-74
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Likely Causes
Tests
Results
Verification
Corrective Action
Use Table C-99 to document the corrective action.
Table C-99
Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-75
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-100 to document the problem description.
Table C-100 Problem Description
Error Messages
C-76
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Tests
Results
Verification
Corrective Action
Use Table C-102 to document the corrective action.
Table C-102 Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-77
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-103 to document the problem description.
Table C-103 Problem Description
Error Messages
C-78
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Tests
Results
Verification
Corrective Action
Use Table C-105 to document the corrective action.
Table C-105 Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-79
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-106 to document the problem description.
Table C-106 Problem Description
Error Messages
C-80
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Tests
Results
Verification
Corrective Action
Use Table C-108 to document the corrective action.
Table C-108 Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-81
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-109 to document the problem description.
Table C-109 Problem Description
Error Messages
C-82
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Tests
Results
Verification
Corrective Action
Use Table C-111 to document the corrective action.
Table C-111 Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-83
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-112 to document the problem description.
Table C-112 Problem Description
Error Messages
C-84
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Tests
Results
Verification
Corrective Action
Use Table C-114 to document the corrective action.
Table C-114 Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-85
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-115 to document the problem description.
Table C-115 Problem Description
Error Messages
C-86
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Tests
Results
Verification
Corrective Action
Use Table C-117 to document the corrective action.
Table C-117 Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-87
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-118 to document the problem description.
Table C-118 Problem Description
Error Messages
C-88
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Tests
Results
Verification
Corrective Action
Use Table C-120 to document the corrective action.
Table C-120 Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-89
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-121 to document the problem description.
Table C-121 Problem Description
Error Messages
C-90
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Tests
Results
Verification
Corrective Action
Use Table C-123 to document the corrective action.
Table C-123 Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-91
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-124 to document the problem description.
Table C-124 Problem Description
Error Messages
C-92
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Tests
Results
Verification
Corrective Action
Use Table C-126 to document the corrective action.
Table C-126 Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-93
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-127 to document the problem description.
Table C-127 Problem Description
Error Messages
C-94
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Tests
Results
Verification
Corrective Action
Use Table C-129 to document the corrective action.
Table C-129 Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-95
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-130 to document the problem description.
Table C-130 Problem Description
Error Messages
C-96
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Tests
Results
Verification
Corrective Action
Use Table C-132 to document the corrective action.
Table C-132 Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-97
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-133 to document the problem description.
Table C-133 Problem Description
Error Messages
C-98
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Tests
Results
Verification
Corrective Action
Use Table C-135 to document the corrective action.
Table C-135 Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-99
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-136 to document the problem description.
Table C-136 Problem Description
Error Messages
C-100
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Tests
Results
Verification
Corrective Action
Use Table C-138 to document the corrective action.
Table C-138 Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-101
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-139 to document the problem description.
Table C-139 Problem Description
Error Messages
C-102
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Tests
Results
Verification
Corrective Action
Use Table C-141 to document the corrective action.
Table C-141 Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-103
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-142 to document the problem description.
Table C-142 Problem Description
Error Messages
C-104
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Tests
Results
Verification
Corrective Action
Use Table C-144 to document the corrective action.
Table C-144 Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-105
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-145 to document the problem description.
Table C-145 Problem Description
Error Messages
C-106
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Tests
Results
Verification
Corrective Action
Use Table C-147 to document the corrective action.
Table C-147 Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-107
Analysis Phase
Document the observations made during the Analysis phase.
Problem Statement
Resources
Problem Description
Use Table C-148 to document the problem description.
Table C-148 Problem Description
Error Messages
C-108
Symptoms and
Conditions
Relevant
Changes
Comparative Facts
Diagnosis Phase
Document the observations made during the Diagnosis phase.
Tests
Results
Verification
Corrective Action
Use Table C-150 to document the corrective action.
Table C-150 Corrective Action
Final Repair
Communication
Documentation
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
C-109
Appendix D
Workshop Exercises
This appendix contains the following fault worksheets:
D-1
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-2
Probable Causes
The following are the probable causes:
Faulty monitor
Disconnected cable
Fault Insertion
Use the setenv command to modify the pcib-probe-list variable to an
invalid value.
The following is an example for a Sun4u PCI-based system:
ok printenv pcib-probe-list
ok setenv pcib-probe-list 1,3
ok reset
The system restarts with a blank monitor. Students might use the Stop-N
key sequence during power on to set the default values of the
environment variables. However, encourage students to debug the system
and analyze the cause of the problem.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-3
Possible Fixes
Complete the following steps:
1.
2.
3.
Use the setenv command to set the probe list variable to its default
value.
Learning
Set up a remote diagnostic session with the tip connection to perform
diagnostics on a remote system.
D-4
Probable Causes
The following are the probable causes:
Fault Insertion
Complete the following steps:
1.
Modify the /etc/system file to reflect a root device that does not
exist.
2.
3.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-5
Possible Fixes
Complete the following steps:
1.
2.
3.
Learning
Learn one of the many uses of the /etc/system file. In this example, you
use the /etc/system file to change the root device after loading the
initial boot and the kernel from a different device.
D-6
Probable Causes
The following are the probable causes:
Operator error
Fault Insertion
Modify the /proc entry option field in the /etc/vfstab file.
Complete the following steps to modify the /etc/vfstab file:
1.
2.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-7
/proc
proc
no
/proc
proc
no
suid
to
/proc
3.
4.
Possible Fixes
The modification in the /etc/rcS.d directory causes the root file system
to be mounted in read-only mode. Therefore, the edit session becomes
more complex because students cannot edit files on the root file system.
To debug the problem, students must boot the system from a CD-ROM.
To boot the system from a CD-ROM, complete the following steps:
1.
2.
3.
4.
5.
Learning
Learn about the files required for system operations, and review the
contents of the /etc/vfstab file.
D-8
Probable Causes
The probable cause is an operator error.
Fault Insertion
Complete the following steps:
1.
2.
3.
4.
5.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-9
2.
3.
4.
5.
6.
D-10
Use the vi editor to open the /etc/inittab file, and restore the
original settings.
2.
3.
4.
Use the vi editor to open the /etc/inittab file, and restore the
original settings.
Learning
Learn about the /etc/inittab file that you use during the boot sequence.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-11
Probable Causes
The following are the probable causes:
System overload
Fault Insertion
Modify the /etc/passwd file to reflect either an improper shell or no shell
for the new user, and then reboot the system.
Complete the following steps to edit the /etc/passwd file:
1.
2.
D-12
Possible Fixes
Log in as the root user, and correct the invalid shell entry in the
/etc/passwd file.
Learning
Learn about the /etc/passwd file and the significance of the parameters
specified in the file.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-13
Probable Causes
The following are the probable causes:
Security software
Fault Insertion
Complete the following steps:
1.
2.
D-14
Possible Fixes
Complete the following steps:
1.
2.
Learning
You can enable or disable remote login by enabling or disabling the
CONSOLE=/dev/console parameter, respectively, in the
/etc/default/login file.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-15
Probable Causes
The probable cause is the incorrect execution of the install, ifconfig,
or sys-unconfig command.
Fault Insertion
Edit the /etc/hosts file to modify the number 1 in each IP address to the
small letter l.
Complete the following steps:
1.
2.
D-16
4.
Possible Fixes
To restore the /etc/hosts file, type the following command:
# cp /var/tmp/.hosts /etc/hosts
Learning
Learn about the files that you must check for configuration errors when
network operations are faulty. Use fault analysis techniques to isolate
problems.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-17
Probable Causes
The probable cause is the incorrect or missing
/etc/rcS.d/S40standardmounts.sh file.
Fault Insertion
Move the /etc/rcS.d/S40standardmounts.sh file to a different
location or rename the file by typing the following commands:
# cd /etc/rcS.d
# mv S40standardmounts S40standardmounts.sh
Possible Fixes
Complete the following steps to repair the fault:
1.
Boot the system in single-user mode and login as the root user.
2.
3.
D-18
mount -o rw,remount /
cd /etc/rcS.d
mv S40standardmounts S40standardmounts.sh
uadmin 2 1
Learning
Learn about the various startup scripts and their significance.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-19
Probable Causes
The following are the probable causes:
Corrupt software
Fault Insertion
Modify the entry in the /etc/nsswitch.conf file.
1.
Change the lines for the passwd and group file entries in the
/etc/nsswitch.conf file:
Before alteration:
passwd: files
group: files
After alteration:
passwd: dns
group: dns
2.
Note When a user logs in, the system authenticates the user information
from the passwd and shadow files. If you modify the settings in these files,
the system fails to authenticate the user.
D-20
Possible Fixes
You must boot the system from the CD-ROM to fix the previous bug.
Complete the following steps to boot the system from the CD-ROM:
1.
2.
3.
Type the following commands to boot the system from the CD-ROM
in single-user mode:
ok boot cdrom -s
4.
Type the fsck command to repair the file system by typing the
following command:
# fsck /dev/dsk/c0t0d0s0
where c0t0d0s0 is the root file system.
5.
Mount the root file system onto the /a directory by typing the
following command:
# mount /dev/dsk/c0t0d0s0 /a
6.
7.
Note If the system does not boot, you might have to reinstall the Solaris
OE.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-21
Learning
When a user cannot log in to the system, first check the settings in the
/etc/nsswitch.conf file.
D-22
Probable Causes
The following are the probable causes:
Network problem
Fault Insertion
Disable the entry for the ftp service in the /etc/inetd.conf file by
completing the following steps:
1.
2.
Edit the lines related to the ftp service to disable the ftp service:
Before edit:
ftp
stream tcp
/usr/sbin/in.ftpd
nowait root
in.ftpd
After edit:
#ftp
stream tcp
/usr/sbin/in.ftpd
3.
nowait
in.ftpd
root
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-23
Possible Fixes
Complete the following steps:
1.
2.
Learning
Learn about the files that are essential to provide network services. In
addition, learn how to restrict network services by editing the appropriate
files.
D-24
Probable Causes
The following are the probable causes:
Fault Insertion
Remove access permissions for the users of the /tmp directory by typing
the following command:
# init 0
At the ok prompt, type the following:
ok boot -s
# chmod 1700 /tmp
The /tmp directory provides read and write permissions to all users by
default. A number of commands generate temporary files during
execution. Any command that creates temporary files fails because of
modified access rights.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-25
Possible Fixes
Reset the permissions on the /tmp directory. The sys user and group must
own the sys user and group and have access rights of 1777 with the sticky
bit enabled. Type the following command to reset access permissions on
the /tmp directory:
# init 0
ok boot -s
# chmod 1777 /tmp
Note The sticky bit ensures that only the owner of a file can delete or
modify the files in the /tmp directory.
Learning
You can locate the SunSolve documents related to the preceding fault
because a relevant bug exists in an earlier release of the Solaris OE.
D-26
Probable Causes
The following are the probable causes:
Network problem
Fault Insertion
Complete the following steps:
1.
2.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-27
Possible Fixes
To fix the fault, restore the correct IP address in the /etc/hosts file.
Learning
Learn about the files in which you must check for configuration errors
when network operations are faulty.
D-28
Probable Causes
The probable cause is that the user corrupted or accidentally deleted a
device file or installed the device file incorrectly.
Fault Insertion
Corrupt the /devices/pseudo/conskbd@0:kbd device file, and then
reboot the system.
To modify the /devices/pseudo/conskbd@0:kbd file, type the following
commands:
# cd /devices/pseudo
# mv conskbd@0:kbd conskbd@0:kbd.old
# ln pts2l@ttyrf conskbd@0:kbd
The /devices/pseudo/conskbd@0:kbd file is the device file for the CDE
environment. If you move or corrupt the file, the system fails to start the
CDE environment.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-29
Possible Fixes
Compare the faulty system with a functional system, especially the
directory trees in the /dev and /devices directories. A reconfiguration
reboot fixes the problem. However, students must try to determine the
corrupt file.
Restore the correct device file by typing the following command:
# devfsadm -C
Learning
Determine which device files are required for proper console operation,
including the CDE and OpenWindows environments.
D-30
Probable Cause
The probable cause is that the administrator accidentally deleted the
startup file for the desktop environment.
Fault Insertion
Type the dtconfig command with the -d option to disable the daemon.
This disables the S99dtlogin script in the /etc/rc2.d directory.
2.
Possible Fixes
Complete the following steps:
1.
Either copy the S99dtlogin file from another system, or type the
following command to enable the daemon:
# /usr/dt/bin/dtconfig -e
2.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-31
Learning
Learn about CDE configuration and administration, including the default
settings and the command interface.
D-32
Probable Causes
The following are the probable causes:
Fault Insertion
Edit the /etc/passwd file, and modify the name of the root login.
Complete the following steps:
1.
2.
3.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-33
Possible Fixes
Complete the following steps:
1.
2.
Type the fsck command to repair the root file system by typing the
following command:
# fsck /dev/dsk/c0t0d0s0
where /dev/dsk/c0t0d0s0 is the root file system.
3.
Mount the root file system onto the /a directory by typing the
following command:
# mount /dev/dsk/c0t0d0s0 /a
4.
5.
6.
Learning
The Solaris OE cannot run without a valid root account.
D-34
Probable Causes
The following are the probable causes:
Fault Insertion
Complete the following steps:
1.
2.
3.
Possible Fixes
Verify the network hardware connections, and check the network files to
ensure that you specify correct hosts and IP addresses.
In this exercise, remove the tape from pin 1, 2, and 3 of the RJ connector.
Learning
Use diagnostic checks to determine the cause of the problem.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-35
Probable Causes
The following are the probable causes:
Defective cables
Fault Insertion
Complete the following steps:
1.
2.
3.
4.
D-36
Possible Fixes
Restore the account information for the lp account in the /etc/passwd
file.
Learning
The lp account in the /etc/passwd file is necessary to run a network
printer successfully.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-37
Probable Cause
The probable cause is that the system administrator accidently deleted the
entry for the system name in the /etc/hosts file.
Fault Insertion
Remove or modify the host name of the system in the /etc/hosts file.
Complete the following steps to edit the /etc/hosts file:
1.
2.
localhost
forward loghost
hammer
After edit
l27.0.0.l
localhost
l29.l50.28.39
forward loghost
129.150.182.68 hammer11
# touch -am 01121234 *
3.
D-38
Possible Fixes
Restore the correct settings in the /etc/hosts file or replace the correct
/etc/hosts file by typing the following command:
# cp /var/tmp/.hosts /etc/hosts
# init 6
Learning
Learn about system files for network operations. Use fault analysis
techniques to isolate problems.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-39
Probable Cause
The following are the probable causes:
corrupt rc scripts
Fault Insertion
Change the rcS script to point to the /fstab directory instead of the
/vfstab directory. The vfstap directory mounts the root account in
read-only mode.
Note Set the TERM parameter. The TERM parameter enables you to edit
files by using the vi editor.
To corrupt the rcS script, complete the following steps:
1.
2.
D-40
Edit the /etc/fstab file to make the root file system read-only:
# vi /etc/fstab
Before edit:
/dev/dsk/c0t0d0s0 /dev/rdsk/c0t0d0s0 / ufs 1 no After edit:
/dev/dsk/c0t0d0s0 /dev/rdsk/c0t0d0s0 / ufs 1 no ro
Possible Fixes
1.
To repair the preceding fault, you first boot the system in single-user
mode from the CD-ROM, and then repair the rc scripts by using the
following commands:
ok boot cdrom -s
ok fsck /dev/dsk/c0t0d0s0
ok mount /dev/dsk/c0t0d0s0 /a
2.
Learning
Insert echoes in the rc scripts to determine the source of the problem. This
is similar to the concept of single-stepping through the rc script
execution.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-41
Probable Cause
The probable cause is that the user corrupted or accidentally deleted the
device file or installed the device file incorrectly.
Fault Insertion
Corrupt the /devices/pseudo/consms@0:mouse device file, and reboot
the system.
To modify the /devices/pseudo/consms@0:mouse file, type the
following commands:
# cd /devices/pseudo
# mv consms@0:mouse consms@0:mouse.old
# touch consms@0:mouse
D-42
Possible Fixes
The /devices/pseudo/consms@0:mouse file is the device file that you
use to activate the mouse. If you move or corrupt the file, the system fails
to start the CDE environment.
Restore the correct device file by typing the following command:
# devfsadm -C
Compare the faulty system with a functional system, especially the
directory trees in the /dev and /devices directories. A reconfiguration
reboot fixes the problem. However, students must try to determine the
location of the corrupt file.
Learning
Determine the device files required for console operation, including the
CDE and OpenWindows environments.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-43
Probable Cause
The probable cause is that the system administrator modified the network
files incorrectly.
Fault Insertion
Corrupt the /etc/nsswitch.conf file to address the wrong services.
Complete the following steps:
1.
2.
3.
4.
D-44
Possible Fixes
Select the correct name services and related files, and restore the modified
files. Restart the system.
Learning
Learn the types of problems that occur if you specify the wrong services
in the nsswitch.conf file.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-45
Probable Causes
The following are the probable causes:
Hardware error
Fault Insertion
Complete the following steps:
1.
2.
D-46
Possible Fixes
In this exercise, locate and remove the K99.dtdown file, and restart the
system.
Learning
Familiarize yourself with the ifconfig command and the rc scripts.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-47
Probable Cause
The probable cause is that the system administrator inadvertently
renamed a file.
Fault Insertion
Rename the /var/sadm/system/admin/INST_RELEASE file as
/var/sadm/system/admin/inst_release.
Possible Fixes
Rename the inst_release file to INST_RELEASE.
D-48
Learning
Use the truss command in the applications for which you have no prior
knowledge.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-49
Probable Causes
The probable cause is corrupt boot block, boot file (/ufsboot), or kernel
(/kernel/unix).
Fault Insertion
Corrupt the boot block or boot file, or move the file to a different location.
Type the following commands to move the boot file to another location:
# mkdir /saved
# mv /platform/`uname -i`/ufsboot /saved
# reboot
Possible Fixes
Provide an alternative boot block to students for booting the system.
To boot the system in single-user mode from the CD-ROM, type the
following commands:
# fsck /dev/dsk/c0t0d0s0
# mount /dev/dsk/c0t0d0s0 /a
# cd /platform/`uname -i`
D-50
2.
3.
4.
Learning
Learn about the files related to the boot sequence and how to restore the
files using a CD-ROM.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-51
Probable Causes
The following are the probable causes:
Corrupt kernel
Operator error
Fault Insertion
Modify the /etc/system file.
Complete the following steps to corrupt the /etc/system file:
1.
2.
3.
D-52
Possible Fixes
Complete the following steps:
1.
2.
When the system prompts for the system file during the boot
sequence, type the following:
/var/tmp/.system
3.
Learning
Learn about the /etc/system file. As a system administrator, you can
modify kernel parameters during the boot sequence in the /etc/system
file.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-53
Probable Causes
The following are the probable causes:
Network problem
Fault Insertion
Complete the following steps:
1.
2.
3.
D-54
Possible Fixes
Complete the following steps:
1.
2.
Learning
Learn about the files that you must check for configuration errors when
network operations are faulty.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-55
Error Symptoms/Conditions/Messages
None recorded.
Fault Insertion
In this exercise, the students insert the fault.
Probable Cause
The following are the probable causes:
D-56
Resource shortage
Diagnostic Steps
Use the following procedure to determine the reason why your system
hangs.
1.
2.
3.
4.
5.
6.
7.
After the system reboots, log in, and type the following command in
a shell window:
# cd <default dump directory>
8.
Note The variable n increments each time a system saves a crash dump.
9.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-57
Expected Fix
A workaround solution is to not run the program guilty_party until
CPU resources are available. Determine if this process usually consumes
so much of CPU time, or run the guilty_party program as a timesharing
process to see if the problem still occurs or is the problem a function of
real-time scheduling.
Learning
The students learn how to determine the cause of a hung system.
Note The preceding steps generate a system crash dump that varies
with systems.
D-58
Probable Cause
The following are the probable causes:
A Trojan Horse
A cron job
A faulty rc script
A faulty at script
Fault Insertion
Start an at process that calls the init 5, halt, or reboot command. An
email message is sent with an indication of the problem.
1.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-59
Possible Fix
The at -l command shows the executing at scripts. After you locate the
script, you can read and remove the execution script. You must examine
the rc scripts.
Use the following commands to repair the fault:
ok boot -s (remain in single-user mode)
# cd /bin
Remove the tst file.
# rm /bin/tst
# reboot
Learning
Learn to trace rc scripts, the cron, and at (at -l) commands. In
addition, read the email message sent to the root user. (This is an often
overlooked source of information because each at job sends an email
message to the root user.)
D-60
Customer call #1
The customer complains that the Time-of-Day-Clock checksum value
is destroyed during the process of power cycling the system. The
message Fatal Error Reset and, sometimes, the wrong year is
displayed. The customer has an Ultra 3000 workstation running the
Solaris 7 OE.
Customer call #2
The customer notices problems with the at and cron utilities on the
Solaris 2.6 OE. Audit records are not properly generated, and the
date 2/29/2000, in particular, causes errors with the at utility.
Probable Cause
The probable cause is the need for a flash PROM update, patch number
103346-08 and 103346-02 or patch numbers 105393-07 and 105621-04 or
above.
Possible Fix
Locate the patch and bug report information, and insist that the customer
install the relevant patches.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-61
Learning
Use available resources to diagnose problems efficiently for which
solutions already exist.
D-62
Probable Causes
The probable cause is a corrupt file system.
Fault Insertion
Complete the following steps to insert the fault:
1.
2.
3.
2.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-63
Possible Fixes
1.
2.
Boot the system from the CD-ROM, and restore the superblock.
Learning
Locate and use an alternative superblock by using the newfs -N command
with the fsck command.
D-64
Probable Causes
The following are the probable causes:
Fault Insertion
Add an entry for the /home directory as an /auto_home mount in the
/etc/auto_master file. To edit the /etc/auto_master file, complete the
following steps:
1.
Open the /etc/auto_master file, and add an entry for the /home
directory:
# vi auto_master
/home auto_home
2.
Possible Fixes
Remove the entry for the /home directory from the /etc/auto_master
file, and restart the daemon.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-65
Learning
Learn the types of problems that are generated by the automount entries.
D-66
Probable Causes
The following are the probable causes:
Fault Insertion
Use the route -f command to flush the routing table. Try connecting to
other systems in some other network.
Possible Fixes
Check for the default route entry by using the netstat -r command.
If the entry is not present, add the route by using the
route add default <ipaddress> command.
Create a file, /etc/defaultrouter, if it does not exist, and add the IP
address of the system.
Learning
Learn about the files that you must check for configuration errors when
network operations are faulty.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-67
Probable Causes
The following are the probable causes:
Operator error
File corruption
Fault Insertion
Ensure that the /.dtprofile file of the user exists within the root
directory. Modify this file to insert the fault.
If any student is working in the OpenWindows environment rather than
in CDE, make a similar modification in the /.profile or /.login file,
depending on the shell.
To edit the /.dtprofile file of the user, complete the following steps:
1.
D-68
2.
Edit the /.dtprofile file to append the exit command in the file.
3.
Possible Fixes
Log in as the root user, and edit the /.dtprofile file of the user to
remove the exit command. Alternatively, you can use the command-line
login to edit the /.dtprofile file.
If the fault occurs for the root user, you must boot the system from the
CD-ROM and edit the file.
Learning
Learn about the files that affect the login sequence, and reinforce the
procedure for examining and fixing problems.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-69
Fault Insertion
The students learn to analyze a system crash dump generated by
Fault # 27. This workshop is slightly different than the others in Appendix
D because here you use the mdb utility to examine the offending address
and thread that caused the system to panic.
Use the mdb utility to achieve the following:
Identify the address of the thread that was running at the time of the
panic
Identify the name and arguments of the process that were running at
the time of the panic
Diagnostic Steps
Use the following procedure for determining the reason for your hung
system.
1.
2.
D-70
>
$r
%g0 = 0x0000000000000000
%l0 = 0x0000000001400000 cpu0
%g1 = 0x000000000103931c prom_enter_mon+0x2c %l1 = 0x000000000142a2c8 cpu
%g2 = 0x0000000000000000
%l2 = 0x000000000140c000
%g3 = 0x0000000000000001
%l3 = 0x0000000000000001
%g4 = 0x00000000014ade58
keyindex_s4 %l4 = 0x0000000000000016
%g5 = 0x0000000000007000
%l5 = 0x000000000000001e
%g6 = 0x0000000000000000
%l6 = 0x0000000000000016
%g7 = 0x000002a10007dd40
%l7 = 0x0000000000000000
%o0 = 0x0000000001000000
scb %i0 = 0x00000000f0066d2c
%o1 = 0x0000000000000016
%i1 = 0x000002a10007d6e8
%o2 = 0x00000000f0000000
%i2 = 0x00000000f0066d2c
%o3 = 0x0000000000000000
%i3 = 0x0000000000000006
%o4 = 0x000000000142ac00
cp_list_head %i4 = 0x0000000000000000
%o5 = 0x0000000001437800
p0+0x8c8 %i5 = 0x000003000004f270
%o6 = 0x000002a10007cd81
%i6 = 0x000002a10007ce31
%o7 = 0x00000000010077cc client_handler+0x2c %i7 = 0x000000000103931c
prom_enter_mon+0x2c
%ccr = 0x88 xcc=Nzvc icc=Nzvc
%fprs = 0x00 fef=0 du=0 dl=0
%asi = 0x00
%y = 0x0000000000000000
%pc = 0x00000000f0050c7c
%npc = 0x00000000f0050c80
%sp = 0x000002a10007cd81 unbiased=0x000002a10007d580
%fp = 0x000002a10007ce31
%tick = 0x0000000000000000
%tba = 0x0000000000000000
%tt = 0x17f
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-71
From the displayed registers, use the %pc (the program counter)
value to display the instruction that caused the system to fail.
5.
> ::status
debugging crash dump vmcore.3 (64-bit) from mako
operating system: 5.9 Generic (sun4u)
panic message: sync initiated
dump content: kernel pages only
>
6.
> ::ps
S
PID
PPID
PGID
SID
UID
FLAGS
R
0
0
0
0
0 0x00000019
T
t0 <TS_STOPPED>
L
lwp0 ID: 1
R
3
0
0
0
0 0x00020019
T
0x300005737c0 <TS_RUN>
L
0x300005714a8 ID: 1
R
2
0
0
0
0 0x00020019
T
0x30000573a60 <TS_SLEEP>
L
0x30000571818 ID: 1
R
1
0
0
0
0 0x00004008
T
0x30000573d00 <TS_SLEEP>
L
0x30000571b88 ID: 1
R
439
1
412
412
0 0x00014008
guilty_party
T
0x30000cead20 <TS_ONPROC>
L
0x30000dc1190 ID: 1
R
426
1
426
426
0 0x10010008
T
0x30000aacfc0 <TS_RUN>
L
0x30000a8f4e0 ID: 1
R
424
1
424
424
25 0x10010008
T
0x30000dfd7c0 <TS_SLEEP>
L
0x30000dfb508 ID: 1
..........<Output truncated>
D-72
ADDR NAME
0000000001436f38 sched
0000030000576008 fsflush
0000030000576a20 pageout
0000030000577438 init
0000030000ce4a98
0000030000aa9468 sendmail
0000030000ce4080 sendmail
> 0x30000cead20$<thread
0x30000cead20: link
stk
startpc
0
2a10055daf0
0
0x30000cead38: bound_cpu
affinitycnt
bind_cpu
0
0
-1
0x30000cead44: flag
proc_flag
schedflag
1000
4
3
0x30000cead4a: preempt preempt_lk
state
0
0
4
0x30000cead50: pri
epri
100
0
0x30000cead58:
pc
sp
1007254
2a10055d2f1
0x30000cead68: wchan0
wchan
sobj_ops
0
0
0
0x30000cead80: cid
clfuncs
cldata
4
1480d08
30000e3e640
0x30000cead98: ctx
lofault
onfault
0
0
0
0x30000ceadb0: ontrap
swap
lock
0
2a10055a000
ff
0x30000ceadc2: pil
pi_lock cpu
0
0
1400000
0x30000ceadd0: lpl
intr
did
142cff0
0
1258
0x30000ceadf0: tnf_tpdp
tid
waitfor
30000d08050
1
-1
0x30000ceae00: sigqueue
sig
hold
0
0
2000000000000
0x30000ceae18: forw
back
thlink
30000cead20
30000cead20
0
0x30000ceae30: lwp
procp
audit_data
30000dc1190
30000ce4a98
0
0x30000ceae48: next
prev
trace
30000dfcd40
30000ceb260
0
.....<Output truncated>
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-73
Use the address under the procp field with the proc2u macro to
view the command name and arguments that caused the panic.
Note The address under the procp field is the address of the proc
structure.
For example:
> 30000ce4a98$<proc2u
auxv
30000ce4dd0
0x30000ce4f00: start.tv_sec
start.tv_nsec
3cda4a8d
28d191b4
0x30000ce4dc8: execsw
ticks
140e228
86b1
0x30000ce4f29: psargs /bin/csh -f
/tmp/guilty_party\0\0\0\0\0\0\0\0\0\0\0\0\0\
0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
0\0\0\0\0\0
0x30000ce4f18: comm
guilty_party\0\0\0\0\0
0x30000ce4f7c: argc
argv
envp
3
ffbff6e4
ffbff6f4
0x30000ce4f90: cdir
rdir
mem
30000a5eed8
0
23ef7
0x30000ce4fa8: cmask
acflag systrap
022
02
0
entrymask
30000ce4fb0
exitmask
30000ce4fd4
0x30000ce4ff8: signodefer
sigonstack
sigresethand
8000000000000001 0
8000000000000001
0x30000ce5010: sigrestart
2000000000000
..........<Output truncated>
The preceding information that you generated can be counterchecked by
displaying the message buffer during the panic.
For example:
> $<msgbuf
SunOS Release 5.9 Version Generic 64-bit
0x3000006d8a3: Copyright 1983-2002 Sun Microsystems, Inc.
reserved.
D-74
All rights
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-75
pseudo-device: vol0
vol0 is /pseudo/vol@0
sd0 at uata0: target 2 lun 0
sd0 is /pci@1f,0/pci@1,1/ide@3/sd@2,0
fd0 at ebus0: offset 14,3023f0
fd0 is /pci@1f,0/pci@1,1/ebus@1/fdthree@14,3023f0
se0 at ebus0: offset 14,400000
se0 is /pci@1f,0/pci@1,1/ebus@1/se@14,400000
pseudo-device: pm0
pm0 is /pseudo/pm@0
panic[cpu0]/thread=2a10007dd40:
sync initiated
0x300003f54a0: sched:
0x300003f5c20: software trap 0x7f
0x300004004e0: pid=0, pc=0xf0050c7c, sp=0x2a10007cd81,
tstate=0x8800001402, context=0x8c0
0x30000400c60: g1-g7: 103931c, 0, 1, 14ade58, 7000, 0, 2a10007dd40
0x300004013e0:
0x30000401b63: 00000000fffa9d00 unix:sync_handler+12c (fff9b840,
1000000, 1412d55, fffe0000, f003bda6, 1437800)
0x3000006c363:
%l0-3: 0000000000000001 000000000103931c
00000000f0000000 00000000fffe0000
%l4-7: 00000000f0050c28 00000000f006729c 00000000fffefd28
00000000fffeef98
0x3000006cae3: 00000000fffa9de0 unix:vx_handler+8c (fff9b840,
2a10007d6e8, f0066d2c, 6, 0, 3000004f270)
0x3000006d263:
%l0-3: 000000000102768c 0000000000000080
00000000014173b8 00000000f0000000
%l4-7: 0000000000000016 000000000000001e 0000000000000016
0000000000000000
0x3000006d9e3: 00000000fffa9e90 unix:callback_handler+20 (fff9b840,
fffde280, 0, 0, 0, 0)
0x300007de123:
%l0-3: 0000000000000016 00000000fffa9741
000000000004a238 00000000ffbff187
%l4-7: 0000000000000000 0000000000000000 0000000000000000
0000000000000000
0x30000e97d60:
0x30000e97ae3: syncing file systems...
0x30000e97863:
done
0x30000e975e3: dumping to /dev/dsk/c0t0d0s1, offset 107479040, content:
kernel
0x30000e97360: WARNING: timeout: reset bus chno = 0 targ = 0
D-76
Learning
The students learn to analyze a system dump. This familiarizes them with
the kernel structures that they must examine when analyzing a system
crash or a hung system.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-77
Probable Causes
The following are the probable causes:
Fault Insertion
Enter the user name in the /etc/ftpd/ftpusers file.
Note The ftpusers file contains the names of users who are not
authorized to use the ftp service.
Possible Fixes
The following are the possible fixes:
1.
2.
Learning
Learn about the files that you must check for configuration errors when
network operations are faulty.
D-78
Probable Causes
The following are the probable causes:
Bad procedure
Fault Insertion
Complete the following steps to share the CD-ROM in the wrong way:
1.
2.
3.
4.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-79
Possible Fix
Complete the following steps to share the CD-ROM properly:
1.
2.
3.
4.
Learning
Learn to properly share the CD-ROM files correctly.
D-80
Error Symptoms/Conditions/Messages
The pg command hangs.
Probable Cause
The probable cause is either a bad command or a bad device.
Fault Insertion
Change the major and minor numbers for the tty drivers.
Complete the following steps:
1.
su:x:0:1::/usr/su:/sbin/sh
guest1:x:12:10::/export/home/guest1:/bin/csh
b.
root:YX4pytcVVZF2k:9555::::::
su::9555::::::
guest1:oWU/elsH4pe6E:::::::
2.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-81
Note Do not reboot now. The students eventually reboot and then notice
that they cannot log in. You can then tell them about the su user and let
them figure out that the su user does not require a password.
Possible Fix
Fix tty in the /devices/pseudo file. Use the truss command for this
analysis.
Complete the following steps to fix the bug:
A reconfiguration boot fixes the problem. However, this should not
be accepted as a solution unless students locate the fault and
associated file specifically.
1.
2.
Learning
What appears to be a minor problem can actually be something quite
disastrous. The truss command is useful on both the passwd and pg
commands.
D-82
Error Symptoms/Conditions/Messages
None recorded.
Probable Cause
The following are the probable causes:
File corruption
Fault Insertion
Complete the following steps to insert the fault:
1.
Ensure that the .dtlogin file exists within the root directory.
2.
Edit the /.dtlogin file using the vi editor by adding a line at the
end of the file that invokes the exit command. There are many
comments in the file. Scroll to the bottom of the file, and type the
following on one line:
exit
3.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-83
Possible Fix
Press Control-C to stop the login script from executing. Then, boot from
the CD-ROM to edit the correct files.
You cannot log in as the root user, therefore, you must boot the faulty
system from the CD-ROM or from an available server. Mount the root
partition in this environment, and edit the .dtlogin file in the root
directory by removing the line that invokes the exit command.
Learning
Students become acquainted with files that affect the login sequence, and
reinforce the procedure for examining and fixing problems by booting the
CD-ROM environment.
D-84
Error Symptoms/Conditions/Messages
Systems cannot talk to system C by the system name. System C cannot
talk to other systems on the same subnet.
Probable Cause
The probable cause is an oversight by the system administrator.
Fault Insertion
Same fault fundamentally as Fault 18.
Edit the /etc/hostname.hme0 file to refer to an incorrect host.
Complete the following steps on system C:
1.
2.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-85
4.
Possible Fix
Fix the /etc/hosts or /etc/hostname.hme0 file.
Learning
Isolate problems using fault analysis techniques.
D-86
Error Symptoms/Conditions/Messages
could not grant slave pty
Probable Cause
The probable cause is that the file permissions or ownership of the
/usr/lib/pt_chmod file are set incorrectly.
Fault Insertion
Complete the following steps to insert the fault:
1.
2.
Possible Fix
Edit the /usr/lib/pt_chmod file to reflect correct file ownership and
group permissions.
For example:
# chown root:bin /usr/lib/pt_chmod
The following should be the file permission:
# ls -la /usr/lib/pt_chmod ---s--x--x 1 root bin
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-87
Learning
To know about files and their correct ownership required for
pseudo-terminals.
D-88
Probable Cause
The probable cause is that the file system contains many small files,
exceeding the limit for inodes (file information nodes).
Fault Insertion
Complete the following steps to insert the fault:
1.
3.
Mount the new constructed file system on the /test mount point.
# mount -F ufs<raw file partition> /test
4.
Copy some files from another file system to the /test directory.
# cp /usr/bin/* /test
You will not be able to store more than 192 files on this file system.
5.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-89
Possible Fix
To repair this fault, reconstruct the file system with the newfs -i
command to increase the inode density, and restore the file system from
the backup.
Learning
To know about different parameters and their importance to construct a
new file system.
D-90
Probable Cause
The following are the probable causes:
Fault Insertion
Complete the following steps to insert the fault:
1.
2.
Use the vi editor to edit the /etc/dfs/dfstab file, and add the
following entries:
share -F nfs -o rw -d test NFS /test1
share -F nfs rw -d test NFS /test2
3.
4.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-91
Possible Fix
To repair this fault, mount the file system on different mount points.
Learning
Learn how to use the NFS shares.
D-92
Probable Cause
The following are the probable causes:
Fault Insertion
To insert the fault, kill the inetd daemon.
For example:
# pkill -9 inetd
Possible Fix
To repair this fault, reboot the system, or use the following command to
restart the inetd daemon:
# /usr/sbin/inetd -s
Learning
Learn about the importance of the inetd daemon.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-93
Probable Cause
The probable cause is that the dtlogin program could not find or create
files and directories to initiate the CDE for the user.
Fault Insertion
Complete the following steps:
1.
2.
Possible Fix
To repair this fault, create the users home directory with the proper
rights.
Learning
Learn about different files, their location, and importance required for
proper working of the users CDE environment.
D-94
Probable Cause
The probable cause is that the Master Internet services daemon inetd
could not locate the TCP service specified after the first colon.
Fault Insertion
Complete the following steps to insert the fault:
1.
2.
Possible Fix
To repair this fault, edit the /etc/nsswitch.conf file to correct the
settings.
Learning
Learn to ensure proper settings in the /etc/nsswitch.conf file.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-95
Probable Cause
The probable cause is that the init program is missing or corrupted.
Fault Insertion
Complete the following steps to insert the fault:
1.
Use the following command to remove the symbolic link from the
/etc/init file:
# unlink /etc/init
2.
Possible Fix
To repair this fault, complete the following steps:
1.
2.
3.
4.
D-96
6.
7.
8.
Learning
Learn about the importance of the init program, and know more about
the system initialization files. The /etc/init file is a symbolic link to
/sbin/init file.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-97
Probable Causes
The probable cause is a corrupt boot block, boot file (/ufsboot), or kernel
(/kernel/unix).
Fault Insertion
1.
2.
Possible Fixes
Provide an alternative boot block to students for booting the system.
To repair this fault, complete the following steps:
1.
2.
3.
D-98
5.
6.
Learning
Learn about the files related to the boot sequence and how to restore a
corrupt boot block.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-99
Probable Cause
The following are the probable causes:
Invalid login shell is substituted in the entry for the login ID in the
/etc/passwd file
Fault Insertion
To insert the fault, change the permission in the /usr/sbin/in.rlogin
file.
For example:
# chmod 444 /usr/sbin/in.rlogin
Possible Fix
To repair this fault, check the permission of the in.rlogind daemon, and
set it as the default permission.
For example:
# chmod 555 /usr/sbin/in.rlogin
D-100
Learning
Learn about the in.rlogind daemon and the significance of the
permissions specified in the /usr/sbin/in.rlogin file.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-101
Probable Cause
The following are the probable causes:
The fsck command of the /usr file system in preen mode failed
Fault Insertion
1.
Use the dd command to corrupt the boot block of the /usr file
system:
For example:
# dd if = /dev/dsk/c0t0d0s7 of = /dev/dsk/c0t0d0s6
count=31
D-102
Possible Fixes
Provide an alternative boot block to students for booting the system.
To repair this fault, complete the following steps:
1.
2.
3.
4.
Learning
Learn how to restore a corrupt file system and the importance of the
/usr/sbin/fsck utility.
Workshop Exercises
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
D-103
Fault Insertion
Probable Cause
Possible Fix
Learning
D-104
Glossary/Acronyms
A
Admintool
A system administration utility with a graphical interface that enables
administrators to maintain system database files, printers, serial ports,
user accounts, and hosts.
B
backup
A copy of file system data that is stored separately from the disk drive
on which the data resides.
baud rate
The unit in which the signalling rate of a communication channel is
measured. In addition, baud rate is the measure of the speed at which
the communication channel can transmit and receive information.
boot
The boot command is used to start the system kernel or a standalone
program.
boot block
A 15-sector disk block that contains information used to boot a system.
Block numbers point to the location of the ufsboot program on the
disk. The boot block directly follows the disk label.
C
checksum
A number that is calculated from the binary bytes of the file. You can
use the checksum to determine if the file contents have changed.
Glossary-1
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
console
The console device is the main input/output device that is used to
access a system and display all system messages. A console can either be
a display monitor and keyboard or a shell window.
coreadm command
The coreadm command specifies the name or location of core files
produced during a core dump.
Cumulative residence-length product
The total time for which a process remains in the queue during its
complete life cycle.
D
debugger command (dcmd)
A debugger command or dcmd (pronounced dee-command) is a routine
in the debugger that can access any of the properties of the current
target. The mdb utility parses commands from standard input and
executes the corresponding dcmds. Each dcmd can also accept a list of
string or numerical arguments.
debugger module (dmod)
A debugger module or dmod (pronounced dee-mod) is a dynamically
loaded library containing a set of dcmds and walkers. During
initialization, the mdb utility attempts to load dmods corresponding to the
load objects present in the target. You can subsequently load or unload
dmods at any time while running the mdb utility.
devfsadm
The devfsadm command configures devices and updates the /dev and
/devices directories.
device
A hardware component or a physical device, such as a printer or disk
drive that act as a unit to perform a specific function.
device driver
A program that the kernel uses to communicate with devices.
directory
A location for files and other directories. The Solaris OE file system or
directory structure enables you to create files and directories that can be
accessed through a hierarchy of directories.
Glossary-2
domain name
The name assigned to a group of systems on a local network that share
administrative files. The domain name is required for the network
information service database to work properly.
dumpadm
The dumpadm utility manages the configuration of the crash dump
facility on the Solaris OE.
dumping
Dumping is the process of copying files and directories for offline
storage.
E
encryption
Encryption is used to protect account passwords, data, and other pieces
of information. When a password is encrypted, it appears as a series of
numerals and uppercase and lowercase letters unrelated to the actual
password. This means that not even the superuser can read the
password; only the system can read the special code.
error checking and correcting (ECC)
ECC logic is used by memory chips and processing units for correction
of single-bit errors and detection of double-bit and multiple-bit errors.
ECC logic uses a part of the system memory to store parity information.
With full parity memory, a memory error alert is sent and the system
halts. With no parity memory, in case of an error, the system experiences
random results, such as system crashes and data corruption. However,
in case of minor memory errors, ECC handles the error without causing
any damage to the system.
Ethernet
A local area network (LAN) that employs a bus topology in which all
the workstations are connected to a single physical medium. Ethernet is
a broadcast network, which means that all of the workstations on the
network receive all transmissions.
Ethernet address
The physical address of an individual Ethernet controller board. It is
called the hardware address or media access control (MAC) address.
The Ethernet address of every Sun workstation is unique and coded into
a chip on the motherboard.
Glossary/Acronyms
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
Glossary-3
F
firmware
Firmware includes programs that are permanently installed in a chip.
The programmable read-only memory (PROM) and non-volatile
random access memory (NVRAM) chips are examples of firmware.
fsck
The fsck command checks the integrity of a file system and repairs any
damage found.
ftp command
The ftp command is used to transfer files to and from a remote network
site using the File Transfer Protocol (FTP) service.
G
group
A group identifies the users associated with a file. A user group is a set
of users who have access to a common set of files. User groups are
defined in the /etc/group file and are granted the same sets of
permissions.
H
heuristic
Heuristic is the process of describing an approach to learning by trying
rather than by following some pre-established formula or organized
hypothesis. A heuristic program is a mathematical program, consisting
of a complex set of functions.
host
A computer system in a network computing environment.
host name
A unique name identifying a host machine connected to a network. The
host name must be unique on the network.
I
inetd
The inetd server daemon listens to service requests and executes the
server program associated with the service.
Glossary-4
K
kernel
The master program of the Solaris OE. It manages devices, memory,
swap, processes, and daemons. The kernel also controls the functions
between system programs and the system hardware.
kernel STREAM
A kernel mechanism that supports development of network services
and data communications drivers. STREAMS define interface standards
for character input/output within the kernel and between the kernel
and user level. The STREAMS mechanism includes integral functions,
utility routines, kernel facilities, and a set of structures.
L
login
The login is used to sign on to the system. A login consists of a login ID
or user name and a valid password.
M
man page
Manual pages or man pages are online references that are available as
part of the Solaris OE.
multiuser
A feature of the Solaris OE that enables more than one user to access the
same system resources.
N
network
A connection between machines that enables an exchange of
information between the machines. Two main types of networks are
local area networks (LANs) and wide area networks (WANs).
O
OEM
An original equipment manufacturer (OEM) is a supplier who builds
parts for systems.
Glossary/Acronyms
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
Glossary-5
P
partition
A logical subdivision of a physical disk drive that is treated as an
individual device. A partition consists of a range of physical disk
cylinders. Partitions are defined in the disk label. Partitions can contain
file systems or can be treated as raw devices, such as swap.
patch
A collection of files and directories that replace or update existing files
and directories that prevent the proper execution of software. The main
purpose of patches is to correct application bugs and provide product
enhancements.
peripheral device
A piece of hardware, such as a mouse or a printer, that performs a
specific function and is connected to a workstation.
port
A pathway used to connect computers. A port can be made up of both
hardware, such as pins and connectors and software, such as a device
driver. Types of ports include serial, parallel, small computer system
interface (SCSI), network, and Ethernet.
Power-On Self-Test (POST)
A series of diagnostic checks to check the system hardware. POST is
invoked each time the system is powered on.
Programmable read-only memory (PROM)
A chip containing permanent, nonvolatile memory and a limited set of
commands used to test the system and start the boot process.
Q
quad card
A quad card is a card having four Ethernet ports plugged into the
motherboard.
R
remote host
A system other than the local system on which the user is working.
residence time
The time a process is in queue at any particular instance
Glossary-6
rlogin
A service that enables users of one system to connect to other systems
across the intranet as if they were connected directly.
root
The user name of the superuser account. The superuser is a privileged
user with complete system access. The terms superuser and root can be
used interchangeably.
run control (rc) script
A script that is executed during system initialization and when
changing run levels. Commands executed by the run control scripts
determine which file systems are mounted, which daemon processes are
running, and other environment configuration.
run level
One of the eight initialization states in which a system runs. A system
can run in only one initialization state at a time. The default run level for
each system is specified in the /etc/inittab file.
S
SBus
A proprietary bus system used in most Sun operating systems.
serial port
A serial port is used to transfer data one bit at a time. It is usually an RS232 port, but 25-pin connector and 9-pin connectors are also used.
single-user
A feature of the Solaris OE that ensures that the system runs minimal
processes and services and regular users cannot log in. The single-user
mode is often referred to as the maintenance mode. You require the root
password to switch to single-user mode on a system.
Small Computer System Interface (SCSI)
A high-speed interface that can connect to computer devices, such as
hard drives, CD-ROM drives, diskette drive, tape drives, scanners, and
printers.
superblock
A block on the disk that contains information about a file system, such
as its name and size in blocks. Each file system has its own superblock.
A block is also defined as space on a physical hard disk where you can
write a unit of data.
Glossary/Acronyms
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
Glossary-7
superuser
The superuser is a privileged user with total system access. For example,
only the superuser can change the password file and edit system
administration files in the /etc directory. The user name for the
superuser account is root.
T
telnet
A service that enables users of one system to connect to other systems
across the intranet as if they were connected directly.
typescript file
A file that is used to record user action during a session. It is a form of
log generation that records user activities during a session.
U
UNIX file system (ufs)
The default disk-based file system for the Sun OS.
V
vfstab
The configuration file for the file systems that defines which file systems
are mounted at the boot time.
W
walker
A set of routines that describe how to walk or iterate through the
elements of a particular program data structure. A walker encapsulates
the implementation of a data structure from dcmds and the mdb utility.
You can use walkers interactively or use them to build other dcmds or
walkers.
Glossary-8
Index
/var/sadm/install/admin
directory 5-17
boot commands 4-30
mdb and adb utilities
relationship 8-9
mdb utility
examining system
dumps 8-14
limitations 8-8
Symbols
.bss section 5-30
.data section 5-30
.enet-addr command 2-20
.locals command 7-9, 7-20
.properties command 4-11
.registers command 7-9, 7-20
.speed command 2-20
.version command 2-19
/ 5-35, 6-20
/dev/cua directory 5-6
/dev/dsk directory 5-6
/dev/kmem file 5-52
/dev/rdsk directory 5-6
/dev/rmt directory 5-6
/dev/term directory 5-6
/devices directory 4-4
/etc directory 1-8
/etc/coreadm.conf file 5-57
/etc/init.d/sysetup
script 7-14
/etc/minor_perm file 5-6
/etc/rc2.d/S50devfadm
script 5-4
/etc/system file 4-21
/etc/vfstab file 7-4, 7-11
/kernel directory 4-22
/sbin/ifconfig
command 5-35
/sbin/init process 4-22
/sbin/rc2 boot script 5-4
/usr/kernel directory 4-22
/usr/local/man directory 6-6
/usr/sbin/ directory 4-19
/usr/share/man directory 6-4
/usr/ucb/ps command 5-23
/var/adm/messages file 7-7
/var/adm/messages log 1-7
/var/sadm/install/contents
file 5-15
/var/spool/pkg directory 5-16
A
About This Course xvii
adb utility 8-4
Address Resolution Protocol
(ARP) 5-37
admin file 5-17
ALP (Assembly Language
Programming) 5-30
American Standard Code for
Information Interchange
(ASCII) 2-9
Index-1
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
B
bad traps 7-6
banner command 2-19
Basic layers and error types in Sun
systems
identifying 1-17
Berkeley Software Distribution
(BSD) 5-23
BigAdmin portal 6-13
BigAdmin services 6-13
boot command 4-17
Boot Programs phase 4-19
Boot PROM
description 2-5
features 2-6
phase 4-18
boot sequence 4-17
bootblk program 4-18
boot-device variable 3-9
BSD (Berkeley Software
Distribution) 5-23
bus errors 1-20
Index-2
C
cat command 5-19
catman -w option 6-7
causes of system panics 7-4
cd command 4-10
choosing the test methodology
factual approach 1-13
realistic approach 1-13
result-oriented approach 1-13
CMOS (complementary metal-oxide
semiconductor) 2-9
cmp command 5-21
collecting error messages 1-7
collection documents 6-11
commands
.enet-addr 2-20
.locals 7-9, 7-20
.properties 4-11
.registers 7-9, 7-20
.speed 2-20
.version 2-19
/sbin/ifconfig 5-35
/usr/ucb/ps 5-23
arp 5-37
banner 2-19
boot 4-17
cat 5-19
cd 4-10
cmp 5-21
coreadm 5-56, 7-16
ctrace 7-20
dev 4-10
devalias 4-13
devfsadm 5-4
device-end 4-10
devlinks 5-6
diff 5-21
disks 5-6
dmesg 7-7
drvconfig 5-6
dumpadm 7-12
eeprom 2-8, 2-14
file 5-45
find 5-43
format 5-7
fsck 5-8
fstyp 5-11
ifconfig 5-35
installboot 4-19
iostat 5-11
ls 4-12
modinfo 5-29
mpstat 5-27
netstat 5-39
nm 5-52
nvalias 4-12, 4-16
nvunalias 4-12, 4-16
pgrep 5-31
ping 5-32
pkgadd 5-16
pkgchk 5-14
pkginfo 5-15
pkgrm 5-17
ports 5-6
printenv 2-11
probe 2-16
probe-ide 2-16
probe-scsi 2-16
probe-scsi-all 2-16
prtconf 5-49
prtconf -vp 4-4
prtdiag 3-14
ps 5-23
psrinfo 5-27
reset-all 4-16
savecore 7-14
script 5-44
see 2-21
set-default 2-13
set-defaults 2-13
setenv 2-13
show-devs 4-14
show-disks 4-15
show-nets 4-15
show-post-results 3-22
showrev 5-48
sifting 2-20
snoop 5-42
stop-n 2-13
sum 5-22
swap 5-53
sysdef 5-51
tail 5-45
tapes 5-6
test 2-17, B-4
test floppy 2-17
test net 2-17
test-all 2-17
tip 3-10
traceroute 5-33
truss 5-55
uname 5-46
vi 5-18
vmstat 5-25
watch 2-18
watch-clock 2-18
watch-net 2-18
watch-net-all 2-18
whatis 6-8
words 4-11
common OBP variables 2-10
comparison results 1-8
complementary metal-oxide
semiconductor (CMOS) 2-9
configuring and executing Explorer 6-21
controlled comparisons 1-8
Copying 7-16
coreadm command 5-56, 7-16
course goals xvii
Course Map xviii
CPU and memory management
commands 5-23
CPU watchdog reset 1-21
crash utility 8-7
ctrace command 7-20
custom device aliases 4-12
D
Data Communication Equipment
(DCE) 3-10
Data Terminal Equipment (DTE) 3-10
DCE (Data Communication
Equipment) 3-10
dcmds 8-8
debugger commands (dcmds) 8-8
debugger modules (dmods) 8-8
Index
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
Index-3
E
eeprom A-2
eeprom command 2-14
eeprom Command on a Sun4u Enterprise
Server A-2
ELF object file 5-52
enable extended POST diagnostics 3-5
error checking and correcting (ECC) 1-18
errors in a boot sequence 4-25
ex editor 5-18
Index-4
F
factual approach 1-13
Failed Field Replaceable Units
(FRUs) 3-14
fault analysis and diagnosis
methodology 1-1
fault diagnosis methodology 1-11
FIFO (first-in first-out) 5-45
file command 5-45
file-checking commands 5-18
find command 5-43
first-in first-out (FIFO) 5-45
forceload module 4-21
format command 5-7
formulating hypotheses 1-12
FPROM jumper J2003 2-8
FPROM Upgrades 2-8
FRUs (Failed Field Replaceable
Units) 3-14
fsck command 5-8
fstyp command 5-11
G
general-purpose commands 5-43
generating system crash dump 7-10
genuix file 4-20
global core file path 7-17
group file 1-8
H
hardwire argument 3-11
header files 8-11
I
ICMP (Internet Control Message
Protocol) 5-32
Icons xxii
IDE (Integrated Drive Electronics) 2-16
Identifying 4-26, 7-20
identifying error-reporting
mechanisms 1-20
bus errors 1-20
Interrupts 1-20
Watchdog Resets 1-21
identifying magnitude of the fault 1-9
identifying patch support tools 6-17
Checksum file 6-17
Patch Check 6-17
PatchPro 6-17
Recommended and Security
Patches 6-17
Solaris Patches 6-17
Sun Alert Patch Report 6-17
identifying the basic layers and error types
in Sun systems 1-17
IEEE (Institute of Electrical and Electronics
Engineers) 2-4
ifconfig command 5-35
IGMP (Internet Group Management
Protocol 5-41
impacts of the methodology chosen 1-13
in.ftpd daemon 1-9
information sources 1-6
init phase 4-23
installboot command 4-19
Installing Explorer 6-21
Institute of Electrical and Electronics
Engineers (IEEE) 2-4
Instruction Unit (IU) 7-6
Integrated Drive Electronics (IDE) 2-16
Internet Control Message Protocol
(ICMP) 5-32
Internet Group Management Protocol
(IGMP) 5-41
Internet Protocol version 4 (IPv4)
protocol 5-34
Internet Protocol version 6 (IPv6)
protocol 5-34
Interrupts 1-20
Introducing 3-4
introducing OBP components, features,
and diagnostics 2-1
introducing system panics 7-6
introduction to types of faults in Sun
systems 1-18
critical errors 1-19
fatal errors 1-19
hardware errors 1-18
software errors 1-18
system panics 1-19
iostat command 5-11
IPv4 (Internet Protocol version 4)
protocol 5-34
IPv6 (Internet Protocol version 6)
protocol 5-34
ISCDA script 8-14
IU (Instruction Unit) 7-6
K
kadb utility 8-4
Kernel Initialization phase 4-21
kernel STREAM 8-6
L
latest security bulletin 6-12
listing facts about the problem 1-6
local-mac-address? variable 2-12
ls command 4-12
M
macro file 8-10
man -k option 6-6
man -l option 6-5
man -M option 6-6
man -s option 6-5
man.cf file 6-4
MANPATH variable 6-4
manual OBP diagnostic commands
preparing 2-15
using 2-15
Index
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
Index-5
mdb 8-4
mdb utility
command formats 8-9
features 8-6
identifying register references 8-12
Macros 8-10
Registers 8-10
misc/obpsym kernel module 7-21
moddir module 4-21
modinfo command 5-29
modules 4-21
mounting 5-9
mpstat command 5-27
multicast 5-38
N
netstat command 5-39
Network File System (NFS) 4-26
network management commands 5-32
NFS (Network File System) 4-26
nm command 5-52
Nonvolatile random access memory
(NVRAM) 2-4
nvalias command 4-12, 4-16
NVRAM 2-9
NVRAM (Nonvolatile random access
memory) 2-4
NVRAM chip 4-12
nvramrc variable 2-9, 4-12
nvunalias command 4-12, 4-16
O
object file
.txt section 5-30
OBP
components 2-1
diagnostics 2-1
features 2-1
OBP Device tree
examining 4-9
introducing 4-4
navigating 4-9
Index-6
P
panic() kernel function 1-19
panic() system call 7-6
passwd file 1-8
Patch Finder 6-17
patchadd command 1-9
PatchDiag cross-reference file 6-19
PatchDiag tool
description 6-11
installation 6-19
sample report A-4
patchdiag.xref file 6-19
patchdiag_setup script 6-20
Patches 6-10
patchk.pl script 6-19
path_to_inst database 5-4
PCI (peripheral component
interconnect) 2-6
PCI probing 4-8
pcia-probe-list variable 2-10
pcib-probe-list variable 2-10
performing search operations in the
SunSolve Online service 6-14
performing Solaris OE diagnostics 5-1
peripheral component interconnect
(PCI) 2-6
per-process core file 5-56, 7-17
pgrep command 5-31
Phases in the boot process
boot programs phase 4-17
boot PROM phase 4-17
init phase 4-17
kernel initialization phase 4-17
ping command 5-32
pkgadd command 5-16
pkgchk command 5-14
pkginfo command 5-15
pkgmap file 5-15
R
realistic approach 1-13
Registers 8-12
repairing a corrupt superblock 5-10
reset-all command 4-16
restoring the bootblk or ufsboot
programs 4-29
result-oriented approach 1-13
reviewing Explorer output 6-22
rootdev module 4-21
rootfs module 4-21
run level 4-26
S
savecore command 7-14, 7-16
sbus-probe-list variable 2-10
script command 5-44
SCSI (small computer system
interface) 2-16
Security bulletin archive 6-12
security bulletin archive 6-12
security information
latest security bulletin 6-12
Security Pretty Good Privacy (PGP)
key 6-12
security t-patches 6-12
security-mode variable 2-10
see command 2-21
set module 4-21
set-default command 2-13
set-defaults command 2-13
setenv command 2-13
setting up a tip connection 3-12
show-devs command 4-14
show-disks command 4-15
show-nets command 4-15
show-post-results command 3-22
showrev command 5-48
sifting command 2-20
single-user mode 4-30
small computer system interface
(SCSI) 2-16
snoop command 5-42
software package management
commands 5-14
SSP (System Service Processor) 3-14
stating the problem 1-5
Stop-D key sequence 3-7
stop-n command 2-13
sum command 5-22
Sun alert notifications 6-12
Sun Explorer Data Collector utility 6-11,
6-21
Sun Validation Test Suite (SunVTS) 6-11
SunSolve Online database documents 6-11
SunSolve Online service 6-9
swap command 5-53
sysdef command 5-51
Index
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
Index-7
T
tail command 5-45
tapes command 5-6
tar.Z package 6-19
TCP (Transmission Control
Protocol) 5-41
test command 2-17
test floppy command 2-17
test floppy option B-4
test -memory option B-4
test net command 2-17
test net option B-5
test-all command 2-17
test-all option B-4
testing the hypothesis 1-13
Time of Day (TOD) 2-9
tip command 3-10
TOD (Time of Day) 2-9
Topics Not Covered xix
tpe-link-test? variable 2-10
traceroute command 5-33
Transmission Control Protocol
(TCP) 5-41
troubleshooting scripts using shell
options 4-28
truss command 5-55
types of system failures 7-1
typescript file 5-44
Typographical Conventions xxiii
U
ufsboot program 4-20
Ultra 5 and Ultra 10 architecture B-7
Ultra Port Architecture (UPA) 4-8
uname command 5-46
Index-8
V
variables
auto-boot? 2-10, 4-32
boot-device 3-9
diag-device 2-10, 3-7
diag-level 3-6
diag-switch 3-5
diag-switch? 2-10
local-mac-address? 2-12
MANPATH 6-4
nvramrc 2-9, 4-12
pcia-probe-list 2-10
pcib-probe-list 2-10
sbus-probe-list 2-10
security-mode 2-10
tpe-link-test? 2-10
use-nvramrc? 2-9
watchdog-reboot? 2-10
vi command 5-18
Viewing 3-10
viewing extended diagnostics during
POST 3-10
viewing the current patch report 6-18
vmstat command 5-25
W
walkers 8-8
watch B-6
watch command 2-18, B-6
watch-clock command 2-18, B-6
Index
Copyright 2002 Sun Microsystems, Inc. All Rights Reserved. Enterprise Services, Revision E
Index-9