You are on page 1of 308

Core Dump Analysis

ST-370

Student Guide With Instructor Notes

Sun Educational Services


MS UMIL07-14
2550 Garcia Avenue
Mountain View, CA 94043
U.S.A.

Part Number 802-6058-02


Revision B, March 1996
 1996 Sun Microsystems, Inc. All rights reserved.
2550 Garcia Avenue, Mountain View, California 94043-1100 U.S.A.
This product and related documentation are protected by copyright and distributed under licenses restricting its use,
copying, distribution, and decompilation. No part of this product or related documentation may be reproduced in any
form by any means without prior written authorization of Sun and its licensors, if any.
Portions of this product may be derived from the UNIX® and Berkeley 4.3 BSD systems, licensed from UNIX System
Laboratories, Inc., a wholly owned subsidiary of Novell, Inc., and the University of California, respectively. Third-party
font software in this product is protected by copyright and licensed from Sun’s font suppliers.
RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the United States Government is subject to the
restrictions set forth in DFARS 252.227-7013 (c)(1)(ii) and FAR 52.227-19.
The product described in this manual may be protected by one or more U.S. patents, foreign patents, or pending
applications.
TRADEMARKS
Sun, the Sun logo, Sun Microsystems, SunSoft, the SunSoft logo, Solaris, SunOS, OpenWindows, DeskSet, ONC, ONC+,
NFS and Java are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and certain other countries.
UNIX is a registered trademark in the United States and other countries, exclusively licensed through X/Open Company,
Ltd. OPEN LOOK is a registered trademark of Novell, Inc. PostScript and Display PostScript are trademarks of Adobe
Systems, Inc.All other product names mentioned herein are the trademarks of their respective owners.
All SPARC trademarks, including the SCD Compliant Logo, are trademarks or registered trademarks of SPARC
International, Inc. SPARCstation, SPARCserver, SPARCengine, SPARCstorage, SPARCware, SPARCcenter, SPARCcluster,
SPARCdesign, SPARC811, SPARCprinter, UltraSPARC, microSPARC, SPARCworks, and SPARCompiler are licensed
exclusively to Sun Microsystems, Inc. Products bearing SPARC trademarks are based upon an architecture developed by
Sun Microsystems, Inc.
The OPEN LOOK® and Sun™ Graphical User Interfaces were developed by Sun Microsystems, Inc. for its users and
licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or
graphical user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical
User Interface, which license also covers Sun’s licensees who implement OPEN LOOK GUIs and otherwise comply with
Sun’s written license agreements.
X Window System is a trademark of the X Consortium.
THIS PUBLICATION IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE, OR NON-INFRINGEMENT.
THIS PUBLICATION COULD INCLUDE TECHNICAL INACCURACIES OR TYPOGRAPHICAL ERRORS. CHANGES
ARE PERIODICALLY ADDED TO THE INFORMATION HEREIN; THESE CHANGES WILL BE INCORPORATED IN
NEW EDITIONS OF THE PUBLICATION. SUN MICROSYSTEMS, INC. MAY MAKE IMPROVEMENTS AND/OR
CHANGES IN THE PRODUCT(S) AND/OR THE PROGRAM(S) DESCRIBED IN THIS PUBLICATION AT ANY TIME.

Please
Recycle
Contents

About the Course ....................................................................................... vii


Course Prerequisites........................................................................ viii
Course Objectives................................................................................ ix
Course Content..................................................................................... x
Typographical Conventions and Symbols ...................................... xi
System Crashes and Hangs ......................................................................1-1
System Crashes.................................................................................. 1-2
Cause of Panics.................................................................................. 1-4
Panic Sequence .................................................................................. 1-6
Core Dumps....................................................................................... 1-8
savecore.......................................................................................... 1-10
Use of savecore.............................................................................. 1-12
Hangs ................................................................................................ 1-14
Forcing Core Dumps ...................................................................... 1-16
System Information ........................................................................ 1-18
Managing Core Dumps.................................................................. 1-20
Lab Exercises for Module 1............................................................ 1-22
Initial Analysis ...........................................................................................2-1
Basic Information .............................................................................. 2-2
Basic Utilities ..................................................................................... 2-4
Crash ................................................................................................... 2-6
Lab Exercises for Module 2.............................................................. 2-8
adb Basics.....................................................................................................3-1
Debuggers .......................................................................................... 3-2
adb Requirements ............................................................................. 3-4
Running adb ...................................................................................... 3-6
kadb..................................................................................................... 3-8
adb Commands ............................................................................... 3-10
adb Command Examples............................................................... 3-12
adb Expressions............................................................................... 3-14
Data Formats.................................................................................... 3-16

iii
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
Display Format ................................................................................ 3-18
Locations and Sizes......................................................................... 3-20
Variables........................................................................................... 3-22
Additional Commands................................................................... 3-24
Process Control................................................................................ 3-26
Basic adb Analysis .......................................................................... 3-28
Lab Exercises for Module 3............................................................ 3-30
Header Files.................................................................................................4-1
Use and Location............................................................................... 4-2
Lab Exercises for Module 4.............................................................. 4-4
Symbol Tables ............................................................................................5-1
Symbol Tables.................................................................................... 5-2
Kernel Name List .............................................................................. 5-4
Example for nm .................................................................................. 5-6
nm Output ........................................................................................... 5-8
Lab Exercises for Module 5............................................................ 5-10
adb Macros (Part I) .....................................................................................6-1
Availability of adb Macros .............................................................. 6-2
Invoking adb Macros........................................................................ 6-4
Review of adb Commands............................................................... 6-6
Example Macro.................................................................................. 6-8
Lab Exercises for Module 6............................................................ 6-10
adb Macros (Part II) ...................................................................................7-1
The Message Buffer........................................................................... 7-2
The msgbuf Macro ............................................................................ 7-4
Counts Used in msgbuf.................................................................... 7-6
The msgbuf.wrap Macro ................................................................ 7-8
Flow Control .................................................................................... 7-10
adbgen .............................................................................................. 7-12
adbgen in Use................................................................................. 7-14
Including Standard Macros ........................................................... 7-16
Lab Exercises for Module 7............................................................ 7-18
SPARC Assembler .....................................................................................8-1
Assembler Characteristics................................................................ 8-2
CISC and RISC................................................................................... 8-4
Basic SPARC Characteristics ........................................................... 8-6
SPARC Instructions .......................................................................... 8-8
SPARC Registers ............................................................................. 8-10
Register Windows........................................................................... 8-12
SPARC Instruction Types .............................................................. 8-14
Synthetic Instructions ..................................................................... 8-16
SPARC Transfer of Control ........................................................... 8-18
Sources of Problems........................................................................ 8-20

iv Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
Lab Exercises for Module 8............................................................ 8-22
Stacks............................................................................................................9-1
Use of Stacks ...................................................................................... 9-2
A Generic Stack in Use ..................................................................... 9-4
SPARC Stack Frames ........................................................................ 9-6
SPARC Stack Frames ........................................................................ 9-7
SPARC Registers and Argument Passing ..................................... 9-8
Saving and Restoring...................................................................... 9-10
Window Overflows and Underflows........................................... 9-12
SPARC Argument-Passing Example............................................ 9-14
SPARC Frame Pointer and Stack Pointer .................................... 9-16
Registers From adb ......................................................................... 9-18
Passing More Than Six Arguments .............................................. 9-20
Lab Exercises for Module 9............................................................ 9-22
Kernel Internals........................................................................................10-1
Kernel Mode .................................................................................... 10-2
Programs and Processes................................................................. 10-4
Threads ............................................................................................. 10-6
Scheduling........................................................................................ 10-8
Virtual Memory............................................................................. 10-10
Kernel Data Structures ................................................................. 10-12
Lab Exercises for Module 10........................................................ 10-14
Device Drivers ..........................................................................................11-1
Use of Device Drivers..................................................................... 11-2
Loadable Drivers............................................................................. 11-4
Driver Functions ............................................................................. 11-6
Installing Drivers ............................................................................ 11-8
Drivers and Crashes ..................................................................... 11-10
Lab Exercises for Module 11........................................................ 11-12
STREAMS..................................................................................................12-1
The STREAMS Facility ................................................................... 12-2
STREAMS Queues .......................................................................... 12-4
Lab Exercises for Module 12.......................................................... 12-6
Traps ...........................................................................................................13-1
Trap Usage ....................................................................................... 13-2
Trap Types ....................................................................................... 13-4
Trap Sequence ................................................................................. 13-6
Trap Base Register........................................................................... 13-8
Trap Handling ............................................................................... 13-10
Trap Types ..................................................................................... 13-12
Interrupts........................................................................................ 13-14
SPARC Interrupts ......................................................................... 13-16
Interrupt Tracebacks..................................................................... 13-18

Contents v
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
Lab Exercises for Module 13........................................................ 13-20
Watchdog Resets ......................................................................................14-1
Watchdog Reset Cause and Response ......................................... 14-2
OpenBoot PROM Commands ....................................................... 14-4
obpsym .............................................................................................. 14-6
Sun-4d Systems ............................................................................... 14-8
Lab Exercises for Module 14........................................................ 14-10
Locks...........................................................................................................15-1
Multithreading ................................................................................ 15-2
Race Conditions............................................................................... 15-4
Critical Sections ............................................................................... 15-6
Mutex Locks..................................................................................... 15-8
The ldstub Instruction ............................................................... 15-10
Semaphores.................................................................................... 15-12
Readers Writer Locks ................................................................... 15-14
Condition Variables ...................................................................... 15-16
Waiters............................................................................................ 15-18
Lab Exercises for Module 15........................................................ 15-20
Additional Exercises.................................................................................A-1
The crash Utility ...................................................................................... B-1
The crash Utility ............................................................................. B-2
crash Commands ............................................................................ B-3
Example Use of crash..................................................................... B-4

vi Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
About the Course

Overview
This course discusses the methods used to determine the causes of
system panics and hangs. The course is applicable to SPARC™
machines running the Solaris™ 2.5 operating environment.

The audience for this course includes system administrators who are
responsible for maintaining Solaris machines.

Extensive cross references are made to the SunSoft™ Press book, Panic!
by Chris Drake and Kimberley Brown, published by Prentice Hall,
ISBN 0-13-149386-8.

vii
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
Course Prerequisites

To fully succeed in this course, you should be able to:

● Use the OpenWindows™ graphical user interface.

● Administer Solaris machines.

● Develop applications in a structured language (preferably C).

● Decipher internal data structures.


✓ The skills of the attendees at any class may affect the running order of this class. In
particular, it may be helpful to present some material relating to kernel internals earlier
than scheduled.

viii Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
Course Objectives

Upon completion of this course, you will be able to:

● Set up savecore to capture kernel core dumps.

● Gather and interpret kernel core dumps to a probable cause using


adb and kadb macros.

● Create adb macros to aid and simplify kernel core dump analysis.

● Follow an organized approach in analyzing kernel core dumps.

About the Course ix


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
Course Content

● System Crashes and Hangs

● Initial Analysis

● adb Basics

● Header Files

● Symbol Tables

● adb Macros (Part I)

● adb Macros (Part II)

● SPARC Assembler

● Stacks

● Kernel Internals

● Device Drivers

● STREAMS

● Traps

● Watchdog Resets

● Locks

x Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
Typographical Conventions and Symbols

The following table describes the type changes and symbols used in
this book.

Typeface or
Meaning Example
Symbol

AaBbCc123 The names of commands, Edit your .login file.


files, and directories; on- Use ls -a to list all files.
screen computer output system% You have mail.

AaBbCc123 What you type, system% su


contrasted with Password:
on-screen computer
output
AaBbCc123 Command-line To delete a file, type rm
placeholder—replace filename.
with a real name or value
AaBbCc123 Book titles, new words or Read Chapter 6 in User’s
terms, or words to be Guide. These are called
emphasized class options.
You must be root to do this.
Code samples are included in boxes and may display the following:
C shell prompt system%
Superuser prompt, C system#
shell
Bourne and Korn shell $
prompt
Superuser prompt, #
Bourne and Korn shells

About the Course xi


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
Instructor Notes

Lab Setup
This course has been written specifically for the Solaris 2.5
environment running on SPARC workstations.

The workstations must have full installations of the operating system


with a large amount of free space in /var for crash dumps. The actual
amount of space varies and can be reduced by removing unwanted
dumps, but a /var partition of 100 Mbytes for a 32 MByte memory
workstation is an approximate requirement. The /var partition must
be on a local disk, not an NFS™ file system.

Lab files must be installed on each machine, ideally on local UFS


partitions. The lab files include C source code, which are compiled
during the course, so a C compiler must be available.

At the top level of the Labs directory there is a file sourceMe which
can be run with . sourceMe to set the paths, if required.

The course involves crashing the operating system, which can result in
corrupt file systems. The ability to rapidly reinstall is a wise
precaution. A spare preinstalled root partition can be very useful.

xii Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
System Crashes and Hangs 1

Objectives
Upon completion of this module, you will be able to:

● Describe how and when system crash files are generated.

● Enable your system to save core files created from system panics
using savecore.

● Identify core sizes with respect to memory size and usage.

● Describe what is in memory and what the kernel is.

● Force your system to panic and create a core file.

● Describe system and program hangs.

Note – This module presents material covered in Chapters 1 to 5 of


Panic!.

1-1
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

System Crashes

System Panics
● Controlled abort of system activities

Watchdog Resets
● Automatic detection of loss of control

Dropping Out
● Sudden loss of control

1-2 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

System Crashes

System Panics
A panic is induced from kernel routines when some unexpected and
inexplicable event or state is detected. To minimize the chance of
further errors, the kernel saves its state and aborts. After a panic, the
machine usually attempts to reboot, and can save information to
enable core dump analysis.

Watchdog Resets
Watchdog resets occur when the kernel, having disabling traps, should
be immune from further interruption . Should a trap occur while in
this state, the kernel cannot handle it and so stops. The kernel state is
not saved, but the CPU registers hold some information that may
assist in identifying the cause. Watchdog resets are explored later in
this course.

Dropping Out
Occasionally, catastrophic failures will cause the kernel to just stop.
Fortunately, these conditions are rare.

System Crashes and Hangs 1-3


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

Cause of Panics

Integrity Safeguards
● Response to unexpected conditions

● Prevention of further damage

● panic() called from kernel routines

Bad Traps
● System trap at inappropriate time

1-4 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

Cause of Panics

Integrity Safeguards
Panics only happen when some unexpected conditions are detected.
Any error may be symptomatic of a larger-scale problem, and one
error may propagate and cause widespread data corruption. The
surest way of preventing further damage is to stop processing. Since
the kernel is running, code can be executed to save the state of the
CPU and memory for later analysis.

The Solaris 2 operating system offers kernel programmers two


methods of causing panics. Although there exists a kernel routine
named panic, the preferred method is to call cmn_err(9F) with
CE_PANIC as the specified level of error. Both routines ultimately call
the kernel code to complete the panic.

It is not possible to cause a panic from user code. The panic and
cmn_err routines can only be called from within the kernel.

Bad Traps
Traps occur to handle hardware interrupts, system calls, and memory
accesses. Most traps are safe, normal traps. Some traps are bad; they
are normal traps that occur at inappropriate times.

Bad traps are sometimes the kernel equivalent of a segmentation


violation in a user program. Both terminate execution and can
generate core dumps. Bad traps are somewhat more serious.

A common bad trap is a data fault from within the kernel, but there
are more, which are explored later in the course.

When a trap does occur at a bad time, the kernel automatically invokes
panic.
✓ ASSERT(9F) may also cause a panic if the assertion fails. ASSERT only works if DEBUG is
defined, which is unlikely in production code. Panics may turn up due to drivers having
been compiled with DEBUG defined.

System Crashes and Hangs 1-5


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

Panic Sequence

Panic Messages
● Display cause and details of panic

● Are dependant on the kernel programmer

Stack Traceback
● Lists addresses and parameters of kernel routines leading to panic

● Lists most recent first

● Does not list user routines

Dump
● Lists use of memory

● Lists total amount dumped

● Lists destination of dump

Reboot
● Automatic attempt at recovery

1-6 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

Panic Sequence

Panic Messages
The panic message is the text string supplied by the kernel
programmer to the panic or cmn_err routine. Some panic messages
are lengthy, and some are much shorter. The first stage of a panic is to
briefly identify what has happened.

Stack Traceback
Following the panic message, the kernel displays the values found on
the current stack, indicating which routines have been called to lead to
this panic. At this stage the kernel cannot inspect a symbol table, so it
cannot give symbolic names for the routines called.

Dump
Following the stack traceback, the kernel syncs the disks to save any
cached file-system information and then saves the memory
information. The number of pages of memory used for each purpose is
shown. The kernel then states the destination of the dump, in terms of
the vnode pointer, which identifies a particular device.

Reboot
Once the dump is committed to disk, the system attempts to reboot. If
the panic was due to a serious failure, the reboot may fail or may
result in another panic. The system may panic and reboot repeatedly.

Usually the system reboots normally, not failing unless the same
conditions are encountered.

System Crashes and Hangs 1-7


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

Core Dumps

Memory Contents
● Kernel text

● Kernel data

● State of the system

● State of processes

● Processes text and data

Core Contents
● Interesting chunks

● Not all of memory

● Kernel state

● Current process state

Dump Device
● Usually swap

1-8 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

Core Dumps

Memory Contents
At any time, memory holds all of the active kernel, active processes,
and cached data from file systems. Similar to user programs, the
kernel has routines stored in text segments and data stored in static
and dynamically allocated memory. Part of the data is used to describe
individual processes, while some is used to describe the system as a
whole.

Core Contents
As described in /usr/include/sys/dumphdr.h, crash dumps contain
only interesting chunks. The panic routine does not save the entire
contents of memory. Some data is old, and some is unrelated to the
panic.

Dump Device
The panic routine saves the core dump to the end of the dumpfile,
which is usually the primary swap partition.

The dump is saved at the end of the dumpfile to maximize the length
of time the dump will be valid once the system reboots. The dump is
written to the dump device with a header and a duplicate at the end of
the dump to enable checking for a valid dump.

System Crashes and Hangs 1-9


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

savecore

Purpose
● Save core image

● Copies from dump device to file system

● Creates vmcore.n and unix.n

Location of Files
● Configured at boot

● Usually found under /var/crash/hostname

● Directory must exist

Size of Files
● No more than memory size

● Frequently less

● Dependant on interesting chunks

1-10 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

savecore

Purpose
The savecore command searches the dump device for a core
generated by a panic. If a dump with intact headers is found, the data
is copied to two files within the specified directory. Subsequent panics
followed by savecore can save to the same directory, since each is
numbered according the sequence number held in the bounds file.

The file vmcore.n is the images of memory, and unix.n is a copy of the
kernel text at the time of panic.

The savecore command can be run successfully from the command


line at any time if there is still a core on the dump device that has not
been overwritten.

Location of Files
Code within sysetup invokes savecore, specifying the directory to
which the core is saved. The default location is /var/crash/hostname,
but this can be changed to any existing directory on a local partition
with enough capacity.

Size of Files
The size of the vmcore file generated is dependent on the system
activity at the time. If the problem is due to a kernel memory leak,
then the core file is likely to approach memory size. If the panic
occurred during or soon after the boot up sequence, before much
system activity it is likely to be much smaller.

Note – Some SPARC machines (such as the SPARCstation-1) produce


core files with holes, that appear to be much larger than they actually
are. Copying such files will fill the holes, requiring more disk space for
the copy than the original.

System Crashes and Hangs 1-11


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

Use of savecore

Enabling savecore
/etc/init.d/sysetup

##
## Default is to not do a savecore
##
if [ ! -d /var/crash/`uname -n` ]
then mkdir -m 0700 -p /var/crash/`uname -n`
fi
echo 'checking for crash dump...\c '
savecore /var/crash/`uname -n`
echo ''

minfree
● Ascii file in core directory

● Specifies number of kilobytes to keep free

Reasons for Failure


● No hard dump device

● Dump device too small

● savecore not enabled

● minfree threshold

● Corrupt data on dump device

● Unsuccessful panic

● System has not paniced

1-12 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

Use of savecore

Enabling savecore
By default, savecore is not usually run. The code to invoke savecore
is within /etc/rc2.d/S20sysetup (actually a link to
/etc/init.d/sysetup). To save cores automatically, this code must
be uncommented before the crash.

Before saving the core, savecore attempts to read the file minfree in
the target directory. The minfree file specifies the minimum number
of kilobytes that must remain free on the file system. If there is less
space available, the core will not be saved.

System Crashes and Hangs 1-13


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

Hangs

Types of Hangs
● Program hangs

● Terminal hangs

● System hangs

● System slowdown

Causes of Hangs
● Deadlocks

● Resource shortage

● Hardware problems

No Panic
● Must force a core dump

1-14 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

Hangs

Types of Hang
A system hang is a subjective impression. When a user reports the
system is hung, it could be that the application being used has hung,
the terminal has hung, the system has hung, or that the user has
become impatient with the speed of the system.

The severity of the hang can be monitored by trying to access the


system in different ways. If a terminal has hung, check to see if other
terminals are working. Also check if telnet can reach the system in
question. Sometimes ping will obtain a response whereas other
methods may not.

Depending on the system type, it may be possible to identify the cause


of the hang by observing external hardware (or even listening to
internal disks on desktop systems).

Often a system hang will be the end result of severe system


slowdowns, whereas on other occasions the system may hang without
warning.

Causes of Hangs
Hangs can be caused due to overloading, errors, or bad configuration.
Deadlocks occur when two or more processes are waiting for each
other to free a resource before they continue. Resource shortages can
occur in poorly tuned systems or in buggy code where resources are
not freed when finished with. Of course, hardware problems can
manifest themselves in many ways, including hanging the system.

No Panic
If the system has hung, it is unlikely you will see anything describing
why it has hung. To determine the cause, you must generate a core
dump file for analysis.

System Crashes and Hangs 1-15


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

Forcing Core Dumps

Enter PROM Monitor


● Abort sequence

● L1-a

● STOP-a

● break

● Stops all execution

● Operating system is frozen in memory

● Can be resumed

● go

● c

Force Panic
● Execute illegal instructions

● sync

● g0

● Synchronize disks

● Important to attempt

● Always performed at panic

1-16 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

Forcing Core Dumps

Enter PROM monitor


At any time, the PROM monitor program can be entered by typing the
abort sequence. With a Sun™ keyboard the default abort sequence is
to hold down the Stop key and the A key simultaneously. Older Sun
keyboards labeled the Stop key L1. L1-a is commonly used to refer to
typing Stop and A at the same time.

On systems without a Sun keyboard, by default, the abort key is the


Break key.

Note that the abort key sequence can be remapped.

In some situations, the system may not respond to the first press of the
abort sequence, so extra presses may be required. At other times, the
first abort will be recognized, but only when there are more keystrokes
after the abort sequence.

When the abort sequence is recognized, the operating system is frozen


so that no processes will execute. The operating system can be
resumed from exactly where it was stopped by typing go (or c if the
PROM prompt is >).

Force Panic
If a core dump is required, the system can be made to panic by
executing an illegal instruction. If you type sync (or g0 at the >
prompt), the system will attempt to execute where it may not, and so
panic.

The sync command is sometimes used to synchronize disks even if a


core dump is not required, because an early stage of the panic
sequence is to synchronize the disks.

System Crashes and Hangs 1-17


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

System Information

System History
● Date and time installed

● Date and time last booted

● Date and time last configured

● Previous problems

● /var/adm/messages

System Configuration
● Hardware details

● Additional devices

● Operating system version

● Patches applied

● Additional device drivers

● System tuning

System Activity
● Usual workload

● Workload at time of crash

● Applications in use

Anything Else?
● As much relevant information as possible

1-18 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

System Information

To assist in the analysis process, save as much information relating the


system and the state of the machine at the time of the crash as
possible. It is usually advisable to attempt to list all the information as
early as possible. If you expect to send the core files to a help desk for
analysis, backup information is invaluable to those who do not know
the system.

System Crashes and Hangs 1-19


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

Managing Core Dumps

Disk Requirements
● Compress core files

● Save to tape

● Disable savecore

Identifying Core Files


● Keep good records

1-20 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

Managing Core Dumps

Disk Requirements
Core dumps can consume a lot of disk space. Although core dumps
cannot be analyzed when compressed, you should compress and
archive them until that time.

Multiple core files can be useful to confirm analysis, but if the system
repeatedly panics with the same symptoms, it is usually unnecessary
to save every core file.

System Crashes and Hangs 1-21


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

Lab Exercises for Module 1

1. Configure your system so that kernel crash files will be saved.

a. Use prtconf to determine how much memory is configured


within your system. Estimate how large crash files will be.

b. Determine the inode number of /etc/init.d/sysetup, and


ensure that a hard link to the same inode exists in the
/etc/rc2.d directory.

c. If necessary, edit the sysetup file to ensure savecore will save


crash files to a directory on a local disk with sufficient space.

2. Run the whiz and byebye programs.

a. Change into the Lab1 directory, and ensure both whiz and
byebye exist as executable files.

b. Become superuser.

c. Run ./whiz, and observe the effect on your system.

The whiz program will seriously degrade the performance of


your system by locking down a large amount of private
memory and then looping through repeatedly.

d. Press Control-C to terminate whiz.

e. Run ./byebye, and observe the effect on your system.

It is likely that your system has hung. byebye starts whiz


running in the real-time scheduling class. It is unlikely you will
be able to terminate byebye using Control-C. Why?
✓ Because the windowing system is running in IA/TS class, and the STREAMS scheduler is
in SYS.

f. To free your system, abort the operating system by pressing


Stop-a.

Unfortunately, because keyboard input is handled by the


kernel STREAMS threads, multiple keystrokes will be required.
After the initial Stop-a, you can use any key.

1-22 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

Lab Exercises for Module 1

g. Once your system has entered the PROM monitor and echoed
ok, type sync to force a panic. Wait for the system to reboot,
and watch the console messages as the crash dump is saved.

h. Write down the relevant information relating to the crash


dump (as discussed in class).

3. Once the system has successfully rebooted, do the following to


create another set of core files, this time due to a kernel panic. This
is as described in Chapter 5 of Panic!.

# adb -kw /dev/ksyms /dev/mem


physmem 1e6d
rootdir/X
rootdir: 0fc103e40
rootdir/W 0
rootdir: 0fc103e40 = 0
$q

If your system does not panic immediately, use ls to list your root
directory. If your system still does not panic, ask your instructor
for assistance.

After your system reboots, note the information relating to this


panic.

4. Change into the directory containing the crash dumps


(/var/crash/hostname), and list the files found.

# cd /var/crash/hostname
# ls -ls

________________________________________________________

________________________________________________________

________________________________________________________

________________________________________________________

System Crashes and Hangs 1-23


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
1

Lab Exercises for Module 1

5. What is contained within the bounds file?

________________________________________________________

6. Compare the size of the vmcore files with the amount of memory
your machine has.

________________________________________________________

7. Compress the vmcore and unix files. How much disk space has
been retrieved?

________________________________________________________

1-24 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
Initial Analysis 2

Objectives
Upon completion of this module, you will be able to:

● Perform an initial analysis of a system crash dump file using basic


commands.

● List the basic commands that are useful in doing an initial analysis
on system crash dump files.

● Describe the extent of the output to be expected from using each of


the basic commands when performing an initial analysis of a
system crash dump file.

Note – This module presents material covered in Chapter 6 of Panic!.

2-1
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
2

Basic Information

Operating System Release

Hardware Architecture

Crash Messages

Kernel Statistics

2-2 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
2

Basic Information

Before starting to perform any crash dump analysis, it is useful to


collect as much preliminary information as possible.

Operating System Release


Knowing the OS release is essential. Core dumps must be analyzed
using the tools for particular releases. If an incorrect release is
assumed, analysis is likely to give incorrect results. Between each
release of the operating system, kernel algorithms and structures may
remain constant, change slightly, or change enormously.

Hardware Architecture
Different architectures require different kernel structures. Also,
analysis utilities such as adb require work on the basis of knowing the
kernel layout, which will be vary between different architectures.

Crash Messages
Crash messages are the initial indication as to what went wrong.

Kernel Statistics
Occasionally, unusual kernel statistics indicate a problem in a
particular area. System hangs are commonly due to overused
resources, such as kernel memory.

Initial Analysis 2-3


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
2

Basic Utilities

strings
● Most useful

● Identify system, OS release, and architecture

● Retrieve system messages

netstat
● Network statistics

ipcs
● Interprocess communication facilities status

2-4 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
2

Basic Utilities

strings
The core dump file contains an image of memory, including current
text messages. By running the strings command on the core file, the
system message buffer can be displayed, including boot and crash
messages. From these messages, the hardware and software can be
identified.

Note that the messages buffer is a ring buffer, meaning the messages
may not be in chronological order.

netstat

ipcs
The netstat and ipcs commands can be used to extract information
from the running system or from system crash dumps.

Note – Check the man pages carefully to determine which of the


options to these commands will work with core dump files.

Initial Analysis 2-5


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
2

Crash

Analyzing Crash Dumps


# crash -n unix.3 -d vmcore.3
dumpfile=vmcore.21,namelist=unix.21,outfile=stdout
>

Inspect Live Kernel


# crash -n /dev/ksyms -d /dev/mem
dumpfile=/dev/mem,namelist=/dev/ksyms,outfile=stdout
>

# crash
dumpfile=/dev/mem,namelist=/dev/ksyms,outfile=stdout
>

Built-in Help System


> help

> help kmastat

> ?

2-6 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
2

Crash

The crash program is designed to simplify the analysis of crash


dumps. The program can also be used to inspect the running system.

The crash program has an extensive set of functions, all of which have
brief help messages available to describe which options are available.

Many of the crash functions take options to specify that all entries or
just a specific entry of a particular type should be displayed.

This course does not use crash to analyze core files. Understanding
the material from this course and reading the crash manual page
should enable you to use crash successfully. An example crash
session is shown in Appendix B.

Initial Analysis 2-7


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
2

Lab Exercises for Module 2

1. Change into the /var/crash/hostname directory. If the crash files


that were created from clearing rootdir are still compressed,
uncompress the vmcore file and the associated unix file.

2. Run strings on the core file, and count the number of lines of
output. You should use wc for this.

# strings vmcore.1 | wc -l

_________________________________________________________

3. Run strings on the core file, and redirect the output into a
temporary file.

4. Use grep to search through the strings output for important


information:

SunOS ___________________________________________________

Patch ____________________________________________________

panic:____________________________________________________

_________________________________________________________

_________________________________________________________

5. Determine whether netstat and ipcs should work against kernel


core files according to the man pages. Then run each against the
live kernel and your crash file. If the same information is
displayed in both cases, devise a method of changing the live
information, and test again.

a. netstat

_________________________________________________________

_________________________________________________________

b. ipcs

_________________________________________________________

_________________________________________________________

2-8 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
2

Lab Exercises for Module 2

6. Read the man page description of the crash command. Start


crash against the running kernel, and list the processes.

# crash
dumpfile=/dev/mem,namelist=/dev/ksyms,outfile=stdout
> proc

7. In another window, run ps -le, and compare the output.

_________________________________________________________

_________________________________________________________

_________________________________________________________

8. Exit the crash session running on the current kernel, and start
crash against a core file. Investigate running different crash
commands including proc, defthread, user, and kmastat. Refer
to Appendix B for examples, and the man page for full details.

> quit
# crash vmcore.n unix.n
> kmastat

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

Initial Analysis 2-9


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
2

Lab Exercises for Module 2

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

_________________________________________________________

2-10 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
adb Basics 3

Objectives
Upon completion of this module, you will be able to:

● Explain how basic values are represented in adb.

● Perform the following functions from within an adb session:

● Convert a value from decimal to hexadecimal and vice versa.

● Evaluate expressions.

● Perform masking and shifting operations.

● Specify an expression for testing a particular bit.

● List the basic adb macros, and describe briefly how they are used.

● Execute a stack backtrace from within an adb session.

● Dereference a pointer from within an adb session.

Note – This module presents material covered in Chapters 7 to 9 of


Panic!.

3-1
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Debuggers

adb
● Absolute debugger

kadb
● Kernel resident absolute debugger

dbx and debugger


● User program debuggers

3-2 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Debuggers

adb
The adb utility is the absolute debugger. It is intended to work with
compiled code, and has no mechanism to use source code. The adb
utility is simple and straightforward, which are advantages when
attempting to analyze a core dump from a system crash. It is generally
accepted as the kernel debugger of choice.

kadb
The kadb program is the memory resident version of adb. adb is run as
a normal SunOS process. The kadb program is run before the
operating system, and itself loads the SunOS™ operating system.
Booting kadb before the operating system enables kadb to run when
the operating system itself stops (due to a Stop-a key sequence or a
panic).

dbx and debugger


The dbx and debugger utilities are two of many available user-level
debuggers designed to work with program source code. Most such
debuggers require the program to be compiled in such a way as to
enable debugging and are not capable of working with kernel virtual
memory.

adb Basics 3-3


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

adb Requirements

Same Kernel Architecture as Core File


# uname -m

Files
● /usr/bin/adb

● /usr/platform/platform-name/lib/adb

● /usr/lib/adb

Use of kas
● As supplied on Panic! CD-ROM.

3-4 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

adb Requirements

Same Kernel Architecture as Core File


adb analysis requires the same architecture and release of the
operating system as was in use when the core file was generated. See
Panic!, page 53 for example mismatches.

The platform of the core file, if not known, may be found using
strings on the core file. The current machine’s kernel architecture is
found using uname -m, and the platformis found using uname -i.

Files
The adb executable is stored within the /usr/bin directory. The
standard adb macros are located in /usr/lib/adb and
/usr/platform/platform-name/lib/adb.

By default, savecore leaves core files readable by anybody. If you are


conscious of potential security problems, then changing the
permissions on the core files will require running adb as the
appropriate user. Running adb on the live kernel always requires root
privileges (unless the permissions are changed on /dev/mem, which is
not recommended). Since Solaris 2.5, the sysetup script will create the
crash save directory with permissions 0700, but does not change the
permissions if the directory already exists.

Use of kas
Through careful manipulation of directories and files, it is possible to
analyze core files on architectures other than where they were created.
Be careful to avoid misinterpreting the results. The Panic! CD-ROM
includes a utility, kas, to assist in the process.

adb Basics 3-5


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Running adb

Object File
● Executable file

● Kernel or program

● Contains code and symbol table

Core File
● Memory

● Image of memory

User Program Core Files


$ adb executable corefile

$ adb – corefile

$ adb executable –

$ adb

Kernel Core Files


# adb -k unix.n vmcore.n

Live Systems
# adb -k /dev/ksyms /dev/mem

# adb -k

3-6 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Running adb

Object File
adb expects an object file to allow it to use text (code), and also to use
the symbol table embedded in (unstripped) executables. If an
executable has been stripped (using strip), debugging becomes
difficult. The kernel modules are not stripped, so they are useful for
kernel debugging even though the text is not required (since in the
Solaris 2 software all kernel text is loaded into memory).

Core File
Memory is occasionally nostalgically referred to as core. Core files are
images of memory, either of a particular process or of the kernel.

User Program Core Files


adb can be used to inspect user programs with both, either, or neither
of the object file and core file specified. The defaults object file is a.out
and the default core file is core.

Kernel Core Files


Using the -k option, adb can be used to inspect kernel core files.

Live Systems
The -k option can also be used to inspect the live kernel. The object file
and core file can be specified, although the -k option will change the
defaults correctly to /dev/ksyms and /dev/mem.
✓ The -P option can be used to specify a prompt for adb. By default there is no prompt.

✓ The -I option is discussed in Module 6.

✓ New in 2.5 is the -V option to specify the register display and disassemby mode.

adb Basics 3-7


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

kadb

Booting kadb
● From PROM

ok boot kadb

● From the operating system

# reboot kadb

Memory Resident
● Stop-a straight to kernel debugger

SunOS Under kadb Control


● Breakpoints

● Single stepping

● Data inspection and modification

Differences From adb


● Macros built in

● Supplies a prompt

3-8 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

kadb

Booting kadb
The kadb program must be loaded before the SunOS operating system,
and so must be booted from the PROM.

Memory Resident
When booted, kadb is always in memory, ready to run. When the
kernel stops, such as due to the abort sequence, kadb is entered.

SunOS Under kadb Control


Since the operating system and all processes are frozen when kadb is
entered, debugging actions such as setting breakpoints, single
stepping, and inspecting and changing data is possible. Having
changed data, or set breakpoints, the operating system and all
processes can be continued.

Differences From adb


kadb is similar to adb; it uses the same syntax as adb. However, since
the operating system is not running when kadb is entered, the file
system cannot be used. This means that kadb must be self-contained,
and, unlike adb, cannot access external macros. A set of macros is
built into kadb, but these cannot be extended.

adb Basics 3-9


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

adb Commands

General Command Syntax


address, count command format

address
● Numeric

● Symbolic

● Expression

count
● Defaults to 1

Display Commands
● ?

Display from address in object file

● /

Display from address in core file

● =

Print the address

format
● Amount of data

● Representation

● Pretty printing

3-10 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

adb Commands

General Command Syntax


address, count command format

address
The address can be specified using a numeric virtual address — by
using a symbol recognized by the object file, or by using expressions.
Address expressions can use numeric values, symbols, and (also) the
current address, referred to as dot (.). If the address is omitted, dot is
assumed.

count
The repetition count for the command defaults to 1, but it can be
specified as a constant in hexadecimal or decimal, or as the result of an
expression.

Display Commands
Data can be displayed from locations specified by the address in both
the object and the core file. The address can also be used as a constant
and can be displayed in a different format using the equal sign (=).

format
The format expression describes how much data is to be displayed and
in what format. It also enables the use of special formatting characters
to make the display more readable. Formatting is used extensively in
adb macros.

adb Basics 3-11


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

adb Command Examples

f00baa00/X
strioctl+0x4d4: deafbabe

foobaa00,2/X
strioctl+0x4d4: deafbabe deadbeef

*panicstr/s
cpr_info+0x7ac: zero

utsname/257c
utsname: SunOS

utsname+0t257/s
utsname+0x101: yoyo

utsname+(2*0t257)/4t"release:"16ts
utsname+0x202: release: 5.5

fork+4?i
fork+4: clr %o0

2*0t257=X
202

3-12 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

adb Command Examples

f00baa00/X

Requests 4 bytes from address f00baa00 in the core file to be


displayed in hexadecimal.

foobaa00,2/X

Requests two lots of 4 bytes from address f00baa00 in the core file to
be displayed.

*panicstr/s

Requests the data pointed to by the symbol panicstr in the core file
be displayed as a string.

utsname/257c

Requests up to 257 characters to be displayed from the core file


starting at the address referred to as utsname.

utsname+101/s

Requests the data starting 101 (hexadecimal) bytes after the address
referred to as utsname be displayed as a string.

utsname+(2*0t257)/4t"release:"16ts

Requests a tab to position 4, the text release: and another tab to


position 16 be displayed, followed by the string that is 2 times 257
(decimal) bytes after the address referred to by utsname.

fork+4?i

Requests the data starting at 4 bytes after the address referred to by


fork be displayed as an instruction.

2*0t257=X

Requests that 2 times 257 (decimal) be displayed as 4 bytes


hexadecimal.

adb Basics 3-13


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

adb Expressions

Binary Operators
cpus+8

end-ptr

count*0t4

val%10000

val&ffff

addr|e0000000

acebabe#4

*(p+4)

Unary Operators
*panicstr

-1

~0

#(end-ptr)

3-14 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

adb Expressions

Binary Operators
+ Addition

- Subtraction

* Multiplication

% Integer division

& Bitwise AND

| Bitwise OR

# Round the value on the left to the nearest multiple of the


value on the right.

( ) Grouping

Unary Operators
* Pointer dereferencing through core file

% Pointer dereferencing through object file

- Unary negation

~ Bitwise one’s complement

# Logical negation. If # is applied to a zero valued


expression, the result is 1. If it is applied to a nonzero
valued expression, the result is 0. This is useful for the
count.

adb Basics 3-15


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Data Formats

-0t1161901314=X
babecafe

-1=U
4294967295

ffffffff=O
037777777777

ffffffff=Q
-01

-1=B
ff

-1=b
0377

ffffffff=f
+NaN

ffffffff=c
ÿ

0=Y
1970 Jan 1 00:00:00

utsname/257c257c
utsname:
utsname: SunOSyoyo

f000baaa=xx
baaa baaa

3-16 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Data Formats

X 4 bytes hexadecimal

D 4 bytes decimal

U 4 bytes unsigned decimal

O 4 bytes unsigned octal

Q 4 bytes signed octal

B Single-byte hexadecimal

b Single-byte octal

F Double precision floating point (64 bits)

f Single precision floating point (32 bits)

s A string, up to first null found

S As s above, but displays nonprinting characters

c A single character

C As c above, but displays nonprinting characters

i An instruction

Y Display as date

x, d, u, o, q
As X, D, U, O, Q, but 2 bytes, not 4

adb Basics 3-17


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Display Format

="Printing nothing but text"


Printing nothing but text

="split"n"over"n"four"n"lines"
split
over
four
lines

="Stretching"16t"a"30t"point"
Stretching a point

fork,6?ai
fork: fork: save %sp, -0x60, %sp
fork+4: clr %o0
fork+8: call cfork
fork+0xc: clr %o1
fork+0x10: mov %o1, %i1
fork+0x14: ret

+4/X

-0t60/X

3-18 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Display Format

"text"
Prints literal text from within the quotes.

n
Prints a single new line for each n. adb fits four 4-byte hexadecimal
values on one line.

nt
Prints a tab to specified position, n. adb uses 16 character positions to
print a 4-byte hexadecimal value.

a
Prints the address before each value shown.

+n
Moves the current address forward n bytes.

-n
Moves the current address back n bytes.

adb Basics 3-19


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Locations and Sizes

Dot (.)
● Known as "dot"

● Current address

Plus Sign (+)


● Automatically advances dot

● Uses the amount of data displayed in last command

Repeat Commands
● Reuse last format

● Advance dot and reuse last format

3-20 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Locations and Sizes

Dot (.)
Dot (.) refers to the current address. This is useful when you want to
apply specific formatting to adb’s current address. It is used
extensively in adb macros.

Plus Sign (+)


The plus sign (+) enables sequences of data to be displayed without
having to move dot manually. This is used extensively within adb
macros.

Repeat Commands
If only an address is supplied (with no format), the previous format is
used. This is useful when you want to display many different variables
in the same format.

Pressing Return is equivalent to reusing the previous format string at


the next sequential address.

adb Basics 3-21


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Variables

Single Letter or Digit Names


● a to z

● A to Z

● 0 to 9

$v
● Displays current variable usage

$r
● Registers

● CPU registers accessed as if variables

Assigning and Retrieving Values


● > and <

● Can be used wherever an expression can be used

Example
aba000>a
dab>b
2>c
<a+<b,<c="y"X"doo"n

3-22 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Variables

Single Letter or Digit Names


Some of the adb variables do have dedicated uses, but (generally)
approximately fifty variables are available for adb expressions.

$v
Variables and their values are displayed. Variables not explicitly set by
commands are not.

$r
The CPU has many internal registers, of which a subset is accessible at
any time. The register set can be displayed using $r. Individual
registers can be read and assigned to.

Assigning and Retrieving Values


The right angle bracket (>) followed by the variable name is used to
assign a value to a variable or register. The left angle bracket (<)
followed by the variable name is used to retrieve values.

adb Basics 3-23


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Additional Commands

$c, $C
● Display stack backtrace

● $C shows saved frame pointer and program counter values

$x, $X
● Display floating point register values

$<
● Read commands from specified file until end of file

$>
● Write output to specified file, or to standard output if no file is
specified

$M
● Display macros (kadb only)

$q or Control-d
● Exit adb

$Ptext
● Set a prompt

3-24 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Additional Commands

$c and $C
The $c and $C commands display the stack backtrace, including six
arguments to each function call. Stack backtraces are central to crash
dump analysis.

$<
Reading commands from an external file is how adb makes use of
macros. $< is sometimes regarded as "executing the specified macro."
When using kadb, there is no access to external files, so $< does mean
execute the specified macro.

$>
adb output can be saved into an external file. While writing to an
external file, error messages are still written to the screen. When
analyzing core files, writing to an external file is useful because you
can use additional utilities (such as vi and grep) to interpret the
output.

$M
Lists the names of the kadb built-in macros.

$q or Control-d
Either $q or Control-d will exit adb. Exiting kadb is only possible
using $q, and will return to the boot PROM prompt.

$Ptext
The prompt is set to text, including any whitespace.

adb Basics 3-25


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Process Control

Breakpoints
address:b Sets breakpoint at address.

$b Displays all breakpoints.

address:d Deletes a breakpoint.

:z Removes all breakpoints.

Running
:r Run the program.

:c Continue after a breakpoint.

Stepping
:s Step a single instruction.

:e Step, but across routines.

:u Continue, and stop at next function call.

3-26 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Process Control

The adb utility can be used to interactively control the execution of


user programs, and kadb can be used to control the execution of the
operating system. See the man pages for adb and kadb for more
information.

adb Basics 3-27


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Basic adb Analysis

Variables
● hw_provider

● architecture

● srpc_domain

● time

● lbolt

Macros
● utsname

● msgbuf

3-28 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Basic adb Analysis

Variables
● hw_provider – Manufacturer of system.

● architecture – CPU architecture.

● srpc_domain – Secure RPC domain.

● time – Time stored in core file. Time of crash for crash cores, or
current time if inspecting live kernel.

● lbolt – Number of ticks since system boot. On most Sun


workstations there are 100 ticks per second.

Macros
● utsname – Displays system information contained within
utsname structure.

● msgbuf – Displays latest system messages.

adb Basics 3-29


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Lab Exercises for Module 3

The following exercises are intended to give you some basic


practice using adb and number representations. Run adb, without
any arguments, to find the answers to the following.

1. What is your machine’s IP address? (Use the ifconfig -a


command)

___ . ___ . ___ . ___ (for example 192.29.98.4)

2. Use adb to convert the IP address you have just found into a single
hexadecimal number. For example:

0t192=x
c0
0t29
1d
0t98
62
0t4
4

Thus the IP address 192.29.98.4 is actually the hexadecimal


number c01d6204. (Note the 04 at the end.)

3. Convert your hexadecimal IP address into decimal. For example:

c01d6204=U
3223151108

4. Most SPARCstations use a 4 Kbyte page size. 4 Kbytes are actually


4096 bytes.
1000
What is 4096 in hexadecimal? _____________________________
200
What is 512 in hexadecimal? ______________________________
400
1 Kbyte (1024)? __________________________________________
100000
1 Mbyte (1024 * 1024)? ____________________________________
1048576
1 Mbyte in decimal? ______________________________________

5. Print the result of 1024 decimal * 1024 decimal in both decimal and
hexadecimal on the same line:
1048576 100000
0t1024*0t1024=DX___________________________________

3-30 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Lab Exercises for Module 3

6. What is the unsigned decimal equivalent of a full 32-bit word?

4294967295
0xffffffff = U____________________________________

7. What is the hexadecimal representation of -1?

ffffffff or ffff
-1 = _____________________________________________

8. Use adb expressions to determine the following in hexadecimal.


3c
a. How many seconds are there in one minute? _____________

b. Use the answer from 8a to calculate the number of seconds in


one hour? ____________________________________________
e10

c. Use the answer from 8b to calculate the number of seconds in


3840
four hours?___________________________________________

d. Use the answer from 8c to calculate the number of seconds in


7080
eight hours? __________________________________________

e. Use the answers from 8c and 8d to calculate the number of


seconds in twelve hours? a8c0
______________________________

f. Use the answer from 8e to calculate the number of seconds in


15180
one day? _____________________________________________

9. The Y display format of adb treats the value as the number of


seconds since the UNIX® epoch. Using 0 as the value, when was
the epoch?
1970 Jan 1 00:00:00
0=Y _______________________________________________

10. The time given is in Greenwich Mean Time (GMT). California can
be eight hours behind GMT. Use the value found in 8d above to
discover the time of the UNIX epoch in California.
1969 Dec 31 17:00:00
0-(value from 8d) = Y ___________________________________
It’s an hour wrong
Does the time look correct?________________________________

11. When does the operating system think is the time for deadbeef?

1952 Apr 14 15:27:43


deadbeef = Y______________________________________

adb Basics 3-31


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Lab Exercises for Module 3

12. What does adb think the signed and unsigned decimal answer to
Hamlet’s question is?
-1 4294967295
2b|~2b = DU _______________________________________

13. Which byte does =B print?


least
Most significant or least significant? ________________________

14. Can shifting of hexadecimal numbers be done by using %? Try


shifting the ff in 0xff0000 by dividing by 0x10000.

ff
0xff0000%0x10000=X_________________________________

15. Try shifting the ff in 0xff000000 by dividing by 0x1000000

0xff000000%0x1000000=X_____________________________
ffffffff

16. Does adb preserve the sign when doing division using %?
-7 (yes, preserves sign)
-0t76%0t10=d_______________________________________

17. Print just the bee from deafbee using & as a mask
bee
0xdeafbee&0xfff=X__________________________________

18. Print just the deaf from deafbee using a combination of & and %.
deafbee&ffff000%1000=X
________________________________________________________

19. Print just the feed from feedcafe.


feedcafe&ffff0000%10000=x
________________________________________________________

20. adb uses # to perform logical negation. What are the values of:
1
#0_________________________________________________
0
#1_________________________________________________
0
#f00baa____________________________________________
1
#(#1)______________________________________________
1
#(#f00baa)_________________________________________
1
##f00baa___________________________________________
0
###f00baa__________________________________________
1
########f00baa_____________________________________

3-32 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Lab Exercises for Module 3

21. adb uses variables with single letter names. Store 4096 decimal
into the variable p and the value of 1 Mbyte into the variable m.
Then find how many pages are in 1Mbyte.

0t4096>p
0t1024*0t1024>m
256
<m%<p=D ___________________________________________

22. Using m and p, construct an expression to show how many pages


are in 32 Mbytes
8192
________________________________________________________

23. When adb is run on the kernel, it reports the availability of


memory in pages. How many Mbytes is 1e6d?
30
________________________________________________________

24. Now exit adb and change into the directory where your crash files
are stored. Run adb specifying the core files from the corrupt
rootdir crash. For example, if that was your second crash:

adb -k unix.1 vmcore.1

25. Use the utsname macro to confirm the origin of the core file:

$<utsname
utsname:
utsname: sys _______________________
utsname+0x101: node _______________________
utsname+0x202: release _______________________
utsname+0x303: version _______________________
utsname+0x404: machine ___________________________

26. Now print the values of system information strings.

hw_provider/s
hw_provider: ____________________________________
architecture/s
architecture: ____________________________________
srpc_domain/s
srpc_domain: ____________________________________

adb Basics 3-33


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
3

Lab Exercises for Module 3

27. Determine when your system crashed, and how long since it was
booted.

time/Y
time: ____________________________________
lbolt/D
lbolt: ____________________________________

28. Store the crash time and number of seconds of uptime into adb
variables, and then use the variables to find the boot time.

*time>t
*lbolt%0t100>l

<t-<l=Y ____________________________________

29. View the panic string to see the reason for the crash.

*panicstr/s
some_label: ____________________________________

30. Dump the message buffer with the msgbuf macro. Then redirect
the output to a new file, and save the msgbuf output in that file.
Remember to redirect the output back to the screen.

$<msgbuf
$>msgbuf.2
$<msgbuf
$>

31. Retrieve the stack backtrace, and store the output to another file.

$c
$>stack.2
$c
$>

3-34 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
Header Files 4

Objectives
Upon completion of this module, you will be able to:

● Explore the /usr/include directory, noting the number of files


and subdirectories.

● Describe the sizes of different basic types in the C programming


language.

● Write simple C programs to determine sizes of specific C


structures.

● Count bytes within C structures to determine offsets.

● Describe how to determine the location of global definitions and


declarations.

Note – This module presents material covered in Chapter 10 of Panic!.

4-1
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
4

Use and Location

Definitions
● Types

● Macros

Declarations
● Functions

● Variables

Directories
● /usr/include

● /usr/include/sys

● /usr/include/vm

● /usr/platform/platform-name/include

Inclusion
● <subdirspec/filename>

4-2 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
4

Use and Location

Definitions
To improve readability, programmers define their own types based on
the standard predefined C data types. View the header files to
determine the actual data types and sizes.

Macros are also defined to improve readability. Macros are used for
symbolic constants and also to simplify expressions.

Declarations
Declarations do not define anything new but merely declare that the
functions or variables are available for access. Although the source
code containing the actual definitions of their types will not usually be
supplied, the declarations will declare the types.

Directories
The standard location for header files is within the /usr/include
directory.

The /usr/include directory has several subdirectories, as discussed


in Panic!, Chapter 10. The most important subdirectories for kernel
debugging are /usr/include/sys and /usr/include/vm.

The /usr/platform/platform-name/include directory contains


platform-specific definitions and declarations.

Inclusion
To use the declarations and definitions from header files, they must be
included into program code. By default, angle brackets (< > ) refer to
files within /usr/include, so only the subdirectory specification (if
any) and the file name are required.

Header Files 4-3


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
4

Lab Exercises for Module 4

The following exercises are intended to give you experience in


working with the operating system header files and to determine
the sizes of some common types and structures. Work in teams.

1. Change into the Lab4 directory, and run the count script to
display the display the number of files and directories stored
beneath the /usr/include directory.

2. Search the include files to determine the types of as many as


possible of the following kernel variables. Note their type and the
file you found the declaration in.

panicstr___________________________________________
systm.h: extern char *
t0_________________________________________________
thread.h:extern struct _kthread
p0_________________________________________________
proc.h:extern proc_t
ncsize_____________________________________________
dnlc.h:extern int
kas________________________________________________
as.h:extern struct as
pt_cnt_____________________________________________
ptms.h:extern int
dumpfile___________________________________________
bootconf.h:extern struct bootobj
moddebug___________________________________________
modctl.h:extern int

3. Find which header file contains the declaration of the following


kernel structures:
thread.h
kernel thread (kthread_t) ________________________________
klwp.h
lwp (klwp_t) ____________________________________________
cpuvar.h
cpu (cpu_t) _____________________________________________
proc.h
proc (proc_t) ___________________________________________
proc.h
pid (struct pid) ________________________________________
user.h
user (user_t)____________________________________________
seg.h
seg (struct seg) ________________________________________
seg_vn.h
segvn_data (struct segvn_data) _________________________
as.h
as (struct as) __________________________________________
page.h
page (page_t) ___________________________________________
mutex.h
mutex (kmutex_t)________________________________________
machtypes.h
label_t ________________________________________________

4-4 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
4

Lab Exercises for Module 4

4. The supplied program sizzle tells you the size of some basic
types. By counting the bytes or by extending sizzle, determine
the sizes of the following common data structures. If you prefer
not to take this challenge, use the supplied program cheat.

Size of thread structure __________________________________


Size of proc structure ____________________________________
Size of lwp structure______________________________________
Size of page structure ____________________________________
Size of lrusage structure _________________________________
Size of itimerval structure _______________________________

5. Assuming you did not use the cheat program, determine how
many bytes the following fields are into their structures:

Offset of thread in cpu structure __________________________


Offset of dispthread in cpu structure ______________________
Offset of state in thread structure ________________________
Offset of pri in thread structure __________________________
Offset of stack pointer (sp) in thread structure _____________
Offset of procp in thread structure ________________________
Offset of next in thread structure _________________________
Offset of pidp in proc structure ___________________________
Offset of pid in pid structure______________________________
Offset of tlist in proc structure__________________________
Offset of user in proc structure ___________________________
Offset of psargs in user structure _________________________
Offset of psargs in proc structure _________________________

6. If you have not run cheat, do so now to check your answers.

Header Files 4-5


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
4

Lab Exercises for Module 4

Optional – a bit of light relief.


7. Complete the crossword puzzle below. The answers can be found
in the header files.

1 _ 2 3 4 5

7
8 9 10

11

12 _

13
14 15 16

_ 17

18
19

Across Down

1. Protects the p_selock. 1. Type of clicks given by maxclick.


6. Who has error codes 135-142? 2. asm instruction used by the swtch macro.
8. Created for each entry in /etc/system. 3. Sets the thread into the ONPROC state.
9. Holds the giant length error count. 4. Starts warning comments.
12. There are 2 of them, to keep it simple. 5. ’q’ shifted left 8 places.
14. Its direction is hidden from ELF. 7. qb_frtn is for it.
16. Hold the number of blocks per parti- 10. The cache enabled by ENA_CACHE.
tion. 11. Mutex type of zs_excl.
18. The name of the transmit interrupt 13. MMU used in Sun4m.
mask register. 15. The entry in -uid or gid.
19. The sort of MCP stuff that should be 17. Could allow "halkword" transfers.
protected by a high priority lock.

4-6 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
Symbol Tables 5

Objectives
Upon completion of this module, you will be able to:

● Run the nm command on various kernel modules, and note the


output generated.

● Describe how to use the dump command on object files, and


explain several more commonly used options to the dump
command.

● Explain which symbols are included in an executable image.

● Compare the different ways to display data within an adb session.

Note – This module presents material covered in Chapter 11 of Panic!.

5-1
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
5

Symbol Tables

Use
● Symbolic debugging

Generation
● Generated by compiler

● In all ELF files

Display
● nm

● dump

Removal
● strip

5-2 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
5

Symbol Tables

Use
Symbol tables (or namelists) enable symbols to be used to refer to
variables and functions when debugging the kernel or user programs.
Without symbol tables, all input to adb would have to be numeric and
all labels displayed by adb would be numeric.

Generation
When source code is compiled, global symbols are extracted from the
source code and placed into the object file. The Solaris 2 operating
system uses the executable and linking format, ELF. All ELF files have
the capability of holding symbol tables although some, such as core
files, may not use the facility.

Display
The nm utility is the name list printer, which will display the symbols
in an object file. Various options to nm modify the data displayed. See
nm(1) for more details; note that there are two versions of nm supplied
in release 5.5 of the operating system.

The dump utility can dump selected parts of an object file including the
symbol table, if requested. dump has many more capabilities than nm.
See dump(1) for more details.

Removal
Symbol tables can be removed using strip(1) to reduce the disk
storage requirements. However, when an object file is stripped, the file
cannot be used for symbolic debugging.

The kernel modules are typically not stripped, but utilities supplied
with the operating system (such as ls, grep, nm, dump and strip
itself) will be stripped. The saving can be large, especially for complex
programs.

Symbol Tables 5-3


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
5

Kernel Name List

Kernel Modules
● /kernel

● /platform/platform-name/kernel

● /usr/kernel

/dev/ksyms
● Symbols from currently loaded modules

5-4 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
5

Kernel Name List

Kernel Modules
The Solaris 2 operating system uses a dynamically loaded kernel. The
kernel is split into multiple modules that need not be loaded into
memory until required, thus saving memory.

By default, kernel modules are located in three directories. Those


under /kernel are common to all platforms within a particular
instruction set and must be available before /usr is mounted. The
/platform/platform-name/kernel directory contains platform
specific kernel components. The /usr/kernel directory is used for
modules that can be shared across platforms within a particular
instruction set and are not required until after /usr is mounted.

Each module is an ELF object file, containing the code and symbols for
a particular kernel function.

/dev/ksyms
The symbols for loaded kernel modules are grouped together and
made accessible through the pseudo device, /dev/ksyms.

Symbol Tables 5-5


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
5

Example for nm

tiny.h
/* tiny.h */

#define BESTYEAR 66

struct mustang {
float ragtop;
int leather;
int candyapple; } dreamcar;

struct highway {
int speedlimit;
int smokey; };

float beach_factor = 123.5;

tiny.c
/* tiny.c*/

#include"tiny.h"

int whistles = 10;

main()
{
int tickets = 6;
dreamcar.ragtop = 123.5;
dreamcar.leather = 6;
dreamcar.candyapple = 10;
}

5-6 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
5

Example for nm

The example presented here is directly from Panic!, page 96.

Symbol Tables 5-7


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
5

nm Output

Symbols from tiny:

[Index] Value Size Type Bind Other Shndx Name

[1] | 0| 0|FILE |LOCL |0 |ABS |tiny

[32] | 0| 0|FILE |LOCL |0 |ABS |tiny.c

[47] | 133372| 12|OBJT |GLOB |0 |18 |dreamcar

[57] | 67352| 76|FUNC |GLOB |0 |8 |main


[58] | 133360| 4|OBJT |GLOB |0 |17 |beach_factor

[60] | 133364| 4|OBJT |GLOB |0 |17 |whistles

Virtual address Size in bytes Scope

5-8 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
5

nm Output

The value column can be used to display offsets within a section if nm


is used on a relocatable file — for example a kernel module, rather
than an executable as shown here.

The type column indicates objects such as variables, functions, files, or


section symbols (used for all ELF files).

The bind column describes the scope of the symbols, whether LOCAL,
GLOBAL, or WEAK (which is similar to GLOBAL, but can be overridden
by GLOBALs of the same name).

The other column is reserved for future use and is currently set to
zero.

The shndx is the section header index or

● ABS (indicating a value that cannot be relocated).

● COMMON (indicating alignment constraints).

● UNDEF (for undefined symbols).

See Panic!, page 97, for a more complete nm output, and see the man
page for more details of the columns.

Symbol Tables 5-9


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
5

Lab Exercises for Module 5

1. Change into the /kernel directory, and list the contents. You will
see several subdirectories plus the generic base kernel located in
the file genunix. The directories (with more under /usr/kernel
and /platform/platform-name/kernel) contain loadable kernel
modules.

2. Count the number of lines of output produced by running nm


against /kernel/genunix. (It is probably quicker to use wc than to
count the lines yourself.)

3. Run modinfo to determine which kernel modules are currently


loaded. Note how many are loaded and the name of the last
module. Also note the Id of the ksyms module if it is loaded.

________________________________________________________

ksyms: __________________________________________________

4. If ksyms is loaded, run modunload -i id to unload it.

5. Count the number of symbols currently loaded using nm


/dev/ksyms.

________________________________________________________

6. Is ksyms now loaded?

________________________________________________________

7. Now count the number of lines of output from running nm against


/kernel/misc/ipc.

________________________________________________________

8. Run nm -s /kernel/misc/ipc. Consult the man page of nm to


help explain this and other options. Try some of the other nm
options.

9. Look up dump in the man pages.

10. Try running dump -vs /kernel/misc/ipc.

11. Try some other options with dump.

5-10 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
5

Lab Exercises for Module 5

12. Use nm or dump with grep to determine which kernel module


contains the symbol rstchown.
/platform/platform-name/unix
________________________________________________________

13. Which kernel module contains the symbol shminfo_shmmni?


/kernel/sys/shmsys
________________________________________________________

14. Now change into the Lab5 directory where you will find the files
penguin.h and penguin.c. View the files. Note which lines are
declarations and which are definitions of symbolic constants,
types, variables, and functions.
errno, appeal
Declarations _____________________________________________
FEATHERS_MCGRAW, TRUE, FALSE
Symbolic constant definition ______________________________
who_t, what_t, bool_t, struct crime
Type definition __________________________________________
Local variable definition __________________________________
suspect
Global variable definition _________________________________
convictions, pervicacious, sentence, fines_total
Function definition _______________________________________
appeal, main

15. Compile penguin if necessary, using make or


cc penguin.c -o penguin.

16. Using nm on penguin, determine which symbols from 14 above are


in the executable file, penguin.

Declarations _____________________________________________
Symbolic constant definition ______________________________
Type definition __________________________________________
Local variable definition __________________________________
convictions, pervicacious, sentence, fines_total
Global variable definition _________________________________
appeal, main
Function definition _______________________________________

A similar process occurswhen the kernel is built. Consequently,


many of the symbols in the header files will not be present in the
kernel or in crash dumps.

Symbol Tables 5-11


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
5

Lab Exercises for Module 5

17. adb can be used to inspect user programs. Run adb using penguin
as the object file. You need not specify a core file. (Do not expect a
prompt.)

$ adb penguin

18. Set a breakpoint at the entry to main, and then run the program
under adb’s control.

main:b
:r

19. List some of the assembly code, using the two commands below.

main?20i
main,20?i

The first requests 20 instructions from the object file, and the
second requests an instruction 20 times. Why do they give
different amounts of output?

20. Display the contents of the sentence variable, this time from
memory (using / instead of ?). Since sentence is a structure of a
bool_t (really an int), an int, and a float, the following is
suitable:

sentence: 0 0 +0.0000000e+00
sentence/DDf___________________________________________

21. Display the value of fines_total (again from memory) as a


float:
fines_total: +1.2340000e+02
fines_total/f__________________________________________

22. Display the value of convictions (from memory) as a decimal.


convictions/D
________________________________________________________
convictions: 547

23. Attempt to display the value of pervicacious and suspect (both


actually ints) from memory:

pervicacious: 1
pervicacious/D_____________________________________
suspect/D ______________________________________________
symbol not found (it’s on the stack)

24. Why is only one found? (Where is the value of the other stored?)

5-12 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
5

Lab Exercises for Module 5

25. Single step through the code using :s to see an instruction


involving the value 0x4d2. What is 0x4d2 in decimal? How does
this relate to the program? 0x4d2=D => 1234
setting up suspect.
________________________________________________________

26. Repeatedly single step through the code using :s until you notice
a comment involving sentence. (Assembler comments are
preceded by !) Single step two more instructions, and then display
the contents of sentence as above.
sentence: 1 0 +0.0000000e+00
________________________________________________________

27. Which line of the program have you just executed?


sentence.custodial = TRUE;
________________________________________________________

28. Continue single stepping, and observe when variables are actually
changed.

29. Exit adb, and ensure floats.c is compiled into floats. Run
make, or use cc floats.c -o floats.

30. Run adb floats.

31. Set a breakpoint at main and run floats.

main:b
:r

32. Observe the first 14 values stored in the floats array, using the
following commands:

farray/14f
farray,e/f

33. Now observe all 15 elements of the array using similar commands
to the above. What do you think causes the output?

________________________________________________________
It seems to be used within adb, so cannot be used as a count.
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________

Symbol Tables 5-13


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
5

5-14 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
adb Macros (Part I) 6

Objectives
Upon completion of this module, you will be able to:

● Describe where adb macros are located.

● Explain how to read and interpret various adb macros.

● Given a header file containing the definition of specific C


structures, describe the correlation between certain adb macros
and the defined structures.

Note – This module presents material covered in Chapter 12 of Panic!.

6-1
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
6

Availability of adb Macros

Standard Macros
● /usr/lib/adb

● /usr/platform/platform-name/lib/adb

● Kernel architecture-specific

adbgen
● Generates adb scripts (macros)

Hand-Written Macros
● Supplement existing

● Targeted at own code

Available From Current Directory


● First place adb looks

adb -I dirspec
● Alternative locations

● Standard macros not accessible

6-2 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
6

Availability of adb Macros

Standard Macros
The Solaris operating system contains a set of standard macros located
under /usr/lib/adb and /usr/platform/platform-name/lib/adb.
Over 200 macros are supplied to display many different kernel data
structures.

adbgen
adb macros refer to the fields within a data structure by means of
offsets from the start of the structure. The macro must print out values
using the correct number of bytes. To assist in the generation of
correctly formatted adb macros, adbgen can be used to compile
adbgen scripts. The header file defining the data structure of interest is
included in the adbgen script, and the fields can be referred to by
name. adbgen requires a compiler to create adb scripts, and is
restricted to generating macros that display the contents of a single
structure.

Hand-Written Macros
adb macros can be written to extend the set supplied with the
operating system. This might be appropriate if the structure of interest
does not have a suitable macro, or if the standard macros do not
display the needed data in the correct format.

Available From Current Directory


The current directory is always searched first when adb macros are
invoked. If a file is not found in the current directory, the standard
location is then searched.

adb -I dirspec
A colon-separated list of directories to be used instead of the standard
locations can be specified on the adb command line.

adb Macros (Part I) 6-3


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
6

Invoking adb Macros

$< file
● Run macro

● Execute commands from named file

● Return to user at end

$<< file
● Include macro into execution of current

● Return to current at end

Start Address
● Embedded into macro

● Supplied when invoked

● From dot (.)

6-4 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
6

Invoking adb Macros

$< file
The $< syntax executes the commands (macro) in the named file. adb
will open the file and source the commands as if directly typed. When
all the commands within the file have been executed, adb returns to
standard input to read the next command.

$<< file
From within one macro, another may be called. If $< is used, then at
the end of the included macro, control will be passed back to standard
input rather than continuing the original macro.

$<< nests a macro, so when the called macro ends, the current macro is
resumed.

Start Address
Macros are usually used to format data that is at a particular address
within memory or the core file. Some macros are for specific kernel
structures that have global names, which are embedded into the
macro. An example is the msgbuf macro. If the macro defines its own
starting point, it is unnecessary to supply the $< command.

Other macros are used to format data structures that can occur in
many different locations. For these, the starting address is usually
supplied. However, $< or $<< can be used with no address if dot, the
current location, is already correct.

adb Macros (Part I) 6-5


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
6

Review of adb Commands

symbolname/ Change the current address.

+/ Advance the current address.

/"printed text" Print text.

/t Tab over.

/c Print byte as character (if displayable).

/C Print byte as character (or ^@ if not displayable).

/X Print full word as hexadecimal value.

/n Print a new line.

6-6 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
6

Review of adb Commands

The commands illustrated are commonly used within adb macros.

adb Macros (Part I) 6-7


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
6

Example Macro

# cat /usr/lib/adb/bootobj

./"fstype"16t16C
+/"name"16t128C
+/"flags"16t"size"16t"vp"n3X

# adb -k unix.47 vmcore.47

physmem 1e6d
rootfs$<bootobj

First Line
● Start at current address; supply address.

● Print fstype, tab to position 16, and print 16 characters.

Second Line
● Move past last data.

● Print name, tab to position 16, and print 128 characters.

Third Line
● Print flags.

● Tab to position 16.

● Print size.

● Tab further 16 positions.

● Print vp.

● Print a new line.

● Print three full words in hexadecimal under printed text.

6-8 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
6

Example Macro

The bootobj macro is used to print the contents of a struct


bootobj, defined in the /usr/include/sys/bootconf.h file.

The bootobj macro is simple, but it illustrates the use of formatting


commands.

adb Macros (Part I) 6-9


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
6

Lab Exercises for Module 6

1. Count the number of adb macros installed on your machine.

# ls /usr/lib/adb | wc -w
# ls /usr/platform/‘uname -i‘/lib/adb | wc -w

View the contents of the /usr/lib/adb/utsname macro.

2. In another window, change into the directory where your crash


files are stored. Run adb on any core file you have, and use the
utsname macro as earlier.

# adb -k unix.n vmcore.n


physmem 1e6e
$<utsname
utsname:
utsname: sys SunOS
utsname+0x101: node yoyo
utsname+0x202: release 5.5
utsname+0x303: version generic
utsname+0x404: machine sun4m

3. The utsname structure is declared in <sys/utsname.h> as:

#define SYS_NMLN 257

struct utsname {
char sysname[SYS_NMLN];
char nodename[SYS_NMLN];
char release[SYS_NMLN];
char version[SYS_NMLN];
char machine[SYS_NMLN];
};

extern struct utsname utsname;

The lines of the macro are executed as if entered directly. Type the
first line of the macro into your adb session.

utsname/"sys"8t257c

This is requesting that the data at the symbol utsname be printed


as "sys"8t257c.

6-10 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
6

Lab Exercises for Module 6

4. Try the following commands to see how the macro uses the
formatting capabilities of adb.

utsname/X _________________________________________
utsname/c _________________________________________
utsname/3c ________________________________________
utsname/257c ______________________________________
utsname/"sys" _____________________________________
utsname/"sys"257c _________________________________
utsname/"sys"40t257c ______________________________
utsname/"sys"8t257c _______________________________

5. Note how the next line of the adb macro starts with +. The +
ensures the next formatting is applied to the data after the
previous line.

6. Try replacing the tab (t) format character by a new line symbol, n.

utsname/"sys"n257c
__________________________________________________
__________________________________________________

7. The kernel contains the global symbols for p0 and t0. p0 is the
address of the proc structure for process 0 (to which all system
kernel threads refer), and t0 is the kernel thread for sched. Using
the proc and thread adb macros, display the contents of p0 and
t0, and compare with the declarations of the proc structure in
<sys/proc.h> and thread structure in <sys/thread.h>.

p0 $< proc
t0 $< thread

The proc and thread macros are long and do involve some syntax
not yet covered, but they should be mostly understandable.

8. Exit adb, change into the root directory, and run ls. Run adb
against the live kernel, and attempt to display the proc structure of
p0 and the thread structure of t0.

p0 $< proc This will attempt to use /proc as a macro.


t0 $< thread

Can you explain the result?

adb Macros (Part I) 6-11


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
6

6-12 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
adb Macros (Part II) 7

Objectives
Upon completion of this module, you will be able to:

● Compose complex adb expressions, such as a conditional


expression.

● Explain the use of logical negation (#) in adb.

● Describe how flow control is implemented within an adb macro.

● Create your own adb macro using adbgen.

● Describe how to include predefined adb macros in your adb


macros.

7-1
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
7

The Message Buffer

From /usr/include/sys/mgsbuf.h
#define MSG_MAGIC 0x8724786

struct msgbuf {
struct msgbuf_hd {
long msgh_magic;
long msgh_size;
long msgh_bufx;
long msgh_bufr;
u_longlong_t msgh_map;
} msg_hd;
}

0x8724786 msgh_magic 0x8724786

0x1fe8 msgh_size 0x1fe8


Byte offset to end msgh_bufx Byte offset to end
of latest message of latest message
Byte offset to start msgh_bufr Byte offset to start
of latest message of latest message

msgh_map they made me


bring it back
again

When I was eight When I was eight


I ran away with a I ran away with a
circus. circus.

Then, when I
was nine,

7-2 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
7

The Message Buffer

The most recent system messages are held in a ring buffer, defined as
shown. There is one msgbuf in the system, at the location specified by
the symbol msgbuf.

The msgbuf structure acts as a header for the actual buffer, defining
the size of the buffer and where the latest message started and ended.
Messages are added after the end of the previous message, and will
wrap around to overwrite earlier messages once the end of the buffer
has been used. Thus it is possible that the start of the message may be
before or after the end of the message within the buffer.

adb Macros (Part II) 7-3


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
7

The msgbuf Macro

msgbuf
msgbuf/"magic"16t"size"16t"bufx"16t"bufr"n4X
+,(*(msgbuf+0t8)-*(msgbuf+0t12))&80000000$<msgbuf.wrap
.+*(msgbuf+0t12),(*(msgbuf+0t8)-*(msgbuf+0t12))/c

First Line
● Starts at location specified by msgbuf

● Prints text labels, new line and four hexadecimal values

Second Line
● Moves across previously printed data

● Calls msgbuf.wrap macro without returning

Third Line
● Prints characters from current location plus offset specified 12
bytes into msgbuf (msgh_bufr)

● Only executed if mgsbuf.wrap is not called

7-4 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
7

The msgbuf Macro

msgbuf
The use of the msgbuf macro has already been described. Here the
macro is broken down line by line to illustrate how complex macros
can be understood.

First Line
The msgbuf macro does not require a starting address to be specified
since it is intended to display the only msgbuf on the system.

Second Line
The expansion of the count portion of the second line is described on
the following pages. Depending on the count another macro,
msgbuf.wrap, is invoked.

Third Line
The count is explained on the following pages.

Macro References

Address Contains

msgbuf + 0t8 Offset to end of latest message


msgbuf + 0t12 Offset to start of latest message
msgbuf + 0t16 Start of message area

adb Macros (Part II) 7-5


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
7

Counts Used in msgbuf

msgbuf
msgbuf/"magic"16t"size"16t"bufx"16t"bufr"n4X
+,(*(msgbuf+0t8)-*(msgbuf+0t12))&80000000$<msgbuf.wrap
.+*(msgbuf+0t12),(*(msgbuf+0t8)-*(msgbuf+0t12))/c

Calls to the msgbuf.wrap Macro (Second Line)


● (*(msgbuf+0t8)-*(msgbuf+0t12)) & 80000000

● Zero times or 0x80000000 times?

● Dependant on whether difference is negative

Character Printed (Third Line)


● (*(msgbuf+0t8)-*(msgbuf+0t12))

● Difference between end and start

7-6 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
7

Counts Used in msgbuf

Calls to the msgbuf.wrap Macro (Second Line)


The number of times the msgbuf.wrap macro is called depends on the
two operands to the &. The expression on the left is determining the
difference between the end and the start of the last message. If this
difference is negative (meaning the message buffer has wrapped) then
the top bit of the 32-bit word is set. Only in that case will the result of
the & be nonzero. Consequently, the line appears to invoke
msgbuf.wrap 0 times or 2147483648 times. If it is executed 0 times,
then the macro proceeds to the third line. If the invocation count is
more than 0, it will never return to this macro, so it is run just once.

Character Printed (Third Line)


The third line is only run if the count from the second line was zero;
this line is only executed if the message buffer has not wrapped. If the
macro has not wrapped, the difference between the end and start is
simply the count of the number of characters in the message — so they
are all printed.

Macro References

Address Contains

msgbuf + 0t8 Offset to end of latest message


msgbuf + 0t12 Offset to start of latest message
msgbuf + 0t16 Start of message area

adb Macros (Part II) 7-7


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
7

The msgbuf.wrap Macro

msgbuf.wrap
.+*(msgbuf+0t12),(*(msgbuf+0t4)-*(msgbuf+0t12))/c
msgbuf+0t16+0,*(msgbuf+0t8)/c

First Line
● Prints from msgh_bufr to end of buffer.

Second Line
● Prints from start of buffer to msgh_bufx.

7-8 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
7

The msgbuf.wrap Macro

msgbuf.wrap
If the message has wrapped around, then to print it correctly requires
two separate prints.

First Line
Prints the first portion of the message.

Second Line
Prints the second portion of the message.

Macro References

Address Contains

msgbuf + 0t8 Offset to end of latest message


msgbuf + 0t12 Offset to start of latest message
msgbuf + 0t16 Start of message area

adb Macros (Part II) 7-9


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
7

Flow Control

No Test Statements
● adb has no flow control constructs

Count
● Used for conditional printing

.,<a="The value of a is not zero"

● Used to conditionally execute a different macro

.,<a$<macro

● Used to conditionally terminate the execution of macro

.,<a$<

Logical Negation (#)


● It is useful to change the sense of a test.

.,#(<a)="The value of a is zero"

● It is useful to limit to 1 or 0.

.,##(<a)="The value of a is not zero"


✓ Point out the difference between this and the first use of a if a is greater than 1.

✓ If required, give examples other than using dot.

7-10 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
7

Flow Control

No Test Statements
Since there are no test or decision-making constructs in adb, the count
must be used — usually in conjunction with an additional macro, as in
the case of msgbuf.

Count
The count field can be used to test whether extra lines should be
executed.

Logical Negation (#)


By negating the count field, the test can be inverted. A double
negation ensures that the count always evaluates to 0 or 1, rather than
using the count to indicate zero or nonzero.

In the example, if a is not zero, the text is printed just once, regardless
of the actual value of a.

adb Macros (Part II) 7-11


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
7

adbgen

adb Scripts
● No structure offset dependencies

● One structure per script

● Requires C compiler

● Generates adb macros

Script Format
● Header lines

● Blank line

● Structure name

● adb script

Example adb Script


#include <sys/thread.h>

_kthread
./"flags"8t"state"n{t_flag,x}{t_state,X}
+/"pri"8t"epri"8t"procp"n{t_pri,d}{t_epri,d}{t_procp,X}

7-12 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
7

adbgen

adb Scripts
adbgen compiles scripts into adb macros. Unlike adb macros, adbgen
scripts do not require absolute offsets into data structures. Instead,
symbolic references can be made to the needed structure members.

Script Format
All text before a blank line is taken as header information, used to
interpret the following script. After the blank line, a single structure
from an included file may be referenced. The adb script that follows
uses member names to reference the elements within the structure,
rather than offsets. Each member name is paired with an adb format
character and enclosed in braces ({}). See the adbgen man page for
additional allowed constructs.

Example adb Script


The example script shown displays fields from a kernel thread.

Note – adbgen requires . (the current directory) to be in your directory


search path.

adb Macros (Part II) 7-13


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
7

adbgen in Use

Macro Generation
● Store the script in a file with the suffix .adb.
# vi tstate.adb
● Run adbgen.
# /usr/lib/adb/adbgen tstate.adb
● The generated file has no suffix.
# cat tstate
./"flags"8t"state"n20+x6+X
+/"pri"8t"epri"8t"procp"n2d132+X

Macro Use
● Refer to the new script within adb as normal.
fc2b8360$<tstate
0xfc2b8360: flags state
0 4
0xfc2b8380: pri epri procp
59 0 fc2b3328

7-14 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
7

adbgen in Use

Macro Generation
adbgen input file names must end in .adb. adbgen creates a C
program from the script, and compiles and runs the program, which
then writes the adb macro. The C program is then removed.

Macro Use
Subject to the usual directory searching of adb, the macro can then be
used by name as shown.

adb Macros (Part II) 7-15


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
7

Including Standard Macros

User-Written Macro to Show Second CPU


# cat cpu2
.>c
*(.+24)>n
<n,##(<c-<n)$<cpu
.="Only one CPU"

Example use within adb


*cpu$<cpu2
Only one CPU

✓ Note that cpu2 is NOT a standard macro. The standard macro cpus is a better (but more
involved) macro.

✓ cpu IS a standard macro.

7-16 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
7

Including Standard Macros

The example given illustrates a macro that calls one of the standard
macros. The cpu2 macro must be in the current directory, both cpu2
and the standard macros must be in the same directory, or the -I
option must be used with adb to specify the directories to use.

adb Macros (Part II) 7-17


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
7

Lab Exercises for Module 7

1. Review the msgbuf macro as already covered by your instructor


and in Chapter 13 of Panic!.

2. Start adb against one of your core files, and invoke the
threadlist macro. Expect to see many lines of output; the
threadlist macro displays a stack backtrace of every thread on
the system.

3. In another window, view the contents of the threadlist macro.


The threadlist macro references another, more complicated,
macro, threadlist.nxt. Do not inspect threadlist.nxt yet.

4. The allthreads variable contains the address of one active thread


within the doubly linked list of all threads. Enter the first two lines
of the macro directly to adb, which will store the address of this
first thread in both of the variables t and e.

*allthreads>t
<t>e

5. e is used to indicate the end of the list. Why do you think the
address of the first thread is used as the end?
To cater for circular lists
________________________________________________________

6. When is the last line of allthreads ever executed?


If the allthreads pointer is NULL. Hopefully never!
________________________________________________________

7. What address is supplied to the threadlist.nxt macro?


The address of the first thread indicated by allthreads
________________________________________________________

8. How many times will the threadlist.nxt macro be called from


threadlist for your core file?

#(#(<t))=D _____________________________________________

7-18 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
7

Lab Exercises for Module 7

9. Now view the threadlist.nxt macro.

Using the cheat program from Lab4, what is stored in the


variables n and s?
Address of next thread
n _______________________________________________________
State of this thread
s _______________________________________________________

10. The longest line in the threadlist.nxt macro determines the


number of times to apply the threadlist.nxt macro to the value
stored in n. The command count is set by the following expression
(#(<s))&(#(#(<n)))&(#(#(<n-<e)))
which can be broken down into three subexpressions. All three are
combined using the bitwise AND, &. Consequently, the
threadlist.nxt macro is only applied by this line if all three are
nonzero.

a. When does (#(<s)) evaluate to 1? (See <sys/thread.h>.)


When the thread is "at loose ends" – TS_FREE
________________________________________________________

b. When does (#(#(<n))) evaluate to 1?


When there IS another thread
________________________________________________________

c. When does (#(#(<n-<e))) evaluate to 1?


When the macro has not returned to the beginning
________________________________________________________

The expression causes threadlist.nxt to be executed if there are


more threads to look at but the current thread is free. If the current
thread is not free, the remainder of the macro is executed.

11. Lower in the macro you will see the $c command being applied at
another address. Again, using the cheat program, what is $c
being applied to?

________________________________________________________
The thread’s stack pointer

12. Applying $< without a file name indicates adb should read from
standard input again. When does threadlist.nxt terminate?

________________________________________________________
When the next thread is the original thread (addr stored in e)

adb Macros (Part II) 7-19


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
7

Lab Exercises for Module 7

13. Exit from adb, and change into the Lab7 directory where you will
find a simple C program involving a linked list, ll.c. Compile the
program, if necessary, and then run it. Look at the source code to
ensure you understand its operation.

14. Run ll under adb.

a. Load adb, and place a breakpoint on the donowt function.

# adb ll
donowt:b

b. Run the program. When the breakpoint is reached, print out


the value referenced by the head pointer, and store the value of
the next field (4 bytes on) in the adb variable n.

:r
breakpoint donowt: save %sp, -0x60, %sp
*head/D
a:
a: 1
*(.+4)>n

c. Display what is found at the value in n:

<n/D
b:
b: 2

15. Repeat the last two last lines until adb cannot find an address.
(This is when you have reached the end of the list.) Can you
formulate an expression that would test for the end of the list? It
would involve checking that the value stored in n is not the NULL
pointer (which is equal to 0).

________________________________________________________
##<n
________________________________________________________

________________________________________________________

________________________________________________________

________________________________________________________

7-20 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
7

Lab Exercises for Module 7

16. Try modifying the adb macro almost.adb (in the Lab7 directory),
to use your expression. When working, the macro should print the
list as below.

# adb ll
donowt:b
:r
breakpoint donowt: save %sp, -0x60, %sp
*head $< almost.adb
a:
a: key 1
b:
b: key 2
c:
c: key 3
d:
d: key 4

17. Write a macro called thr, which produces the following output
when supplied a kernel thread address. The pc and sp are the two
elements within the t_pcb member of the thread structure.

t_state: decimal
t_pri: decimal
t_pc,sp: hex, hex
t_clfuncs: hex
t_forw: hex

18. Try the exercises as described in Chapter 14 of Panic!.

19. If you have previously completed the exercises in Panic!, write


macros to do the following:

a. Scan all processes and print their PID. (Use practive as the
head of the list of processes.)

b. Scan all the LWPs of a process; for each one, print whether it is
in kernel mode or user mode, the accumulated system time
and user time, and the resource usage.

adb Macros (Part II) 7-21


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
7

7-22 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
SPARC Assembler 8

Objectives
Upon completion of this module, you will be able to:

● Define and compare RISC and CISC architectures.

● With regard to SPARC assembler, describe the use of registers,


indirection, and branches and jumps.

● Explain how to find bad memory references in compiler-generated


code.

● Compile a C program into assembler code.

● Compare optimized and non-optimized assembler code.

Note – This module presents material covered in Chapters 15 and 16


of Panic!

✓ This module covers the SPARC version 8 architecture.

8-1
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

Assembler Characteristics

High-Level languages
● C, Fortran, C++, Java™, BASIC, COBOL

● Independent of target hardware

● Compiler may reorganize code

Low-Level Languages
● Assembly language

● Specific to architecture

● Direct mapping to executable

● Possible performance gains

● Complicated to write

● Necessary for some hardware manipulation

8-2 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

Assembler Characteristics

High-Level languages
Statements within programs written in high-level languages do not
directly correspond to the instructions used to implement the
program. A single statement can result in many instructions, or some
statements may result in no instructions at all if an optimizing
compiler detects redundant code.

When using a high-level language, the statements need not take


account of the actual processor on which the program will be run. That
is the responsibility of the compiler.

Low-Level languages
Low-level languages are symbolic or mnemonic representations of the
actual instructions to be performed by the hardware.

Writing code in a low-level language is usually less efficient than using


a high-level language, but it can result in performance gains when the
architecture is well understood. For some specific operations, high-
level languages cannot be used, since they do not support the direct
hardware manipulation required.

SPARC Assembler 8-3


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

CISC and RISC

Complex Instruction Set Computers (CISC)


● Rich instruction set

● Single instruction for much work

● Complex decoders

● Difficult to use power

Reduced Instruction Set Computers (RISC)


● Smaller number of instructions.

● Most instructions have a single task.

● Simpler instruction decoders.

● Power is obtained by rapid processing.

8-4 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

CISC and RISC

Complex Instruction Set Computers (CISC)


CISC architectures are those where a large number of different
operations are available, some of which perform complex operations
that could take multiple statements even in a high-level language.

Although CISC architectures enable efficient execution of particular


operations, it is difficult for compilers to recognize when the particular
operations can be used.

Reduced Instruction Set Computers (RISC)


A RISC architecture streamlines the architecture by reducing the
operations to those that compilers can easily use. By reducing the
number and complexity of instructions, all instructions become
similar, which enables rapid processing and advanced pipelining
techniques.

Older Architectures
An older, third class of architecture exists — which is not complex as
CISC, but old, simple, and slow.

SPARC Assembler 8-5


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

Basic SPARC Characteristics

RISC architecture
● Few, fast instructions

● More operations are required than CISC

Register-Intensive
● Many registers per CPU.

● Almost all instructions are register-based.

Memory Alignments
● Bytes (8 bits) – anywhere

● Short words (16 bits) – 2-byte boundary

● Long words (32 bits) – 4-byte boundary

● Double words (64 bits) – 8-byte boundary

8-6 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

Basic SPARC Characteristics

RISC Architecture
The SPARC architecture was designed to fully exploit the benefits
offered by RISC architectures. As a result, fewer instructions are
available than on a CISC architecture, but each one executes rapidly.
To perform any task, a large number of SPARC operations may be
required, but each of them is handled quickly.

Register Intensive
Memory speed has not increased at the same rate as CPU speed, so
frequent use of memory locations causes the processor to stall while
waiting for data. To reduce the number of loads from and stores to
memory, SPARC processors provide many general purpose registers.
With many registers, data can be held within the processor longer,
increasing the likelihood that an operation will not need to access
memory so frequently.

Instructions that reference registers are usually simpler than those that
reference memory.

Memory Alignments
To simplify the requirements of memory access whenever instructions
or data do need to be fetched, the SPARC architecture imposes
restrictions on how data must be stored. All data types must reside on
their natural boundaries.

SPARC Assembler 8-7


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

SPARC Instructions

Alignment Constraints
● All instructions are one full word.

● All instructions start on 4-byte boundary.

Instruction Pipelining

Fetch Decode Resources Execute Results

Register Indirect Memory Addressing


● No direct addressing

● Multiple instructions required:

● Load address into register

● Access memory using register

8-8 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

SPARC Instructions

Alignment Constraints
Just as with all data that is stored in memory, SPARC instructions must
be aligned to their natural boundary.

All SPARC instructions are 4 bytes, so all instructions must reside at a


4-byte boundary within memory. Thus the address of an instruction
always ends with binary 00. If the code attempts to execute data from
memory at some other aligned address, a memory alignment fault will
occur.

Instruction Pipelining
Rapid processing of instructions is obtained by simultaneously
performing the stages of execution on multiple instructions. This is
possible due the similarity and constant size of instructions. Compilers
for pipelined architectures will attempt to take advantage of the
pipeline by ordering the instructions in the most suitable manner. A
particular challenge is to order the instructions around conditional
branches, since the partial processing of instructions may be unused if
the branch causes the following code to be skipped.

Memory Access Always Register Indirect


Since all instructions are one word in size, it is not possible to directly
reference an address in a 32-bit address space. Instead, a register must
be set up with the address wanted and then the memory accessed
using the address specified in the register.

SPARC Assembler 8-9


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

SPARC Registers

Processor Status Register (PSR)


● Contains CPU state and condition codes.

EC EF

IMPL VER ICC reserved PIL CWP

31 0
N Z VC S PS ET

Program Counter (PC)


● Instruction currently in execution

Next Program Counter (nPC)


● Used for instruction prefetch

Partial Arithmetic Result (Y)

Large Number of General-Purpose Registers


● Varies between implementation

● Only 32 of total available at any time

Specific Control Registers


● Additional registers for trap processing and other tasks

8-10 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

SPARC Registers

As previously stated, as an aid to performance, the SPARC architecture


is register-intensive. In addition to a large number of general purpose
registers, there are several control registers.

Processor Status Register (PSR)


The PSR contains the CPU state for the currently executing task. When
a context switch occurs, the PSR is saved to enable the task to be
resumed in the same state as when it was switched out.

The set of condition codes are heavily used to implement various


branch instructions.

SPARC Assembler 8-11


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

Register Windows

32 Available General Purpose Registers


● Different registers are available for each function.

%i0
8 Input to
%i7

%g0 %l0
to 8 Global 8 Local to
%g7 %l7

%o0
8 Output to
%o7

Reserved Registers
● %g0 always zero

● %o6, %o7 stack pointer and return pc

● %g7 current thread

Argument Passing
● Stores arguments in output registers

● Redefines window

● Retrieves arguments from input registers


✓ The next module, "Stacks" has more information on using register windows.

8-12 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

Register Windows

32 Available General Purpose Registers


At any one time, there are 32 registers available to the current task.
Compilers attempt to keep the data required for the program in the set
of registers, to avoid costly memory accesses.

When a function call is made, another set of 32 registers can be made


available, allowing for a new set of local variables.

Reserved Registers
Global register %g0 is always zero. Reading from %g0 will return zero,
and writing to %g0 will not change its value. This enables %g0 to be
used as a destination register in certain operations where no output is
wanted. This is particularly useful for setting the integer condition
codes for the benefit of a subsequent conditional branch.

Argument Passing
Although different functions have different registers available, by
arranging that the new set of registers overlap with the old set of
registers, arguments can be passed without having to write to the
stack.

The next module covers argument passing in detail.

SPARC Assembler 8-13


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

SPARC Instruction Types

Load and Store


● Between memory and registers

ld [address], reg

st reg, [address]

ld [%g7 + 0xa0], %l7

st %i1, [%fp + 0x48]

Arithmetic and Logical


● Only using registers and immediate values

Transfer of Control
● Jumps, conditional branches, and traps

Control Register Manipulation


● Kernel only

Coprocessor and Floating Point Instructions


● Control of coprocessors

Miscellaneous
● Flush instruction cache

8-14 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

SPARC Instruction Types

There are several different classes of SPARC instructions. For more


details see Appendix B of Panic! or the SPARC Architecture Manual,
Version 8 published by Prentice Hall, ISBN 0-13-825001-4.

The SPARC Architecture Manual, Version 9 also published by Prentice


Hall (ISBN 0-13-099227-5), describes the SPARC architecture as used in
UltraSPARC™ workstations.

SPARC Assembler 8-15


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

Synthetic Instructions

● Convenience instructions

● Implement common operations

● %g0 often used

The following are example synthetic instructions:

Synthetic Instruction Actual Instruction

tst %g6 orcc %g0, %g6, %g0

mov 0x10, %o0 or %g0, 0x10, %o0

8-16 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

Synthetic Instructions

Synthetic instructions are instructions available to the assembler


language programmer as shorthand for common operations. When
disassembling with adb, some instructions are recognized and given
their synthetic mnemonic.

The instructions shown are used to set the condition codes (as a result
of performing a bitwise OR with 0), and for loading an immediate
value into a register.

SPARC Assembler 8-17


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

SPARC Transfer of Control

Unconditional
call label

jmpl address, reg

Conditional
ba

bleu

bpos

Delay Instruction
● Load output register

nop

Annul Bit
● Only execute delay instruction if branch is taken

bne,a

Trap on Integer Condition Code


ta

tleu

tpos

8-18 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

SPARC Transfer of Control

There are many different variants of control transfer instructions —


some of which are actually synthetic instructions.

Since the SPARC architecture uses a pipeline to decode instructions,


the instruction following a branch is usually ready and waiting to be
executed long before the first instruction at the target address can be
fetched and decoded. It is usual for the instructions that are apparently
after the branch to be executed before the target of the branch. This is
often used to load an output register, or sometimes simply filled with
a nop (no operation) instruction.

The trap instructions always generate software traps (see Module ,


"Traps"), and never execute the delay instruction.

SPARC Assembler 8-19


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

Sources of Problems

Load and Stores


● Data faults

● Alignment faults

Branches
● Jumps to incorrect address

Illegal Instructions
● Jumps to data

8-20 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

Sources of Problems

When debugging crash dumps, the crashes can often identify the
instruction that caused the problems. However, the instruction is an
assembler instruction, which may simply be using data that it was
incorrectly handed.

It is often necessary not only to isolate the faulting instruction, but to


determine why the registers it used contain incorrect data.

SPARC Assembler 8-21


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

Lab Exercises for Module 8

1. Change into the Lab8 directory, and view the file cards.c. Ensure
you understand the operation of the program.

2. Compile the program using make cards or cc -o cards


cards.c. Run the program to verify the output is as you expected.

3. Now compile the C source code to assembly language into the


cards.s file.

# cc -S cards.c

4. View the assembler file and use the assembler comments


(preceeded by !) to relate this code with the C source.

5. Compile from the assembler to an executable using the following


command:

# cc -o cards cards.s

6. Notice how the variables point1 and point2 have their values
stored at a location relative to the frame pointer, %fp. The values
are then retrieved into local registers and copied into output
registers before the call to brucey.

Edit the cards.s file and remove some of the unnecessary register
transfers. To test your hand optimization, compile and run the
edited assembler.

# cc -o hand cards.s
# ./hand

7. Count the number of lines of actual assembler instructions once


you have performed some manual optimization.

8. Use the compiler optimization. The Makefile supplied will create


executables and assembler files for the standard optimization
options.

# make

9. Inspect the optimized assembler and see where you could have
done better.

8-22 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

Lab Exercises for Module 8

10. The file sumem.c adds two numbers and prints the result. Compile
the file and run the executable to verify the program functions
correctly.

# cc sumem.c -o sumem
# ./sumem
Sum is 7

11. lostem.s is an assembler version of sumem, which has lost


instructions. Assemble and run lostem.

# cc lostem.s -o lostem
# ./lostem
Sum is 0

12. Edit lostem.s, and replace the missing lines in the three blocks as
described by the comments within the file.

13. Compare your solution with the assembler version of sumem.

# cc -S sumem.c

14. woops.s is the assembler version of woops.c. Compile and


execute woops.s.

# cc woops.s -o woops
# ./woops
Before swap, M=5, N=10
Segmentation Fault(coredump)

15. By viewing woops.s, see if you can spot where the bug is, and
relate this back to woops.c.
✓ diff woops.c swoop.c will identify the bug.

SPARC Assembler 8-23


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
8

8-24 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
Stacks 9

Objectives
Upon completion of this module, you will be able to:

● Describe SPARC register windows and how they are used.

● Backtrace through a stack within an adb session.

● Display registers and stacks within an adb session.

● Describe how functions with more than six arguments are


handled.

Note – This module presents material covered in Chapters 17 and 18


of Panic!.

9-1
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

Use of Stacks

Function Calling
● Nested functions

● Unknown level of nesting

Generic Frame

fr_savfp Saved frame pointer

fr_savpc Saved program counter

fr_arg[]
Array of calling arguments

Pushing
● Creating a new frame

Popping
● Returning to use previous frame

9-2 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

Use of Stacks

Function Calling
In programming languages such as C, functions may be called in any
order, and may themselves call other functions. At the time of writing
the program, the order of function execution and the degree of
function call nesting are not usually known. Since each function
requires some storage space, a dynamically sized area is required.

Stack Frames
Each time a function is called, the called function must be able to
return back to the caller at the point of call to carry on executing the
original function. The return address is stored on the stack with space
allocated for any locally defined variables. The return information, and
the local storage is known as the stack frame. A generic stack frame is
shown, which includes space for storing the previous extent of the
stack, the point at which the function was called, and the arguments.

Pushing
Whenever a new stack frame is allocated it is pushed onto the stack,
growing it in size.

Popping
Returning from a function enables function data to be discarded, and
so the stack frame is popped off the stack, leaving the caller’s frame
back at the top of the stack.

Stacks 9-3
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

A Generic Stack in Use

main()
{
fred(1, 2, 3);
}

int fred(int a, int b, int c)


{}

0xefff ffd8 0xefff ffec Saved fp

0xefff ffdc main+1c Saved pc


fred()’s frame
0xefff ffe0 1
Arguments
0xefff ffe4 2

0xefff ffe8 3

0xefff ffec 0

0xefff fff0 0

main()’s frame 0xefff fff4 0

0xefff fff8 0

0xefff fffc 0

9-4 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

A Generic Stack in Use

In the example program (taken from Panic!, Chapter 17), main is


automatically called by the operating system, so a stack frame is
created. main then calls fred, requiring a stack frame to be pushed.

The frame contains the previous frame pointer, so that when main is
resumed, the location to store any further stack frames will be known.
The saved program counter is an indication of where execution should
continue after returning from fred. On a SPARC processor, the
address of the instruction that called fred is stored, so execution will
resume after that address.

Note that the stack usually grows from high addresses to lower
addresses. The analogy of pushing something onto the stack works
best if you think of a diagram with high addresses at the bottom.

Stacks 9-5
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

SPARC Stack Frames

From <sys/stack.h>
/*
* A stack frame looks like:
*
* %fp->| |
* |-------------------------------|
* | Locals, temps, saved floats |
* |-------------------------------|
* | outgoing parameters past 6 |
* |-------------------------------|-\
* | 6 words for callee to dump | |
* | register arguments | |
* |-------------------------------| > minimum stack frame
* | One word struct-ret address | |
* |-------------------------------| |
* | 16 words to save IN and | |
* %sp->| LOCAL register on overflow | |
* |-------------------------------|-/
*/

From <sys/frame.h>
/*
* Definition of the sparc stack frame (when it is pushed on the stack).
*/
struct frame {
int fr_local[8]; /* saved locals */
int fr_arg[6]; /* saved arguments [0 - 5] */
struct frame *fr_savfp; /* saved frame pointer */
int fr_savpc; /* saved program counter */
char *fr_stret; /* struct return addr */
int fr_argd[6]; /* arg dump area */
int fr_argx[1]; /* array of args past the sixth */
};

9-6 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

SPARC Stack Frames

From <sys/stack.h>
The /usr/include/sys/stack.h header file contains a few symbolic
constant definitions, but most usefully it contains a diagram of how
SPARC processors use stacks. The diagram shows high addresses at
the bottom.

From <sys/frame.h>
The actual definition of a stack frame is contained within
/usr/include/sys/frame.h. Since the first elements will be at the
lower addresses when using structures, it is necessary to remember the
required inversion when relating the structure to the diagram on the
previous page.

The 16 words mentioned in the diagram are actually declared as the


first 4 elements in the structure. The 8 local registers in use at the time
of the call are stored first. Next are the 8 registers used to pass
arguments to functions, as input to the called function. Although there
are eight 8 registers, only 6 are available for argument passing. The
seventh is used as the saved frame pointer, and the eighth as the saved
program counter.

The location of the saved frame pointer and program counter mean
that the fifteenth and sixteenth words of a stack frame can normally be
inspected to determine where the function was called from.

The structure return address is for functions that return structures. The
structure will be placed elsewhere, with the stack containing its
address.

The arg dump area used is to reserve some temporary space for the
called function to copy the arguments to it.

Beyond the minimum stack frame from the diagram, more memory
can be used to store extra arguments and local variables. Since it is
unknown how many there will be, the stack frame is not of fixed sized.
It is always at least 24 words.

Stacks 9-7
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

SPARC Registers and Argument Passing

Register Windows

Processor Registers

Window N-3
8 registers 8 out registers
8 registers 8 locals
Call function Window N-2
8 registers 8 out registers 8 in registers
8 registers 8 locals
Window N-1
8 registers 8 out registers 8 in registers
8 registers 8 locals
Window N
8 registers 8 out registers 8 in registers
8 registers 8 locals Registers available
from current
8 registers function
8 in registers
8 registers
Globals
Return from function

9-8 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

SPARC Registers and Argument Passing

Register Windows
To increase the speed of argument passing, SPARC processors store the
arguments to a function in output registers of their current window.
When a function call is made, a new register window can be made
available by shifting the current window pointer (CWP). Since the shift
overlaps the out and in registers, the arguments to the called function
are available from the in register set of the new window.

Arguments to functions are only copied from the registers to the stack
when the registers must be flushed. Registers are flushed on a context
switch, when a windows overflow trap is handled, and by the panic
sequence.

Stacks 9-9
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

Saving and Restoring

Saving
save %sp, -0x60, %sp

● Issued by called function

● Decrement window pointer

● Set new stack pointer value

● Can cause window overflow trap

● Leaf routines need not call save

Restoring
restore

● Issued by called function

● Increments window pointer

● Can cause window underflow trap

Stack Pointer (%sp)


● Stored in %o6

Return Address
● Stored in %o7

9-10 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

Saving and Restoring

Saving
A called function shifts the register window to create its own set of
registers. Atomically with shifting the window pointer, the new %sp is
set to be the value of the old %sp, plus whatever increment is specified
(-0x60 in the example shown).

If the newly called routine does not require any registers other than
the output registers of the calling function and (itself) does not call any
other functions, then there is no need to call save. Such a routine is
known as a leaf routine.

Restoring
restore is usually used as a synthetic instruction with no arguments.
(see the SPARC instruction set for additional arguments.)

Calling restore does not return to the calling function, but makes the
caller’s register set current.

Stacks 9-11
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

Window Overflows and Underflows

Current Window Pointer (cwp)


● Within processor status register (psr)

● Decremented by save

● Incremented by restore

Window Invalid Mask (wim)


● 32-bit register

● Single bit is set

Saving to the Stack


● Registers to stack on overflow

● Registers from stack on underflow

9-12 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

Window Overflows and Underflows

Current Window Pointer (cwp)


The current window pointer indicates which register set is appropriate
for the current routine. The version 8 SPARC architecture allows for up
to 32 register windows, requiring 5 bits for the window pointer.

Window Invalid Mask (wim)


A single bit is set in the window invalid mask to indicate when a
overflow or underflow occurs due to a save or restore.

Stacks 9-13
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

SPARC Argument-Passing Example

Simple program
main()
{
fred(1, 2, 3);
}

int fred(int a, int b, int c)


{}

Disassembled Code

main: save %sp, -0x60, %sp


main+4: mov 0x1, %l0
main+8: mov 0x2, %l1
main+0xc: mov 0x3, %l2
main+0x10: mov %l0, %o0
main+0x14: mov %l1, %o1
main+0x18: mov %l2, %o2
main+0x1c: call fred
main+0x20: nop
main+0x24: ret
main+0x28: restore

fred: save %sp, -0x60, %sp


fred+4: st %i2, [%fp + 0x4c]
fred+8: st %i1, [%fp + 0x48]
fred+0xc: st %i0, [%fp + 0x44]
fred+0x10: ret
fred+0x14: restore

9-14 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

SPARC Argument-passing example

The program shown here is from Panic!, Chapters 17 and 18. The first
stage of this module’s exercises will use this program to work through
the presentation in Panic!.

Stacks 9-15
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

SPARC Frame Pointer and Stack Pointer

Use of General Purpose Registers


Window N-1

Outs

Window N Locals

Outs %o6 = %sp %i6 = %fp Ins

Locals

Ins

Relationship to Stack Memory

Memory

Frame for
Window N-2 Window N-2
%sp

Window N-1
Frame for %sp %fp
Window N-1

%sp %fp

Frame for
%fp
Window N
Window N

9-16 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

SPARC Frame Pointer and Stack Pointer

SPARC processors do not have a dedicated stack pointer, but use one
of the general purpose registers (%o6) instead.

After a register window shift, the output registers become the next
window’s input registers. Consequently, the previous frame’s stack
pointer is always available. It is referred to as the frame pointer (%fp).

Stacks 9-17
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

Registers From adb

All Registers of Current Set


$r
g0 0x0 l0 0x44000c0
g1 0x0 l1 0x0
g2 0x0 l2 0x1
g3 0x0 l3 0xf0152400 backfs+0x50
g4 0x0 l4 0xfc32f6c0
g5 0x0 l5 0xfc32f6e4
g6 0x0 l6 0xf0172f54 panic_regs
g7 0x0 l7 0xf05b0ae4
o0 0x0 i0 0xf0048b38 complete_panic+0xf8
o1 0x0 i1 0xf05b0c74
o2 0x0 i2 0xf05b0b00
o3 0x0 i3 0x306ce
o4 0x0 i4 0x0
o5 0x0 i5 0x1
sp 0xf05b0b00 fp 0xf05b0b68
o7 0xf0048b38 complete_panic+0xf8
i7 0xf0048718 do_panic+0x1c
y 0x0
psr 0x0
pc 0xf0048b38 complete_panic+0xf8
npc 0x0
complete_panic+0xf8: unimp 0x0

Individual Registers
<o7/i
complete_panic+0xf8: call setjmp

<sp/16X
0xf05b0b00: 44000c0 0 1 f0152400
fc32f6c0 fc32f6e4 f0172f54 f05b0ae4
f0048b38 f05b0c74 f05b0b00 306ce
0 1 f05b0b68 f0048718

fr_savfp fr_savPC

9-18 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

Registers From adb

All Registers of Current Set


$r displays all the current registers.

Individual Registers
Individual registers are accessed and used in adb as if they were
variables.

Displaying 16 words from sp shows the start of a stack frame.

%l0 – local0 %l1 – local1 %l2 – local2 %l3 – local3


%l4 – local4 %l5 – local5 %l6 – local6 %l7 – local7
%i0 – arg0 %i1 – arg1 %i2 – arg2 %i3 – arg3
%i4 – arg4 %i5 – arg5 %i6 – *savfp %i7 – savPC

Stacks 9-19
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

Passing More Than Six Arguments

Only Six Output Registers Available


● %o0 to %o5

● %o6 for caller’s stack pointer

● %o7 for saved program counter

Space Reserved on Stack


● Beyond arg dump area

Using Caller’s Frame


● Called function’s frame available only after call

9-20 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

Passing More Than Six Arguments

Only Six Output Registers Available


Functions taking six or fewer arguments are handled easily by %o0 to
%o5.

Since two of the eight output registers are always used for the stack
pointer and the saved program counter, passing more than six
arguments requires another method.

Space Reserved on Stack


Functions that take more than six arguments have the seventh and
beyond passed on the stack. The stack frame definition contains the
fr_argx array to contain an arbitrary number of arguments.

Using Caller’s Frame


Unfortunately, the extra arguments cannot be placed on the called
function’s stack frame, since the frame only exists after the save
instruction is issued, within the called function. Therefore, the extra
arguments are passed at the end of the caller’s frame.

The called function references these arguments through %fp, which is


equal to the caller’s stack pointer.

Stacks 9-21
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

Lab Exercises for Module 9

1. The programs discussed in class and in Panic! are in the Lab9


directory. Change into Lab9 and review the use of stacks using
little.c and nine-args.c.

2. Also in the Lab9 directory is spaghetti. Compile and run


spaghetti.

# cc spaghetti.c -o spaghetti
# spaghetti
Segmentation Fault(coredump)

3. Use adb to view the stack of the core file.

# adb spaghetti core


core file = core -- program ``spaghetti''
SIGSEGV 11: segmentation violation

$c
___________________________________________________
___________________________________________________
___________________________________________________

4. Display the stack and the calling instructions. Start by dumping


the stack frame from the stack pointer and noting the 15th and 16th
words.

<sp/16X
___________________________________________________

5. Display the calling instruction and the previous stack frame, again
noting the 15th and 16th words from the stack frame.

16th_word?i _____________________________________________
15th_word/16X___________________________________________

6. Repeat the display for the next frame.

16th_word?i _____________________________________________
15th_word/16X___________________________________________

9-22 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

Lab Exercises for Module 9

7. Display the last instruction before the program terminated (which


caused the segmentation violation).

<pc?i
________________________________________________________

8. If the instruction is not a load (ld) instruction within the incr_pv


function, ask your instructor for assistance.

9. Determine the location being loaded from. Does it seem correct?


Attempting to read from location 0
________________________________________________________

10. View the assembly code for incr_pv.

incr_pv?10i

11. Where is the value for the faulting load coming from? What do
you think is happening?

________________________________________________________
Calling function passes value instead of reference
________________________________________________________

________________________________________________________

________________________________________________________

________________________________________________________

________________________________________________________

________________________________________________________

________________________________________________________

________________________________________________________

________________________________________________________

________________________________________________________

________________________________________________________

Stacks 9-23
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
9

9-24 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
Kernel Internals 10

Objectives
Upon completion of this module, you will be able to:

● Describe user mode and kernel mode.

● Explain what a thread is and how it relates and compares to


processes.

● Describe how system calls are implemented.

● Define virtual memory, and explain how it is used.

● Describe process scheduling and the various thread states.

● Trace through kernel data structures.

Note – This module presents material covered in Chapters 19 to 22 of


Panic!.

10-1
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

Kernel Mode

Processes
Library code
User code

User mode

System call interface

kernel mode Process control

Scheduling
Interprocess communication

Memory management
I/O handling

Device drivers

Hardware

10-2 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

Kernel Mode

The kernel exists to provide various kernel services, to manage system


resources, and to control and protect the system hardware. To
manipulate any kernel resources or to access system resources
(hardware or software), the system must run in kernel, or supervisor,
mode.

On a SPARC processor, kernel mode is entered only in response to a


trap. Traps result from user mode execution of a system call, or they
are generated by hardware conditions such as device interrupts.

Kernel Internals 10-3


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

Programs and Processes

Programs
● Written

● Compiled

● Stored on disk

Processes
● In some state of execution

● Instances of programs

● Require resources

● Memory

● CPU

10-4 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

Programs and Processes

Programs
Programs are written, compiled, and stored on disk. They may exist
for long periods of time, but they cannot change any state of the
machine or of other programs.

Processes
A process must be created to execute the program. At startup of the
process, the kernel allocates a collection of resources. Thereafter, the
process carries out the instructions of the program and may request
further resources or kernel services.

At any one time, several processes can run the same program.

Kernel Internals 10-5


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

Threads

User-level threads

User space
LWP LWP
Kernel

Kernel threads

Kernel Schedules Threads


● Not processes

10-6 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

Threads

All processes will carry out a sequence of instructions from the


program. The Solaris 2 operating system enables processes to request
the resources to enable multiple sequences of execution to be
performed concurrently. Each sequence of execution is represented by
a thread.

Internally, programs can request that they have multiple sequences


available as user-level threads. Additionally, they may request that
more than one internal thread be executable at any one time, requiring
multiple kernel threads.

From the user address space, each kernel execution capability is


represented as a light-weight process (LWP). The kernel assigns a
kernel thread to each LWP and also creates kernel threads to perform
internal kernel housekeeping operations.

The Solaris 2 kernel schedules kernel threads rather than the more
traditional UNIX scheduling of processes.

Kernel Internals 10-7


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

Scheduling

Scheduling of Kernel Threads

TS_ONPROC

Resume
Block Preempt
cpu

TS_RUN
TS_SLEEP

Wake up

TS_STOPPED

Preemption
● Time-sliced

● Higher-priority thread

10-8 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

Scheduling

Scheduling of Kernel Threads


There are three major scheduling states: TS_ONPROC, TS_RUN, and
TS_SLEEP. Each processor can be executing a single thread, so there
will be, at most, one thread in the ONPROC state per CPU. If there are
more threads that are to run, they will be queueing on dispatch
queues, waiting for their turn to use the CPU.

Once a thread is running, it may leave the processor either voluntarily


or involuntarily. Should the current thread make a blocking system
call, the thread voluntarily transfers to a sleep queue. The current
thread may also be forced to leave the processor through a
preemption. Preemption occurs either because a higher-priority thread
can be run, or because the current thread is subjected to a time-sharing
scheduling strategy and has reached the end of its time slice.

Whenever a processor becomes vacant, the highest priority thread that


can be run resumes.

When a thread that was sleeping is awakened, the thread is removed


from the sleep queue and added to a run queue. If it wakes at a higher
priority than the currently executing thread, then a preemption occurs.

A typical reason a thread might block is due to a page fault. This


happens when the data the thread requires is not in memory, but is
resident on disk. While the disk driver is retrieving the data, the
thread sleeps. The disk controller generates an interrupt when the
requested data is available, causing an interrupt service routine to run.
The service routine wakes up the thread that requested the data and
places it on a dispatch queue.

Kernel Internals 10-9


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

Virtual Memory

Process Address Space

0xffffffff
Stack

shared library

Data

Text

Memory Management Unit

Secondary memory

CPU MMU Page fault

Physical memory

Virtual address

Physical address

10-10 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

Virtual Memory

Process Address Space


Every process has access to the full range of the processor’s address
space. The address space is populated by segments, each representing
a particular file or area of private memory.

Memory Management Unit


Whenever the CPU executes instructions on behalf of a process, the
addresses for the instruction and data are sent from the CPU as virtual
addresses and translated to the actual physical address in memory by
the memory management unit (MMU).

If the MMU does not have a valid record of the requested virtual
address, a fault occurs asking the kernel to run the appropriate
routines to make the data available and load the MMU with the data’s
location. If the data is successfully loaded, the CPU reissues the
request, which is translated to the appropriate physical address in
memory.

Kernel Internals 10-11


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

Kernel Data Structures

Kernel Representation and Control of a Process

Stack seg

Library seg

Data seg

Text
seg

as
proc

lwp
lwp thread tsproc

lwp thread tsproc


lwp

10-12 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

Kernel Data Structures

Kernel Representation and Control of a Process


Each process has an address space plus at least one kernel thread.
There is one proc structure for each process, which contains pointers
to other structures describing the state and context of the process.

The address space is described by an as structure pointing to a linked


list of seg structures, each relating to a segment of memory of the
process.

Each thread within the process has a kernel thread structure.


Accompanying the kernel threads are lwp structures and scheduling
structures — shown here as tsproc structures for the timesharing
scheduling class.

Kernel Internals 10-13


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

Lab Exercises for Module 10

These exercises lead you through the analysis of a core dump after
the system performs very poorly. The techniques can be applied to
a crash dump obtained from hung systems.

1. Ensure that there is sufficient space in the crash directory to save


another core file, and that savecore is enabled. Exit the window
system to reduce the number of active threads.

2. Go to the Lab10 directory, and run the StartMeUp script.

3. Attempt to force a core dump. If your computer hangs at the ok


prompt, try again; then ask your instructor for assistance.

4. When the computer has successfully rebooted and saved the core
files, go to the crash directory and perform initial analysis.

Use strings and grep to verify that the core file was obtained
forcefully by checking for the panic: zero message.

5. Use the cpus macro within adb to determine the number of


runnable threads.

________________________________________________________

6. Use the threadlist adb macro to save the state of all the kernel
threads to a file.

# adb -k unix.n vmcore.n


physmem 1e6d
$> /tmp/threads
$<threadlist
$q

7. View the file you have saved the list of threads into, and look for
any threads that may be of interest.

In general, threads that have short stack traces ending with


cv_wait or cv_wait_sig_swap are common. Threads with stack
traces longer than most or involving unknown kernel routines
(perhaps from a third-party device driver) may be of interest. In
this case, you should see a typical set of normal threads.

10-14 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

Lab Exercises for Module 10

8. The StartMeUp script ran a program named rattle.

Use crash to list the processes at the time of the panic. Save the
listing to a file, and use the full listing of the rattle process to
determine the threadp of rattle. Use the slot number for the
rattle process.

# crash -n unix.n -d vmcore.n


dumpfile=vmcore.n, namelist=unix.n, outfile=stdout
> p
> p -w /tmp/procs
> p -f rattle_slot
> q

Note the value of threadp:

________________________________________________________

9. Display the thread at the address above using the thread adb
macro. Note the values for the fields below.

kthread_t
state
pri
pc
sp
clfuncs
cpu
forw
procp

10. Supply the address of procp to the proc2u macro, and look at the
ps_args fields to verify that you are inspecting the rattle
process.

Kernel Internals 10-15


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

Lab Exercises for Module 10

11. Supply the procp address to the proc macro to determine the
number of LWPs in the rattle process.

________________________________________________________

12. Use the forward pointer (forw) from the thread displayed in step 9
to trace through the next kernel thread of the rattle process.
Continue using the forward pointers until you have noted the
details of all the kernel threads of the process. Use the thr macro
you wrote in module 7, or the thr macro provided.
forw from step 9
kthread_t kthread_t kthread_t kthread_t
state
pri
pc
sp
clfuncs
forw

a. Compare the number of kernel threads with the number of


LWPs determined in step 11.

b. Are all the threads running at the same priority? ____________

c. For each thread, use the address you noted for sp and the $c
adb function, and record the function name at the top.
sp_value$c_______________________________________________
sp_value$c_______________________________________________
sp_value$c_______________________________________________
sp_value$c_______________________________________________

d. Using the address you noted for the pc for each thread,
determine the last instructions they called.
pc_value/i_______________________________________________
pc_value/i_______________________________________________
pc_value/i_______________________________________________
pc_value/i_______________________________________________

10-16 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

Lab Exercises for Module 10

e. To determine the scheduling class of each thread, dump the


value at the address against clfuncs, and note the label
displayed at the left of the returned value.
clfuncs_value/X___________________________________________
clfuncs_value/X___________________________________________
clfuncs_value/X___________________________________________
clfuncs_value/X___________________________________________

✓ The next exercise is easier to do than to explain!

13. The StartMeUp script started some other processes running an


infinite loop at very low priority. At least one of them will be on
the dispatch queues of the CPU.

Display the cpu structure using the address noted in step 9.

The first dispatch queue’s address is listed by the cpu macro under
the queue heading. The queues are at the following addresses,
each taking three 4-byte words as described in <sys/disp.h>.

Using the address under queue in the CPU structure, dump three
words at a time until you find a queue with runnable threads.

queue_value/3X
Return

forw from step 9

struct cpu
queue

dq_first dq_last dq_sruncnt


dq_first dq_last dq_sruncnt

Kernel Internals 10-17


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

Lab Exercises for Module 10

14. The address within the dispatch queues are of threads waiting to
run. Use the first nonzero address found to display the thread,
note the procp value, and use the proc2u macro to determine the
name of the process.

___________________________________________________________
___________________________________________________________

Continue inspecting threads on the dispatch queues until you find


a roll process. Write down which queue you find the thread on,
the address of the thread, and the address of its proc structure.

Queue _____________________________________________________
Address of thread ___________________________________________
Address of proc_____________________________________________
✓ The roll process is likely to be the first one found, since it is run with a very low priority.

15. Using the thread macro, determine the state, priority, program
counter, and scheduling class of the thread you have found. What
do the state and clfuncs fields indicate?

kthread_t
state
pri
pc
clfuncs

16. What was the last instruction this thread executed?

___________________________________________________________

17. The dispatch queues are in priority order. Does the first queue
represent the lowest or the highest priority?

___________________________________________________________

10-18 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

Lab Exercises for Module 10

You have now traced through several important kernel data


structures. The diagram below shows the picture so far.

proc

thread

Last ran on
proc

cpu
Current thread
cpu

Dispatch queues
thread
first last cnt
first last cnt

First on queue

thread

proc

Kernel Internals 10-19


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

Lab Exercises for Module 10

18. Using the roll process you found waiting on the dispatch queue,
trace through the relevant data structures to determine the
composition of its address space.

a. Use the procp field from step 14 to redisplay the proc


structure, and note the value shown under as.
as_____________________________________________

b. Supply the above address to the as macro and record the value
given under segs (and the number of segs).
segs __________________________________________
nsegs _________________________________________

c. Supply the segs address from above to the seglist macro


and using the returned base and size fields complete the
diagram below to show the address space of roll. On the
right, draw the address space approximately to scale.

0xfffffffff 0xfffffffff

Address Address

Size =
Base =

Size =
Base =

Size =
Base =
Size =
Base =

Text Size =
0 Base = 0
NOT TO SCALE
The roll program is statically linked. Dynamically linked
processes normally have more segments.

10-20 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

Lab Exercises for Module 10

19. The segment with the highest base address is the stack of the
process. Beneath the stack is a segment from an mmapped file.

a. Check that the adb has displayed the ops field for the seg
under the stack as segvn_ops, then use the segvn macro on the
data value. Continue to fill out the diagram below to discover
which inode represents the file mapped in.

Use the vnode, inode, and page macros to display the


required data. Note that the pagenum is the number of the
physical page frame being used.

seg
data

segvn
vnode

vnode
data
pages

page inode
vnode ino
offset
pagenum

b. Which page of the file is mapped in? (Look at offset) _____

c. How many pages are in memory for this file? (Use the page’s
vpprev and vpnext pointers to help.) ____________________

Kernel Internals 10-21


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

Lab Exercises for Module 10

20. The roll process also has the mmapped file still open. Again using
the procp field from step 14, display the user structure (see
<sys/user.h>) using the proc2u macro.

At the end of the user structure is the start of the process’s open
file table. The first three entries are for standard input, output, and
error. The next entry is the file opened. Supply the address of the
fourth entry to the file macro and complete the diagram.
To /dev/tty
user

uf_ofile_________ uf_pofile refcnt


uf_ofile_________ uf_pofile refcnt
uf_ofile_________ uf_pofile refcnt
uf_ofile_________ uf_pofile refcnt

file
vnode

vnode
data

inode
ino

Note – In this case the open file has been mapped in, and so is also
referenced through the segments. Commonly, opened files are not
mapped in, and so do not necessarily have segments associated.

10-22 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

Lab Exercises for Module 10

You have now traced through several more important kernel data
structures. The diagram below shows their relationship.

page page

inode inode

vnode vnode
seg

seg segvn

file
seg

user

as seg
proc

lwp thread

Kernel Internals 10-23


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
10

10-24 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
Device Drivers 11

Objectives
Upon completion of this module, you will be able to:

● List the requirements to install device drivers.

● Describe when and how device drivers are called.

● List and describe the major device driver functions.

Note – This module presents material covered in Chapter 23 of Panic!.

11-1
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
11

Use of Device Drivers

Code for Specific Hardware


● Device-specific control functions

● Device-specific data formats

● Device-specific access mechanisms

Required for New Hardware


● Add hardware, require extra driver

● Upgrade hardware, upgrade driver

User Requests
● Translating system calls into device specific actions

Device Service Requests


● Ensuring hardware requirements are met

● Interrupt service routines

11-2 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
11

Use of Device Drivers

Code for Specific Hardware


Every hardware device has its own characteristics and peculiarities. To
control each device, a particular set of routines must be written, which
take account of the device specific control functions, data formats and
access restrictions. The collection of routines is known as the device
driver and is added to the kernel as a single module.

Required for New Hardware


Since new versions of hardware require different (or additional)
control sequences or provide data in a different format, a new device
driver is usually required for each device and each new version of
each device.

Some devices present a standard interface while changing the internal


organization or structure, in which case the previous driver routine
still functions correctly.

User Requests
Device drivers are responsible for servicing user requests resulting
from system calls. The standard system call interfaces will require
translation into the device-specific operations.

Device Service Requests


Drivers must be able to service any requests and requirements the
device hardware may have. In particular, the driver must respond to
interrupts generated by the driver, determining which device
interrupted and why. Having claimed the CPU interrupt, the driver
must apply whatever operations are required to service the device and
cancel the interrupt.

Device Drivers 11-3


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
11

Loadable Drivers

Loadable Kernel Modules

ufs
sd

sbus genunix
scsi

unix

dma
mydrv
TS specfs

Loaded When Required


● On first access to device

● Manually loaded using the modload command

11-4 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
11

Loadable Drivers

Loadable Kernel Modules


Device drivers in the Solaris 2 operating system are all loadable kernel
modules. Thus, when a device is not in use, there is no need for the
kernel to contain the code to service and use the device. When a driver
is installed, its presence is made known to the kernel, so that when the
associated hardware is accessed, the kernel has a record of which
driver to load.

Loaded When Required


On first reference to the device, the kernel loads the driver module into
memory and links the routines into the kernel. Thereafter, the module
remains in memory until the system is shut down, or when an extreme
memory shortage causes the kernel to attempt to unload modules.

Driver modules can be loaded into memory directly by issuing a


modload command. This is usually only used for testing the driver.

Device Drivers 11-5


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
11

Driver Functions

Configuration and Initialization


● XXprobe()

● XXattach()/XXdetach()

● XXidentify()

User Level
● XXopen()

● XXclose()

● XXstrategy()

● XXprint()

● XXread()

● XXwrite()

● XXioctl()

● XXmmap()

● XXsegmap()

● XXchpoll()

Interrupt Service Routine


● XXintr()

11-6 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
11

Driver Functions

Configuration and Initialization


A set of functions isrequired to manage the device driver itself. These
functions are used to associate the driver with the particular device
and to ensure the device is really functioning.

User Level
Device driver routines conventionally have standard names prefixed
by a short identifier of the device.

Driver Function Use

XXopen() Gain access to a device


XXclose() Relinquish access to a device
XXstrategy() Perform block I/O
XXprint() Display a driver message on system console
XXread() Read data from a device
XXwrite() Write data to a device
XXioctl() Control a character device
XXmmap() Check virtual mapping
XXsegmap() Map device memory into user space
XXchpoll() Poll entry point

Interrupt Service Routine


● XXintr()

Device Drivers 11-7


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
11

Installing Drivers

Compile and Link


# cc -D_KERNEL -Dsparc -O -c driver.c
# ld -r -o driver driver.o

Install
# cp driver /usr/kernel/drv
# add_drv driver

Configuration Files
● Required for non self-identifying devices
# cp driver.conf /usr/kernel/drv

11-8 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
11

Installing Drivers

Compile and Link


A driver module must be compiled with the appropriate defines to
indicate the generation of kernel code for a particular architecture. The
final executable must be created as a relocatable object with no main()
entry point.

Install
Kernel modules must reside in a location known to the kernel at boot
time. The /etc/system file enables the definition of the module path,
which defaults to the directories /platform/platform-name/kernel,
/kernel, and /usr/kernel. Under these module path directories, the
kernel will expect to find various subdirectories including drv for
device drivers.

The driver module must be copied to a driver module directory and


the kernel notified of the new driver.

Configuration Files
Some devices, known as self-identifying, have a small amount of code
on the device itself, which can be executed to identify the
characteristics of the device. Other devices require configuration files,
which must reside in the same directory as the driver and named as
the driver module with a .conf extension.

Device Drivers 11-9


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
11

Drivers and Crashes

Monitor Driver Usage


● Initial access?

● Heavy use?

The /etc/system File


● Driver options

● moddebug
set moddebug | 0x80000000

Core Analysis
● Save multiple cores

● Use threadlist macro

11-10 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
11

Drivers and Crashes

Although strictly the responsibility of the writer of the device driver


software, the complexity of some driver code means that some bugs
may arise after the driver has been installed. Any stack traceback
containing the name of a driver routine is a good indication that there
may be problems with the driver.

Monitor Driver Usage


If crashes occur only when the driver is first used or when the driver is
placed under a heavy load, the driver itself should be suspected of
containing bugs.

The /etc/system File


Drivers often contain debug variables, which may have their values set
in the /etc/system file. Such variables can cause debug messages to
appear on the console, so that the execution of driver routines can be
traced.

The moddebug kernel variable can be used to affect how kernel


modules are loaded and unloaded. Setting the top bit of the moddebug
variable causes loading and unloading messages to be written to the
console. More settings for the moddebug variable can be found at the
end of the /usr/include/sys/modctl.h file.

Core Analysis
If multiple core files all contain stack backtraces involving the driver
routines, or if multiple threads within one core seem to be using the
same driver routines, there is a strong case to suspect the driver. Also,
if a newly installed driver is corrupting kernel memory in
unpredictable ways, then a single core file may produce misleading
results.

Device Drivers 11-11


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
11

Lab Exercises for Module 11

In the Lab11 directory, there is a slightly modified form of one of


the sample Sun device drivers. The modifications cause messages
to be printed to the console whenever any of the driver functions
are invoked.

In these exercises you will build, install, and use the device driver
to gain experience of the procedure and to discover when the
various functions are called.

Part I
1. Go to the Lab11 directory, and read the README file.

2. Ensure your console window is visible, and run the following


scripts within the Lab11 directory to build and use the driver.
Watch the commands being issued and the messages displayed on
the console.

a. Compile the driver.

# ./build

b. Place the driver in the correct directory, and notify the system.

# ./install

c. Make a file system, and mount the ram disk.

# ./mkfsmnt

3. Now run some basic commands (while still watching the console).

# ls /mnt
# echo hiya > /mnt/text
# sync
# umount /mnt
# modinfo
# modunload -i ramdisk_id

11-12 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
11

Lab Exercises for Module 11

Part II
4. Ensure savecore is enabled, and run the mkfsmnt command from
the Lab11 directory again. Now exercise the driver by running the
soak script.

# ./soak

5. While soak is running, exercise your system by running various


commands or a CPU-intensive application such as texteroids.

6. If the system performance becomes unsatisfactory, press Stop-a


and force a core dump.

7. Make notes relating to the details of the core dump, and retain for
later examination.

________________________________________________________

________________________________________________________

________________________________________________________

________________________________________________________

________________________________________________________

________________________________________________________

________________________________________________________

________________________________________________________

________________________________________________________

________________________________________________________
✓ The core file should be much larger than the others so far collected; the driver has a
kernel heap core leak. The threadlist macro should show threads waiting for pages.
crash/kmastat should show an excessive amount of memory used.

Device Drivers 11-13


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
11

11-14 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
STREAMS 12

Objectives
Upon completion of this module, you will be able to:

● Describe the composition of a Stream.

● Explain the purpose of STREAMS modules.

● Describe message passing and the message structures used in


STREAMS.

● Describe the use of queues and the queue structure used in


STREAMS.

Note – This module presents material covered in Chapter 25 of Panic!.

12-1
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
12

The STREAMS Facility

Purpose
● Efficient character-based I/O

Structure
● Stream head

● Queues

● STREAMS modules

● Device driver

User space

Stream head Kernel

Upstream Module Downstream

Driver

12-2 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
12

The STREAMS Facility

Purpose
The STREAMS facility was introduced into the UNIX system to
provide an efficient yet uniform and flexible method of handling
character-based communications.

STREAMS are used within the Solaris 2 operating system for terminal
and network handling; STREAMS device drivers can be written for
additional character-based devices.

Structure
Each stream is a connection between the user-level interface (known as
the stream head) and the device driver. Characters to be sent to the
device are passed downstream, while those incoming from the device
travel upstream to the user process. Between the stream head and the
device driver, additional processing of the data can be performed by
loadable streams modules. Each module implements a different
function; and by selecting appropriate modules, the stream can be
used to implement various protocols.

As each message of zero or more characters is added to the Stream,


each module performs whatever processing is required, and adds the
message to the queue of the next module. Each module, the head, and
the device have queues to buffer data coming downstream and going
upstream.

STREAMS 12-3
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
12

STREAMS Queues

Queue Data Structures

stdata
Stream head

qinit
queue
qband
module_info

module_stat

mblk

qinit
queue
qband
module_info

module_stat

mblk

queue

12-4 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
12

STREAMS Queues

Queue Data Structures


STREAMS involves many data structures to describe the individual
modules and to handle the queues of messages. The messages are
handled by message blocks, which are linked onto the appropriate
queue.

Queue-Processing Functions
There are a number of standard functions to process STREAMS
queues. Stack traces involving functions such as putnext, putq,
qprocson, and qprocsoff indicate STREAMS activity. See section 9F
of the man pages for more functions.

STREAMS 12-5
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
12

Lab Exercises for Module 12

These exercises follow through the analysis of a Stream as


presented in chapter 25 of Panic!.

1. Start by ensuring you have a window running the Bourne shell, sh


or the Korn shell, ksh.

2. In that shell, change to the Lab12 directory, and run the program
watneys, which prints ad nauseam to the screen. Run the program
in the background, and note its process ID. Soon after starting it,
suspend the terminal output in that window by pressing ^S.

# ./watneys&
^S
PID ____________________________________________________

3. In another window, run adb on the kernel.

# adb -k /dev/ksyms /dev/mem

4. As in module 10, access the open file table of the process, and
display the vnode. This time use standard output, which is the
second file descriptor listed by the proc2u macro.

0tPID_of_watneys $< setproc


pid xxx
$<proc2u
2nd_address_under_ofile$<file
address_under_vnode$<vnode

5. Display the stream data at the address under stream.

address_under_stream$<stdata

6. Apply the queue macro to the address under wrq.

address_under_wrq$<queue

7. If the address under first is 0, then this queue is empty, and the
address under next can be supplied to the queue macro to inspect
the next queue. Continue doing so until you arrive at a queue with
data, indicated by a nonzero value for first. Note the address
under qinfo.

________________________________________________________

12-6 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
12

Lab Exercises for Module 12

8. When you have a nonzero value for first, you can inspect the
message blocks pointed to by first.

address_under_first$<mblk

9. The difference between the values under rptr and wptr indicate
the number of characters in this block. Calculate the difference,
then dump that many characters from the address under rptr.

addr_under_rptr,(addr_under_wptr - addr_under_rptr)/c
________________________________________________________

10. Using the address you noted from under qinfo (step 7), display
the data and look at the label adb prints at the left. This should
indicate on which STREAMS module you have data.

addr_under_qinfo/X
________________________________________________________

11. Are the data and the module as you would expect?

________________________________________________________

________________________________________________________

STREAMS 12-7
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
12

12-8 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
Traps 13

Objectives
Upon completion of this module, you will be able to:

● Explain the difference of synchronous and asynchronous traps,


and give examples of each.

● List the various trap types and give a brief description of each.

● Describe how traps are handled.

● Describe the trap base register and trap stack frames.

● Explain what an interrupt is on a SPARC system and how they are


handled.

Note – This module presents material covered in Chapters 26 and 28


of Panic!.

✓ The SPARC Version 9 architecture, as used in UltraSPARC, specifies a different trap


handling model, including nested traps. This course does not include coverage of SPARC
version 9. Those interested should refer to the Version 9 architecture manual.

13-1
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

Trap Usage

SPARC Traps
● Immediate jump to kernel code

Causes
● System calls

● Page faults

● Hardware interrupts

CPU Responsibilities
● Identify trap type

● Set control registers

● Execute handler code

● Resume normal execution (if possible)

13-2 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

Trap Usage

SPARC Traps
A trap causes an immediate jump to kernel code from wherever the
CPU was currently executing. Traps can occur while executing user
programs or kernel code. The sequence is interrupted and, hopefully,
resumed after processing the trap. On some occasions it is not possible
to continue from where the trap was caused.

Causes
Any operation that requires kernel code to be executed must invoke a
trap. A common example is system calls, where a trap instruction is
part of the system call wrapper function.

Page faults are also handled as traps. When a program attempts to


access text or data not currently held in memory, the memory
management unit causes the kernel to run code to bring the required
data into memory from disk, if possible.

A third example of traps is when an external device requires attention.


The device raises an interrupt, which causes a specific type of trap.

CPU Responsibilities
Whenever a trap occurs, the kernel must identify why the trap
occurred and take the appropriate action. If the trap can be
satisfactorily handled, the CPU must resume whatever was being
executed.

Traps 13-3
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

Trap Types

Synchronous
● Caused during instruction execution

● Trap instructions

● Hardware errors

● Current instruction stopped

● Handled before changing processor state

Asynchronous
● Requested at any time

● Handled after instruction completes

● Typically caused by interrupts

Good Traps and Bad Traps


● Traps occur frequently

● Bad traps cause panics

● die supplied with trap type and address of saved registers

13-4 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

Trap Types

Synchronous
Synchronous traps occur because of the executed instruction. System
calls fall into this type of trap — where a trap instruction is used as the
method to transfer execution to the kernel functions that implement
the system call.

Hardware errors, such as bus errors, also cause a synchronous fault.

When synchronous faults are handled, the faulting instruction is


stopped before it changes any state in the processor. If the trap is
handled successfully, the instruction can be reexecuted if necessary, as
in the case of page faults.

Asynchronous
Asynchronous traps can be requested at any time, independent and
asynchronously of whatever instructions are executing. The trap is
handled after the completion of the current instruction, so as not to
disrupt the unrelated current operation.

Good Traps and Bad Traps


During normal execution there are many traps handled and processed.
Without traps, the operating system could not provide any system
services to user programs and could not handle external hardware.
Without traps, the CPU cannot run the operating system. Most traps
occur and are handled with no ill results.

At times, traps occur when they should not. In such situations, the
kernel may be unable to continue — either by design, where the
operating system recognizes a fatal error, or due to inability of the
hardware to resolve problems. When a trap occurs that cannot be
handled, a bad trap is reported and the system panics. The trap
function calls panic by way of the die routine, supplying the trap
type and the address of the saved registers as the first and second
arguments. The registers can be inspected using the regs macro
within adb.

Traps 13-5
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

Trap Sequence

Recognize the Trap


● Clear the Trap Enable bit

● Set or clear the previous supervisor bit

● Set the supervisor bit

Get a New Window


● Implicit save

● Decrement the current window pointer

● Window overflows not trapped

● Save PC and nPC into local registers

Set Trap Base Register (TBR)


● Set the tt field of TBR

Force a Branch to Trap Instructions


● Set PC to value at *TBR

● Set nPC to value at *(TBR+4)

13-6 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

Trap Sequence

Recognize the Trap


On recognizing that a trap has occurred, the Trap Enable bit within the
processor status register (PSR) is cleared. This prevents further traps
from being handled, which may overwrite registers holding critical
information for this trap. If a further trap is requested, a watchdog reset
occurs. Events that usually generate traps must either be avoided, or
their traps must be delayed. Trap-handling code must execute quickly,
since interrupts cannot be handled while traps are disabled.

The value of the supervisor bit within the PSR is transferred into the
previous supervisor bit. This enables the trap-handling code to
determine where the trap originated. For example, page faults from
supervisor mode are somewhat more serious than those from user
mode.

Traps are handled by the kernel in supervisor mode.

Get a New Window


Since the trap code requires registers to execute, and the code at the
time of the trap should not be affected, a new register window must be
obtained. However, since traps are disabled windows overflows
cannot be handled, so save cannot be used. Furthermore, since the
validity of the windows is uncertain, the full window cannot be used.
Trap code uses only the local registers and is often distinguishable
from registers %l1 and %l2 containing the addresses of adjacent
instructions.

Set Trap Base Register (TBR)


The trap type is placed in the trap base register, which is used to
indicate the code to execute.

Force a Branch to Trap Instructions


The program counter is set to the newly defined trap code location.

Traps 13-7
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

Trap Base Register

Set at System Initialization


● Points to page boundary

Format

Trap Base Address Trap Type


4 bits,
20 bits 8 bits all 0

T_TEXT_FAULT 0x01

T_DATA_FAULT 0x09

T_SOFTWARE_TRAP 0x80

13-8 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

Trap Base Register

Set at System Initialization


At boot time, the trap base register is set to the start of a page when
the trap handling code is installed.

Format
The trap base register always has the bottom four bits cleared. The
next eight bits are set by the processor (in hardware) when a trap
occurs, giving an offset into the page defined by the top 20 bits.

This implies that the trap-handling code for each trap type is 16 bytes
away from the previous and next trap’s code. With four bytes for each
instruction, this allows four instructions per trap type.

Hardware-generated traps have values in the range 0 to 0x7f, defined


in /usr/include/v7/sys/machtrap.h. Traps generated by software
are in the range 0x80 to 0xff, but defined in
/usr/include/sys/trap.h with the prefix ST_ in the range 0 to 0x7f.
When traps are generated by software (using a ticc instruction),
T_SOFTWARE_TRAP (0x80) is added to the software trap number.

Traps 13-9
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

Trap Handling

Register Usage
● Address of trapping instruction in register %l1

Handler Code
● Four instructions available

● Usually jump to another routine

● fault(), trap() or xxintr()

Returning
● rett

13-10 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

Trap Handling

Register Usage
Due to the restricted availability of registers during trap handling (no
explicit save, no overflows detected), the program counter is saved in a
local register. The next program counter is held in %l2. During
handling of some traps, the processor status register is also stored. The
trap code generally has five local registers available.

Handler Code
Since only four instructions are available at the address specified by
the trap base register, they usually consist of saving some state
followed by a jump to a routine located in memory without such tight
space restrictions.

The routine jumped to is often a trap routine, a fault routine, or an


interrupt routine.

Returning
The special instruction to return from a trap is rett, undoing all that
was done to get to the trap handling code and setting the program
counter to the value saved in %l1.

Traps 13-11
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

Trap Types

Common Trap Types

Trap Type Trap Description

1 Illegal instruction access (text fault)


2 Illegal instruction
3 Privileged instruction
4 Floating point disabled
5 Window overflow
6 Window underflow
7 Memory address alignment
8 Floating point exception
9 Data access exception (data fault)
17 Interrupt level 1
18 Interrupt level 2 – 14

31 Interrupt level 15
128 Software trap 0
136 Software trap 8 – 126

255 Software trap 127

13-12 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

Trap Types

Of the 256 traps available, half are used for hardware traps and half
for software traps. Software traps are generated by the SPARC
assembler ticc instructions, and are used, for example, inside the
system call wrapper functions. Software traps are identified by the
software trap number, which always has 128 added before use as the
tt field in the trap base register.

Although the previous table does not show all traps used within the
Solaris environment, most types are unused.

Traps 13-13
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

Interrupts

Delivery
● External hardware requires attention

● Hardware signal to CPU

Asynchronous
● CPU checks before each instruction fetch

● Time criticality

Response
● Trap

● Interrupt Service Routine (ISR)

13-14 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

Interrupts

Delivery
Interrupts occur when hardware external to the processor needs
attention. The external hardware notifies the processor that data
require collection. The data may be that previously requested, or some
incoming data from a source external to the system. An interrupt also
notifies condition changes, or possibly just that a condition has
occurred.

Asynchronous
External devices generate interrupts asynchronously to the current
execution stream. Even if they are in response to data requests, it is
likely that the requesting thread is asleep and another is running in its
place. Consequently, the current instruction is unlikely to be related to
the source of the interrupt and is allowed to complete before the
interrupt is handled.

After completing each instruction and before fetching the next, the
processor checks for outstanding interrupts.

When an interrupt is handled, the code is typically fast, to allow the


interrupting device to be placated and the processor to get back to the
current thread.

Response
When an interrupt is noticed, the standard trap handling sequence is
used, with the trap handler branching to the interrupt service routine.

Traps 13-15
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

SPARC Interrupts

Cause Traps
● Traps 17 to 31

Interrupt Service Routines


● Kernel code

Prioritized
● 15 levels

CPU Priority Interrupt/Device

Level 0 (Spurious)
Level 1 (Softclock)
Level 2 SBus/VME level 1
Level 3 SBus/VME level 2
Level 4 On-board SCSI
Level 5 SBus/VME level 3
Level 6 On-board Ethernet
Level 7 SBus/VME level 4
Level 8 On-board video (retrace)
Level 9 SBus/VME level 5
Level 10 Normal clock (100Hz)
Level 11 SBus/VME level 6, floppy drive
Level 12 Serial I/O (zs)
Level 13 SBus/VME level 7, audio device
Level 14 High-level clock (kernel profiling)
Level 15 Asynchronous error (memory errors)

13-16 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

SPARC Interrupts

Cause Traps
An interrupt is delivered to the processor as a trap, where the trap
type indicates the level of interrupt. Each level is used for an
individual device, or the interrupt handler may polls several devices
to determine which caused the shared interrupt level.

Interrupt Service Routines


This routine is typically part of a device driver and is responsible for
stopping the interrupt. This requires reading data or just clearing a
condition flag in a device register. The service routine must be fast
enough to ensure that any further interrupts from the same device are
not lost.

Prioritized
SPARC allows fifteen levels of interrupt, which are allocated according
to how urgently devices require attention. The lower the level of
interrupt, the lower priority it has. For example, a device interrupting
at level 6 is always serviced before a device interrupting at level 2.
When any interrupt is serviced, the processor interrupt level (PIL) field
of the PSR is set to prevent any lower-level interrupts being serviced.

The higher priority interrupt levels are used for devices that require
urgent attention. This is often due to the amount of buffering available
on the device. For example, a disk with a large amount of buffering
will interrupt at a lower level than the serial lines (zs), which have a
two-byte buffer.

Level 0 indicates that all interrupts are enabled. User code runs at level
0. If an interrupt does occur at level 0, it is reported as a spurious
interrupt.

Traps 13-17
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

Interrupt Tracebacks

Trap
● Trap recognition code

● Trap frame

Interrupt Polling
● Determine which device

● Often named after level

Interrupt Service Routine


● Specific to interrupting device

13-18 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

Interrupt Tracebacks

Interrupt stack back traces are distinguishable by the routines


contained on the stack.

To service an interrupt, the trap must be handled, so a trap frame will


be visible. If the same interrupt level is used for multiple devices, the
processor executes a routine to determine which device actually
caused the interrupt. Once the polling routine has determined the
source, the appropriate service routine is called.

Interrupt service routines are usually named xxintr, where xx


indicates the device causing the interrupt.

Traps 13-19
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

Lab Exercises for Module 13

Caution – These exercises will cause a reset and a panic.


!
1. In preparation for a Part II of this exercise, edit the file
/etc/name_to_sysnum and insert the following line:

meltdown 180

Part I
2. Reboot your system to run kadb.

# reboot kadb

3. When the system has fully rebooted, press Stop-a to enter kadb.

4. Examine the value stored in the trap base register, tbr.

kadb[0] <tbr=X
________________________________________________________

5. Dump the trap-handling code for the current trap type.

kadb[0] <tbr/4i
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________

6. Add the address within the %hi() instruction to the offset


supplied to the jmp instruction.

________________________________________________________

7. Display the first instruction at the address you have just


calculated.

________________________________________________________

Does the label make sense?________________________________

13-20 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

Lab Exercises for Module 13

8. Take the value of the register found in step 4, and extract the two
hexadecimal digits indicating the trap type:
Trap base address Trap type

tbr: 0

9. View the <sys/trap.h> header file, and find the symbolic name
of this trap type. (Remember that software trap types are all above
0x80)

________________________________________________________

10. What generates this trap? Inspect the stack backtrace.

kadb[0] $c

11. To receive the Stop-a sequence, the keyboard uses the serial line
driver to send an interrupt through the zsintr routine. Why is
there no keyboard interrupt (zsintr) in the backtrace?

________________________________________________________

12. From the <v7/sys/machtrap.h> and <sys/trap.h> header files,


find the trap numbers for data faults (T_DATA_FAULT), system calls
(SYSCALL), and SunOS 4.x system calls (OSYSCALL).

T_DATA_FAULT___________________________________________
T_SYSCALL ______________________________________________
T_OSYSCALL _____________________________________________

Traps 13-21
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

Lab Exercises for Module 13

13. From these trap numbers and the value already found from the
trap base register, determine the location of the trap-handling code
for each.

T_DATA_FAULT___________________________________________
T_SYSCALL ______________________________________________
T_OSYSCALL _____________________________________________

14. Using the same method as in steps 5 to 7, determine the routines


that are called from these traps.

T_DATA_FAULT___________________________________________
T_SYSCALL ______________________________________________
T_OSYSCALL _____________________________________________

________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________

13-22 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

Lab Exercises for Module 13

15. Now attempt to single step through the trap-handling code. Set a
breakpoint at the first instruction for the T_SYSCALL trap handling
code found in step 13.

Stop-a
kadb[0] addr:b

16. Continue UNIX, then attempt to single step through the trap code
to determine what trap occurred.

kadb[0] :c

17. What happens?

________________________________________________________

18. Why?

________________________________________________________

19. Reboot the system by whatever method is effective.

Part II
20. Change into the Lab13 directory where there is the code for a
loadable system call. Build and install the loadable module, plus a
program to utilize it, by running the make command.

# make

21. Ensure you have savecore enabled, then run the runme program
and wait a short time. Note what happens.

________________________________________________________

Traps 13-23
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

Lab Exercises for Module 13

22. When the system reboots, change into the crash file directory and
start adb on the core dump. (Ensure you are using a scrollable
window.)

# cd /var/crash/hostname
# adb -k unix.n vmcore.n

23. Perform initial analysis. Display the message buffer, find the crash
time, and use the current thread from the cpu structure to find the
name of the process that caused the crash.

________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________

24. According to the output of the $c command, what was the


function called just before a trap occurred?

$c

die(0x7, addr
trap
sys_trap(?)
________________________________________________________
meltdown_sys

25. The panic was caused by a loadable system call, meltdown. In


another window view the source file meltdown.c, and determine
if meltdown_sys calls the above function.

26. Remember that traps use the stack in a different fashion from
function calls.

Use the second argument to the die function (from above) as


input to the regs macro.

addr$<regs
addr psr pc npc
_______

13-24 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

Lab Exercises for Module 13

27. Display the instruction that caused the trap.

addr_under_pc?i
________________________________________________________

28. Is the faulting instruction within the function found in step 24?

29. Display a more complete stack traceback using the stacktrace


macro:

<sp$<stacktrace

30. Look for a stack frame where %l1 (second value in frame)
corresponds to the instruction found in step 27.

31. Look at the stack frame before (printed after), the frame containing
the instruction from step 27. From that frame write down the
value stored in the register the instruction uses. Remember the
first value shown in a stack frame is that stored in %l0.

32. Given the instruction that was attempted, why did it fault?

________________________________________________________

33. Look at the source code for meltdown.c, and runme.c to


determine where the bug is. (There is a big hint in a comment in
meltdown.c!)

Traps 13-25
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
13

13-26 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
Watchdog Resets 14

Objectives
Upon completion of this module, you will be able to:

● Define a Watchdog reset, and list reasons why they occur.

● List and describe the basic PROM commands.

● Explain the limitations for debugging watchdog resets.

Note – This module presents material covered in Chapter 27 of Panic!.

14-1
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
14

Watchdog Reset Cause and Response

Cause
● Trap while traps disabled

● During trap handling

Response
● OpenBoot PROM commands

● kadb is of no use

● Core files unobtainable

● Reboot

14-2 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
14

Watchdog Reset Cause and Response

Cause
Watchdog resets occur to return control of the system after the
processor has entered an error mode. The error mode is entered if a
trap is received while traps are disabled. The SPARC version 8
architecture cannot handle nested traps, so while processing one trap,
the ET bit in the PSR is cleared to indicate traps are unwelcome.
Asynchronous traps such as those due to interrupts are delayed, but
synchronous traps such as data faults cannot be delayed.

Response
After a watchdog reset, there is very little that can be done to analyze
the system. Due to the catastrophic nature of a reset, it is impossible to
execute any sophisticated code as required by the panic() routine or
kadb.

The only possibilities available after a reset are offered by the boot
PROM. The system can be booted in an attempt to salvage the
operating system; but, additionally, a small number of OpenBoot
PROM commands can be used to determine a small amount of
information from the processor.

Watchdog Resets 14-3


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
14

OpenBoot PROM Commands

The .registers Command


● Display CPU registers

The .locals Command


● Current window of registers

The .psr Command


● Processor Status Register

The ctrace Command


● Stack backtrace

The wd-dump Command


● watchdog information

● Sun-4d systems only

14-4 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
14

OpenBoot PROM Commands

The commands shown can be executed immediately after a reset and


the information written down. The information may be of use when
searching the SunSolve™ database for similar reset conditions.

After booting the system, the hexadecimal addresses from the ctrace
command can be used to identify which functions were called prior to
the fatal trap. Unfortunately, using adb does not always identify the
routines correctly. The addresses can be used by different kernel
routines after the boot, due to the unpredictability of the order of
loading kernel modules. The only routines guaranteed to be at the
same location are those present within the
/platform/platform_name/kernel/genunix module.

Watchdog Resets 14-5


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
14

obpsym

Kernel Symbols for OpenBoot PROM


● Symbolic names displayed by ctrace

● Symbolic names for dis

Loadable Kernel Module


● forceload in /etc/system

● modload

● ramforth may be required

14-6 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
14

obpsym

Kernel Symbols for OpenBoot PROM


When the obpsym module is loaded, the OpenBoot™ PROM has kernel
symbols available. ctrace will display kernel modules and symbols,
and kernel symbols can be supplied to OpenBoot commands such as
dis.

Loadable Kernel Module


obpsym can be loaded either at boot time with a forceload entry in
/etc/system, or later by using the modload command. In either case
the commands must reference the kernel module directory misc.

forceload: misc/obpsym

modload -p misc/obpsym

On some systems, the OpenBoot program must first be loaded into


RAM using the ramforth OpenBoot command. See obpsym(1M) for
more details.

Watchdog Resets 14-7


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
14

Sun-4d Systems

CPU Watchdog Reset


● Trap during trap handling

System Watchdog Reset


● Major hardware fault

● Automatic reboot

The prtdiag Command


● Information regarding last system watchdog reset

14-8 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
14

Sun-4d Systems

CPU Watchdog Reset


The CPU watchdog reset is the reset that occurs on other SPARC
machines due to untimely traps.

System Watchdog Reset


The Sun-4d architecture monitors the system as a whole to enable
automatic hardware recovery. Should subsystems fail, the operating
system automatically reboots (to reconfigure) to avoid the failed
hardware, if possible.

Since a system watchdog reset causes an immediate reboot, there is no


possibility of executing PROM commands. However, the cause of the
reset is logged in nonvolatile RAM. Since these resets are due to
hardware failures, the inability to run PROM commands is not
necessarily significant.

The prtdiag Command


The prtdiag command gives detailed information about the last
system watchdog reset, which is valuable to identify the hardware
problem.

Watchdog Resets 14-9


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
14

Lab Exercises for Module 14

Caution – These exercises cause the system to reset.


!
1. Change into the lab14 directory, and ensure zap is compiled and
executable.

# make zap

2. Run zap.

# ./zap

3. What happens?

________________________________________________________
________________________________________________________

4. Gather together as much information as you can, using the PROM


monitor commands discussed in class (such as .registers,
.locals, .psr and ctrace).

________________________________________________________

________________________________________________________

________________________________________________________

________________________________________________________

________________________________________________________

5. Attempt to generate a core dump. What happens? Why?

ok sync
________________________________________________________
________________________________________________________
________________________________________________________

6. Reboot the system and load the obpsym module. Run zap and use
ctrace. Use the value indicated for PC as the input to the dis
OpenBoot command to view the code executed.

ok addr dis

14-10 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
14

Lab Exercises for Module 14

7. From the label and instruction displayed at the PC address, why


did the system reset?

________________________________________________________
________________________________________________________
________________________________________________________

8. Reboot the system, view the file zap.c, and verify your answer.

Watchdog Resets 14-11


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
14

14-12 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
Locks 15

Objectives
Upon completion of this module, you will be able to:

● Explain the concept of race conditions and critical sections.

● Describe mutual exclusion and mutex locks.

● Describe the use of other types of locks, such as semaphores,


readers/writer locks, and condition variables.

● Explain what waiters are and how they are used.

Note – This module presents material covered in Chapter 29 of Panic!.

15-1
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15

Multithreading

Thread of Execution
● Sequence of instructions being executed

Concurrency
● Measure of number of simultaneous operations

Multithreaded Kernel
● Multiple threads simultaneously executing kernel code

● Multiple threads simultaneously accessing kernel data

15-2 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15

Multithreading

Thread of Execution
A thread is a single sequence of execution. All processes have at least
one thread.

Concurrency
Concurrency is a measure of how many threads of execution can be
performed at any one time.

Multithreaded Kernel
A multithreaded kernel allows more than one sequence of operation to
be carried out inside the kernel at any one time. This is particularly
important for multiprocessor systems. If a multiprocessor system runs
a single threaded kernel, then only one sequence can be executed in
the kernel at any one time, meaning that only one CPU can be
executing kernel code at any time.

A major challenge in writing a multithreaded kernel is to ensure that


any thread does not change data that another thread is assuming to
remain constant.

Locks 15-3
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15

Race Conditions

Test and Set


● Two threads executing same code

● Both see resource is free

Sees not in use if ( ! InUse ) { Sees not in use

Too late! InUse = True; Too late!

CLASH! UseResource(); CLASH!

InUse = False;
}

Incomplete Operation
● One thread sets up to use structure

● Second thread deletes structure

15-4 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15

Race Conditions

Race conditions are those where the result of an operation is


unpredictable, depending on the order of execution of two or more
threads. Typically, race conditions do not cause problems every time
the operation is performed and so can be difficult to discover during
testing.

Test and Set


A simple race condition can occur if a test and set operation is not
atomic.

Incomplete Operation
A more difficult race condition to avoid is when one thread initializes
a structure that is then used by a second thread before initialization is
complete. System V semaphores suffer from this type of race condition
— where the creation of a semaphore, and the initialization of its value
are two separate operations.

Locks 15-5
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15

Critical Sections

Definition
● Code segment executable by only one thread at any time

Serialization
● No parallel access

Mutual Exclusion
● If any one thread in critical section, all other excluded

Parallel Execution

Critical Section Serial Access

15-6 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15

Critical Sections

Serialization
To avoid problems during critical sections, it is necessary to restrict
access to the critical section by a single thread at any time. Thus, if
multiple threads attempt the operation in parallel, the operations must
be serialized.

Mutual Exclusion
To enforce safety in a critical section such as a test and set operation, a
method of mutual exclusion must be provided. When any thread
enters the critical sections, all other threads must be excluded.

Locks 15-7
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15

Mutex Locks

Implement Mutual Exclusion


● Only one holder at a time

Variables Associated With Critical Sections


● Obtain lock before entering critical section

Code Locking
● Mutex for functions

Data Locking
● Mutex for variables

int global
mutex_t mutex;

mutex_enter(&mutex);
Critical section
global++;

mutex_exit(&mutex);

15-8 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15

Mutex Locks

Implement Mutual Exclusion


A lock is a data structure associated with a critical section. Mutex locks
are either unlocked or have a single owner.

Variables Associated With Critical Sections


Before entering a critical section, the thread should attempt acquisition
of the lock. If the lock is currently free, the acquisition will succeed,
and the thread will become the owner of the lock and can enter the
section. If the lock is already held when acquisition is attempted, the
thread cannot enter the critical section as another thread is already in
the section. With the standard mutex acquisition function,
mutex_enter(), a calling thread blocks until the mutex can be
acquired.

Code Locking
Code locking occurs when only one thread may be executing a portion
of code, regardless of the data being operated upon. To implement
code locking, a mutex is associated with the section of code.

Data Locking
Data locking enables multiple threads to execute the same code but
not on the same data. To implement data locking, each shared variable
requires an associated lock.

Data locking usually results in a higher degree of concurrency but


requires a larger number of locks. The Solaris 2 uses data locking
extensively.

Locks 15-9
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15

The ldstub Instruction

Used for Mutex Acquisition


0xff

Lock in memory

Register

Example Use
ldstub [%o0 + 3] , %g6 !Get & set lock byte

orcc %g0, %g6, %g0 !Test register

bne already_held !If lock byte had ff


!it was already held

Atomic Operation
● Loading previous contents and storing new are indivisible

15-10 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15

The ldstub Instruction

A logical sequence for lock acquisition is to determine if the lock is set,


and if not, set it. If such steps are carried out literally; there would be a
race condition, as previously illustrated, where two threads could see
the lock as being free. Testing the lock and setting the lock must be
done in a single indivisible (atomic) operation.

The ldstub instruction is a load and store of an unsigned byte. The


ldstub instruction is available on all SPARC processors and is used as
the basis for mutex acquisition. The instruction atomically transfers the
contents of a byte in memory into a register and replaces the value in
memory with 0xff. Being atomic, it is not possible that two threads
can retrieve the old value of the byte simultaneously.

Using the ldstub instruction effectively replaces the test and set
metaphor with a set and test. However, by following the ldstub
instruction with a test on the retrieved value, it is possible to
determine if the lock was previously set. If previously locked, the
ldstub instruction would have previously written 0xff to the
memory location, so writing a new 0xff does not change the lock
status. If the previous value was 0, the lock was free, but has now been
acquired.

Locks 15-11
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15

Semaphores

Resources Counter
● Initialized to count of resources

sema_p
● Atomic test and decrement

sema_v
● Increment count

15-12 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15

Semaphores

Resources Counter
Semaphores are used when a count of resources must be atomically
tested and decremented. If the value of the semaphore is greater than
zero after the decrement, a resource was available.

Semaphores were first proposed for computer systems by Dijkstra, a


Dutch computer scientist. The names sema_p and sema_v reflect the
Dutch words prolagen (meaning try to decrease) and verhogen (meaning
increase).

Locks 15-13
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15

Readers Writer Locks

Read Mostly Data


● Sometimes modified

● Usually read

Read Read

Read Read

Read Read

OR

Write

rw_rdlock()
● Add another reader

● Block, if writer holding lock or writer waiting

rw_wrlock()
● Become writer

● Block, if any readers or another writer

15-14 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15

Readers Writer Locks

Readers writer locks are an optimization for access to shared data that
on occasion is modified, but is usually just read. Since the data is
sometimes written, it must be locked to prevent two or more threads
from trying to change the value simultaneously. However, acquiring a
mutex results in only a single thread reading the data at a time. Such
serialization can have an adverse effect on performance.

Readers writer locks can be locked for read access or for write access.
If an attempt is made for read access, it succeeds if the lock is not held,
or being waited for, by a writer. Attempts to acquire the lock for
writing only succeed if the lock is not held by any other thread for
reading or writing.

The Solaris kernel gives writers preference in that writers waiting will
prevent more readers from obtaining the lock. If both writers and
readers are waiting when a writer releases the lock, the waiting writer
gains the lock. Thus, if such a lock is used when writing is frequent,
readers may be starved of access to the critical section.

Locks 15-15
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15

Condition Variables

cv_wait()
● Sleep until awakened

cv_signal()
● Wake up one sleeper

int busy;
mutex_t lock;
cond_t cv;

mutex_enter(&lock); mutex_enter(&lock);

while (busy) busy = FALSE;


cv_wait(&cv,&lock);
cv_signal(&cv);
busy = TRUE;
mutex_exit(&lock);
mutex_exit(&lock);

15-16 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15

Condition Variables

Condition variables are used as a rendezvous point. If a thread cannot


continue because a resource is not free or not yet ready, then the thread
can sleep on a condition variable. Any thread making the resource
available has the responsibility to wake up waiting threads. Wakeups
can be sent to a single thread waiting using cv_signal() or to all
threads waiting on the condition variable using cv_broadcast().

The cv_wait() call is the basic call to wait on a condition variable,


but a thread calling cv_wait cannot be awakened by a UNIX system
signal. Thus, processes whose threads remain inside a cv_wait cannot
be killed. It is often better for a driver routine to use cv_wait_sig(),
which enables sleeping threads to be awakened by signals.

The code shown is the standard construct for using condition


variables. If a thread tests a condition that is set by another thread, the
test must be made a critical section through the use of a mutex. The
cv_wait routine releases the mutex when going to sleep, and does not
return (after being awakened) until it can reacquire the mutex.

Locks 15-17
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15

Waiters

● Indication of waiting threads

● Unable to acquire mutex, join queue of waiters

● Release mutex wake up one waiter

15-18 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15

Waiters

To keep track of which threads are waiting for a mutex or condition


variable, the object contains a waiters field, indicating the queue of
waiting threads.

Locks 15-19
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15

Lab Exercises for Module 15

✓ Target objectives here are to illustrate mutexes and ldstub, note leaf routines, and to
demonstrate how threads stuck in drivers cannot be terminated.

1. If kadb is not already running, reboot your workstation to run


kadb.

# reboot kadb

2. Go to the Lab15 directory, and ensure the necessary files are


compiled and installed.

# make install

This builds and installs another version of the ramdisk driver.

3. Run the openrd command in the background.

# ./openrd &

4. Wait for the console message to inform you "Mutexes entered",


then press Stop-a to enter kadb.

5. Set a breakpoint on rd_open, then resume the operating system


to run a second instance of openrd.

kadb[0] rd_open:b
kadb[0] :c
# ./openrd &

6. When kadb is entered, patch the kernel to prevent the clock


interrupt interfering with the kernel tracing you are to perform.
Note the existing value at clock before replacing it as below.

kadb[0] clock/X ___________________________________

kadb[0] clock/W 81c3e008

Caution – This replaces the save instruction of the clock routine with
! a return, effectively disabling any clock processing. Do not expect the
operating system to multitask until you have restored the correct
instruction.

15-20 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15

Lab Exercises for Module 15

7. Display the instruction at clock, and verify that it corresponds to


that shown below. If not, carry out step 6 again.

kadb[0] clock/i
clock:
clock: jmp %o7+0x8

8. Display the instructions for rd_open, and note the address of the
first call to mutex_enter.

kadb[0] rd_open,10/ai

_______________ call mutex_enter

9. Set a breakpoint at the address you have just determined.

kadb[0] rd_open+<offset>:b

10. Continue the operating system, then single step into mutex_enter.

kadb[0] :c
kadb[0] :s
kadb[0] :s

11. Display instructions for mutex_enter, and note the absence of a


save instruction.Its
What
a leafdoes this mean about mutex_enter?
routine

kadb[0] mutex_enter,6/i

__________________________________________________
__________________________________________________

12. Register %o0 holds the address of the mutex to be locked. Display
the value, and format the mutex.

kadb[0] <o0=X
kadb[0] <o0$<mutex

If the lock is not clear, ask your instructor for assistance.

13. Single step past the ldstub instruction, and display the contents
of the destination register (second argument) of the ldstub. This
now holds the previous value of the lock byte.

Locks 15-21
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15
Lab Exercises for Module 15

14. View the mutex again.

15. Continue single-stepping past the st instruction and back to


rd_open, then display the mutex structure once more.

The owner field should now be set to the current thread. Confirm
this by comparing the owner with the value stored in %g7. Note
that, due to the way the mutex macro has been written, the lock
field may appear clear.

16. Continue stepping into the next call to mutex_enter.

kadb[0] :s
kadb[0] :s
kadb[0] :s
stopped at mutex_enter:

17. Display the mutex, and write down the owner.

kadb[0] <o0$<mutex
owner______________________________________________

18. Single-step through the ldstub, and display the contents of


ldstub’s destination register. This time the lock should be set.

19. Now, set breakpoints on the following routines, and decide the
order in which they will be called.

kadb[0] 5
disp:b ____________________________________
kadb[0] 4
swtch:b ___________________________________
kadb[0] 3
sleepq_insert:b ___________________________
kadb[0] 1
t_block:b _________________________________
kadb[0] 2
ts_sleep:b ________________________________
kadb[0] 6
resume:b __________________________________

20. Use :c to continue the operating system until you have entered
resume. Note the order in which the routines are called.

21. Remove all breakpoints, and restore the first instruction of the
clock routine to the correct value as noted in step 8.

22. Continue the operating system.

kadb[0] :c

15-22 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15

Lab Exercises for Module 15

23. Use the ps command to find the process IDs of the two openrd
processes, then attempt to kill them. Will they die?

__________________________________________________

Locks 15-23
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
15

15-24 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
Additional Exercises A

A-1
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
A

Additional Exercises

By the end of this course, you should have several core files in your
crash directory. During the course, your instructor has introduced
methods of analysis that you can now apply to these core files.

Follow the steps below to create another crash with which you can
work. Start with the initial analysis, and determine as much
information as possible. Try to identify what causes the crash and why.
If you have SunSolve available, search for any related bug reports.

1. Ensure you have OpenWindows running, that savecore is


enabled, and that you have sufficient space to save another core
file.

2. Use the swap command to determine the current swap usage of


your computer.

3. Use mkfile to create a swap file on a local disk large enough to


accommodate the current usage.

4. Add the file to the total swap using swap -a.

5. Remove the original swap partition using swap -d.

6. Now, attempt to increase the size of your swap file using mkfile.
If this does not cause a panic, ask your instructor for assistance.

A-2 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
A

Notes

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

Additional Exercises A-3


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
A

Notes

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

A-4 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
A

Notes

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

Additional Exercises A-5


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
A

Notes

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

______________________________________________________________________________

A-6 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
The crash Utility B
This appendix contains a brief illustration of the crash utility. See the
crash man page for full details.

B-1
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
B

The crash Utility

Command Line Syntax


/usr/sbin/crash [ -d dumpfile ] [ -n namelist ] [ -w output-file ]

Dumpfile
Defaults to /dev/mem

vmcore.n

Namelist
Defaults to /dev/ksyms

unix.n

Output File
Defaults to standard output

B-2 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
B

crash Commands

Command Syntax
function [ argument ]

Functions Available
> ?
as hment prnode strstat
b (buffer) kfp proc t (trace)
base kmastat pte trace
buf (bufhdr) l (lck) pty thread
buffer lck q (quit) ts
bufhdr linkblk qrun tsdptbl
c (callout) lwp queue tsproc
callout m (vfs) quit tty
class major rd (od) u (user)
cpu map redirect user
ctx mblock rtdptbl ui (uinode)
dblock mode rtproc uinode
defproc mount (vfs) rwlock v (var)
defthread mutex s (stack) var
dispq mutextable search vfs
ds nfsnode sema vfssw
f (file) nm size vnode
file od sment vtop
findaddr p (proc) smgrp ?
findslot page snode !cmd
fs (vfssw) pcb stack
hat pcfsnode status
help pmgrp stream

Function Arguments
-e Display every entry
-f Display full structure
-p Interpret all adresses as physical addresses
-s slot Use specified process slot
-w file Redirect output to named file

The crash Utility B-3


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
B

Example Use of crash

# crash -d vmcore.50 -n unix.50


dumpfile = vmcore.50, namelist = unix.50, outfile = stdout
> help status
status [-w filename]
system status
alias:
acceptable aliases are uniquely identifiable initial substrings
> stat
system name: SunOS
release: 5.4
node name: yoyo
version: Generic_Patch
machine name: sun4m
time of crash: Thu Feb 1 17:11:41 1996
age of system: 2 min.
panicstr: zero
panic registers:
pc: f0048b38 sp: f03e59f8
> help var
var [-w filename]
system variables
alias: v
acceptable aliases are uniquely identifiable initial substrings
> var
v_buf: 100
v_call: 0
v_proc: 490
v_nglobpris: 110
v_maxsyspri: 99
v_clist: 0
v_maxup: 485
v_hbuf: 64
v_hmask: 63
v_pbuf: 0
v_sptmap: 0
v_maxpmem: 0
v_autoup: 30
v_bufhwm: 620
> class
SLOT CLASS INIT FUNCTION CLASS FUNCTION

0 SYS f00cdc0c f0156074


1 TS fc078034 fc07a71c

B-4 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
B

Example Use of crash

> help proc


proc [-e] [-f] [-l] [-w filename] [([-p] [-a] tbl_entry | #procid)... | -
r]
tbl_entry = slot number | address | symbol | expression | range
process table
alias: p
acceptable aliases are uniquely identifiable initial substrings
> p
PROC TABLE SIZE = 490
SLOT ST PID PPID PGID SID UID PRI CPU NAME FLAGS
0 r 0 0 0 0 0 98 63 sched load sys lock
1 r 1 0 0 0 0 98 33 init load
2 r 2 0 0 0 0 98 0 pageout load sys
lock nowait
3 r 3 0 0 0 0 98 21 fsflush load sys
lock nowait
4 r 258 1 258 258 0 98 16 sac load jctl
5 r 108 1 108 108 0 98 80 rpcbind load
6 r 129 1 129 129 0 98 66 inetd load
7 r 194 1 194 194 0 98 2 utmpd load
8 r 259 1 259 259 0 98 62 ksh load
10 r 110 1 110 110 0 98 34 keyserv load
11 r 118 1 118 118 0 98 80 rpc.nisd load
12 r 116 1 116 116 0 98 12 nis_cachemgr load
13 r 120 1 120 120 0 98 16 kerbd load
14 r 132 1 132 132 0 98 13 statd load
15 r 134 1 134 134 0 98 35 lockd load
16 r 153 1 153 153 0 98 14 automountd load
17 r 186 1 186 186 0 98 5 sendmail load
18 r 157 1 157 157 0 98 28 syslogd load nowait
19 r 177 1 177 177 0 98 30 lpsched load nowait
20 r 167 1 167 167 0 98 26 cron load
21 r 185 177 177 177 0 98 12 lpNet load nowait
jctl
22 r 202 185 177 177 0 98 16 lpNet load nowait
23 r 214 1 214 0 0 98 80 vold load jctl
26 r 262 258 258 258 0 98 22 listen load nowait
jctl
28 r 255 1 255 255 0 98 19 mountd load
29 r 253 1 253 253 0 98 13 nfsd load
31 r 263 258 258 258 0 98 21 ttymon load jctl

The crash Utility B-5


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
B

Example Use of crash

> p -f 23
PROC TABLE SIZE = 490
SLOT ST PID PPID PGID SID UID PRI CPU NAME FLAGS
23 r 214 1 214 0 0 98 80 vold load jctl

Session: sid: 0, ctty: -


Process Credentials: uid: 0, gid: 0, real uid: 0, real gid: 0
as: fc010f90
wait code: 0, wait data: 0
sig: effff708 link 0
parent: fc0b5998 child: 0
sibling: fc31dcd0 threadp: fc1ce5c0
utime: 12 stime: 154 cutime: 1 cstime: 16
trace: 0 sigmask: effff708 class: 0
lwptotal: 3 lwpcnt: 2 lwprcnt: 2
lwpblocked: 1

> p -p #118
PROC TABLE SIZE = 490
SLOT ST PID PPID PGID SID UID PRI CPU NAME FLAGS
11 r 118 1 118 118 0 98 80 rpc.nisd load
> help defthread
defthread [-p] [-r] [-w filename] [-c address]
set default thread
alias:
acceptable aliases are uniquely identifiable initial substrings
> defthread -r fc1ce5c0
Current Thread = fc1ce5c0
> help defproc
defproc [-w filename] [-c | -r | slot]
set default process slot
alias:
acceptable aliases are uniquely identifiable initial substrings
> defproc
Procslot = 23

B-6 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
B

Example Use of crash

> u
PER PROCESS USER AREA FOR PROCESS 23
PROCESS MISC:
command: vold, psargs: /usr/sbin/vold
start: Thu Feb 1 17:10:53 1996
mem: 146, type: exec
vnode of current directory: fc3121a8
OPEN FILES, POFILE FLAGS, AND THREAD REFCNT:
[0]: F 0xfc30c0e0, 0, 0 [1]: F 0xfc30cc70, 0, 0
[2]: F 0xfc30cc70, 0, 0 [3]: F 0xfc30cd60, 0, 0
[4]: F 0xfc30c3b0, 1, 0 [5]: F 0xfc30c180, 0, 0
[6]: F 0xfc30c7c0, 1, 0 [7]: F 0xfc30c1a8, 1, 0
[8]: F 0xfc30c810, 0, 0 [9]: F 0xfc30cef0, 0, 0
[10]: F 0xfc30c748, 0, 0 [11]: F 0xfc30c360, 1, 0
cmask: 0000
RESOURCE LIMITS:
cpu time: unlimited/unlimited
file size: unlimited/unlimited
swap size: 2147479552/2147479552
stack size: 8388608/2147479552
coredump size: unlimited/unlimited
file descriptors: 1024/1024
address space: unlimited/unlimited
SIGNAL DISPOSITION:
1: ef77d67c 2: ef77d67c 3: ignore 4: default
5: default 6: default 7: default 8: default
9: default 10: default 11: default 12: default
13: default 14: default 15: ef77d67c 16: ef77d67c
17: ef77d67c 18: ef77d67c 19: default 20: default
21: default 22: default 23: default 24: default
25: default 26: ignore 27: default 28: default
29: default 30: ignore 31: ignore 32: ignore
33: ef77d67c 34: default 35: default 36: default
37: default 38: default 39: default 40: default
41: default 42: default 43: default
> file 0xfc30c0e0
ADDRESS RCNT TYPE/ADDR OFFSET FLAGS
fc30c0e0 1 SPEC/fc1dd404 0 read
> file 0xfc30cd60
ADDRESS RCNT TYPE/ADDR OFFSET FLAGS
fc30cd60 1 UFS /fc2d2a38 0 write appen

The crash Utility B-7


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
B

Example Use of crash

> vnode fc2d2a38


VCNT VFSMNTED VFSP STREAMP VTYPE RDEV VDATA VFILOCKS VFLAG
2 0 fc244b60 0 f - fc2d2a30 0 -
> help uinode
uinode [-d] [-e] [-f] [-l] [-r] [-w filename] [[-p] tbl_entry[s]]
tbl_entry = slot number | address | symbol | expression | range
inode table
alias: ui
acceptable aliases are uniquely identifiable initial substrings
> uinode -f -l fc2d2a30
UFS INODE TABLE SIZE = 600
SLOT MAJ/MIN INUMB RCNT LINK UID GID SIZE MODE FLAGS
- 32, 29 2169 2 1 0 0 35056 f---666 rf

i-rwlock: type 0 waiters 0 owner 0


writewanted 0 holdcnt 0
i-contents: type 0 waiters 0 owner 0
writewanted 0 holdcnt 0
i_tlock: waiters 0 lock 0
type: MUTEX_ADAPTIVE owner 0
Condition variable i_wrcv: 0
NEXTR
0

[ 0]: 14e8 [ 1]: 1870 [ 2]: 18c8 [ 3]: 2ee8


[ 4]: 30a0 [ 5]: 0 [ 6]: 0 [ 7]: 0
[ 8]: 0 [ 9]: 0 [10]: 0 [11]: 0
[12]: 0

VNODE :
VCNT VFSMNTED VFSP STREAMP VTYPE RDEV VDATA VFILOCKS VFLAG
2 0 fc244b60 0 f - fc2d2a30 0 -
mutex v_lock: waiters 0 lock 0
type: MUTEX_ADAPTIVE owner 0
Condition variable v_cv: 0

B-8 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
B

Example Use of crash

> kmastat
buf buf buf memory #allocations
cache name size avail total in use succeed fail
---------- ----- ----- ----- -------- ------- ----
kmem_slab_cache 32 65 254 8192 189 0
kmem_bufctl_cache 12 238 762 12288 524 0
kmem_alloc_8 8 153 1524 12288 5411 0
kmem_alloc_16 16 210 1016 16384 2499 0
kmem_alloc_24 24 135 338 8192 912 0
kmem_alloc_32 32 132 381 12288 1019 0
kmem_alloc_40 40 44 202 8192 405 0
kmem_alloc_48 48 50 84 4096 885 0
kmem_alloc_56 56 60 72 4096 153 0
kmem_alloc_64 64 27 126 8192 3452 0
kmem_alloc_80 80 11 150 12288 2480 0
kmem_alloc_96 96 24 126 12288 361 0
kmem_alloc_112 112 14 144 16384 620 0
kmem_alloc_128 128 3 31 4096 251 0
kmem_alloc_144 144 11 84 12288 112 0
kmem_alloc_160 160 24 25 4096 213 0
kmem_alloc_176 176 16 138 24576 317 0
kmem_alloc_192 192 23 84 16384 537 0
kmem_alloc_208 208 18 19 4096 18 0
kmem_alloc_224 224 9 18 4096 20 0
kmem_alloc_240 240 14 16 4096 17 0
kmem_alloc_256 256 3 60 16384 75 0
kmem_alloc_320 320 2 12 4096 49 0
kmem_alloc_384 384 3 10 4096 20 0
kmem_alloc_448 448 4 9 4096 92 0
kmem_alloc_512 512 4 16 8192 28 0
kmem_alloc_576 576 8 105 61440 137 0
kmem_alloc_672 672 3 12 8192 22 0
kmem_alloc_800 800 5 10 8192 19 0
kmem_alloc_1024 1024 7 20 20480 7099 0
^C
> quit
#

The crash Utility B-9


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
B

B-10 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
Analysis Checklist C
Effective core dump analysis relies on many factors including
experience and being able to recognize unusual patterns. However,
there are some procedures that can be followed to ensure you have a
good chance of finding any problems.

The Panic! CD-ROM also contains a script, iscda, which can assist in
the initial analysis.

C-1
Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996
C

Checklist

● Determine the system background:

● Does it crash regularly? – When? How often?

● When did it first crash? – New hardware? New software?

● Look at the crash dump using strings and simple macros.

● OS Release

● Name of system

● Particular hardware?

● panic: zero?

● If the problem results in a hang:

● Look at the size of the run queues.

● Use threadlist.

● Look for unusual traces.

● Look for similar traces.

● Look for unusual functions.

● If the system panics:

● Use $c for backtrace.

● If a trap, get regs and use stacktrace.

● Determine running process(es).

C-2 Core Dump Analysis


Copyright 1997 Sun Microsystems, Inc. All Rights Reserved. SunService March 1996

You might also like