Vuwwrwwwwwwewwowwweevvewewweuwweuvwsd
Content
‘System Status ~ Memory
System Status - 10
System Status - CPU
Performance Trending with sar
‘Troubleshooting Basies: The Process
Troubleshooting Basics: The Tools
strace and irace
Common Problems
Troubleshooting Incorrect
Inability to Boot 7
‘Types in Configuration Fes
Corrupt Flesystems
Rescue Environment
Lab Tasks
1. Recovering Damaged MBR.
Permissions
Chapter
MONITORING &
TROUBLESHOOTING‘Memory Status
Linux provides several tools for examining the memory usage on
running systems as shown in the following examples:
4 free
4 cat /proc/senintfo
‘Swap usage can be summarized by either of the folowing:
4 swapon ~3
4 cat /proc/sueps
Field [Meaning
fF [umber of processes waiting to run
[b__ [amber of processes uninterruptedlysleeping|
spd [amount of swap used, in kilobytes
ree [amount of memory free, in Kilobytes
System Status - Memory
free
Quick summary of memory usoge
Displays swap usage on per swap fle / pation basis
Process Fes
* Tpcoc/neninfo
® 7proc/swape
vastat
*rastat 2~ plays unring stats every? seconds
[buff [amount of memory used for butfers, in Kilobytes
5i__|memory being swapped from disk, in klobytes/sec
[so_| memory being swapped to disk, in Klobytes/soe
fbi [Blocks writen to disk, in bloeksise
[bo _ [blocks read from disk, in blocksisee
Fin number of interupts per second
[es [number of context switches per second
jus [percent of total CPU consumed by user processes,
[sy _ [percent of total CPU consumed by eystem processes
percent of total CPU spent waiting on WO
fia [percent of total CPU ile time
Linux also provides the vastat command, Commonly used on Unix systems to examine memory usage, mastat produaes output Ike:
# metat
procs. ~
nemory—
sap
cb swpd free buff cache si so bi bo ince us sy id va
6 8 "8 31106 413726 155988 6 «817142
15.2
ssystem= ----cpu-
450038 16 4 783
A A EYVewwE wer ewwUEUerowuwwwowwwowed
Vo Status
‘Several commands are also available to examine UO statistics on
Unux systems. Some basic V0 information is provided by mastat. In
‘addition, stat can be run witha few aifferent options to abtain
‘moce detailed states. This command presents summary
information regarding VO:
System Status ~
astat
Toate
+ Shows detailed VO statistics on 8 per dive basis
Process Files
Yoroe/ interrupts
1 /proe/stat
yo
‘More detailed information about specific devices can be found by:
f dostat -a -x /dev/eda
+ output omitted...
iets [Description
[Device _|name, in devi format, where # is the major number
4 dostat -a 5 7) "
Device: tps slkread/s Blkwrtn/s Blkread slivrtn — [F£90/s [merged read requests per secon:
sa 093 aS 35.68 4433955 3456508 [argn/s [merged write requests per second
fas 8.08 8 32 ® 7 7
add 8.12 36 2:51 35283 ageng [2/8 __ toad requests per second
are 8.64 oa 1st dea 146162 [rs write requests per second
Field [Description [rsec/s | sectors read per second
[Device _[name, in devé format, where # is the majornumber| _[wsec/s [sectors written per second
ps 17/0 requests por second [avarg-sz average sizo of requests, in sectors
BUK_read/s [aumber of blocks read per second
[avaqu-sz average quove length of requests
BTK wrtn/s [pumber of blocks writen per second
[avait [average wait ime until 170 request served
UK read [total numberof blocks read
favctm [average time to service 17 O requests in seconds
UK wrta [total number of Blocks writen
[tatit [percentage of CPU time used servicing VO requesta
183(CPU Details
Feirly detailed information about the processors} on the systom can
be obtained from the /proc/cpuinfo file as shown in the following
example:
5 cat /proc/epuinto
processor °
vendor_id Genuinerntet
cpu family 1 6
rodel PB
rodel nane
ee and
15-4
System Status - CPU
Process Files
* Iproclepuinto
‘Shows dealed info about physical CPUs)
estat -<
"= provides 8 summery of CPU uilization
pstat
‘provides detailed CPU uttization
+ Including forced sleep by Xen hypervisor
(CPU Status (unl-processor)
‘Several tools provide information about the system CPU configuration
‘Futilzation. This command displays an average, for all processors:
4 fostat -<
avg-opu: Buser nice tsys Nidle
8.81 93.96 8.83 6.64
See
Se Ses ae ere
fi [acon oa eee
aaa care
RRA RRR RAR ARERR RR ARAAA RR RAR RAOVeer www wuWwwwUwWuwwwweuwwwwwwed
(CPU Status (multi-processor)
Slightly more detailed information can be obtained with mpstat,
‘which fs usually run with the =P ALL option to display information
‘bout all processors on the system
f mpstat -P ALL
ad 8.48
a7
a
Field [Description
is avorage of all CPUs; a number is a specie CPU
tnice faye Hovait —tirg
Asoft. tsteal
8.
[taser [percentage of CPU used for user processes
[tnice [percentage of CPU used for niced user processes
[says [percentage of CPU used for system processes:
siowait [percentage of CPUICPUs time spent waiting for isk UO
[srg [percentage of CPUICPUs time spent servicing interupts
[ssoft | percentage of CPUICPUs time spent servicing softwar
inerrupts
fisteat [percentage of ime spent in involuntary wait by virtual
ICPUICPUs while the hypervisor was servicing another
vital CPU
fbiate_|percontage of CPU time spent idle
jintr7s_[number of interrupts per second processed
tidte
99.27
98.99
intr/s
335.15
15-5‘System Activity Reporter
‘The System Activity Reporter (sar) is flexible tool which can be
sed to observe realtime performance data or take samples over
time to show performance trends.
‘The basic syntax ofthe sar command is:
sar [option] [count]
‘The option indicates what output is desired, and the optional count
‘says how many samples are desiod. Useful options include:
[Option Result
=A__ [display (almost) all avaliable statistics
a
display disk VO statistics
=| display system paging staisties
=e __| display process creation statistics
=4___|sioplay perclisk activity statistics
Gisplay interrupt statistics: irq is the number of the IRQ to
sisplay: SUM displays the total numberof interupts; PROC
sisplays the number of interupts per processor: ALL
sisplays tho fist 16 intorupts: XALL displays all
4
Bega;
sar
Performance Trending
sade
* system process located at /usr/Iib/sa/sade
‘an rom /ete/eron.d/sysstat
1 loge ectiy to ves tog/sa/
Shows system activity overtime based on sade logs
cafes to clplaying data from 10 minute inteals
devices: EDEV displays network errors: SOCK displays
| soce | statistics on network sockets; FULL displays all network
[rus | statistics
-4__ | aisplays queve lengths and load averages
mz [displays memory and swap utilization statistics
=a [displays memory VO statistics
=u [aisplays CPU valization statistics
=v [displays kernel file table statistics
ma |alsplays switching statistics
x _|alsplays ewepping statistics
lsplays process staisics, pid is the PID of the process to
monitor, SELP indicates the sar process itself; SUM displays
system processes
major and minor fait statistics; ALL cispays statistics for al
cispiays stats about
fof the process whose children are monitored: SELF
ingicates children of the sar process itslf; ALL displays
statistics for child processes of all system processes
156
[=7 [display staistios about TTY activity
RAR RARE ERR RAR RAR RANA ARANewww wwwwewwwwuwowwwwwwwwwowed
Investigate
In many ways, troubleshooting is lke reading a good detective novel
‘One must not stop atthe fist possible explanation. Instead, evidence
‘must be gathered and analyzed. The investigation can't end until
theres enough proof to cstinguish truth from fiction. Sometimes the
{ruth is exactly as expected, and the challenge is proving it. Other
times, the ruth wall be quite cifferent from what was originally,
suspected.
Dont form an opinion too quickly when troubleshooting. A change
‘that might corect one problem could make another problem worse
‘A superficial ix that only addresses @ surface problem, without
Correcting the root cause, may hide the problem for awhile, but
‘eventualy the problem will return, perhaps making the problem
worse!
Without proper understanding of the problem, itis rare to antive at 8
{900d solution. Be patent. Make sure the problem is understood
betore trying to solve it. Such obvious advice may seem ridiculous,
but itis surprising the numberof times its ignored.
Document Discoveries
IT knowing is half the battle, what's the other hall? Remembering
While it may be possible to keep everything in one's head during 3
‘short troubleshooting session, the longer the session, ofthe higher
‘the stress of the situation, the more important it becomes to keep a
record of everything learned during the process. Dorit make things,
Troubleshooting Ba:
lovestigato
‘Decument Discoveries
Ferm Multiple Hypotheses
‘Test Each Hypothesis
Consider Al Solutions (Then Pick The Bost)
Document Changes,
Fepoat the Process (H Needed)
too complicated. Paper and a pen is enough to get the job done, but
ifthe voubleshooting session is lang or the problem reluns, 8 few
basic notes taken during the troubleshooting process can be
invaluable. Good notes can also be the basis of future
documentation
Form Multiple Hypotheses.
Eventually, enough information willbe gathered to support at least
‘one hypothesis. At this point itis tempting to jump in and attempt to
solve the problem. Resist this temptation! Take a moment to
‘consider and rank other possible hypotheses. Many problems are
‘easy to solve, but only alter one stops focusing on the wrong
‘hypothesis.
‘Test Each Hypothesis
Even after forming hypotheses, continue to collect information. If
possible, perform one or more experiments to confirm the most likely
‘exlanation. Such confirmation is especially important when the fx is
potentially dangecous or time consuming. A good experiment may
‘ao reveal 8 more serious underlying problem. Without careful
confirmation, one may end up repeatedly addressing surface
problems without ever artving at the root cause.
Consider All Solutions (Then Pick The Best)
‘Many problems allow multiple solutions. Instead of immediately
choosing the first solution that comes to mind, consider alternatives,
Fecusing too much on one solution may resull in ignoring a faster,
167safer solution, Taking time to consider the consequences of 2
Solution is @ertieal part of not making the problem worse.
Document Changes
{All changes made during the troubleshooting process should be
‘documented. ideal, any and al changes to @ system should always
be documented. It a change is later discovered to be inappropriate,
something as simple as a chronological ist of chenges on a piece of
‘baper can save significant time when reversing unintended damage.
Repent the Process (W Needed)
‘Sometimes resolving one problem can reveal another. In these cases,
itis tempting to assume the new problem is just an extension of the
‘ld. While that may often be true, care should stil be takon to follow
proper troubleshooting procedure, not just slap @ quick fix on and
hope forthe best.
15-8
POR ARERR RRR ARR RRR AAR NROCOCONUT U wwe wwwwwwwwwwwod
‘Troubleshooting Basics: The Tools
Every Tool Is Useful
‘Analyzing Log Files
‘kc, doesg. grep. head, tail
Reviewing System Status
‘cat fete/*vrelease, chkconfig, date, 4, dig, free, getfacl,
1s, tsar, mount, netstat, ps, rpm, service, top
Reviewing Network Settings
‘cat /ete/resolv.cont dig, ethtol, ip addr, ip route
Every Too! Is Useful
‘There is no replacement for a deep knowledge of a system's services and subsystems, and the tools that can be used to explore and control
ther. Under the right conditions, any command is
Potential troubleshooting tool. However, some tools are more commonly used than
‘everyting else. These most common tools are presented inthe next severl tables.
‘Analyzing Log Files
Log files should always be examined earl in the troubleshooting process. Alot can be learned quickly using basic tools to examine the
systems logfiles.
[Command [Description
fark | The eblity to rapidly create simple reports from logs by fiterng cerlain columns, summarizing, and applying formatting can reveal
things that would otherwise be lost inthe noise ofthe fogs.
[While some Kemel events are recorded in system logfiles, others may only be present in the Kernel fing butfr.
Log files are often filed with a mixture of useful and lrelevant information, While ile tempting to attempt mental tering, is
better to use software to do it
Wile things sometimes bresk as a result of long past events, most often a recent change Is atthe root of new problems.
Monitoring logs as new information fs added can accelerate the woubleshooting proc
159Reviewing System Status
‘Minor misconfiguations can have surprising side effects. Early in the troubleshooting process iti important to review the general
system
[commana
[cat /ete/*-release
Description
[When working in on environment with multiple Linux versions, Knowing how To quicky determine whats running on &
system can be helpful
‘ctkconfig list
[service ane status|
pore daemon
pe ef
[Sometimes things break when a service is running that shouldnt be. More offen things break when @ service should
be running but isnt. If service depends on multiple daemor's, make sure that al are runing.
te
[As services becoming increasingly time sensitive, systems can break in unexpected ways when thelr clocks are wrong
ath Running out of disk space can cause many applications to exhibit stvange behavior. Just as It is possible to run out of
jaf ~ih data blocks, itis also possible to run out of inodes. Some Linux filesystems can automatically allocate more inades, but
Jotners can not
fees When Linux rons low on available RAM, its forced to page data out to swap, resulting in poor performance. Running
low on memory can also force the kerel to activate the OOM Kil
avaliable but actualy unusable,
‘This can even happen when memory is technically
Improper file permissions are a common cause of unexpected behavior In adalion to wadlional Unix permissions,
[consider other options. like SELinux contexts, extended attibutes, and FACLs.
Pay attention to not only what is mounted, but which options the flesystoms are mounted with, Unfortunately, when
the root filesystem is in readonly mode, the /etc/mtab fle ean not be Updated and tho output of the mount command
may be incorrect. For this reason It may be necessary to query the kernel directly
netstat -tulpa [A sovvice may be running but not Iistening tothe right pant or address. Even if a service is configured to listen to the
sof =i [correct pon and adress, it will not be able to if another process has already claimed it.
pa Va Teis normal for configuration fles to change. lis not narmal for binaries and Worries to change, Changes may indiate
fling storage, bad memory, or human action
fruntevel [When a system is brought up manually, tis easy accidentally leave itn an incorrect runlevel. Knowing the current
runlevel is also important when confirming the correct services are running.
top In addition tots default view, top Includes several useful commands to tweak what information lt shows and how. ITs
also a simple interface for certain types of process management,
o [When woubleshooting iis important to know wo is using the system.
15-10
RRA RARER RR AREA A RRA ARANwww Uwwewwwwwwowwowwwowwwwuwwud
Reviewing Network Status
Linux has a very network oriented design. Even services only listening on localhost can break in unexpected ways when the network is
‘misconfigured or misbehaving,
[Command [Description
aig Many services depend on working name resolution,
ost
cat /ete/resolv.cont!
fethtoot [When physical indicators are not visible, the ethtool command can be used to check lnk status, It can also be used
Inti-tool to correct link level setings when autonegotation fails,
ip addr [Although ist possible to view most of a systems network configuration using ifconfig the ip command wil
ip route provide more correct and more complete information.
feepeump While i may be tedious to get spectic information about network behavior using tepdump, ils helpful for gating @
general idea of whats happening on the network
8.11‘The strace Command
‘The stace command can be used to intercept and record the
system calls made, and the signals received by a process. Ths allows,
‘examination of the boundary layer between the user and Kernel space
‘which can be very useful fr identiving wy a process is failing
Using strace to analyze how a program interacts with the system is
‘especially useful when the source code ls not ready avaliable. In
‘addition to its importance in troubleshooting, strace can provide
‘deep insight into how the system operates,
‘Any user may trace their own running processess; additionally the
root user may trace any running processess. For exemple,
fallowing could be used to attach to and trace the running slapd
‘daemon:
4 strace -p $(pgrep slapd)
+ + + output omitted.
‘trace Output
‘Output from strace will correspond to either a system cal or signal
‘Output from a system calls comprised of three components: The
‘system cal, any arguments surrounded by parenthesis, and the result
(ofthe call folowing an equal sign. An exit status of =1 usually
Indicates an error. For example:
§ strace 1s enterprise
exceve("/bin/ls*, {"le", “enterprise'}, [/*
brk(@) ‘= éxdaa2000
16-12
vars */]) = 6
strace and Itrace
Tacos systems call and signals.
1 Sp planus atoch to 3 running process
222 Hlenane— output te
1 We tracerenpe— spociy 2 fiter for whats esplayed
“i Tolow fd wace) forked chid processes
222 show counts,
teeaee
very sia to steaes in functonslity and options but shows
‘Syoe Roery calle nade by Uns proouns
25 show system als
1 -a 3 indent output for nested cals
access(*/ete/Id.so.preload", ROK) = -1 BNOENT (Wo such >
file or directory)
open(*/ete/d.so.cacie", 0 ROOMY) = 3
esnips ss
Curly braces are used to Indicate dereferenced C structures. Square
braces are used to indicate simple pointers or an array of values.
Since steace often creates a large amount of output, it's often
Convenient to redirect i to fil, For example, the following could be
Used to launch the Z shell, trace any forked child processes, and
record all le access tothe fies. trace file:
4# strace ~f -0 files.trace -e trace=file 25h
+ output enitted .
un the 1s command counting the number of times each system c
‘was made and print tolls showing the number and time spent in
‘each call (useful for basi profling or botleneck isolation
4 strace -c le
‘output omitted .
‘The following example shows the three config files that OpenSSH's
sshd reads as it starts. Note that strace sends its output to STOERR
by default, so if you want to pipe it to other commands like grep for
further altering you must redirect the output appropristely:
4 strace ~f ~copen /asr/sbin/sshd 41 | orep ssh
+ + + output omitted. .
Trace just the network related system calls as Netcat attempts to
Se ea een ea ree eo canWCC WUWOUWT EWU oUUNUwwUwewowwNd
‘connect to a local telnet service:
| steace -e tracemnetwork ne localhost 23
+ output omitted...
‘The trace Command
‘The Ltrace command can be used to intercept and record the
dynamic eals made to shared libraries. The amount of output
‘generated by the trace command can be overwhelming for some
‘commands (especialy ifthe ~8 option is used to also showy system
calls). You can focus the output to just the interection between the
program and some list of libraries. For example, to execute the id -2
‘command and show the calls made to the Libselinux.so module,
execute:
§ Verace -L /Lib/Libselinux.so.1 id -2
is_selinux enabled(@xcicTas, 8:9£29168, Sxclaffc, 8, -l)>
=
getconléx804c2c8, exfee80Et4, exB64b179, 8840828, 8) >
user_ussysten_runcontined_t
Remember that you can see what libraries @ program is inked against
using the 1dé command,
15-13Common Problems
Incorrect flo permissions
Inability to boot
Imeorrest bootloader installation
Corrupt ile systems,
Typos in configuration fles
Disks fu
oro fre blocks
+ no tres inoges
Running out of (virtual) memory
Runaway processes
Problems and Solutions in Linux
When prablems aceu in Linus, they manifest thameslves via
‘symptoms. Sometimes there is 3 stong correlation between the
‘symptom and problem such that determining the root problem is very
‘2887. Other times, the problem manifests itsell in symptoms that are
‘ube or otherwise non-obvious. These latter types of problems can
'be especialy dificult to solve the frst time you encounter them,
Experience isa large i not the largest, factor in speedy
troubleshooting. The upcoming pages document common problems,
thor symptoms and solutions, Hopeful this wil sve you Ue i 1
tue.
18-14
Pe es ee ee en eewwwwwwwwwwwwwwwwwwwwvvvvwwowwswuud
Troubleshooting Incorrect File Permissions
‘very strict permissions
‘tap? not world writable and sticky (1777)
Filesystom is «hierarchy
What are the permissions on /
What UID & GID Is deomon running as?
= What are the permissions on te lest
« Vina ae the permissions on parent directories?
Incoract SELinux fle context
Possible Solutions
Incorrect fle permissions can be very dificult to debug. For example if permissions on /tmp/ are incorrect, xfs wil ceport that it started
‘successtully, but when making an attempt to write a socket in /tmp, wil fall and silently ex.
strace and Ltrace ar invaluable for debugging opaque problems lke these. Use strace on the daemon or program to See it itis getting any
"permission denied’ error messages. Then take the appropriate action to fix the permission problem.
ne problem which can be dificult to diagnose is incorrect permissions on the root directory, /. To see the permissions on /, use the 1s ld
1/eomerand. On Linux, you should see:
§ is “ta /
drwix-xx-x 22 root root 1024 Sep 18 67:25 /
(On RHEL, the root directory does not have write access for the owner, i.e. 0585 mode,
Incorrect SELinux Security Context
‘On RHELS, probloms with @ weong SELinux context can be fixed with the cheon or restorecon commands. For example, inorder for Apache to
be able to read a filet needs to have the ht tpd_sys_content_t SELinux context. Here isan Mlustration of a common problem:
4 cd /var/inea/htal/; Is -alt
Grwxr-or-x. Toot root systen_uzobject_r:httpa_sys_content_tist .
rwnr-xr-x. root root systenusobject_rshttpd_sys_content_trs®
root root unconfined vrobject_r:httpd_sys_content_t:s@ index.html
sritist logo.pag
has the incorrect context. It can be corrected with
‘The ogo. fi
# chcon -t hetpd_sys_content_t Logo,png
15-15X86 Linux Boot Process
Bios
2% Quick Hardware Check
2€ ail Hardware configuration
Selects and passes contol to Master Boot Record on boot
‘device
LILO / GRUB first stage in MBR.
2 Executes 2nd stage
Boot loader 2nd stage
der
28 Starts execution of kernal
Linux kemet
2 nitiaizes and configures hardware
48 Finds and decompresses intramis image(s)
Loads drivers
4% Creates foot device, mounts root patton read-only. Location of
‘oot filesystem is usually passed to kernel by boot loader
14 Executes /sbin/init (can be overidden with init=/sex/ex
option)
Int (inital process)
Reads /etc/inittab, Determines defauit runlevel: if not
15-16
Inability to Boot
Understanding exact stops performed during booting wil greatly aid
‘roubleehooting
‘108, hardware, and boot loader problems
Kernel issues
= Does Komal have divers fr device that has rot filesystem, and
forthe (00 fisystem type?
Flesystom Corruption or Label erors
“= Flsystom is corupted enough that the kemelcennot mount i
readonly
Configuration tle errors exe! flow
= Tete/inittab, /ete/tstab, bot leader contig
‘defined, prompts.
1 Executes a variety of scripts to bring up core system services.
Scripts executed by init on a RHELG system include:
1 Processes seLinux and enforcing parameters passed to kernel
Uses Libsetinax to mount the /selinux flesystem. Loads
SELinux policy.
1 /ete/ze.4/re-sysinit which performs inal tasks that ae the
‘same regardless of the runlevel being entered.
28 fete/ze.d/re which runs all SysV iit scripts referenced by the
‘symbolic inks in the corresponding /ete/rcx.d/ directory for
tho runiovel being entered.
Starts mingetty’s attached to 9 set of local vitual terminals,
28Starts a display manager such 2s gam if entering runlevel 5.
RAR ARR RA RAR RRR RRA ARAMA ARRAYwe wewewwewwewwwwwwwewwYwwwrwwwwwwd
‘Typos in Configuration Files
‘Typos in ortan configuration fle can prevent succesful boot
= boot loader con es
2 fetc/initias
1 Jeve/fatab
Cheek /var/tog)*
(Cheek documentation for proper syntax
Essontial Configuration Flos
Typographical errors in essential configuration files can prevent Linux
from booting correctly. This can occur when mistakes are made in
the boot loader configuration files, In order to recover from an
incorrectly installed boot loader, the system usually needs to be
booted off of rescue media. To fx mistakes in other important files,
such as fstab or inittab, you can usually do a minimal boot off of
the local installation, either by booting with a rescue shell instead of
int (init=/bin/bash atthe bootloader command line) or by booting
into runlevel 1 appending 1 to the kernel parameters from the GRUB
boot loader.
(On RHEL, the GRUB configuration file that defines the menu items.
displayed on boot isthe boot /grub/grub.coaf fe. The
‘/etc/grub.cont and /boot /grub/nens.1st file are symbolic inks to
‘this contig file.
1617Solutions:
‘At boottime, all flesystems are checked to see if they ae in clean
state. Flesystems are marked clean as the final step of being
‘unmounted. If @ fllesystem is not clean, fsek is run against it. fsck Is
2 front-end program wich calls an appropriate program, (89
(e2fsckiB),reiserfsck(8), xfs_check(S) to check the filesystom.
If there is enough damage to the filesystem that data might
potentially be lost, the automatic filesystem check halts, and you are
prompted to manually tun feck. You wall be prompted to press at
tech repair step. This can become quite tedious, 50 you can use feck
=y to tell Fsck to always assume "yes"
Files recovered by fsck wil be stored inthe lost+found/ directory at
the root ofthe filesystem from which they were recovered.
‘ost#found/ is not a normal directory file, but rather one wihich has a
pool of prellocated data blocks avallable fr its use. Ifit gets
Femoved it ean be recreated with the most +found command.
15-18
Corrupt Filesystems
Filesystoms need to be clean before they are mounted roed-write
Unioss major damage has occurred, uncloan Mesystems ean still be
‘mounted read-only
‘Flosystem could be unclean because of hardware falure
{ck will return flesystom to consistent stato
= Flesystem stscturesrepaved, not Mlesystom dats
1 Can take @ LONG tine on vary logo flesystems
Journaling filesystems partially prevent need to run fsck
"Burs, Ela, JFS, ResorFS, ES
AVe wwwewwewwewwwUwwwwewwrwwewwwwwsd
‘The RHELG Rescue Environment
Using the rescue environment is @ last-stop efor, nat a first step.
‘Most problems do not require its use, and can be solved more easily
ina more fulfeatured envionment, such as single user mode. The
rescue environment has 2 lrvted set of tities, One useful lity is
the chroot command:
| chroot /ant/sysinage
chroot will change / (the roo filesystem) to begin from
/ant/sysinage/. From there you can run grub-install, vi, ena
{and everything else as you would normaly
W Auto-mounting Filesystems Fail
Use mknod to create device nodes, use fdisk -1 to view partitions.
run fsck against filesystems, then mount if needed) the root
filesystem at /mnt/sysinage/ (and other filesystems under
‘Innt/sysinage/ 2s needed).
4 nknod /dev/héa
4 mknod /dev/hdat
1 mknod /dev/hda2
i taisk -L /aevinsa
4 edfsck -y /dev/ndat
4 edfsck -y /dev/nda2
Wrount /dev/ndax /ant/sysinage
Rescue Environment
‘Some problems require use of a vseue mode
= Boot of install sk 1 and to Linur Fescue atthe boot prompt
+ Fesystems wil be mounted undo fat /eysiage/ if possilo
* After fing the problem, type exit et the promt to reboot the
system
4 chroot /ant/sysinage
(On RHEL, the rescue environment provides several options for
detecting and mounting Linux fiesystems. One option is for the
rescue environment to scan all gives on the system and
automatically mount detected Lux filesystems in read-urite mode.
nether option will mount detected Linux filesystems in read-only
‘mode. The last option is to skip the detection of Linux flesystems
completely.‘Task 1: Recovering Damaged MBR.
Page: 1521 Time: 18 minutes
Requirements: & (1 station) @ (classroom server)
18.20
Lab 15
PET CTT LTee CwewwwwrwwwwrUEUwwworwerwweswuved
oie th damaged MOR T: ie k1
Jae te rescve environment to recover a damag
as!
Requirements Recovering Damaged MBR
'B.(1 station) (classroom server) Zaee aS TE eee
Relevance
Certain problems are severe enough that they render the system
Lunbootable by normal means. In these cases, you must boot the system
sing an alternate boot medi and possibly aitenate roo filesystem to
repair the system. Ths task teaches how to enter and use the rescue
‘environment
1) The following actions require administrative privileges. Switch to root login
shall:
8 su -1
Password: makeitso E
for any disk or filesystem disaster includes having 2 copy of
3nd what each partion contains. At a minimum you should
know where your /boot, /, use, and /var filesystems are located. Use df to view
‘where they are currently located
tae
= output onitted .
3) Record which partition contains /boot:
Result:
4) Record which parttion contains /:
Result
55) Record which parttion contains Just:
Resut:
16-2166) Record which partion contains /var:
Result:
7), Overwrite your MBR by running the command listed below. Please double-check
your syntax before running the command, as simple typos here can easly render
‘the machine inoperable (beyond the scope of this course to recover}
4d ifs/dev/zero of=/dev/ sda countel be=446 ‘Carey tye this command as any typos coud
«output onitted . ‘campleteydesboy you Lux stalin,
reboot.
£8) Your instructor will provide instructions on whether PXE is available, and if so how
to PXE boot your lab system. I PXE isnot avilable, skip to the next step.
PXE boot your workstation,
Most Intl systems do tis with the key after POST.
From the PXE menu, select Rescue Node by typing XEx (replace with appropiate selection from description in menu).
‘This wil bot tothe Rascue environment. Skip to 10,
9) I you are using the installation DVD, the Rescue CO, or boot. iso mede CD,
complete the fllowing:
Boot from the optical media provided by your instructor.
‘At the menu, select Rescue installed systen
10) After a moment the rescue environment will ask about language and keyboard
preferences, Select as appropriate,
11) I you are doing @ network rescue with no DHCP server, supply the following
networking parameters tothe first stage ofthe installer. using DHCP, only the
NPS Server and Directory settings will be necessary. Be sure to replace distro
withthe proper value forthe version of Linux you are using:
15.22
RRR RR RAR RAR RRR AR RAR ARTA AAASVe ewe ewww wEUNWwYUwUUWNweNWWwed
12)
13)
IP address
Netmask
Dotautt gateway 0 254
Primary nameserver 254
INFS sever 254
Directoy /export/actinstall/aistro|
Iwill then ask i t should attempt to locate and automatically mount the Linux
filesystems. in many situations this will work fine and you would choose
Continue. I would then mount the root filesystem at /ant/sysinage/ and mount
child flesystoms as wl
In order to become versed with the steps required in the worst case
ot have it attempt automatic mounting, Select Skip to go deectly to
command prompt, Select Shell Start Shell from the menu:
nai. do
Starting shell.
Dash-4.i8 fdisk —t * Not the parton ruber for your Bat pation
Ifthe system boing rescued is using LVM based devices, it may be necessary to
siscover, import, activate, of create the necessary device files;
1 lndiskscan ‘ln some cass, misconiwed extemal storage targets,
tae eM es + such a5 SCSI o FE, alto automaticaly detect Note
idev/sd (74.51 GiB) int casa tat 1 LV phil volume is dtecta,
6 disks
17 partitions
@ Lin physical volune whole disks
L WM physical volume
# lum vgs “ther is an x inte atts then the Volume Group
We tp HL ASW attr Size Free has teen exgertd and wil eed tobe inported wing
vgstationk 15 8 ig-n~ 34.189 25.689 ‘mwpimort vy station.
1 lum vochange -a y vg_stationx
5 ogical volune(s) in volune group "vg.stationX nov active
Is /dev/vg_stationt/
Waroot Wwisvap Wtemp Wiser Wvar
15-2314)
15)
16)
15-24
‘Mount the normal filesystems under /ant/sysinage/:
f mkdir /ant/sysinage
oust /dev/vg_stationt/Lv_soot /ant/sysinage/ + Once the oot lesystem i mounted is possible to
look at fant /sysinage/ete/ fstab to see the somal
f mount -o bind /dev /ant/sysinage/dev/ ‘mounts. LABEL or UUID rlrences ae used instead of
4 aout -o bind /sys /ant/sysinage/sys/ the device flonome, use te fds or bIkLd
4 out 0 bind /proc /ant/sysinage/proc/ cmmards to ientty the device lenmes
# mount /dev/vg_stationt/tv_usr /ant/sysinage/usr/
4 aout /dev/vg_stationx/Iv_var /ant/sysinage/var/
4 mount /dev/vg_stationX/lv_tap /ant/sysinage/tnp/
# mount /dev/edal /ant/sysinage/boot/
4 chroot /nnt/sysinage/ +The choot command may not prove an inéeaton
that you are curently in a pseudo soo.
Repair the master boot record (MBR) on the primary dive:
4 grub ‘The hit and bX parameter passed to grub shud be
‘grub> root (ho, 0) the haré lve ond partion te BOS toot ram
oot (hd8, 8)
Filesystem type is ext2fs, partition type 6x83
‘grub> setup (hab)
snips
Done
grub> quit,
GRUB should find its stagot,stagel_S, and stage2 images, and report succeeded
in anything run after the checks. If there are problems, ask your instructor.
xt chroot, exit rescue mode and reboot the system. Once the system has
‘rebooted, remove any boot media you used and start a normal boot process. I
the steps above were performed correctly your system should boot in @ normal
fashion
fb exit ‘teint the cvooted ste
exit “esminte th seve envoment
Soloct reboot Reboot from the menu.
RRA RRR AR RAR RAR RAR AAA AAA RATA AAD