You are on page 1of 48

IBM

DB2

for Linux

, UNIX

, and Windows

Best practices
Troublesootin! DB2 ser"ers
Nikolaj Richers
Information Architect
IBM
Amit Rai
Advisory Software Engineer
IBM
Serge Boivin
Senior Writer
IBM
Issued: January 201
IBM
Table of Contents
!rou"leshooting #B2 servers$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$1
%&ecutive summary$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
Introduction$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$'
(no)ing )hen to contact IB* +or hel,$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$-
%&changing in+ormation )ith IB* through %cuRe,$$$$$$$$$$$$$$$$$$$$$$$$$$$$-
Be ,re,ared: con+igure your data server ahead o+ time$$$$$$$$$$$$$$$$$$$$$$$$$$$$$-
Redirect diagnostic data a)ay +rom the #B2 installation ,ath$$$$$$$$$$$.
/or greater diagnostic logging resilience0 con+igure an alternate
diagnostic ,ath$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$.
Redirect core +ile dum,s and /1#2 data to a di++erent directory
,ath$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$10
2on+igure +or rotating diagnostic and administration noti+ication
logs$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$11
Regularly archive and delete diagnostic data$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$12
3rovide enough +ree s,ace to store diagnostic data$$$$$$$$$$$$$$$$$$$$$$$$$$$$14
/irst ste,s +or trou"leshooting$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$1
/irst occurrence data ca,ture 5/1#26$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$1'
d"2diag and administration noti+ication logs$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$1-
#B2 tools$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$17
1,erating system tools and log +iles$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$20
*onitoring in+rastructure$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$21
2on+iguring in8memory metrics +or trou"leshooting$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$22
%vent monitor in+rastructure$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$24
!e&t re,orts +or monitoring data$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$2
*inimi9ing the im,act o+ trou"leshooting$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$2-
2ollect diagnostic data only )here the ,ro"lem is occurring$$$$$$$$$$$2-
2ollect only the diagnostic data you need$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$2-
Avoid service delays due to trans+erring diagnostic data$$$$$$$$$$$$$$$$$$2-
Scenarios$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$2:
Troubleshooting DB2 servers page 2 of #$
Scenario: !rou"leshooting high ,rocessor usage s,ikes$$$$$$$$$$$$$$$$$$$$2:
Identi+ying the ,ro"lem$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$2:
#iagnosing the cause$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$27
Resolving the ,ro"lem$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$44
Scenario: !rou"leshooting sort over+lo)s$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$44
Identi+ying the ,ro"lem$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$44
#iagnosing the cause$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$44
Resolving the ,ro"lem$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$4'
Scenario: !rou"leshooting locking issues$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$4-
Identi+ying the ,ro"lem$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$4:
#iagnosing the cause$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$47
Resolving the ,ro"lem$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$1
Best ,ractices$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
/urther reading$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$-
2ontri"utors$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$-
2ontacting IB*$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$-
Notices$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$:
!rademarks$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$7
Troubleshooting DB2 servers page % of #$
&xecuti"e su''ar(
%ven in a ,er+ectly engineered )orld0 things can "reak$ ;ard)are that is not
redundant can +ail0 or so+t)are can encounter a condition that re<uires intervention$
=ou can automate some o+ this intervention$ /or e&am,le0 you can ena"le your #B2
server to automatically collect diagnostic data )hen it encounters a signi+icant
,ro"lem$ %ventually0 ho)ever0 a human "eing must look at the data to diagnose and
resolve the issue$ >hen the need arises0 you can use several #B2 trou"leshooting tools
that ,rovide highly granular access to diagnostic data$
Troubleshooting DB2 servers page # of #$
Introduction
!he in+ormation and scenarios in this ,a,er sho) ho) you can use the #B2
trou"leshooting tools to diagnose ,ro"lems on your server$
In large data"ase environments0 the collection o+ diagnostic data can introduce an
un)anted im,act to the system$ !his ,a,er sho)s ho) you can minimi9e this im,act
"y tailoring the values o+ a +e) "asic trou"leshooting con+iguration ,arameters such
as diagpath0 DUMPDIR0 and FODCPATH and "y collecting data more selectively$
!he result? >hen things do "reak0 you are )ell ,re,ared to make trou"leshooting as
<uick and ,ainless as ,ossi"le$
!he +ollo)ing #B2 trou"leshooting scenarios are covered in this ,a,er:
!rou"leshooting high ,rocessor usage s,ikes
!rou"leshooting sort over+lo)s
!rou"leshooting locking issues
/or each scenario0 this ,a,er sho)s you ho) to identi+y the ,ro"lem sym,toms0 ho)
to collect the diagnostic data )ith minimal im,act to your data"ase environment0 and
ho) to diagnose the cause o+ the ,ro"lem$
!he target audience +or this ,a,er is data"ase and system administrators )ho have
some +amiliarity )ith o,erating system and #B2 commands$
!his ,a,er a,,lies to #B2 @10$1 /32 and later0 "ut many o+ the +eatures that are
descri"ed here are availa"le in earlier #B2 versions as )ell$ /or e&am,le0 some o+ the
servicea"ility +unctionality +or large data"ase environments )as introduced in #B2
@.$: /30 and user8de+ined threshold detection +or ,ro"lem scenarios )as introduced
in #B2 @.$: /3'$ I+ you are not sure )hether s,eci+ic +unctionality is su,,orted +or
your #B2 version0 check the in+ormation center +or that version$
Troubleshooting DB2 servers page ) of #$
*nowin! wen to contact IBM for elp
An im,ortant ,art o+ trou"leshooting is kno)ing )hen you cannot +i& a ,ro"lem
yoursel+ and you must ask +or assistance$ I+ you have a maintenance contract0 you can
engage IB* Su,,ort )hen you think that a ,ro"lem goes "eyond the sco,e o+ )hat
you can or )ant to +i& yoursel+$ !here might also "e some ,ro"lems that you most
likely cannot +i& yoursel+0 such as ,ro"lems that re<uire diagnostic tools that are not
generally availa"le outside o+ IB* or ,ro"lems that indicate a ,ossi"le ,roduct de+ect$
/or in+ormation a"out ho) to contact IB* and the availa"le su,,ort o,tions0 see
A2ontacting IB* So+t)are Su,,ortB in the #B2 in+ormation center
5htt,:CC,u"li"$"oulder$i"m$comCin+ocenterCd"2lu)Cv10r'Cto,icCcom$i"m$d"2$lu)$admi
n$tr"$docCdocCt00'4:1-$html 6 $
Exchanging information with IBM through EcuRep
IB* has set u, a standard method that you can use to e&change diagnostic
in+ormation0 called the %nhanced 2ustomer #ata Re,ository 5%cuRe,6$ %cuRe, makes
it easy +or you to associate the diagnostic data that you u,load )ith a ,ro"lem
management re,ort 53*R60 so that IB* Su,,ort ,ersonnel can +ind your data <uickly$
!o avoid delays0 there are naming conventions to +ollo) )hen you ,re,are your
diagnostic data +or u,loading$
/or in+ormation a"out ho) to use %cuRe,0 see A%nhanced 2ustomer #ata Re,ository
5%cuRe,6B 5htt,:CC)))80'$i"m$comCdeCsu,,ortCecure,Cinde&$htm6$
Be prepared+ confi!ure (our data ser"er aead of
ti'e
!he ,ur,ose o+ con+iguring ,arameter and registry varia"le settings "e+ore you
encounter ,ro"lems is to minimi9e the im,act o+ diagnostic data collection and to
ensure that diagnostic data is availa"le )hen you need it$ Denerally0 you )ant to
control0 not su,,ress0 diagnostic data collection$ *ost im,ortantly0 you must control
)here diagnostic data is stored0 and there must "e enough +ree s,ace to store the
diagnostic data$
!o see ho) your server is con+igured to "ehave during diagnostic data collection
)hen a critical error occurs0 issue the db2pdcfg command$ !he out,ut sho)s ho)
your data server res,onds to critical events such as tra, conditions and )hat the
current state is$ Signi+icant events0 such as critical errors0 trigger automatic data
ca,ture through +irst occurrence data ca,ture 5/1#20 sometimes also re+erred to as
db2cos60 )hich is descri"ed else)here in this ,a,er$
Sam,le out,ut +rom the db2pdcfg command is as +ollo)s:
Troubleshooting DB2 servers page , of #$
db2pdcfg
Current PD Control loc! "ettings#
All error catch flag settings cleared$
db2cos is enabled for engine traps$
PD it%ap# &'(&&&
"leep Ti%e# )
Ti%eout# )&&
Current Count# &
Ma' Count# 2**
Current bit%ap +alue# &'&
Instance is not in a sleep state
Thread suspension is disabled for engine traps$
D2 trap resilience is enabled$
Current threshold setting # & , disabled -
.u%ber of traps sustained # &
Database Me%ber &
FODC ,First Occurrence Data Capture- options#
Du%p director/ for large ob0ects ,DUMPDIR-1
2ho%e2hbrites22s3llib2db2du%p2
Du%p Core files ,DUMPCOR4-1 AUTO
Current hard core file si5e li%it 1 Unli%ited
Current soft core file si5e li%it 1 & /tes
/or more in+ormation a"out the db2pdcfg command0 see Ad"2,dc+g 8 2on+igure #B2
data"ase +or ,ro"lem determination "ehavior commandB
5htt,:CC,ic$dhe$i"m$comCin+ocenterCd"2lu)Cv10r'Cto,icCcom$i"m$d"2$lu)$admin$cmd$
docCdocCr00242'2$html6$
=ou ty,ically make con+iguration changes +or trou"leshooting in one o+ t)o ,laces:
!he #B2 ,ro+ile registry
!he data"ase manager con+iguration
/or #B2 ,ro+ile registry varia"les0 there is an im,ortant di++erence "et)een the
methods that you can use to make con+iguration changes:
=ou can make changes ,ermanently "y using the db2set command0 )hich
re<uires an instance restart +or changes to "ecome e++ective$
=ou can make changes tem,orarily "y using the db2pdcfg command$
2hanges are e++ective until you restart the instance$
Troubleshooting DB2 servers page - of #$
!o retrieve in+ormation a"out database 'ana!er confi!uration settings0 you issue the
64T DATAA"4 MA.A64R CO.FI6URATIO. command or its a""reviated +orm0 the
64T DM CF6 command$
!he command out,ut includes all data"ase manager con+iguration values$ !he
+ollo)ing sam,le out,ut has "een a"ridged to sho) only values that are related to
,ro"lem determination:
Diagnostic error capture le+el ,DIA674847- 1 )
.otif/ 7e+el ,.OTIF974847- 1 )
Diagnostic data director/ path ,DIA6PATH- 1
2ho%e2db2inst22db2diag(
Alternate diagnostic data director/ path ,A7T:DIA6PATH- 1
2ho%e2db2inst22db2diag2
"i5e of rotating db2diag ; notif/ logs ,M- ,DIA6"I<4- 1 &
!he diagnostic error ca,ture error level 5indicated "y DIA6748476 determines the
level o+ detail that is recorded in the d"2diag log +ile0 and the noti+y level 5indicated "y
.OTIF974876 determines the level o+ detail that is recorded in the noti+ication log +ile$
!he diagnostic data directory ,ath 5indicated "y DIA6PATH6 and the alternate
diagnostic data directory ,ath 5indicated "y A7T:DIA6PATH6 determine )here
diagnostic data is stored$
Enless you are guided "y IB* Su,,ort0 do not change the de+ault settings o+
,arameters or registry varia"les that are s,eci+ic to ,ro"lem determination "ut are not
descri"ed in this ,a,er$ /or e&am,le0 do not change the settings o+ the diagle+el
con+iguration ,arameter or other D2FODC registry varia"le ,arameters$ I+ you set the
level o+ detail +or these ,arameters or registry varia"les too high0 very large amounts
o+ diagnostic data can "e generated in a very short time0 )hich in turn can negatively
a++ect the ,er+ormance o+ your data server$ I+ you set the level o+ detail too lo)0
insu++icient data to trou"leshoot a ,ro"lem might "e availa"le0 re<uiring +urther
diagnostic data collection "e+ore you can diagnose and resolve a ,ro"lem$
I+ you notice that the values +or con+iguration ,arameters such as diagle+el and
notif/le+el are not set to the de+aults and you are not trou"leshooting a ,ro"lem0
you can use the UPDAT4 DM CF6 command to reset them to their de+aults$ In the
+ollo)ing e&am,le0 )ith a"ridged out,ut0 the 64T DM CF6 command sho)s that
the diagle+el ,arameter is set to the highest value ,ossi"le:
db2 get db% cfg
Diagnostic error capture le+el ,DIA674847- 1 =
!he de+ault value is 40 )hich ca,tures all errors0 )arnings0 event messages0 and
administration noti+ication messages$ !o reset the value +or the diagle+el ,arameter
to the de+ault0 issue the +ollo)ing command:
db2 update db% cfg using DIA674847 )
Troubleshooting DB2 servers page $ of #$
/or more in+ormation a"out the diagle+el con+iguration ,ara%eter0 see
Adiaglevel 8 #iagnostic error ca,ture level con+iguration ,arameterB
,http#22publib$boulder$ib%$co%2infocenter2db2lu>2+(&r*2topic2co
%$ib%$db2$lu>$ad%in$config$doc2doc2r&&&&2?@$ht%l6$
Redirect diagnostic data away from the DB2 installation
path
!he diagpath con+iguration ,arameter s,eci+ies the +ully <uali+ied ,rimary ,ath +or
#B2 diagnostic data$ By de+ault0 the #B2 installation ,ath is used$ >hen you
con+igure your data server0 your +irst action is to ,oint the diagpath ,arameter to a
se,arate +ile system0 a)ay +rom the #B2 installation ,ath$
!he reason +or redirecting diagnostic data a)ay +rom the installation ,ath is that the
various ty,es o+ diagnostic data can use signi+icant amounts o+ s,ace in the +ile
system$ By de+ault0 the d"2diag and administration noti+ication logs0 core dum, +iles0
tra, +iles0 an error log0 a noti+ication +ile0 an alert log +ile0 and /1#2 ,ackages are all
)ritten to the installation ,ath$ !hese +iles can negatively a++ect data server
availa"ility i+ the data +ills u, all the s,ace in the +ile system$
Redirect diagnostic data a)ay +rom the #B2 installation ,ath "y using the +ollo)ing
command0 re,lacing /var/log/db2diag )ith a location on your system:
db2 update db% cfg using diagpath A/var/log/db2diagB
!he di++erent ty,es o+ diagnostic +iles are descri"ed in more detail in the section
Ad"2diag and administration noti+ication logsA$
For greater diagnostic logging resilience, configure an
alternate diagnostic path
!he alt:diagpath con+iguration ,arameter s,eci+ies an alternate ,ath +or storing
diagnostic in+ormation$ Set this ,arameter and the diagpath ,arameter to di++erent
,aths$ !he ,ath that you s,eci+y +or the alt:diagpath ,arameter is used only )hen
the data"ase manager +ails to )rite to the ,ath that you s,eci+ied +or the diagpath
,arameter and im,roves the likelihood that critical diagnostic in+ormation is not lost$
!o see the value +or the alt:diagpath con+iguration ,arameter0 issue the +ollo)ing
command:
db2 get db% cfg
Alternate diagnostic data director/ path ,A7T:DIA6PATH- 1
2ho%e2db2inst22db2diag2
!o change the value o+ the alt:diagpath con+iguration ,arameter ,ermanently0
enter the +ollo)ing command0 re,lacing /var/log/db2diag_alt )ith a location on
your system:
Troubleshooting DB2 servers page . of #$
db2 update db% cfg using A7T:DIA6PATH C/var/log/db2_diag_altC
/or more in+ormation a"out the alt:diagpath con+iguration ,arameter0 see
AaltFdiag,ath 8 Alternate diagnostic data directory ,ath con+iguration ,arameterB
5htt,:CC,ic$dhe$i"m$comCin+ocenterCd"2lu)Cv10r'Cto,icCcom$i"m$d"2$lu)$admin$con+i
g$docCdocCr00'7722$html6$
Redirect core file dumps and FOD data to a different
directory path
!)o ty,es o+ diagnostic data are created only in res,onse to s,eci+ic events on your
data server$ 1ne ty,e is the /1#2 ,ackage0 )hich stores diagnostic data as a ,ro"lem
is occurring0 and the other is the core +ile0 )hich ,reserves a memory image "e+ore a
,rocess is terminated$ Both /1#2 ,ackages and core +iles can re<uire signi+icant
amounts o+ disk s,ace$ By de+ault0 "oth are sent to the directory ,ath that you
s,eci+ied +or the diagpath con+iguration ,arameter or0 i+ you did not set a value +or
the diagpath ,arameter0 the #B2 installation ,ath$
!o reduce the amount o+ diagnostic data that is sent to the directory ,ath that you
s,eci+y +or the diagpath ,arameter0 redirect /1#2 ,ackages and the core +ile to a
di++erent directory ,ath$ =ou use the +ollo)ing D2FODC registry varia"le settings to
change the setting +or the /1#2 ,ackages and core +iles:
FODCPATH# S,eci+ies the a"solute ,ath name +or the /1#2 ,ackage$ !he si9e
o+ a /1#2 ,ackage de,ends on the ty,e o+ collection0 the o,erating system0
and the si9es o+ the +iles that are collected$ !he si9e can reach several
giga"ytes$
DUMPDIR# S,eci+ies the a"solute ,ath name o+ the directory +or core +ile
creation$ A core +ile can "ecome as large as the amount o+ ,hysical memory o+
the machine )here the core +ile is generated$ /or e&am,le0 a machine )ith -
DB o+ ,hysical memory re<uires at least - DB o+ s,ace in the directory ,ath
)here the core +ile )ill "e stored$ =ou can limit the si9e o+ the core +ile0 "ut
you should instead con+igure core +ile "ehavior to ,oint to a +ile system )ith
enough s,ace to avoid lost or truncated diagnostic data$
=ou use the db2set command to make changes to these registry varia"le settings$ /or
e&am,le0 to redirect "oth /1#2 ,ackages and core +iles to the 2t%p ,ath
,ermanently0 issue the +ollo)ing db2set command0 )hich takes e++ect a+ter you
restart the instance:
db2set D2FODC1CDUMPDIR12t%pC
=ou can also s,eci+y multi,le registry varia"le settings +or the db2set command0
se,arating the settings )ith a s,ace$ /or e&am,le0 to set "oth the FODCPATH and the
DUMPDIR registry varia"les at the same time0 you can issue the +ollo)ing command0
re,lacing the varia"le values )ith values that a,,ly to your o)n system:
Troubleshooting DB2 servers page /0 of #$
db2set D2FODC1CDUMPDIR1/home/testuser/mydumpdir
FODCPATH1/home/testuser/myfodcdirC
/or more in+ormation a"out the registry varia"les that are su,,orted "y the db2set
command0 see ADeneral registry varia"lesB
5htt,:CC,ic$dhe$i"m$comCin+ocenterCd"2lu)Cv10r'Cinde&$js,?
to,icGCcom$i"m$d"2$lu)$admin$regvars$docCdocCr000'-':$html6$
>hen you run the db2support command to collect environment data0 it searches a
num"er o+ ,aths +or /1#2 ,ackages0 including the ,ath that is indicated "y the
FODCPATH registry varia"le$ =ou can s,eci+y an additional e&isting directory +or the
db2support command to search +or /1#2 ,ackages "y using the Dfodcpath
command ,arameter$ /or more in+ormation a"out the ,arameters +or the
db2support command0 see Ad"2su,,ort 8 3ro"lem analysis and environment
collection tool commandB
5htt,:CC,u"li"$"oulder$i"m$comCin+ocenterCd"2lu)Cv10r'Cto,icCcom$i"m$d"2$lu)$admi
n$cmd$docCdocCr000'04$html6$
onfigure for rotating diagnostic and administration
notification logs
By de+ault0 a single #B2 diagnostic log +ile and a single noti+ication log +ile are used$
!hese log +iles gro) in si9e inde+initely0 )hich can "ecome ,ro"lematic i+ the +iles +ill
all the availa"le s,ace in the +ile system$ A "etter a,,roach is to use rotating
diagnostic and administration noti+ication logs0 con+igured to )ork +or your ,articular
system$
>hen you s,eci+y that you )ant to use rotating diagnostic and noti+ication logs0 a
series o+ rotating diagnostic log +iles and a series o+ rotating administration
noti+ication log +iles are used that +it into the si9e that you de+ined +or the diagsi5e
,arameter$ As log +iles +ill u,0 the oldest +iles are deleted0 and ne) +iles are created$
!o see the current diagnostic logging setting0 use the 64T DM CF6 command:
db2 get db% cfg
"i5e of rotating db2diag ; notif/ logs ,M- ,DIA6"I<4- 1 &
>hen the value o+ the diagsi5e ,arameter is the de+ault o+ &0 as sho)n in the
out,ut0 there is only one diagnostic log +ile0 called the d"2diag$log +ile$ !here is also
only one noti+ication log +ile0 )hich is named a+ter the instance and has a $n+y +ile
e&tension$ I+ con+igured as in the a"ove e&am,le0 these +iles gro) in si9e inde+initely$
!o con+igure +or rotating diagnostic and noti+ication logs0 set the diagsi5e
con+iguration ,arameter to a non9ero value$ !he value that you s,eci+y de,ends on
your system$ *ost im,ortantly0 you )ant to avoid losing in+ormation too <uickly
"ecause o+ ra,id +ile rotation 5the deletion o+ the oldest log +ile6 "e+ore you can archive
the old +iles$ Denerally0 set the diagsi5e ,arameter to at least '0 *B0 and make sure
that there is enough +ree s,ace in the directory ,ath that you s,eci+y +or the
Troubleshooting DB2 servers page // of #$
diagpath ,arameter$ 3rovide the same amount o+ s,ace in the directory ,ath that
you s,eci+y +or the alt:diagpath ,arameter$
/or e&am,le:
db2 update db% cfg using diagsi5e *&
A+ter you con+igure +or rotating diagnostic logs0 s,end some time o"serving the
rotation o+ these +iles$ !he #B2 diagnostic and noti+ication log +iles should "e rotated
"y the system every seven to 1 days$ I+ they are rotated out too o+ten0 increase the
value o+ the diagsi5e ,arameter$ I+ they are rotated too in+re<uently0 decrease the
value o+ the ,arameter$
/or more in+ormation a"out rotating diagnostic logs0 see A#B2 diagnostic 5d"2diag6
log +ilesB
5htt,:CC,ic$dhe$i"m$comCin+ocenterCd"2lu)Cv10r'Cto,icCcom$i"m$d"2$lu)$admin$tr"$d
ocCdocCc00'-2$html6$
Regularly archi!e and delete diagnostic data
2on+iguring your data server to use rotating diagnostic and noti+ication logs solves
the issue o+ log +iles that gro) inde+initely$ ;o)ever0 you must archive the contents o+
older log +iles "e+ore they are rotated out and deleted0 so that you can access them i+
you need them +or trou"leshooting$
!o archive the log +iles0 use the db2diag DA command$
1
!o avoid +illing u, the
diagnostic directory ,ath )ith the archived diagnostic data0 archive the diagnostic log
+iles to a di++erent +ile system or to "acku, storage$ A+ter archiving a +ile0 retain the
diagnostic data +or t)o to +our )eeks0 +or e&am,le0 "y "acking it u, to a storage
solution$ A+ter this retention ,eriod has ,assed0 you can automatically delete the
diagnostic data$ I+ you do not archive the log +iles to the intended location "y
s,eci+ying a directory ,ath0 make sure that you move the archived diagnostic data to a
di++erent location to +ree u, the s,ace in the diagnostic ,ath$
!he +ollo)ing e&am,le demonstrates ho) to archive$ !he directory listing sho)s
)hich d"2diag log +ile is in use$ =ou can tell that this is a rotating diagnostic log +ile
"ecause a numerical identi+ier 506 is ,art o+ the +ile name$
Dr>Dr>Dr>D ( testuser pd'db2 (*@@(E?? Fun (= (=#&? db2diag.0.log
No) issue the db2diag DA command and include a destination ,ath +or the
archived logs:
db2diag DA 2ho%e2testuser2archi+e2
db2diag# Mo+ing C2ho%e2testuser2s3llib2db2du%p2db2diag$logC
to C2ho%e2testuser2archi+e2db2diag$&$log:2&(2D&GD(=D(*$(2$=EC
1
On operating systems other than Windows operating systems, you can also archive all contents of the diagnostic path into an
archive path by using the db2support DA command. If you use this command to archive everything, make sure that the target
directory path is on a file system that has sufficient free space, equivalent to the amount of data that is in the diagnostic path.
Troubleshooting DB2 servers page /2 of #$
!he +ollo)ing directory listing sho)s the archived version o+ the log +ile:
ls Dl 2ho%e2testuser2archi+e2 Hgrep Di diag
Dr>Dr>Dr>D ( testuser pd'db2 (*@@(E?? Fun (= (=#&?
db2diag.0.log_2012-06-14-15.12.47
;aving a good ,olicy +or regularly archiving and deleting the diagnostic and
noti+ication logs takes care o+ diagnostic data that is regularly generated0 "ut it does
not take care o+ all ty,es o+ diagnostic data$ =ou might need to remove other ty,es o+
data a+ter you no longer need it$ /or e&am,le0 i+ you run the db2support command
to ,re,are +or u,loading diagnostic data to the IB* Su,,ort site0 you end u, )ith a
com,ressed archive that takes u, s,ace$ Remem"er to remove this archive a+ter your
,ro"lem re,ort is resolved$ =ou must also manually remover any additional data
dum, or /1#2 ,ackages that are generated$
"ro!ide enough free space to store diagnostic data
#iagnostic data can use su"stantial amounts o+ s,ace0 and you must ensure that
enough s,ace is availa"le to store this data$ ;o) much s,ace is needed de,ends on
the ty,e o+ diagnostic data$
Dia!nostic and notification lo!s+ /or "oth the ,rimary diagnostic ,ath that you
s,eci+y +or the diagpath ,arameter and the alternate diagnostic ,ath that you
s,eci+y +or the alt:diagpath ,arameter0 ,rovide at least 20H more +ree s,ace than
the value o+ the diagsi5e ,arameter$
*inimum s,ace +or diagnostic and noti+ication logs G value o+ the diagsi5e ,arameter & 1$2
1ore file du'ps and 23D1 data: /or +ree s,ace0 ,rovide at least t)ice the amount o+
,hysical memory o+ the machine0 ,lus 20H$ 3roviding this much s,ace ensures that
you can store at least t)o +ull core +ile dum,s or several /1#2 ,ackages )ithout
running the risk o+ truncated diagnostic data$
*inimum s,ace +or core +iles and /1#2 ,ackages G 2 & ,hysical memory & 1$2
/or e&am,le0 i+ a machine has - DB o+ ,hysical memory0 ,rovide a minimum o+ 1'
DB o+ s,ace +or core +iles and /1#2 ,ackages in the +ile system 5- DB & 2 & 1$2 G 1'
DB6$
Dia!nostic data tat (ou are uploadin! to te IBM 4upport site+ I+ you run the
db2support command to ,re,are to u,load diagnostic data to the IB* Su,,ort site0
make sure that enough s,ace is availa"le$ !he si9e o+ the d"2su,,ort$9i, +ile de,ends
on )hat ,arameters you s,eci+y +or the db2support command0 "ut the si9e o+ the
d"2su,,ort$9i, +ile can range +rom several mega"ytes to more than tens o+ giga"ytes$
I+ you do not s,eci+y an out,ut ,ath0 the resulting com,ressed archive is stored in the
directory ,ath that you s,eci+ied +or the diagpath ,arameter$
Troubleshooting DB2 servers page /% of #$
2irst steps for troublesootin!
!his section e&,lains the initial ste,s +or identi+ying and diagnosing a,,arent errors
and ,er+ormance ,ro"lems$ !he ,ur,ose o+ these trou"leshooting ste,s is to
determine the +ollo)ing in+ormation:
>hat in+ormation to collect +rom #B2 tools and logs and +rom o,erating
system tools and logs and )hat environmental in+ormation to collect
;o) to use this in+ormation in ,ro"lem investigation
!he +irst ste, is to characteri9e the issue "y asking the +ollo)ing <uestions:
>hat are the sym,toms?
>here is the ,ro"lem ha,,ening?
>hen does the ,ro"lem ha,,en?
Ender )hich conditions does the ,ro"lem ha,,en?
Is the ,ro"lem re,roduci"le?
=ou might also ask include )hether there )ere any recent changes that might "e
im,licated in the ,ro"lem$ Some ,ro"lems0 such as ,er+ormance ,ro"lems or
,ro"lems that occur only intermittently or only a+ter some time has ela,sed0 are much
more o,en ended and re<uire an iterative a,,roach to trou"leshooting$
A+ter you have characteri9ed the issue0 you can use a num"er o+ tools and logs$ In the
sections that +ollo)0 the main tools and diagnostic logs are descri"ed$ !hese tools
include /1#20 #B2 diagnostic and administration noti+ication logs0 #B2 tools0 and
o,erating system tools and logs$
!he #B2 monitoring in+rastructure can also ,rovide a )ealth o+ in+ormation a"out the
health and ,er+ormance o+ #B2 servers$ Esing ta"le +unctions0 you can access a "road
range o+ real8time o,erational data 5in8memory metrics6 a"out the current )orkload
and activities0 along )ith average res,onse times$ Esing event monitors0 you can
ca,ture detailed activity in+ormation and aggregate activity statistics +or historical
analysis$
=ou can also ,er+orm some o+ the trou"leshooting and monitoring tasks that are
covered in this ,a,er "y using the IB* In+oS,hereI 1,timJ and IB* #ata Studio
tools$ In+ormation a"out ho) to use all o+ these tools is outside the sco,e o+ this ,a,er0
"ut you might consider the +ollo)ing 1,tim tools:
IB* In+oS,here 1,tim 3er+ormance *anager 513*6: 3rovides easy8to8use
,er+ormance monitoring$ Alert mechanisms in+orm you o+ ,otential ,ro"lems$
;istorical tracking and aggregation o+ metrics ,rovide in+ormation a"out
system ,er+ormance trends$ /or more in+ormation0 see AIB* In+oS,here 1,tim
Troubleshooting DB2 servers page /# of #$
3er+ormance *anagerB
5htt,:CC,u"li"$"oulder$i"m$comCin+ocenterCd"2lu)Cv10r'Cto,icCcom$i"m$d"2$lu
)$idm$tools$docCdocCc00':04'$html6$
13* %&tended Insight: *easures end8to8end res,onse time to detect issues
outside your #B2 data server$ /or more in+ormation0 see AIB* In+oS,here
1,tim 3er+ormance *anager %&tended InsightB
5htt,:CC,u"li"$"oulder$i"m$comCin+ocenterCd"2lu)Cv10r'Cto,icCcom$i"m$d"2$lu
)$idm$tools$docCdocCc00':2-$html6$
1,tim Kuery >orkload !uner: 3er+orms dee,8dive analysis to identi+y and
solve many ty,es o+ <uery "ottlenecks$ /or more in+ormation0 see AIB*
In+oS,here 1,tim Kuery >orkload !uner +or #B2 +or Linu&0 ENIM0 and
>indo)sB
5htt,:CC,u"li"$"oulder$i"m$comCin+ocenterCd"2lu)Cv10r'Cto,icCcom$i"m$d"2$lu
)$idm$tools$docCdocCc00':044$html6$
First occurrence data capture #FOD$
/1#2 is the "uilt8in #B2 +acility +or detecting s,eci+ic +ailure scenarios$ /1#2
automatically ca,tures diagnostic data )hen a s,eci+ic error condition occurs0 and
you can also use it to manually ca,ture data +or a s,eci+ic ,ro"lem scenario that you
are o"serving$ /1#2 minimi9es the need to re,roduce ,ro"lem scenarios0 "ecause
diagnostic data is collected as the ,ro"lem +irst occurs$
By de+ault0 /1#2 invokes a d"2cos callout scri,t to collect diagnostic data$ !he
d"2cos callout scri,t is located in the "in directory in the #B2 installation ,ath 5in the
s<lli"C"in directory0 +or e&am,le6$ =ou can modi+y the d"2cos callout scri,t to
customi9e diagnostic data collection$
In #B2 @.$: /3' and later 5e&cluding @.$760 /1#2 su,,orts de+ining your o)n
threshold rules +or detecting a s,eci+ic ,ro"lem scenario and collecting diagnostic data
in res,onse$ =ou de+ine threshold rules "y s,eci+ying the Ddetect ,arameter +or the
db2fodc command$ !o detect a threshold condition and to trigger automatic
diagnostic data collection )hen the threshold condition is e&ceeded multi,le times0
create an /1#2 threshold rule such as the +ollo)ing one:
db2fodc I%e%or/ basic Ddetect freeCJ1(&C connectionsCK1(&&&C
sleepti%e1C)&C iteration1C(&C inter+al1C(&C triggercount1C=C
duration1C*C I%e%ber )
Cdb2fodcC# 7ist of acti+e databases# C"AMP74C
Cdb2fodcC# "tarting detection $$$
!he e++ect o+ this threshold rule is as +ollo)s$ 1n mem"er 40 detection is ,er+ormed to
check )hether the conditions that are s,eci+ied "y the threshold rules +reeNG10 and
connectionsOG1000 are met$ !hese threshold rules s,eci+y that the si9e o+ the +ree list
must "e 10 or less and the num"er o+ connections must "e 1000 or more +or the
threshold to "e detected$ /1#2 memory collection is triggered on mem"er 4 )hen the
num"er o+ times that the threshold conditions are detected reaches the value that is
Troubleshooting DB2 servers page /) of #$
s,eci+ied "y the trigger count$ In this e&am,le0 +or /1#2 collection to "e triggered0 the
trigger conditions must e&ist +or 0 seconds 5triggercount value o+ & inter+al
value o+ 10 seconds G 0 seconds 6$ !he detection ,rocess slee,s +or 40 seconds
"et)een each iteration0 and the total time that detection is ena"led is ' hours$ >hen
/1#2 memory collection is triggered0 a ne) directory )hose name is ,re+i&ed )ith
/1#2F*emoryF is created in the current diagnostic ,ath$
>hen the threshold conditions that are de+ined "y the Ddetect ,arameter are met
and /1#2 memory collection is triggered0 a message similar to the +ollo)ing one is
dis,layed in the command )indo):
Cdb2fodcC# = consecuti+e threshold hits are detected$
Cdb2fodcC# Triggering collection ($
*essages are also )ritten in the d"2diag$log +ile0 as sho)n in the +ollo)ing e&am,le$
!o get details a"out a triggered threshold0 you can use tools and scri,ts to scan the
d"2diag$log +ile and look +or the string pdFodcDetectAndRunCollection,
probe100$
2&()D&=D&2D(=$2=$*=$?*(**GD2=& I2)=(4E?& 74847# 4+ent
PID # ()2E? TID # =E(@@&?*@*?=E2 PROC # db2fodc
I."TA.C4# inst( .OD4 # &&)
FU.CTIO.# D2 UDL RA"2PD co%ponentL
pdFodcDetectAndRunCollection, probe100
CHA.64 #
Hostna%e# host(( Me%ber,s-# ) Iteration# (
Thresholds hit &# free,@-J1(& connections,(&(&-K1(&&&
Thresholds hit (# free,@-J1(& connections,(&&?-K1(&&&
Thresholds hit 2# free,?-J1(& connections,(&&(-K1(&&&
Thresholds hit )# free,(&-J1(& connections,(&&*-K1(&&&
=ou can also gather diagnostic ,er+ormance data selectively )ithout de+ining
threshold rules "y using the Dcpu0 Dconnections0 or D%e%or/ ,arameter$ !hese
,arameters are alternatives to collecting diagnostic data more e&tensively and
e&,ensively )ith the Iperf and Ihang ,arameters )hen you already have a
,reliminary indication o+ )here a ,ro"lem might "e occurring$
As o+ #B2 @.$: /30 /1#2 collects diagnostic data at the mem"er level to ,rovide
more granular access to diagnostic data$ *em"er8level /1#2 settings ,rovide greater
control than the instance8level or host8level settings that )ere su,,orted in ,revious
releases and +i& ,acks$
d%2diag and administration notification logs
#B2 diagnostic and administration noti+ication messages are "oth logged in the
d"2diag log +iles0 making the d"2diag log +iles one o+ the +irst ,laces to check i+ you
sus,ect a ,ro"lem$ Esing the db2diag command0 you can analy9e the d"2diag logs
to e&tract ,ro"lem8s,eci+ic in+ormation$ /or e&am,le0 you can e&tract error messages
Troubleshooting DB2 servers page /, of #$
that are related to health indicators on a s,eci+ic date "y using a command such as the
+ollo)ing one:
db2diag Dle+el 4rror Dti%e 2&(2D&GD() Dgi %essage#1health
2&(2D&GD()D(=$)?$()$@*&*&@D2=& 4(*@EE(&EAG** 74847# 4rror
PID # 2*@2()&@ TID # EE2 PROC # db2acd
I."TA.C4# test?? .OD4 # &&&
4DUID # EE2 4DU.AM4# db2acd
FU.CTIO.# D2 UDL Health MonitorL HealthIndicator##updateL probe#*&&
M4""A64 # ADM(&*&&4 Health indicator C7og Files/ste% Utili5ationC
,Cdb$log:fs:utilC- breached the CupperC alar% threshold of C@*
MC
>ith +alue C@? MC on CdatabaseC C%t%elo$"AMP74 C$ Calculation#
C,,os$fs:used2os$fs:total-N(&&-OC 1 C,,G*22G)@E=*G 2 E)&(====&)2
- N
(&&-C 1 C@? MC$ Histor/ ,Ti%esta%pL 8alueL For%ula-# C,-C
=ou can also use the db2diag command to merge multi,le log +iles$
=ou should monitor the administration noti+ication log to determine )hether any
administrative or maintenance activities re<uire manual intervention$ /or e&am,le0 i+
the directory )here transaction logs are ke,t is +ull0 this "locks ne) transactions +rom
"eing ,rocessed0 resulting in an a,,arent a,,lication hang$ In that situation0 the #B2
,rocess )rites the error A#*172-% to the administration noti+ication log0 as sho)n in
this e&am,le:
2&()D&=D(?D()$&)$)&$G??&=2 Instance#shenli .ode#&&&
PID#()&=*,db2s/sc &- TID#2?E&=GEE== Appid#none
data protection ser+ices s3lpgCall6IF7 Probe#(*=&
AD!1"26# D2 cannot continue because the dis! used for
logging is full$
!he error condition is also )ritten to the d"2diag log +ile0 as sho)n here:
2&()D&=D(?D()$&)$)&$G(@G*&D2=& 42G*G4)@( 74847#
4rror
PID # ()&=* TID # =G?(2G&)2E=G*G PTID #
()&=*
PROC # db2s/sc &
I."TA.C4# dbinst( .OD4 # &&& D #
"AMP74
HO"T.AM4# host(
4DUID # =? 4DU.AM4# db2loggr ,"AMP74- &
FU.CTIO.# D2 UDL data protection ser+icesL s3lpgCall6IF7L
probe#(*=&
!#$$A%# AD!1"26# D2 cannot continue because the dis! used
for logging is full$
Troubleshooting DB2 servers page /- of #$
DB2 tools
Several #B2 commands are used regularly as trou"leshooting tools$ !he commands
that are descri"ed in this section are the ones that you might use most o+ten$ !he
scenarios in this ,a,er ,rovide e&am,les o+ ho) to use these commands$
db2pd co''and+ !he db2pd command is a stand8alone command that you
can use to monitor and trou"leshoot a #B2 instance +rom its data"ase system
memory$ !he db2pd command collects in+ormation )ithout using any engine
resources or ac<uiring any latches$ Because the db2pd command does not
ac<uire any latches0 you can normally retrieve in+ormation that is changing
)hile the db2pd command is collecting in+ormation$ =ou can rerun the db2pd
command i+ results donPt a,,ear to "e accurate$
1n a slo) or non8res,onsive data"ase system0 the db2pd command is one o+
the most im,ortant tools that you can use$ !he db2pd command )orks on a
#B2 engine that might other)ise a,,ear to "e hung$
/or more in+ormation0 see Ad"2,d 8 *onitor and trou"leshoot #B2 data"ase
commandB
5htt,:CC,u"li"$"oulder$i"m$comCin+ocenterCd"2lu)Cv10r'Cto,icCcom$i"m$d"2$l
u)$admin$cmd$docCdocCr0011:2.$html6$
db2diag co''and+ !his command +ilters0 +ormats0 and archives the
diagnostic in+ormation in the d"2diag log +iles$ /iltering records in the d"2diag
log +iles can reduce the time that you re<uire to locate the records that you
need )hen trou"leshooting ,ro"lems$ /or e&am,le0 you can use ,rocess
in+ormation that you o"tain +rom the db2pd command to +ilter related
diagnostic in+ormation in the d"2diag log +iles "y using the db2diag
command$
=ou can archive rotating diagnostic log +iles to retain diagnostic data that
)ould other)ise "e eventually over)ritten and move the +iles to a di++erent
location +or storage$
/or more in+ormation0 see Ad"2diag 8 d"2diag logs analysis tool commandB
5htt,:CC,u"li"$"oulder$i"m$comCin+ocenterCd"2lu)Cv10r'Cto,icCcom$i"m$d"2$l
u)$admin$cmd$docCdocCr0011:27$html6$
db2top co''and 5ENIM and Linu& o,erating systems6: !his command uses
the sna,shot monitor to ,rovide a single8system vie) +or ,artitioned data"ase
environments$ !he db2top command can hel, you identi+y ,er+ormance
,ro"lems across the )hole data"ase system or in individual ,artitions$ =ou
can also use the db2top command on single8,artition environments$
1n large systems0 the db2top command can re<uire large amounts o+
memory "ecause the glo"al sna,shot "u++er can gro) large$ 3articularly +or
large ,artitioned environments0 you should care+ully choose the u,date
interval so as not to generate e&cessive tra++ic and overta& the +ast
Troubleshooting DB2 servers page /$ of #$
communication manager 5/2*6 "u++er shared memory$ =ou s,eci+y the
u,date interval "y using the Di command ,arameter$
/or more in+ormation0 see Ad"2to, 8 #B2 monitoring tool commandB
5htt,:CC,u"li"$"oulder$i"m$comCin+ocenterCd"2lu)Cv10r'Cto,icCcom$i"m$d"2$l
u)$admin$cmd$docCdocCr002'222$html6$
db2&upport co''and+ !his command archives all the diagnostic data +rom
the directory that you s,eci+y +or the diagpath con+iguration ,arameter into
a com,ressed +ile archive$ =ou ty,ically use the db2&upport command to
,re,are to u,load the data to the IB* Su,,ort site or to analy9e the diagnostic
data locally$ =ou can limit the amount o+ data that is collected to a s,eci+ic time
interval "y using the Dhistor/ or Dti%e ,arameter0 and you can decom,ress
the com,ressed +ile archive$
/or more in+ormation0 see Ad"2su,,ort 8 3ro"lem analysis and environment
collection tool commandB
5htt,:CC,u"li"$"oulder$i"m$comCin+ocenterCd"2lu)Cv10r'Cto,icCcom$i"m$d"2$l
u)$admin$cmd$docCdocCr000'04$html6$
db2cae' co''and+ !his command automates the ,rocess o+ creating and
running an activity event monitor to collect detailed diagnostic and runtime
in+ormation a"out one or more SKL statements$ !he db2cae% command
e&tracts and +ormats the in+ormation that is ca,tured "y the activity event
monitor$ !he db2support command includes a num"er o+ o,tions to collect
the in+ormation that is generated "y the db2cae% command$
/or more in+ormation0 see Ad"2caem 8 2a,ture activity event monitor data
tool commandB
5htt,:CC,u"li"$"oulder$i"m$comCin+ocenterCd"2lu)Cv10r'Cto,icCcom$i"m$d"2$l
u)$admin$cmd$docCdocCr00':272$html6$
=ou ty,ically run the +ollo)ing commands under the guidance o+ IB* Su,,ort
,ersonnel$ !he commands are use+ul to hel, you gather the re<uired in+ormation to
hel, su,,ort ,ersonnel assist you in diagnosing and correcting ,ro"lems$
db2trc co''and+ !his command collects traces through the #B2 trace
+acility$ !he ,rocess re<uires setting u, the trace +acility0 re,roducing the
error0 and collecting the data$ In @.$: /3 and later0 the db2trcon and
db2trcoff scri,ts sim,li+y using the db2trc command$ !he db2trc
command can have a signi+icant ,er+ormance im,act unless you limit )hat
you trace to s,eci+ic a,,lication I#s or to, %#Es$
/or more in+ormation0 see Ad"2trc 8 !race commandB
5htt,:CC,u"li"$"oulder$i"m$comCin+ocenterCd"2lu)Cv10r'Cto,icCcom$i"m$d"2$l
u)$admin$cmd$docCdocCr000202:$html6$
Troubleshooting DB2 servers page /. of #$
db2dart co''and+ !his command e&amines data"ases +or architectural
correctness and re,orts any errors$ =ou ty,ically use it +or the +ollo)ing
,ur,oses:
!o ins,ect an entire data"ase0 a ta"le s,ace0 or a ta"le +or correctness
!o re,air a data"ase
!o change a data"ase state
!o dum, +ormatted ta"le data +rom a data"ase0 +or e&am,le0 i+ a data"ase
)as corru,ted "ecause o+ a hard)are +ailure and a current "acku, is not
availa"le0
/or more in+ormation0 see Ad"2dart 8 #ata"ase analysis and re,orting tool
commandB
5htt,:CC,u"li"$"oulder$i"m$comCin+ocenterCd"2lu)Cv10r'Cto,icCcom$i"m$d"2$l
u)$admin$cmd$docCdocCr0004::$html6$
()$*#C+ co''and+ !his command e&amines a data"ase +or architectural
integrity0 checking the ,ages o+ the data"ase +or ,age consistency$ !he
I."P4CT command checks that the structures o+ ta"le o"jects and structures o+
ta"le s,aces are valid$ 2ross8o"ject validation conducts an online consistency
check "et)een the inde& and the data$ !he I."P4CT command can identi+y
logical corru,tion that the db2dart command might not detect$
/or more in+ormation0 see AINS3%2! commandB
5htt,:CC,u"li"$"oulder$i"m$comCin+ocenterCd"2lu)Cv10r'Cto,icCcom$i"m$d"2$l
u)$admin$cmd$docCdocCr0007-44$html6$
Operating system tools and log files
#iagnosing some ,ro"lems re<uires you to look at "oth #B2 diagnostic data and
o,erating system diagnostic data0 such as ,ro"lems related to memory0 s)a, +iles0
23E0 disk storage0 and other o,erating system resources$
1n ENIM and Linu& o,erating systems0 the +ollo)ing system tools are availa"le:
,'&tat co''and+ !his is a good overall tool +or sho)ing )hether ,rocessor
or memory "ottlenecks e&ist$ =ou can run it continuously$
io&tat co''and+ =ou can use this command to +ind out )hether any disk
IC1 "ottleneck e&ists and )hat the IC1 through,ut is$
p& co''and !his command ,rovides in+ormation a"out the ,rocesses that
use the most ,rocessor time$
&,'on co''and 5AIM o,erating systems only6: !his command ,rovides an
in8de,th analysis o+ memory usage$ It ,rovides more detailed in+ormation
Troubleshooting DB2 servers page 20 of #$
than )hat the +%stat and ps commands ,rovide0 although the ,er+ormance
im,act is slightly higher$
1n >indo)s o,erating systems0 you can use the +ollo)ing system tools and
commands:
db2pd -,'&tat co''and+ Ese the db2pd command )ith the D+%stat
,arameter to sho) )hether ,rocessor or memory "ottlenecks e&ist$
db2pd -io&tat co''and+ Ese the db2pd command )ith the Diostat
,arameter to sho) )hether any disk IC1 "ottleneck e&ists$
Tas6 Mana!er+ =ou can use this tool to sho) memory consum,tion and
,rocessor usage$ Alternatively0 you can use the db2pd Iedus command to
return similar in+ormation$
7rocess &xplorer+ !his tool is similar to the !ask *anager "ut ,rovides
additional in+ormation and +unctionality +or ,rocesses$
Windows 7erfor'ance Monitor+ =ou can use this tool to monitor system and
a,,lication ,er+ormance in real time and historically$ !he tool su,,orts data
collection that you can customi9e0 and you can de+ine automatic thresholds +or
alerts and actions to take in res,onse$ !he tool can also generate ,er+ormance
re,orts$
!he +ollo)ing o,erating system error logs0 )hich di++er "y o,erating system0 can
contain im,ortant diagnostic in+ormation:
8IX operatin! s(ste's+ CusrC"inCerr,t Qa
97:UX operatin! s(ste's+ CvarCadmCsyslogCsyslog$log and CusrCs"inCdmesg
Linux operatin! s(ste's+ CvarClogCmessages and CusrCs"inCdmesg
4olaris operatin! s(ste's+ CvarCadmCmessages and CusrCs"inCdmesg
Windows operatin! s(ste's+ %vent logs and the #r$ >atson log
Monitoring infrastructure
!he light)eight0 metrics8"ased monitoring in+rastructure that )as introduced in #B2
@ersion .$: and is availa"le in #B2 @ersion 10$' ,rovides ,ervasive and continuous
monitoring o+ "oth system and <uery ,er+ormance$ 2om,ared to the older sna,shot
and system monitor0 the monitoring in+rastructure ,rovides real8time in8memory
aggregation and accumulation o+ metrics )ithin the #B2 system at di++erent levels0
)ith a relatively lo) im,act on the system$
=ou can use the #B2 monitoring in+rastructure to gain an understanding o+ the ty,ical
)orkloads that your data server ,rocesses$ Enderstanding your ty,ical )orkloads is
Troubleshooting DB2 servers page 2/ of #$
an im,ortant ste, to)ard identi+ying aty,ical events that re<uire +urther investigation
or ,erha,s trou"leshooting$
!he three +ocus areas +or monitoring in+ormation are as +ollo)s:
System: 3rovides a com,lete ,ers,ective on the a,,lication )ork 5data"ase
re<uests6 "eing ,er+ormed "y the data"ase system0 collected through the >L*
in+rastructure$
Activities: 3rovides a ,ers,ective on )ork "eing done "y s,eci+ic SKL
statements0 collected through the ,ackage cache in+rastructure$
#ata o"jects: 3rovides a ,ers,ective on the im,act o+ a,,lication )ork on data
o"jects0 collected through the data storage in+rastructure$
!he di++erent kinds o+ in+ormation include the +ollo)ing ones:
!ime8s,ent metrics that identi+y ho) the time that is s,ent "reaks do)n into
time s,ent )aiting 5lock )ait time0 "u++er ,ool IC1 time0 and direct IC1 time6
and time s,ent doing ,rocessing$
In8memory metrics$ SKL ta"le +unctions ,rovide highly granular access to
these metrics$
Section e&,lains that sho) the access ,lan that )as e&ecuted +or a statement
)ithout the need +or recom,iling the statement$
Section actuals0 )hich can shorten the time to discover ,ro"lem areas in an
access ,lan )hen you com,are them to the estimated access ,lan values$ =ou
use the db2cae% command to get section actuals$
1onfi!urin! in:'e'or( 'etrics for troublesootin!
!he %on:re3:%etrics0 %on:act:%etrics0 and %on:ob0:%etrics con+iguration
,arameters control the collection o+ metrics$ By de+ault0 these metrics are set to A"4
+or ne) data"ases that you create in #B2 @.$: and later$ /or most situations0 the
in+ormation that is returned )ith the A"4 setting is su++icient$ >hen trou"leshooting0
you might need to set the %on:re3:%etrics and %on:act:%etrics con+iguration
,arameters to 4QT4.D4D to collect more granular0 time8"ased in+ormation$
Additionally0 there are other con+iguration ,arameters0 such as the %on:uo>:data
,arameter0 that are set to .O.4 "y de+ault$ !o collect some o+ the unit o+ )ork
in+ormation0 you might need to set the values +or these ,arameters$ !o see the current
values +or monitoring8related con+iguration ,arameters0 you can issue the +ollo)ing
command:
db2 get db cfg H grep MO.
Re3uest %etrics ,MO.:R4R:M4TRIC"- 1 A"4
Acti+it/ %etrics ,MO.:ACT:M4TRIC"- 1 .O.4
Ob0ect %etrics ,MO.:OF:M4TRIC"- 1
4QT4.D4D
Unit of >or! e+ents ,MO.:UOS:DATA- 1 .O.4
7oc! ti%eout e+ents ,MO.:7OCPTIM4OUT- 1 .O.4
Troubleshooting DB2 servers page 22 of #$
Deadloc! e+ents ,MO.:D4AD7OCP- 1
SITHOUT:HI"T
7oc! >ait e+ents ,MO.:7OCPSAIT- 1 .O.4
7oc! >ait e+ent threshold ,MO.:7S:THR4"H- 1
*&&&&&&
.u%ber of pac!age list entries ,MO.:PP67I"T:"<- 1 )2
/or more in+ormation a"out the availa"le con+iguration ,arameters and the metrics
that are returned +or each ,arameter setting0 see A2on+iguration ,arametersB 5
htt,:CC,u"li"$"oulder$i"m$comCin+ocenterCd"2lu)Cv10r'Cinde&$js,?
to,icGCcom$i"m$d"2$lu)$admin$con+ig$docCdocCc000'''$html6$
&"ent 'onitor infrastructure
%vent monitors in #B2 @ersion .$: and later use a highly scala"le0 light)eight
in+rastructure$ ;ighlights o+ this architecture are as +ollo)s:
It ma&imi9es ,arallelism "y using one thread ,er ,rocessor core$
It uses multi,le threads +or high volume event monitors$
It does not re<uire a dedicated thread +or lo) volume event monitors$
/ast8)riter threads ,rovide the in+rastructure +or su,,lementing event in+ormation
and out,ut +ormatting$ /ormatting o+ data and out,ut o+ data are ,er+ormed
asynchronously +rom the ,rocessing agents$ *ulti,le +ast8)riter threads make it much
less likely +or a "acklog to occur in the event monitor <ueues$
!o +urther reduce the system im,act o+ high8volume event monitors and to reduce
storage re<uirements0 #B2 @ersion .$: introduced a ne) event monitor target ty,e0
the un+ormatted event ta"le$ =ou can +ormat the un+ormatted event ta"le data as
+ollo)s:
Into M*L data out,ut "y using the 48MO.:FORMAT:U4:TO:QM7 ta"le
+unction$ An e&am,le +ollo)s:
"474CT e+%on$N FROM TA74 ,48MO.:FORMAT:U4:TO:QM7 ,.U77LFOR 4ACH
ROS OF ,select N fro% 7OCP order b/ 484.T:TIM4"TAMP >here
484.T:T9P4 1 T7OCPSAITT and 484.T:TIM4"TAMP K1 CURR4.T:TIM4"TAMP
$ * hours ---
Into relational ta"le data out,ut "y using the 48MO.:FORMAT:U4:TO:TA74
,rocedure$ An e&am,le +ollo)s:
call 48MO.:FORMAT:U4:TO:TA74" ,TUOSTL .U77L .U77L .U77L .U77L
TIMD2"AMP74QM7TL TR4CR4AT4:FORC4TL D(LT"474CT N FROM UOST7
ORD4R 9 e+ent:ti%esta%pT-
%vent monitors that use un+ormatted event ta"les to store their data include the
+ollo)ing ones:
Locking event monitor: !his ,rovides a consolidated mechanism +or ca,turing
locking data you can use +or in8de,th analysis$ !his event monitor re,laces the
Troubleshooting DB2 servers page 2% of #$
deadlock event monitor and lock timeout re,orts$ In+ormation a"out
deadlocks0 lock timeouts0 and lock )aits is returned$ =ou can control the
granularity o+ the in+ormation that is returned at the )orkload or at the
data"ase level$ A statement history is also availa"le$ !he +ollo)ing statement
creates a locking event monitor:
CR4AT4 484.T MO.ITOR M9:7OCP48MO. FOR 7OCPI.6 SRIT4 TO
U.FORMATT4D 484.T TA74 ,I. U"4R"PAC4(-
Enit o+ )ork event monitor: !his re,laces the transaction event monitor$ =ou
can control the granularity o+ the in+ormation that is returned at the )orkload
or at the data"ase level$ #ata that is ca,tured includes in8memory metrics$ !he
+ollo)ing statement creates a unit o+ )ork event monitor:
CR4AT4 484.T MO.ITOR UOS48MO. FOR U.IT OF SORP SRIT4 TO
U.FORMATT4D 484.T TA74 ,I. U"4R"PAC4(-
3ackage cache event monitor: !his ca,tures "oth dynamic and static SKL
entries )hen they are removed +rom the ,ackage cache$ %ntries "egin to "e
ca,tured as soon as the event monitor is activated$ =ou can control the
granularity o+ the in+ormation that is returned "y using the SH4R4 clause
)hen you de+ine the event monitor$ !he SH4R4 clause +or the event monitor
can include one or more o+ the +ollo)ing ,redicates 5AN#ed6: the num"er o+
e&ecutions0 overall aggregate e&ecution time0 and evicted entries )hose
metrics )ere u,dated since last "oundary time set using the
MO.:64T:PP6:CACH4:"TMT +unction$ =ou can also use the event monitor
de+inition to control the level o+ in+ormation that is ca,turedR o,tions include
A"4 and D4TAI74D$ !he +ollo)ing statement sho)s an e&am,le o+ a ,ackage
cache event monitor:
CR4AT4 484.T MO.ITOR M9:PP6CACH4:48MO. FOR PACPA64 CACH4 SRIT4
TO U.FORMATT4D 484.T TA74 ,I. U"4R"PAC4(-
Text reports for 'onitorin! data
Similar in ,ur,ose to the 64T ".AP"HOT command "ut availa"le through any SKL
inter+ace0 monitoring re,orts are generated "y the MO.R4PORT module$ !hese re,orts
are ,rovided in the +orm o+ result sets that are returned +rom stored ,rocedures$ !hese
re,orts can ,rovide hel,+ul monitoring in+ormation +or ,in,ointing ,ro"lem areas
)hen trou"leshooting$
2onsider the +ollo)ing e&am,le$ !he users o+ a ne) Java a,,lication are com,laining
a"out slo) ,er+ormance$ %&,ected results are not "eing returned )ithin the e&,ected
time +rame$
As a +irst ste,0 you can run the MO.R4PORT$D"UMMAR9 summary re,ort +or si&
minutes 5as an e&am,le6 to get a ,icture o+ the system and a,,lication ,er+ormance
metrics +or the data"ase )hile the a,,lication is running$ !o run the summary re,ort
+or si& minutes0 issue the +ollo)ing command:
db2 Acall %onreport$dbsu%%ar/,)G&-B
Troubleshooting DB2 servers page 2# of #$
A+ter the re,ort has +inished running0 you can look at the )ait times that are included
in the re,ort to see )hether there are any signi+icant delays that might indicate a
,ro"lem area$ In this e&am,le0 the summary re,ort includes the +ollo)ing
in+ormation:
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
DD Detailed brea!do>n of TOTA7:SAIT:TIM4 DD

M Total
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
TOTA7:SAIT:TIM4 (&& E((*=G

I2O >ait ti%e
POO7:R4AD:TIM4 & )2)
POO7:SRIT4:TIM4 & &
DIR4CT:R4AD:TIM4 & &
DIR4CT:SRIT4:TIM4 & &
7O6:DI"P:SAIT:TIM4 & ()=
7OCP:SAIT:TIM4 & &
A64.T:SAIT:TIM4 & &
.et>or! and FCM
+C*(*_$#)D_-A(+_+(!# .6 6"45"1
TCPIP:R4C8:SAIT:TIM4 = 2G*(&
IPC:"4.D:SAIT:TIM4 & &
IPC:R4C8:SAIT:TIM4 & &
FCM:"4.D:SAIT:TIM4 & &
FCM:R4C8:SAIT:TIM4 & &
S7M:RU4U4:TIM4:TOTA7 & &
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
!he out,ut sho)s that most o+ the )ait is ha,,ening )hile data is "eing sent "ack to
the client 5the value in the TCPIP:"4.D:SAIT:TIM4 ro) is high6$ /urther
investigation )ith J#B2 tracing reveals an im,ro,er override setting "y the
"tate%ent$setFetch"i5e method$ A+ter you set the +etch si9e 5Fetch"i5e
,arameter6 correctly0 a,,lication ,er+ormance not only im,roves "ut e&ceeds
e&,ectations$
1ther monitoring re,orts that you might +ind use+ul in trou"leshooting include the
MO.R4PORT$CO..4CTIO.0 the MO.R4PORT$CURR4.TAPP"0 the
MO.R4PORT$CURR4.T"R70 the MO.R4PORT$PP6CACH40 and the
MO.R4PORT$7OCPSAIT re,orts$
/ull coverage o+ the monitoring in+rastructure is "eyond the sco,e o+ this ,a,er0
although the monitoring in+rastructure is used in some o+ the scenarios$ /or more
in+ormation a"out #B2 monitoring0 see A#ata"ase monitoringB
5htt,:CC,u"li"$"oulder$i"m$comCin+ocenterCd"2lu)Cv10r'Cto,icCcom$i"m$d"2$lu)$admi
n$mon$docCdocCc0001147$html6 in the #B2 In+ormation 2enter$
Troubleshooting DB2 servers page 2) of #$
Mini'i;in! te i'pact of troublesootin!
A system already a++ected "y ,er+ormance issues might not "e a"le to tolerate much0 i+
any0 additional load0 even to collect "asic diagnostic data$ 2on+iguration as
recommended earlier does not reduce the ,er+ormance im,act o+ diagnostic data
collection0 "ut it does hel, you control )here diagnostic data is stored$ !here are
s,eci+ic things you can do to reduce the ,er+ormance im,act0 though$
ollect diagnostic data only where the pro%lem is occurring
!o avoid the overhead o+ unnecessary diagnostic data collection in large data"ase
environments0 several trou"leshooting commands su,,ort o,tions to s,eci+y )here to
collect data$ /or e&am,le0 you can collect data at the mem"er level instead o+ the host
level i+ you kno) that a ,ro"lem a++ects only a mem"er0 not the host machine$ !hese
o,tions s,eed u, data collection "y collecting only relevant in+ormation0 )hich
reduces the ,er+ormance im,act o+ data collection on the system and can shorten the
time that is re<uired to ,er+orm ,ro"lem determination$
/or e&am,le0 to collect /1#2 data during a ,er+ormance issue on mem"ers 100 110 120
140 and 1'0 issue the +ollo)ing command:
db2fodc Dperf D%e%ber (&D()L(*
ollect only the diagnostic data you need
=ou can use /1#2 collections introduced in @.$: /3' to collect diagnostic data only
+or the s,eci+ic ty,e o+ ,ro"lem that you encounter rather than collecting more
com,rehensive ,er+ormance data "y using the db2fodc command Iperf and Ihang
,arameters$ !he collections +or ,ro"lems that are related to ,rocessor usage0 memory0
and connections are light)eight and minimi9e the ,er+ormance im,act$ Ese the
collections "y s,eci+ying the Icpu0 D%e%or/0 and Iconnections ,arameters )ith
the db2fodc command $ /or more in+ormation0 see A2ollecting diagnostic data +or
s,eci+ic ,er+ormance ,ro"lemsB
5htt,:CC,ic$dhe$i"m$comCin+ocenterCd"2lu)Cv10r'Cto,icCcom$i"m$d"2$lu)$admin$tr"$d
ocCdocC,00'.'.0$html6$
&!oid ser!ice delays due to transferring diagnostic data
Some ,ro"lems might re<uire you to contact IB*Ps technical su,,ort0 and this means
that a service analyst must look at the diagnostic data that you collected on your
system$ !y,ically0 this means that you must use the db2support command and
u,load the diagnostic data$ I+ the volume o+ diagnostic data that you generated on
your system is large0 u,loading this volume o+ data to the IB* Su,,ort site +or
analysis can introduce additional delays "e+ore the ,ro"lem can "e diagnosed$ In @.$:
/3 and later0 you can use the db2support command )ith the Dun5ip ,arameter to
e&tract ,ackages locally rather than u,loading them to the IB* Su,,ort site $ In
addition0 some tools that service analysts use routinely are no) installed )hen you
install the #B2 so+t)are$
Troubleshooting DB2 servers page 2, of #$
!he +ollo)ing command e&tracts ,ackages +rom the +ile d"2su,,ort$9i,:
db2support Dun5ip db2support$5ip
$$$
4'tracting CD2DUMP2db2diag$logC $$$
4'tracting C"TMM2st%%$&$logC $$$
4'tracting C484.T"2db2e+ent$&$logC $$$
4'tracting C484.T"2db2optstats$&$logC $$$
4'tracting C484.T"2$db2optstats$rotate$lc!C $$$
4'tracting Cautopd$5ipC $$$
4'traction co%pletes$
4cenarios
!he +ollo)ing scenarios sho) ho) you can a,,ly some o+ the trou"leshooting "est
,ractices$ !he list o+ scenarios re,resents only a small sam,le o+ ,ossi"le scenarios
that you might encounter )hile trou"leshooting a #B2 server$ /or links to additional
scenarios and recommendations0 see the AAdditional in+ormationB section o+ this
,a,erSs )e" ,age in the #B2 "est ,ractices develo,er>orks community $
'cenario( )rou%leshooting high processor usage spi*es
Identif(in! te proble'
Esers o+ an a,,lication tell you that the res,onse times +or the a,,lication are
occasionally very slo)$ %very so o+ten0 )hen the a,,lication is )aiting on the
dedicated data"ase server0 the a,,lication slo)s to a cra)l +or a )hile$ =ou are asked
to investigate the cause$
=ou log on to the data"ase server to do a ,reliminary investigation$ =ou run some
o,erating system tools0 such as the top command or the >indo)s !ask *anager0 to
see )hat the current ,rocessor usage is$ As these tools are running0 you o"serve that
the ,rocessor usage occasionally s,ikes to a"ove .0H and remains high +or several
minutes "e+ore dro,,ing do)n to )hat seem to "e more ty,ical usage levels:
top D (*#*?#*= up E#2EL G usersL load a+erage# &$G(L &$(?L &$2?
Tas!s# (E( totalL ( runningL (E& sleepingL & stoppedL & 5o%bie
Cpu,s-# ?@$&MusL 2$&Ms/L &$&MniL &$&MidL &$&M>aL &$&MhiL &$&MsiL
&$&Mst
Me%# 2&G)*@@! totalL ((G)*EG! usedL ?&&&(2! freeL ?@*@&! buffers
">ap# &! totalL &! usedL &! freeL E=(EG=! cached
PID U"4R PR .I 8IRT R4" "HR " MCPU MM4M TIM4U COMMA.D
""06 db2in&t1 20 0 600' 1.2' 151' $ .6.6 ..5 2105.45 db2&/&c
E)&) db2inst( 2& & @=*@@ ()% ?% " ($) &$E &#(($&* gno%eD
ter%inal
G)?= root 2& & =?*@= 2*% EG)2 " ($& ($) &#2@$2& Q
E(G( db2inst( 2& & ((G% 2E% 2&% " &$E ($= &#22$== nautilus
$$$
Troubleshooting DB2 servers page 2- of #$
In this e&am,le0 you can see that the user ,rocessor usage is .7H0 )ith the "ulk o+ the
high ,rocessor usage0 .-$-H0 coming +rom the d"2sys ,rocess$ !his ,rocess is +or the
#B2 system controller 5on >indo)s o,erating systems0 look +or d"2sys$e&e6$ >ithout
+urther in+ormation0 you kno) only that the ,ro"lem occurs intermittently and lasts
several minutes$ !here doesnSt seem to "e a s,eci+ic time at )hich it occurs$
Dia!nosin! te cause
Intermittent ,er+ormance issues can "e challenging to diagnose "ecause o+ the
di++iculty in collecting the data that is necessary to trou"leshoot the cause$ !his ty,e o+
issue is unlikely to trigger automatic diagnostic data collection0 and "y the time that
you can manually collect data0 the issue might have ,assed0 leaving you )ith little or
no diagnostic data$ /or an intermittent ,er+ormance ,ro"lem0 your +irst ste, is to set
u, a method to ca,ture the diagnostic data that you need0 as the ,ro"lem is occurring$
!o ca,ture diagnostic data +or the intermittent ,rocessor usage s,ikes that you
o"served0 you de+ine an /1#2 threshold rule$ An /1#2 threshold rule is a tool that
)aits +or the resource conditions that you de+ine to occur$ In this case0 you have some
,reliminary in+ormation that ,oints to high ,rocessor usage$ I+ you donSt kno) )hat
system resources are constrained0 you can ada,t this scenario to collect data a"out
additional system resources0 such as connections and memory$ A+ter you set u, the
/1#2 threshold rule0 it triggers an /1#2 collection )henever the threshold
conditions that you s,eci+ied +or ,rocessor usage are e&ceeded0 +or as many
occurrences o+ the ,ro"lem as you s,eci+ied$
=ou de+ine a /1#2 threshold rule +or ,rocessor usage "y using the db2fodc
Ddetect command$ !he db2fodc Idetect command ,er+orms detection at regular
intervals +or as long as you tell it to0 i+ you s,eci+y a duration$ I+ you do not s,eci+y a
duration0 detection runs until the threshold conditions are triggered$ !he term
Threshold conditions in this conte&t re+ers to "oth a s,eci+ic +re<uency o+ the ,ro"lem
and a duration that must "e met "e+ore a collection is triggered$
!he +ollo)ing threshold rule is a good start +or detecting ,rocessor usage s,ikes:
V db2fodc Dcpu basic Ddetect us:s/CK1?&C sleepti%e1C)&C
iteration1C(&C inter+al1C(&C triggercount1C=C
duration1C*C
In this case0 the threshold rule is used to detect a com"ined user and system ,rocessor
usage rate that is higher than .0H$ /or /1#2 collection to "e triggered0 the threshold
conditions must e&ist +or 0 seconds +or each iteration 5triggercount value o+ &
inter+al value o+ 10 seconds G 0 seconds6$ !he detection ,rocess slee,s +or 40
seconds "et)een each iteration$ !he total time that detection is ena"led is ' hours or
10 iterations o+ success+ul detection in total0 )hichever comes +irst$ I+ /1#2 collection
is triggered0 a ne) directory )ith a name that is ,re+i&ed )ith /1#2F23EF is created
in the current diagnostic ,ath$ 1nly the lighter8)eight0 "asic collection o+ diagnostic
data is ,er+ormed$
Troubleshooting DB2 servers page 2$ of #$
No)0 assume that detection has "een running +or a )hile and the threshold conditions
+or ,rocessor usage s,ikes are met0 )hich means that /1#2 collection is triggered$
#uring /1#2 collection0 you might see out,ut that is similar to this e&am,le:
Cdb2fodcC# "tarting detection $$$
Cdb2fodcC# C=C consecuti+e thresholds hits are detected$
Cdb2fodcC# Triggering collection C(C$
"cript is running >ith follo>ing para%eters
CO774CTIO.:MOD4 # 7I6HT
CO774CTIO.:T9P4 # CPU
CO774CTIO.:DURATIO. # *
CO774CTIO.:IT4RATIO. # (&
DATAA"42M4M4R # Dalldbs
FODC:PATH #
2+ar2log2db2diag2db2du%p2FODC:Cpu:2&()D&ED(=D
(($&?$*($E)?=)&:&&&&
db2pd:options # Dagent Dapinfo Dacti+e Dtran
Dloc!s Dbufferpools Ddbptn%e% D%e%set D%e%pool Dsort Dfc% h>%
Dd/n
".AP"HOT # 2
"TACPTRAC4 # 2
TRAC47IMIT # 2&
".AP"HOT:T9P4 # A77
!his out,ut s,eci+ies )here to look +or the diagnostic data +or the ,articular /1#2
,ackage 5CvarClogCd"2diagCd"2dum,C/1#2F2,uF201480:81811$0.$'1$:4.40F00006$
!he out,ut also ,rovides a "it o+ in+ormation a"out )hat ty,es o+ data are collected$
A+ter the collection is +inished0 the db2fodc Idetect command either sto,s
running or e&ecutes the ne&t iteration a+ter slee,ing +or some time$ !he amount o+
time to slee, is determined "y the value o+ the sleepti%e o,tion i+ you s,eci+y it or 1
second i+ you do not s,eci+y a value$ >hether detection continues de,ends on o+ten
the threshold trigger conditions have "een met at this ,oint and ho) much time has
,assed 5that is0 the values o+ the iteration and duration o,tions that you used$
As de+ined in the ,revious e&am,le0 detection and /1#2 collection continue until
either all 10 iterations o+ detection are com,lete or the end o+ the threshold duration is
reached$
Data collected
!he diagnostic data that is collected is stored in an /1#2 ,ackage 5a directory ,ath6$
!his ,ath is created inside the ,ath you that s,eci+ied +or the FODCPATH ,arameter
)hen you con+igured your data server$ I+ you did not con+igure the ,aths )here
diagnostic data is stored ahead o+ time0 see the section ABe ,re,ared: con+igure your
data server ahead o+ timeA to learn a"out ho) to con+igure your system$
!he contents o+ the /1#2 ,ackage directory ,ath might look like the +ollo)ing
e&am,le:
Troubleshooting DB2 servers page 2. of #$
db2inst(Wdb2+(&#X2s3llib2db2du%p2FODC:Cpu:2&()D&ED(=D(*$(*$=&$G&=&2G:&&&&K ls Dl
total =&
Dr>'r>'r>' ( db2inst( db2grp( (*E& Ful (= (*#(? db2fodc$log
dr>'r>'r>' 2 db2inst( db2grp( =&?G Ful (= (E#(? D2PD:2&()D&ED(=$(*$(*$=&$&&&&&&
dr>'r>'r>' * db2inst( db2grp( =&?G Ful (= (E#(? FODC:Perf:2&()D&ED(=D
(*$(*$==$=E=E2*:&&&&
dr>'r>'r>' 2 db2inst( db2grp( =&?G Ful (= (E#(? iostat:2&()D&ED(=$(*$(*$=&$&&&&&&
dr>'r>'r>' 2 db2inst( db2grp( =&?G Ful (= (E#(? %e%or/:2&()D&ED(=$(*$(*$=&$&&&&&&
dr>'r>'r>' 2 db2inst( db2grp( =&?G Ful (= (E#(? netstat:2&()D&ED(=$(*$(*$=&$&&&&&&
dr>'r>'r>' 2 db2inst( db2grp( =&?G Ful (= (E#(? ps:2&()D&ED(=$(*$(*$=&$&&&&&&
dr>'r>'r>' 2 db2inst( db2grp( =&?G Ful (= (E#(? +%stat:2&()D&ED(=$(*$(*$=&$&&&&&&
Data anal(sis
An analysis o+ the out,ut o+ the +%stat command 5/1#2F2,uF201480:818
1'$1'$0$-002-F0000CvmstatF201480:81$1'$1'$0$000000Cd"2v10$vmstat$out6 sho)s a
com"ined user and system ,rocessor usage rate o+ 100H 0 )hich in turn triggered the
/1#2 collection:
procs DDDDDDDDDDD%e%or/DDDDDDDDDD DDDs>apDD DDDDDioDDDD Ds/ste%DD DDDDDcpuDDDDDD
r b s>pd free inact acti+e si so bi bo in cs u& &/ id >a st
( & & @2G=?G G*@)@& *&E?*2 & & & & =&= ))@ ." 2 & & &
( & & @2G=?G G*@)@& *&E?*2 & & & & )?= )*2 .. 1 & & &
( & & @2G=?G G*@)@& *&E?*2 & & & & )EE 2?@ 100 0 & & &
( & & @2G=?G G*@)@& *&E?*2 & & & & )?= ))& .. 1 & & &
( & & @2G=?G G*@)@& *&E?*2 & & & & )@2 )(( 100 0 & & &
2 & & @*(2&@ G*@)@& =@)G)2 & & & & )E) =&& .7 0 & & &
) & & @*(@(2 G*@=2& =@)&G= & & & & )@= G2& "7 10 & & &
2 & & @*2)(G G*@=G& =@2=G= & & & & )EG G&= .0 10 & & &
2 & & @*2*G= G*@*&= =@(E&& & & & & )@& *GE .1 . & & &
2 & & @=))@@ G*@**G =?(22= & & & 2@ )E( *@= .1 . & & &
2 & & @==G)G G*@G(G =?&&*2 & & & GEG =(( *E& ". 11 & & &
( & & @*2@)G G*@G)2 =@(=2= & & & & )@? =&? ." 2 & & &
2 & & @*2@)G G*@G)G =@(=2= & & & & )G( )(E ." 2 & & &
( & & @*2@)G G*@G)2 =@(=2@ & & & & )*= 2@2 100 0 & & &
( & & @*2@)G G*@G)2 =@(=2@ & & & @ )=* )(E 100 0 & & &
( & & @*2@)G G*@G)G =@(=2@ & & & & =&& )2= 100 0 & & &
No)0 investigate the cause o+ these high ,rocessor usage rates$ !o narro) do)n the
cause0 you must use "oth the stack trace log and the out,ut o+ the db2pd command$
!)o stack trace logs are created during the /1#2 collection$ !hese indicate the to,
#B2 consumers o+ ,rocessor resources0 in descending order over an interval o+ 40
seconds$ !he in+ormation that is given is +or the to, coordinator agents 5d"2agents60
)hich ,er+orm all data"ase re<uests on "ehal+ o+ the a,,lication$
;ere are the to, coordinator agents +rom the stack trace log 5/1#2F2,uF201480:818
1'$1'$0$-002-F0000C/1#2F3er+F201480:818
1'$1'$$::2'F0000CStack!race$00:'CStack!race$log$06:
List o+ 20 to, d"2agent 5d"2agT6 %#Es:
)#
':
'4
:4
::
Troubleshooting DB2 servers page %0 of #$
--
:2
:0
-:
-.
:'
100
104
.7
111
10-
10'
101
..
10
Look +or one or several coordinator agents that use signi+icantly more ,rocessor
resources than other agents use0 )hich gives you a clue +or the ne&t ste,$ In this
out,ut0 d"2agent %#E ' looks ,romising0 "ased on the amount o+ ,rocessor
resources that it used:
$$$
54 2"606557.2 .105 db2agent 1$A!*2#3 0 100.200000 1.170000
*E 2G=)=*=@)2 ?(?) db2agent ,idle- & 2$&=&&&& ($&E&&&&
*) 2@G=E&=)G@ ?(&? db2agent ,idle- & ($@@&&&& &$**&&&&
$$$
=ou can see that d"2agent %#E ' uses +ar more resources than the ne&t t)o
coordinator agents use$ !he other stack trace log 5)hich is not sho)n "ut looks very
similar to the ,revious sam,le out,ut6 also sho)s d"2agent %#E ' at the to, o+ the
list$
!he d"2agent num"er "y itsel+ is only an intermediate "it o+ in+ormation and is not
use+ul "y itsel+$ =ou can use this in+ormation to gain additional insight into the
a,,lication that the d"2agent is )orking +or0 though$ Look at the out,ut +older o+ the
db2pd command and see )hether you can correlate the d"2agent num"er )ith a
s,eci+ic a,,lication I# 5alternatively0 use the sna,shot out,ut +or the same ,ur,ose6$
Several db2pd command out,ut +iles are created during /1#20 each sho)ing similar
out,utR you need the in+ormation +rom only one o+ them$ Searching +or the
coordinator agent ' in one o+ the db2pd command out,ut +iles 5/1#2F2,uF201480:8
181'$1'$0$-002-F0000C#B23#F201480:81$1'$1'$0$0000006 yields the +ollo)ing
result:
Address AppHandl YnodDinde'Z Agent#D4(D Priorit/ T/pe "tate Client*id Userid Client.% $$$
&'()A)EF@& (?* Y&&&D&&(?*Z 54 & Coord InstDActi+e "724 db2inst( db2bp $$$
Note the ,rocess I#0 7:2$ !his ,rocess I# is another im,ortant clue and gets you
closer to determining the <uery statement that is the likely cul,rit "ehind the s,ikes in
,rocessor usage$ All you have to do no) is to search +or additional occurrences o+ the
Troubleshooting DB2 servers page %/ of #$
same ,rocess I# in the db2pd command out,ut$ !he ,rocess I# leads you to the client
a,,lication that originated the <uery and the <uery statement$
Application #
Address # &'()&&G&
AppHandl YnodDinde'Z # (?* Y&&&D&&(?*Z
TranHdl # )
Application *(D "724
Application .ode .a%e # db2+(&
IP Address# n2a
Connection "tart Ti%e # ,()E)@=&&??-"un Ful (= (*#(=#*? 2&()
Client User ID # db2inst(
"/ste% Auth ID # D2I."T(
Coordinator #D4 (D 54
Coordinator Me%ber # &
.u%ber of Agents # (
7oc!s ti%eout +alue # .ot"et
7oc!s 4scalation # .o
Sor!load ID # (
Sor!load Occurrence ID # (
Trusted Conte't # n2a
Connection Trust T/pe # non trusted
Role Inherited # n2a
Application "tatus # UOSD4'ecuting
Application .a%e # db2bp
Application ID # N7OCA7$db2inst($()&E(=22(=*?
ClientUserID # n2a
ClientSr!stn.a%e # n2a
ClientAppl)a'e C2* long5uer/.db2
ClientAccntng # n2a
CollectActData# .
CollectActPartition# C
"ectionActuals# .
7ist of acti+e state%ents #
NUOSDID # (
Acti+it/ ID # (
Pac!age "che%a # .U77ID [
Pac!age .a%e # "R7C2F2=
Pac!age 8ersion # [
"ection .u%ber # 2&(
"R7 T/pe # D/na%ic
Isolation # C"
"tate%ent T/pe # DM7L "elect ,bloc!able-
$tate'ent $#2#C+ C64)+173 FR6!
$8$CA+.+A92#$, $8$CA+.+A92#$, $8$CA+.+A92#$, $8$CA+.+A92#$,
$8$CA+.+A92#$
!he cul,rit is the a,,lication long<uery$d"20 )hich issues an e&,ensive S%L%2!
statement )henever it is run$ In this case0 you could also have used the coordinator
agent num"er ' to +ind the <uery statement directly0 )ithout looking u, the ,rocess
I#$ !his )orks here "ecause the coordinator %#E I# is the same as the d"2agent %#E
I# or coordinator agent$ !here are likely cases )here a direct look8u, using only the
Troubleshooting DB2 servers page %2 of #$
coordinator agent does not )ork0 so it is use+ul to "e a"le to correlate a coordinator
agent )ith a ,rocess I# in the db2pd command out,ut$
<esol"in! te proble'
In this case0 the users com,laining o+ an intermittent ,er+ormance slo)do)n are not
a++ected "y an issue )ith the a,,lication that they are using$ Instead0 they are a++ected
"y another <uery that is "eing run against the #B2 server +rom time to time$ !his
other <uery turns out to "e very e&,ensive "ecause it im,acts the res,onse times +or
everyone else$
;o) can you address the im,act o+ this other <uery? =ou might "e a"le to re)rite the
<uery so that it "ecomes less e&,ensive to run$ Alternatively0 you can use some
standard #B2 )orkload management ,ractices to run the <uery in a more controlled
+ashion0 )ithout using e&cessive system resources$
!here might "e cases )here it is not easy to determine the cause o+ a ,ro"lem$ %ven i+
you cannot resolve an issue yoursel+0 you can set u, an /1#2 threshold rule to collect
the re<uired diagnostic data +or di++erent system resources0 )hich you can then
,rovide to IB* Su,,ort +or +urther analysis$ IB* Su,,ort needs the diagnostic data to
"e a"le to hel,0 es,ecially )ith intermittent ,ro"lems$ I+ you have the diagnostic data
ready0 you can reduce the amount o+ time that it takes to diagnose the underlying
issue$
'cenario( )rou%leshooting sort o!erflows
Identif(in! te proble'
Esers re,ort a signi+icant increase in <uery run times0 )hich you are asked to
investigate$
A general ,er+ormance slo)do)n that is ,erceived "y users can have many di++erent
causes$ In this case0 the +ocus is on the ,er+ormance im,act that a large num"er o+ sort
over+lo)s0 also kno)n as sort s,ills0 can cause$ I+ you donSt kno) )hether sort
over+lo)s are a ,ro"lem on your system0 use this scenario to +ind out$
Kueries o+ten re<uire a sort o,eration$ A sort is ,er+ormed )hen no inde& e&ists that
)ould satis+y the sort order or )hen an inde& e&ists0 "ut sorting is determined to "e
more e++icient$ Sort over+lo)s occur )hen an inde& is so large that it cannot "e sorted
in the memory that is allocated +or the sort hea,$ #uring the sort over+lo)0 the data to
"e sorted is divided into several smaller sort runs and stored in a tem,orary ta"le
s,ace$ >hen sort over+lo)s that are stored in the tem,orary ta"le s,ace also re<uire
)riting to disk0 they can negatively im,act the ,er+ormance o+ your data server$
Dia!nosin! te cause
=ou can use the MO.:64T:PP6:CACH4:"TMT ta"le +unction to determine )hether a
sort over+lo) occurred$ !ry a <uery such as the +ollo)ing one0 )hich returns not only
in+ormation a"out sorts "ut also s,eci+ies the related SKL statements:
Troubleshooting DB2 servers page %% of #$
D2 "474CT TOTA7:"ORT"L "ORT:O84RF7OS"L TOTA7:"4CTIO.:"ORT:TIM4L
"U"TR,"TMT:T4QTL(L)&- A" "TMT:T4QT FROM TA74 ,MO.:64T:PP6:CACH4:"TMT,TDTL
.U77L .U77L D(--
TOTA7:"ORT" "ORT:O84RF7OS" TOTA7:"4CTIO.:"ORT:TIM4 "TMT:T4QT
DDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDD
& & & "474CT PO7IC9 FROM "9"TOO7"$P
& & & UPDAT4 "9"TOO7"$HMO.:ATM:I.FO
& & & "474CT CO7.AM4L T9P4.AM4 FROM
( & & "474CT IM$TIDL IM$FID FROM
& & & UPDAT4 "9"TOO7"$HMO.:ATM:I.FO
& & & "474CT "TAT":TIM4L I.D4QT9P4
& & & "474CT TA.AM4 FROM "9"CAT$TA
& & & UPDAT4 "9"TOO7"$HMO.:ATM:I.FO
& & & "474CT TRI6.AM4 FROM "9"CAT$
& & & 7OCP TA74 "9"TOO7"$HMO.:ATM:
& & & "474CT CR4AT4:TIM4 FROM "9"TO
( 1 1.75260 &elect c1 :ro' t1 order b/ c1
& & & "474CT COU.T, N - FROM "9"TOO
& & & CA77 "9"PROC$"9"I."TA77OF4CT
& & & UPDAT4 "9"TOO7"$HMO.:ATM:I.FO
2 & ( D474T4 FROM "9"TOO7"$HMO.:ATM
& & & UPDAT4 "9"TOO7"$HMO.:ATM:I.FO
& & & CA77 "9"I."TA77OF4CT", TD2A
& & & "474CT "TAT":7OCPL R4OR6:7OCP
& & & "474CT CR4ATORL .AM4L CTIM4 F
& & & UPDAT4 "9"TOO7"$HMO.:ATM:I.FO
& & & "4T CURR4.T 7OCP TIM4OUT *
& & & "474CT TA.AM4 FROM "9"CAT$TA
& & & "474CT total:sortsL sort:o+er
& & & "474CT COU.T, N - FROM "9"TOO
& & & "474CT ATM$"CH4MAL ATM$.AM4L
2G record,s- selected$
Look at the S1R!F1@%R/L1>S column in the out,utR any non9ero value indicates
that a <uery ,er+ormed a sort o,eration that s,illed over to disk$ In this e&am,le0 there
is one S%L%2! statement on ta"le !1 that resulted in a sort over+lo)$ Also0 the
!1!ALFS%2!I1NFS1R!F!I*% column +or the same statement indicates that the
section sort o,eration took a very long time$ A section in this conte&t is the com,iled
<uery ,lan that )as generated "y the SKL statement that )as issued$ !he unit o+
measurement is millisecondsR )hen converted0 the total sort time is around 42
minutes0 )hich makes the <uery long running$
!he MO.:64T:PP6:CACH4:"TMT ta"le +unction can return other use+ul columns that
you can include in your <uery$ /or e&am,le0 the NE*F%M%2E!I1NS column can
sho) ho) o+ten a statement )as e&ecuted$ It might also "e use+ul to return more o+
the statement te&t "y modi+ying the sam,le <uery$ /or more in+ormation a"out
metrics0 see A*1NFD%!F3(DF2A2;%FS!*! ta"le +unction 8 Det SKL statement
activity metrics in the ,ackage cacheB
Troubleshooting DB2 servers page %# of #$
5htt,:CC,ic$dhe$i"m$comCin+ocenterCd"2lu)Cv10r'Cto,icCcom$i"m$d"2$lu)$s<l$rtn$docCd
ocCr00''01:$html6$
!he db2pd command is also use+ul in this conte&t0 "ecause you can investigate sort
,er+ormance )hile a ,erceived <uery ,er+ormance ,ro"lem is ha,,ening$ =ou do not
have to )ait +or the monitoring in+ormation to "e u,dated "e+ore you can "egin
trou"leshooting the ,ro"lem$ !his +eature is hel,+ul i+ your <ueries are very long
running0 as in the e&am,le that is used here$
!o monitor sort ,er+ormance )ith the db2pd command0 use the ,arameters in the
+ollo)ing e&am,le:
db2pd Dd sa%ple Dsort Dapp Dd/n
!he +ollo)ing sam,le out,ut is a"ridged to highlight ho) you can determine )hether
sort over+lo)s are ha,,ening and )hat a,,lications and SKL statement are involved:
Database Me%ber & DD Database "AMP74 DD Acti+e DD Up 2 da/s &(#)G#*@ DD Date &=22222&() 2&#)=#2?
AppHandl YnodDinde'Z
.50 Y&&&D&&?*&Z
"ortC Ma'Ro>"i5e 4st.u%Ro>s 4stA+gRo>"i5e .u%"MP"orts )u'$pill&
&'4EE*&=)& ()) (&?(**)G@ ()G ( 1"7711
Pe/"pec
CHAR#(2@
"MP"ort[ "ortheapMe% .u%ufferedRo>s )u'$pilledRo;&
& (G ))( 62102041
Applications#
Address AppHandl YnodDinde'Z $$$ "tatus C-Anc<(D $$$ Appid
&'F(*F&&G& ?*& Y&&&D&&?*&Z $$$ UOSD4'ecuting 542 $$$ N7OCA7$D2$()&=2&2))==)
D/na%ic "R7 "tate%ents#
Address AnchID "t%tUID .u%4n+ $$$ .u%Ref .u%4'e +e=t
&'4E((F*G& *=2 2 ( $$$ 2 2 &elect c1 :ro' t1 order b/ c1
In this case0 the NumS,ills column indicates that there are sort over+lo)s$ !he
NumS,illedRo)s column sho)s that these sort over+lo)s resulted in )riting a large
num"er o+ ro)s to disk$
!o determine the a,,lication and SKL statement0 +irst use the A,,l;andl value0 .'0
in this e&am,le0 to locate the a,,lication in+ormation0 and note the value in the 28
AnchI# column0 here '2$ Ne&t0 locate the same 28AnchI# value in the out,ut +or
SKL statements to +ind the statement te&t$
<esol"in! te proble'
I+ the ratio o+ sort over+lo)s to total sorts is <uite high0 you might need to change the
value o+ the sortheap data"ase con+iguration ,arameter to make more memory
availa"le +or sort o,erations$ =ou might also consider ena"ling sel+8tuning memory to
allo) the #B2 system to adjust the sortheap memory automatically$ In #B2 @ersion
Troubleshooting DB2 servers page %) of #$
.$: and later0 the value o+ the sortheap ,arameter de+aults to AUTOMATIC0 )hich
ena"les automatic tuning o+ the memory that is re<uired +or sort o,erations$ I+ you are
using an earlier #B2 version or i+ you determine "y monitoring sort durations and
sort over+lo)s that the automatic tuning is not al)ays su++icient0 you can tune the
value o+ the sortheap ,arameter manually$ /or e&am,le0 sorting -0000 records
)ith a record length o+ 127 "ytes re<uires ':00770000 "ytes o+ insert8"u++er memory
5-0000 & 127 "ytes60 e<uivalent to 10000 +our (B ,ages$ Insert8"u++er memory makes
u, a,,ro&imately '0H o+ sort hea, memory internally0 so this +igure o+ 10000 +our (B
,ages must "e dou"led to arrive at an a,,ro&imate value +or sort hea, memory$
#ou"ling the num"er o+ ,ages gives a suggested sortheap ,arameter setting o+
270000 +our (B ,ages$
!he sortheap ,arameter has a relationshi, )ith the sheapthresh and
sheapthres:shr ,arameters$ I+ you modi+y the sortheap ,arameter setting0 also
modi+y the value o+ the sheapthresh ,arameter to maintain su++icient sort
,arallelism$ I+ you set "oth the sortheap and sheapthresh:shr ,arameters to
automatic0 the sel+8tuning memory manager 5S!**6 can kee, these settings in tune
)ith the current )orkload$ Alternatively0 you can generally ena"le sel+8tuning
memory +or the data"ase "y using the self:tuning:%e% data"ase con+iguration
,arameter0 )hich tunes several ,arameters a++ecting memory usage0 including +or sort
hea, memory$ In ,artitioned data"ase environments0 some additional considerations
a,,ly )hen you use sel+8tuning memory$ /or in+ormation0 see ASel+8tuning memory in
,artitioned data"ase environmentsB
5htt,:CC,ic$dhe$i"m$comCin+ocenterCd"2lu)Cv10r'Cto,icCcom$i"m$d"2$lu)$admin$,er+$
docCdocCc002471'$html6$
!here might "e no )ay to avoid sort over+lo)s "y increasing the value o+ the
sortheap ,arameter0 "ecause o+ the amount o+ memory that is re<uired$ ;o)ever0
you can still take some actions to minimi9e the im,act o+ sort over+lo)s$ %nsure that
the "u++er ,ool +or tem,orary ta"le s,aces is large enough to minimi9e the amount o+
disk IC1 that sort over+lo)s cause$ /urthermore0 to achieve IC1 ,arallelism during the
merging o+ sort runs0 you can de+ine tem,orary ta"le s,aces in multi,le containers0
each on a di++erent disk$ !o assess ho) )ell tem,orary data is used in the "u++er ,ool0
use the db2pd command )ith the Dbufferpool ,arameter$ A section o+ the out,ut
sho)s the cache hit ratios o+ tem,orary ta"le s,ace data and inde&es$
I+ more than one inde& is de+ined on a ta"le0 memory usage increases ,ro,ortionally0
"ecause the sort o,eration kee,s all inde& keys in memory$ !o kee, memory usage to
a minimum0 create only the inde&es that you need$
'cenario( )rou%leshooting loc*ing issues
!his scenario illustrates ho) to use IB* In+oS,here 1,tim 3er+ormance *anager
513*6 to investigate the causes o+ lock )ait ,ro"lems in a #B2 system$
Be+ore you can use 13* to its +ull ,otential to investigate and diagnose the cause o+
any ,er+ormance issue0 you must esta"lish and save a ,er+ormance "aseline in 13*$
13* can then com,are the "aseline )ith the current metrics and highlight ,otential
Troubleshooting DB2 servers page %, of #$
issues )hen certain metrics vary +rom the esta"lished "aseline$ /or more in+ormation
a"out setting a "aseline )ith 13*0 see the 13* overvie) dash"oard hel,$
Identif(in! te proble'
=ou o"serve that your #B2 system is not ,rocessing the e&,ected num"er o+
transactions0 even though there doesnSt a,,ear to "e a "ottleneck )ith the 23E or
disk$ A ty,ical sym,tom that you might encounter is a high average lock )ait time
that is accom,anied "y lo) 23E usage$
/igure 1 illustrates such an e&am,le$ !he highlighted section o+ the 13* >orkload
dash"oard ,rovides valua"le in+ormation to hel, you determine )hether the
,er+ormance ,ro"lem is due to lock )ait issues$
!he !ransaction !hrough,ut and Statement !hrough,ut gra,hs sho) that the system
)as originally )orking +ine )ith good through,ut$ At a s,eci+ic time0 the system
e&,erienced a very signi+icant dro, in through,ut0 almost to 9ero$ ;o)ever0 there
)as still some activity on the system during that ,eriod0 as sho)n "y the Ro)
!hrough,ut and Ro)s Read ,er /etched Ro) gra,hs$ !hey sho) that a high num"er
o+ ro)s )ere read even though almost no transactions )ere com,leted$ !hese
sym,toms suggest that one transaction might have held locks and "locked most other
transactions +rom e&ecuting$ !hese sym,toms might also indicate that a very large
<uery )as reading a very large num"er o+ ro)s$ !o diagnose the cause0 you must
investigate +urther$
=ou can con+igure 13* to monitor locking events and noti+y you )hen ,articular
events occur or e&ceed a threshold$ =ou can use the Locking con+iguration dialog0
Troubleshooting DB2 servers page %- of #$
Figure 1. Worload dash!oard showing liely loc wait "ro!lem
)hich is sho)n in /igure 20 to monitor certain conditions and control the level o+
detail that is collected +or lock events$
Dia!nosin! te cause
=ou can use the 13* 1vervie) dash"oard to hel, investigate the sym,toms more
closely$ /igure 4 highlights im,ortant in+ormation that hel,s you understand the
,ro"lem "etter$
Troubleshooting DB2 servers page %$ of #$
Figure #. $ou can use the %ocing configuration dialog to s"ecify locing alerts and
the amount of detail to collect for locing events
!he 1vervie) dash"oard dis,lays average values across the selected !ime Slider
interval0 +or a )ide range o+ metrics$ !)o are o+ ,articular interest in this scenario$
Both #B2 Lock >ait !ime and Average Lock >ait !ime metrics sho) signi+icant
increases +rom the "aseline$ !hese increases ,rovide +urther evidence o+ a locking
,ro"lem$
=ou can investigate the ,ro"lem +urther )ith the Locking dash"oard$ =ou can access
the Locking dash"oard +rom the 1vervie) dash"oard$ !he Locking dash"oard
,rovides detailed in+ormation +or all locking events on a se,arate ta"$ /igure
highlights a section o+ the Locking dash"oard sho)ing locking events$
Troubleshooting DB2 servers page %. of #$
DB2 Lock Wait time, %
Avg Lock Wait time, ms
Figure &. 'verview dash!oard highlighting large loc wait times
!he current Maxi'u' Bloc6 Ti'e and Loc6 Wait 8lerts 'etrics are o+ ,articular
interest )hen you com,are them )ith the "aseline0 )hich is sho)n in the dashed
"order "o& in the +igure$ 13* has recorded a much higher num"er o+ lock )ait alerts
than is ty,ical$ !he "aseline sho)s 9ero lock )ait alerts and a very short ma&imum
"lock time$
=ou can investigate the ,ro"lem +urther "y selecting an individual lock timeout event
+rom the dash"oard and dis,laying detailed in+ormation +or it$ /igure ' highlights
some o+ the key in+ormation that is dis,layed a"out the lock event a+ter you dou"le8
click to select it$
Troubleshooting DB2 servers page #0 of #$
Baseline comparison
Figure (. %ocing dash!oard highlighting large values of %oc Wait Alerts and Bloc
Time
!he lock timeout event details sho) in+ormation +or "oth ,artici,ants in the lock
event: the o)ner o+ the lock and its re<uestor$ !o see the lock o)nerSs SKL statement0
select the Statements details$ !he in+ormation includes the com,lete te&t o+ the SKL
statement0 the details o+ the lock that is "eing held0 and the isolation level o+ the
transaction$ In this e&am,le0 the isolation level is re,eata"le read 5RR6$ !his is the
likely cause o+ the multi,le lock timeout events and slo)do)n in transaction
through,ut$ A transaction using the RR isolation level can hold a large num"er o+
locks during a unit o+ )ork 5E1>6 and cause many other transactions to "e "locked0
)aiting +or locks to "e released$
<esol"in! te proble'
1ne ,ossi"le resolution +or the ,ro"lem that is descri"ed in this scenario is to
determine )hether you can modi+y the a,,lication so that it does not use the RR
isolation level$ !his change )ould reduce the num"er o+ locks that the a,,lication
holds at one time0 there+ore reducing the likelihood o+ lock contention )ith other
a,,lications$
A #BA ty,ically returns locking ,ro"lems to the a,,lication team +or resolution0 "ut
o+ten0 it is a challenge is to ,rove that the delays are caused "y locking issues$ No) the
#BA can have ,roo+ o+ the locking delays and details +or "oth the "locking a,,lication
and the "locked a,,lication and their SKL statements$
An alternative )ay to diagnose a locking issue is "ased on alerts0 i+ you con+igure
13* to monitor locking events and re,ort alerts0 as sho)n in /igure 2$ !he health
summary indicates data"ases that have active alerts$ /or e&am,le0 in /igure -0 you can
Troubleshooting DB2 servers page #/ of #$
Figure ). Analy*ing the details of a loc timeout alert
see that a locking alert )as issued +or the data"ase named A(e,lerB 5highlighted )ith
a red "o&6$
>hen you click the red icon0 it o,ens the locking alert list +or the data"ase$ I+ you
select a s,eci+ic alert0 you can see the details o+ the alert0 as sho)n in the +ollo)ing
e&am,le$
!o drill do)n into the +ull details +or this event0 click 8nal(;e $
Troubleshooting DB2 servers page #2 of #$
Figure +. %ocing alert list and details
Figure ,. '-M health summary shows locing alerts
!he event details )indo) ,rovides more in+ormation a"out each ,artici,ant in the
event$ !his in+ormation hel,s you ,in,oint the cause o+ the ,ro"lem and determine a
course o+ action to correct the issue$
Troubleshooting DB2 servers page #% of #$
Figure .. %ocing event details
Best practices
Be ,re,ared "y con+iguring your data server "e+ore ,ro"lems
might occur:
Redirect diagnostic data a)ay +rom the #B2 installation
,ath$
/or greater resilience0 con+igure an alternate diagnostic
,ath$
Redirect core +ile dum,s and /1#2 data to a di++erent
directory$
2on+igure +or rotating diagnostic and administration
noti+ication logs$
Set u, a ,rocess to archive and delete diagnostic data
regularly$
3rovide enough +ree s,ace to store diagnostic data$
*inimi9e the im,act o+ diagnostic data collection:
2ollect data as locally to the ,ro"lem as ,ossi"le$
2ollect only the diagnostic data that you need$
Ese the monitoring in+rastructure to gain an understanding o+ the
ty,ical )orkloads that your data server ,rocesses0 so that you can
tell )hen aty,ical events are ha,,ening$
Ese the scenarios in this ,a,er as e&am,les +or ho) you can use
the various trou"leshooting and monitoring tools$
(no) )hen you are +aced )ith a ,ro"lem that you cannot resolve
on your o)n and there+ore must engage )ith IB* +or technical
su,,ort$
Troubleshooting DB2 servers page ## of #$
2onclusion
!he trend is to)ard more granular diagnostic data collection0 es,ecially on large
data"ase systems$ 1n these systems0 end8to8end diagnostic data collection is o+ten too
e&,ensive and carries the risk o+ a++ecting data"ase availa"ility$ !o lessen the im,act
o+ diagnostic data collection0 #B2 tools such as /1#2 can collect data a"out ongoing
,ro"lems locally and selectively$
!o ,re,are +or a ,ossi"le ,ro"lem0 it is im,ortant that you con+igure your data server
"e+ore ,ro"lems might occur$ !rou"leshooting is much easier i+ the data is readily
availa"le and the im,act to the ,er+ormance o+ the system is )ell controlled$
3art o+ "eing ,re,ared also means kno)ing the ty,ical )orkloads that your data
server ,rocesses$ I+ you understand your ty,ical )orkloads0 you are much more likely
to kno) <uickly )hen an aty,ical event might "e ha,,ening that re<uires +urther
investigation$
Troubleshooting DB2 servers page #) of #$
2urter readin!
IB* In+ormation *anagement Best 3ractices )e"site
5)))$i"m$comCdevelo,er)orksCd"2C"est,ractices 6
Tuning and Monitoring /ata!ase System -erformance
5)))$i"m$comCdevelo,er)orksCdataC"est,racticesCsystem,er+ormance 6
IB* #B2 @ersion 10$ ' In+ormation 2enter
5,ic$dhe$i"m$comCin+ocenterCd"2lu)Cv10r'Cinde&$js,6
1ontributors
#mitri A"rashkevich0 #B2 #evelo,ment0 IB*
Al"ert Drankin0 Senior !echnical Sta++ *em"er0 IB*
Bill 3eck III0 #B2 Advanced Su,,ort0 IB*
*aira !ei&eira #e *elo0 L2 !echnical Su,,ort0 IB*
1ontactin! IBM
!o ,rovide +eed"ack a"out this ,a,er0 )rite to d"2docsUca$i"m$com $
!o contact IB* in your country or region0 see the IB* #irectory o+ >orld)ide
2ontacts at htt,:CC)))$i"m$comC,lanet)ide $
!o learn more a"out IB* In+ormation *anagement ,roducts0 see to
htt,:CC)))$i"m$comCso+t)areCdataC $
Troubleshooting DB2 servers page #, of #$
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other
countries. Consult your local IBM representative for information on the products and
services currently available in your area. Any reference to an IBM product, proram, or
service is not intended to state or imply that only that IBM product, proram, or service
may be used. Any functionally e!uivalent product, proram, or service that does not
infrine any IBM intellectual property riht may be used instead. "owever, it is the user#s
responsibility to evaluate and verify the operation of any non$IBM product, proram, or
service.
IBM may have patents or pendin patent applications coverin sub%ect matter
described in this document. The furnishin of this document does not rant you any
license to these patents. &ou can send license in!uiries, in writin, to'
IBM (irector of )icensin
IBM Corporation
*orth Castle (rive
Armon+, *& ,-.-/$,01.
U.S.A.
The following paragraph does not apply to the United Kingdom or any other country
where such provisions are inconsistent with local law: I*T23*ATI4*A) BUSI*2SS
MAC"I*2S C43543ATI4* 5346I(2S T"IS 5UB)ICATI4* 7AS IS7 8IT"4UT 8A33A*T& 49
A*& :I*(, 2IT"23 2;532SS 43 IM5)I2(, I*C)U(I*<, BUT *4T )IMIT2( T4, T"2 IM5)I2(
8A33A*TI2S 49 *4*$I*93I*<2M2*T, M23C"A*TABI)IT& 43 9IT*2SS 943 A 5A3TICU)A3
5U354S2. Some states do not allow disclaimer of e=press or implied warranties in certain
transactions, therefore, this statement may not apply to you.
8ithout limitin the above disclaimers, IBM provides no representations or warranties
reardin the accuracy, reliability or serviceability of any information or
recommendations provided in this publication, or with respect to any results that may be
obtained by the use of the information or observance of any recommendations provided
herein. The information contained in this document has not been submitted to any
formal IBM test and is distributed AS IS. The use of this information or the implementation
of any recommendations or techni!ues herein is a customer responsibility and depends
on the customer>s ability to evaluate and interate them into the customer>s operational
environment. 8hile each item may have been reviewed by IBM for accuracy in a
specific situation, there is no uarantee that the same or similar results will be obtained
elsewhere. 5eople attemptin to adapt these techni!ues to their own environment do so
at their own ris+.
This document and the information contained herein may be used solely in connection
with the IBM products discussed in this document.
This information could include technical inaccuracies or typoraphical errors. Chanes
are periodically made to the information herein? these chanes will be incorporated in
new editions of the publication. IBM may ma+e improvements and@or chanes in the
productAsB and@or the proramAsB described in this publication at any time without
notice.
Any references in this information to non$IBM websites are provided for convenience only
and do not in any manner serve as an endorsement of those websites. The materials at
those websites are not part of the materials for this IBM product and use of those websites
is at your own ris+.
IBM may use or distribute any of the information you supply in any way it believes
appropriate without incurrin any obliation to you.
Any performance data contained herein was determined in a controlled environment.
Therefore, the results obtained in other operatin environments may vary sinificantly.
Some measurements may have been made on development$level systems and there is
no uarantee that these measurements will be the same on enerally available systems.
Troubleshooting DB2 servers page #- of #$
9urthermore, some measurements may have been estimated throuh e=trapolation.
Actual results may vary. Users of this document should verify the applicable data for their
specific environment.
Information concernin non$IBM products was obtained from the suppliers of those
products, their published announcements or other publicly available sources. IBM has not
tested those products and cannot confirm the accuracy of performance, compatibility
or any other claims related to non$IBM products. Cuestions on the capabilities of non$IBM
products should be addressed to the suppliers of those products.
All statements reardin IBM#s future direction or intent are sub%ect to chane or
withdrawal without notice, and represent oals and ob%ectives only.
This information contains e=amples of data and reports used in daily business operations.
To illustrate them as completely as possible, the e=amples include the names of
individuals, companies, brands, and products. All of these names are fictitious and any
similarity to the names and addresses used by an actual business enterprise is entirely
coincidental.
C45&3I<"T )IC2*S2' D Copyriht IBM Corporation E-,F, E-,/. All 3ihts 3eserved.
This information contains sample application prorams in source lanuae, which
illustrate prorammin techni!ues on various operatin platforms. &ou may copy,
modify, and distribute these sample prorams in any form without payment to IBM, for
the purposes of developin, usin, mar+etin or distributin application prorams
conformin to the application prorammin interface for the operatin platform for
which the sample prorams are written. These e=amples have not been thorouhly
tested under all conditions. IBM, therefore, cannot uarantee or imply reliability,
serviceability, or function of these prorams.
Trademarks
IBM, the IBM loo, and ibm.com are trademar+s or reistered trademar+s of International
Business Machines Corporation in the United States, other countries, or both. If these and
other IBM trademar+ed terms are mar+ed on their first occurrence in this information with
a trademar+ symbol AG or HB, these symbols indicate U.S. reistered or common law
trademar+s owned by IBM at the time this information was published. Such trademar+s
may also be reistered or common law trademar+s in other countries. A current list of IBM
trademar+s is available on the 8eb at ICopyriht and trademar+ informationJ at
www.ibm.com@leal@copytrade.shtml
8indows is a trademar+ of Microsoft Corporation in the United States, other countries, or
both.
U*I; is a reistered trademar+ of The 4pen <roup in the United States and other
countries.
)inu= is a reistered trademar+ of )inus Torvalds in the United States, other countries, or
both.
4ther company, product, or service names may be trademar+s or service mar+s of
others.
Troubleshooting DB2 servers page #$ of #$

You might also like