NSD Analysis

OK, so here goes on a lengthy post for the admins amongst us on NSD Analysis. An area I feel I know quite well …. however as you’ll remember from my last post this is based on publicly available information. NSD (or Notes System Diagnostic) is the name given to software bundled in Domino to give a snapshot of what the Domino system is doing. The tool produces text files with enormous ammounts of information and can be run manually or will run automatically during a crash. Interested ? …… (it may seem a bit dull but the information here could save you a lot of time!). Memory Without memory nothing works. In the operating system memory is divided into Kernel and User Address Space. The Kernel looks after the OS, hardware drivers and communications with the hardware. The User Address Space is where our applications run, and this includes Domino. So when Domino crashes it happens in the User Address Space ….. this means that Domino won’t directly cause a blue screen of death!!!! However Domino may, for example, be attempting to read or write to an area of disk which could cause a kernel memory error remembering that the kernel must deal with the disk. As we know, Domino is made up of a number of individual processes (nserver, nreplica, nrouter etc). Each of these processes all do their own little bit to make up the server. Each process is doing a number of tasks at any one time, these are called threads. And within each thread there is a specific set of individual actions. These are called function calls. Crashes (in a paragraph!) Yeah, Yeah, Blah, Blah so what does this mean for me? Well Domino is a fairly complex beast. Now and again a thread will try and use some memory which is reserved or in use by another process. This is a memory exception and at this point everything will go a bit messy. A panic will be recorded in the thread, Domino will freeze everything that it is doing and the nsd task will run. This will gain a snapshot of the environment immediately before the crash storing the important results in \data\ibm_technical_support on either the client or server which has crashed. Hangs Hangs are a different beast and I’ll not do much here to go into them. To recognise a hang the easiest way to look for the hang is to examine in real time the memory allocated to each Domino or Notes process. Remember from earlier each process is made up of a number of threads. New threads are constantly starting and old threads are constantly stopping. So for each process you should see the

 nsd sees nservice is running but also sees it isn’t running under nserver so it says for example found 22 matched 21.memory allocated to that process changing with time. To troubleshoot a hang you need to run the nsd process 3 times at 5 minute intervals and then engage IBM Support to help resolve the issue. The file Well the file produced will always have a common naming convention: type_plaftorm_systemname_date@time.log Each platform has its own format and for sake of making this post a record length I’m going to stick to Wintel. PC=60197cf3. what files the process was using. stksize=2424 . Best option once you have looked through the process table is to search for “FATAL”. On the thread which resulted in the crash the name will change from thread to “fatal thread”. SP=0743ebd0. so if you want to run it without killing the PIDs check out the extensions by running nsd -?. a list of each Domino instance and a list of the processes running therein. You’ll see nsd as a child of whichever process crashed. OK so this section helps gather a picture of what was running on the server Below this section there is a dump of each process. A hung process may or may not prevent user sessions on a server.. Running NSD Manually The important thing to remember when running NSD is that by default it will kill all the processes …. A server can recover from a hang. nservice is the parent of nserver. Fatal Thread So once you’ve searched for fatal you may see something like this: ### FATAL THREAD 39/83 [ nSERVER:07c0: 2764] ### FP=0743f548. If Y is one less than X then providing you are running Domino as a Windows service don’t worry! nsd examines all processes from nserver down. Sections in Wintel NSD’s First section is the header with system information. If you don’t see changes in the memory allocated to a thread then you possibly have a hang. Next we have the process table. The position of “[” denotes parent and child status indents denoting children. matched Y. Processes nsd recognises as Domino are indicated with “->”. You’ll see some strange entries for Found X processes. Normally advice is to run nsd -detach as that leaves the processes alone after running. From here you can see all processes on the server. and then importantly a dump of each thread.

4b64b5bc.0.496dace8) @[ 2] 0×600018a4 nnotes.Exception code: c0000005 (ACCESS_VIOLATION) ############################################################ @[ 1] 0×60197cf3 nnotes._ThreadWrapper@4+212 (0.10ec334._ReadEntries@68+2860 (4c5440e8. Panic OSBBlockAddr _CollectionNavigate _ReadEntries _NIFReadEntriesExt _ServerReadEntries _DbServer _WorkThreadTask _Scheduler _ThreadWrapper Finding the fault .0.563fb10.  Lines 1 through 11 are the function calls that the thread performed.563fb10) @[ 9] 0×100016cb nserverl.800f._NIFReadEntriesExt@72+351 (0._DbServer@8+2284 (41b0383. 9._Scheduler@4+763 (0.1) @[ 3] 0×6000bd92 nnotes.0. 10._ServerReadEntries@8+1424 (0. So what does each line mean? The @ sign means nsd has annotated it and recognised the thread as a domino function.0) [11] 0×77e887dd KERNEL32._WorkThreadTask@8+1576 (4711d68. The bit before the full stop is the class (nnotes.10ec334) @[10] 0×6011e5e4 nnotes._Panic@4+483 (7430016.0.23696f8) @[ 8] 0×1002b8c8 nserverl.8d0c0035. Well the header block is fairly obvious. The 0x lines I assume to be the address (but someone may correct me). then crashed and 1 shows the panic.4cfb8dba.3. …… 2.743fc74.1) @[ 5] 0×600b9f6f nnotes.496dae76. The bit after the full stop and before the @ sign is the function call. nserverl etc).800f. 8. For wintel 1 is the event closes to the crash and 11 the event furthest from the crash. _Scheduler.4cfb8dba.cb740064.4ae46dd6) @[ 7] 0×100191fc nserverl.f. So here the function calls are _ThreadWrapper.GetModuleFileNameA+465 So what does all this mean._OSBBlockAddr@8+148 (1153f38.743f608. So the server performed 11.0) @[ 4] 0×600626cc nnotes.563fb10.2000000._CollectionNavigate@24+610 (0. These are in sequence.1) @[ 6] 0×10032d40 nserverl. _WorkThreadTask etc. Call Stack Listing all these functions we get the call stack.

 you need to compare your call stack with any call stacks listed in the knowledgebase.ibm. and also add * to the beginning and end of the call stack. i. then 10 and 9 then …..Well now is the point where you have some data which can be searched in the IBM Knowledge Base My only tip here is to ensure a good search strip off the leading underscore.ibm. .com/abstracts/tips0053. They are experts. Take 2 items from the call stack list and search for them in turn.com/support/docview. If you need to examine an NSD I’d recommend before you start you log the call with IBM.ibm.e. Reference material • • • • UNIX NSD Analysis : http://www-1.wss?rs=0&uid=swg27003396 Nash!com presentation : http://www.com/developerworks/lotus/library/domino-server-crashes/ REMEMBER IBM ARE THE EXPERTS As a footnote please remember that locked in a deep vault somewhere in IBM is a team of people who spend all day every day looking at NSD’s (and even having fun). While you are waiting for them to get back to you have a go at resolving the NSD yourself.redbooks.de/nshweb/pages/lotusphere. search for 11 and 10.nashcom.htm Redbooks technote : http://www.html?Open LDD Article : http://www-128.

Sign up to vote on this title
UsefulNot useful