You are on page 1of 11

System Monitoring.

Unix Systems administrator performs the job of systems monitoring in between other
jobs. In order to be well informed about system all the time, the very first thing is control
of system. In all cases if you work in a team make sure that you tell your teammate about
anything you do in system which is going to affect performance. System utilities and
commands which reports about and perform vital statistics collections are:

Commands

• cron jobs.
• uptime tells you about load average.
• ps tells about processes running in the systems.
• iostat tells about input and output statistics.
• sar, vmstat informs about memory, cpu, disk utilization reports. (in case of bsd
systems vmstat)
• netstat is used for networking reports.
• other self written scripts.

SAR reports
usage sar -function
Sar functions

• -a reports usage of file access system calls.


• -b reports buffer cache usage and hit rate.
• -c reports system calls.
• -d report block device activity.
• -g report paging activity (V.4 only)
• -k report kernel memory allocation activity. (v.4 only)
• -m report message and sephamore activity.
• -p report paging activity.
• -q report average queue length waiting for CPU.
• -r report unused memory pages and disk blocks.
• -u report CPU utilization.
• -v report status of system tables.
• -w report swapping and paging activity.
• -x report RFS operation (V.4 only)
• -y reports terminal activity.
• -A reports all data (same as sar -udqbwcayvmprgkxSDC)
• -C reports RFS buffer caching overhead.
• -Db report buffer cache usage for RFS and local activity.
• -Dc report system calls separately for RFS and local activity.
• -Du report CPU utilization by RFS and local activity.
• -S reports RFS server and request queue status.
This following is example of sar -q (system run queue, jobs that are ready to be run at
any time)
dxi4 dxi4 3.2.0 V2.1.6 i386 01/02/98

00:00:01 runq-sz %runocc swpq-sz %swpocc


01:00:01 2.0 81
02:00:01 2.0 81
03:00:01 2.0 82
04:00:00 2.0 82
05:00:01 2.2 84
06:00:01 2.1 82
07:00:03 2.7 99
08:00:01 2.2 75
08:20:01 2.3 80
08:40:01 4.1 98
09:00:03 5.2 100
09:20:00 4.4 98
09:40:01 3.4 95
10:00:01 3.5 97
10:20:01 3.6 97
10:40:00 3.5 93

Average 2.6 86

back to top of page


Back to main page

Cpu workload Management.


To find out the system load average the most common used command is
uptime.

3:04pm up 4 day(s), 10:37, 16 users, load average: 0.11, 0.10,


0.12
Here it tells us that current time is 3:04 pm, system is up four days since 10:37, there are
16 users, five minutes ago load average was 0.11, ten minutes ago was 0.10 and fifteen
minutes ago was 0.12.

PS command.
ps -el | more
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY
TIME COMD
3 S 0 0 0 0 0 20 0 0 482f60 ?
3:11 schet
3 S 0 2 0 0 0 20 30810 4096 4900e4 ?
10:08 vhanh
1 S 0 159 1 0 28 20 3081c 56 fcbc5c ?
0:00 strer
1 S 0 1408 1 0 28 20 30b21 104 fcbb14 syscon
0:00 gettl
1 S 0 167 1 0 40 20 30893 52 47da28 ?
0:00 volid
5 S 0 171 1 0 26 20 30835 528 48bc30 ?
0:17 vold
41 S 0 321 1 0 26 20 308b1 108 1b66b4 ?
0:00 ktlod
1 S 0 517 1 0 40 20 30904 100 4831ec ?
0:00 x25nd
1 S 0 560 1 0 40 20 30941 92 4831ec ?
0:00 netd
1 S 0 459 1 0 26 20 3094e 72 48bc30 ?
0:02 trlod

what it means?

• F is a set of flags which indicates process's current state.


o 0 means process has terminated.
o 1 means it is system process and is always in memory.
o 2 process is being traced by its parent process.
o 4 process is being traced by its parent and has been stopped.
o 8 process cannot be awakened by a signal.
o 10 process is loaded into memory.
o 20 process cannot be swapped.
• S letter indicating process state.
• O process is currently running on the processor.
• S The process is sleeping or waiting for an event to complete.
• R process is runable.
• I process is idle.
• Z process is zombie, it has terminated but in process state.
• T process has stopped because parent is tracing it.
• X means process is waiting for more memory.
• UID is the identification of user who created this process.
• PID is process identification number.
• PPID parent process identification number.
• C is time spent by CPU on this process.
• PRI is processe's scheduling priority: lower number indicate higher priority.
• NI process's nice number, also for scheduling priority.
• SZ is amount of virtual memory in pages required by process.
• TTY is the terminal that started the process.
• TIME is total time spent by CPU on process so far.
• COMD is the actual command.

To reduce the workload on system make sure you are only running the daemons you
need, disabling the unnecessary daemons can significantly improve the load on system.

Daemons

accounting : The command accton enables system-wide accounting services, if you are
not using accounting on your system then disabling accton command will increase
productivity of your system.
biod: Daemon allows the system to access filesystems via NFS.

comsat: program that prints "you have new mail" on your screen.

lpd or lpsched: printer daemons.

mounted: this daemon listens for remote mount requests.

nfsd: This Daemon services NFS requests from remote systems.

nntpd : This Daemon supports USENET network news services.

quotas: /etc/rc quotaon enables disk quota checking.

rlogind: services rlogin and rsh commands.

routed: This daemon routes network packets destined for other networks. If your local
network has only one gateway to the outside world you can disable routed. Then make
sure that /etc/rc.local has a line such as route add default gateway 1

rwhod: This daemon provides information about users on other systems, rwho and
ruptime commands use this daemon.

sendmail: This provides e-mail services both internally and externally (between other
systems). Sendmail uses lot of memory.

talkd: This Daemon supports the talk command.

timed: This daemon attempts to synchronize system clocks across a network. If you are
working across different systems then this is necessary.

ypbind: This daemon lets the system look up information in NIS database. Atleast one
system must be running ypserv before you can run ypbind.

ypserv: This daemon makes a system act as an NIS server, the system that can send
information about the NIS database to other systems on network. It must not be running if
system isn't a NIS server.
back to top of page
Back to main page

Memory Management.

Memory management and performance issues are probably the most important in any
system. If there is not engouh physical memory installed in system, Most Unix systems
use swap and paging techniques to make sure that adequate memory is available at all
time. When you set up a file system for the first time, a separate swap area is also
required to be setup. Swapping and Paging can significantly reduce the performace of
your system. Difference between swapping and paging is that, Swapping occurs when
whole process is transferred to disk, while paging is when some part of process is
transferred to disk while rest is still in physical memory. There are two utilities to
monitor memory, called vmstat (for bsd, etc), sar( for system V, etc). Page-ins and page-
outs are pages moved in and out from physical memory to disk, swap-ins and swap-
outs are processes moved in and out of disk.

Estimating memory for a system (System V)

First find out the size of your application using size command as follows, here
application name is a.out (binary executable). In case there are many applications, add
their sizes and multiply by the number of times they will run on a system. size command
will show text, data and stack in the executable file.
size a.out

50884 + 9544 + 27604 = 88032

Then using file command will tell you about this file as followed.

file a.out
a.out: iAPX 386 executable not stripped

If this file had been pure executable then we had a need to account for the text and
data/stack segments separately. For each invocation, we need to allow 88032 bytes * each
invocation 2000(or 1024 KB) . Perform same computation for each program to be run on
system.

Vmstat command is used in System V and BSD systems, it informs about virtual
memory.
Syntax of vmstat is :
vmstat interval number
so for example, if I want vmstat to show me memory every five seconds and for 3 times I
will use
vmstat 5 3

procs memory page disk faults


cpu
r b w swap free re mf pi po fr de sr s0 s1 s2 s6 in sy cs
us sy id
0 0 0 133136 13400 0 66 14 0 0 0 0 1 7 1 0 194 650 124
3 3 94
0 0 0 3557440 835184 0 22 1 0 0 0 0 0 1 0 0 164 155 151
0 2 98
0 0 0 3557440 835184 0 77 0 0 0 0 0 9 7 1 0 213 287 79
0 9 90
What it means
• Procs (Processors)
o r means number of runnable processes during the interval. It does not
include processes waiting or in I/O.
o b the number of processes that are blocked waiting for I/O or other event.
o w is the number of processes that are swapped out. A non zero value
means system was swapping.
• memory
o swap is used swap space.
o free is availabe free swap space.
• Page
o pi means 1 kb pages per second that have been paged in.
o po means 1 kb pages per second that have been paged out.
o de means that anticipated short term memory shortfall.
• disk
o s0, s1, s2, s6 number of disk operations per second on each disk drive.
• faults
o in
o sy
• cpu
o cs is .
o us is percentage of total cpu time spent in user state.
o sy is total cpu time spent in system state.
o id is percentage of total CPU time that CPU is idle.

sar -r shows how much free memory is available.


dxi4 dxi4 3.2.0 V2.1.6 i386 01/02/98

00:00:01 freemem freeswp


01:00:01 44571 1374268
02:00:01 43930 1367068
03:00:01 43224 1368316
04:00:00 43500 1374012
05:00:01 43831 1376500
06:00:01 44128 1373268
07:00:03 43349 1354548
08:00:01 43488 1364372
08:20:01 43078 1352500
08:40:01 42526 1350828
09:00:03 42261 1342652
09:20:00 42487 1349292
09:40:01 41296 1338516
10:00:01 41484 1331284
10:20:01 41368 1335316
10:40:00 40969 1326292
11:00:01 41208 1336340
11:20:01 41236 1347508
11:40:01 41439 1340748
12:00:00 40581 1332708
12:20:01 41221 1339964
12:40:01 41431 1338068
Average 42964 1350653

freemem columns reports how much free memory is available in pages. System starts
paging when free memory drops below the configuration constant called GPSGLO,
paging then continues until the number of free blocks passes GPGSHI. GPGSLO and
GPGSHI default to 25 and 40 blocks. To directly look at swapping statistics use sar -w

dxifour:/u0/ssb>sar -w

dxi4 dxi4 3.2.0 V2.1.6 i386 01/02/98

00:00:01 swpin/s bswin/s swpot/s bswot/s pswch/s


01:00:01 0.00 0.0 0.00 0.0 122
02:00:01 0.00 0.0 0.00 0.0 119
03:00:01 0.00 0.0 0.00 0.0 117
04:00:00 0.00 0.0 0.00 0.0 123
05:00:01 0.00 0.0 0.00 0.0 157
06:00:01 0.00 0.0 0.00 0.0 134
07:00:03 0.00 0.0 0.00 0.0 135
08:00:01 0.00 0.0 0.00 0.0 151
08:20:01 0.00 0.1 0.00 0.1 183
08:40:01 0.00 0.0 0.00 0.0 408
09:00:03 0.00 0.0 0.00 0.0 510
09:20:00 0.00 0.0 0.00 0.0 390
09:40:01 0.00 0.0 0.00 0.0 359
10:00:01 0.00 0.1 0.00 0.1 377
10:20:01 0.00 0.0 0.00 0.0 451
10:40:00 0.00 0.1 0.00 0.1 440
11:00:01 0.00 0.0 0.00 0.0 478
11:20:01 0.00 0.0 0.00 0.0 317
11:40:01 0.00 0.0 0.00 0.0 247
12:00:00 0.00 0.0 0.00 0.0 249
12:20:01 0.00 0.0 0.00 0.0 176
12:40:01 0.00 0.0 0.00 0.0 350

Average 0.00 0.0 0.00 0.0 213


what it means

• swpin/s is the average number of swapping transfers into memory per second
during interval.
• bswin/s is the average number of 512-byte blocks transferred into memory per
second.
• swpot/s is the average number of swap-outs per second during interval, it should
be zero.
• bswot/s is the average number of 512-byte blocks swapped out of memory per
second, it should be zero.
• pswch/s is number of processes switches per second during the interval.

sar -p shows paging activity.


# sar -p (system V.3)

dxi4 dxi4 3.2.0 V2.1.6 i386 01/02/98


00:00:01 vflt/s pflt/s pgfil/s rclm/s
01:00:01 854.76 0.00 0.00 0.00
02:00:01 867.77 0.00 0.00 0.00
03:00:01 863.78 0.00 0.00 0.00
04:00:00 839.07 0.00 0.00 0.00
05:00:01 822.51 0.00 0.01 0.00
06:00:01 833.93 0.00 0.00 0.00
07:00:03 881.90 0.00 0.00 0.00
08:00:01 348.59 0.00 0.01 0.00
08:20:01 684.73 0.00 0.00 0.00
08:40:01 2727.91 0.00 0.03 0.00
09:00:03 1018.15 0.00 0.03 0.00
09:20:00 989.03 0.00 0.05 0.00
09:40:01 838.29 0.00 0.00 0.00
10:00:01 965.68 0.00 0.04 0.00
10:20:01 948.19 0.00 0.00 0.00
10:40:00 908.90 0.00 0.00 0.00
11:00:01 781.14 0.00 0.02 0.00
11:20:01 843.14 0.00 0.00 0.00
11:40:01 875.43 0.00 0.04 0.00
12:00:00 872.25 0.00 0.01 0.00
12:20:01 802.01 0.00 0.01 0.00
12:40:01 1192.86 0.00 0.01 0.00

Average 878.55 0.00 0.01 0.00


what it means?

• vflt/s is the number of address translation faults per second.


• pftl/s is number of page faults per second. Page fault occur when a process
references an invalid page.
• pgfil/s number of address translation faults that were satisfied by a page-in.
• rclm/s is the average number of "page reclaims" per second. This is number of
pages that have been reclaimed and added to the free list by page-out activity.

back to top of page


Back to main page

Disk Management.

Disk performance issues.

Per-process disk throughput. Speed at which single process and read or write to a disk.
You can measure time taken by executing a cp or mv command.

Total disk throughput: Total speed at which all the processes together can transfer to
and from disks.

Disk storage efficiency: Efficiency of disk storage.


A rule of a thumb is that a disk spend about 80% of time searching, while only 20%
reading and writing data back and forth. That means that if a seek time of a disk is lower
the better is the performance of disk.

Other things when considering to buy a disk like rotational speed (most disks are 3600
RPM), Raw transfer rate is not that important as seek time. Disk capacity depends upon
the user need. System V divides each disk into many partitions. You should always stick
to your disk tools when partitioning or defining disk.

The problem of fragmentation can be kept to minimum by regularly running fsck


command. To do this you will need to unmount the disk and then run fsck diskname
respond to the questions that come up on console. You can also reorganize free list by
using fsck -S option, so that the fragmentation could be kept to minimum, as when system
is booting up this free list is used to see fragmentation.

iostat is BSD tool which is also found on many system V systems. This tool prints a
number of I/O statistics that will help you to balance disk load. Syntax is
iostat drives interval count
drives are disk drives, interval is in seconds, count is number of samples.
i.e. in following example, all disks with interval of 2 seconds show:
iostat 2i

device bps sps msps

c1t6d0 0 0.0 1.0


c1t3d0 0 0.0 1.0
c1t4d0 0 0.0 1.0
c1t5d0 0 0.0 1.0
c0t1d0 0 0.0 1.0
c0t1d1 0 0.0 1.0
c0t1d2 0 0.0 1.0
c0t1d3 0 0.0 1.0
c0t1d4 0 0.0 1.0
c0t1d5 0 0.0 1.0
c0t1d6 0 0.0 1.0
c0t1d7 0 0.0 1.0

What it means?

• bps : average number of kilobytes per second during previous interval for disk.
• msps: Average number of milliseconds per seek.

For disk cache statistics you could use sar -b command.


back to top of page
Back to main page

Network Management.
Network performance can reduce the response time and frustrate users. To find out if
your network is slowing down the traffic, try this. First open up a session to the system
from regular terminal emulator and log in while counting the seconds say your system is
named apple, then from another system say orange, rlogin to apple while counting the
seconds it take for login prompt to appear. Compare the time and if rlogin is slower then
your network is reaching its maximum capacity.

The basic network tool is called ping. If you want to see that a system named orange is
reachable from your system named apple then from apple
apple :> ping oranges

Pinging host oranges.com (oranges) : 38.152.119.3


oranges.com: is alive!

----oranges.com PING Statistics----


1 packets transmitted, 1 packets received, 0% packet loss
round-trip (ms) min/avg/max = 9.27/9.27/9.27

This above command tells us that oranges host is reachable and no packets were
dropped. Another command to test networking problems is called netstat.

To diagnose a networking problem netstat -i could be used in the following way:


netstat -i

Name Mtu Network Address Ipkts


Opkts Odrop
eg1 1500 204.89.162 dxi4.dxi.com 2275517
3783974 0
eg0 1500 38.254.211 dxifour.dxi.com 4716968
2862227 0
loop 1536 loopback-net localhost 0
0 0
What it means?

• Name is the name of the interface. It identifies a particular Ethernet board.


• Mtu is the maximum transfer unit or the maximum packet size for this interface.
• Network is the network to which this interface is connected.
• Address is the address of this interface on Internet.
• Ipkts are the number of input packets received by this interface since system was
booted.
• Opkts are the number of output packets sent by this interface since system was
booted.
• Odrop are the number of packets that were dropped or discarded without
reaching destination.

Netstat -rn shows the static routing table of your system. i.e.
netstat -rn
Routing tables (10 entries)
Destination Gateway Flags ttl Use Interface
default 200.89.161.216 UGP PERM 537225 eg1
193.9.4.1 200.89.161.223 UGHP PERM 61325 eg1
127.0.0.1 127.0.0.1 UHP PERM 0 loop
191.99.8.40 200.89.161.245 UGHD 29 937 eg1
for more information about netstat, type man netstat at prompt.
back to top of page
Back to main page

Kernel Management.
Kernel is the heart of a Unix operating system. It manages memory, schedules
processes, manages I/O, and does all of the other low level jobs. Since it does all the
important jobs it is always resident in physical memory of a Unix system. Other
programs and software processes can be swapped or paged but kernel is always in
physical memory. That's the reason that it should be as small as possible. To configure
Kernel, login as root to system console and use the utility provided by your system. HP-
UX uses sam, AIX uses smit, sco uses scoadmin, dynix uses menu. The things that you
can do is make sure that software and drivers are absolutely needed to be in the system,
if not then remove them and compile and replace the current kernel.

You might also like