UNIX network performance analysis

Quick methods for finding UNIX performance problems
Skill Level: Intermediate Martin Brown (mc@mcslp.com) Professional writer Freelance

08 Sep 2009 Knowing your UNIX® network layout will go a long way with understanding your network and how it operates. But what happens when the performance of your UNIX network and the speed at which you can transfer files or connect to services suddenly reduces? How do you diagnose the issues and work out where in your network the problems lie? This article looks at some quick methods for finding and identifying performance issues and the steps to start resolving them.

The performance of your network can have a significant impact on the general performance and reliability of the rest of your environment. If different applications and services are waiting for data over the network, or your clients are having trouble connecting or receiving the information, then you need to address these issues. Performance issues can also affect the reliability of your applications and environment, and can both be triggered by network faults, and in some cases they can even be the reason for a network fault. To understand and diagnose network issues, you first need to understand the nature of the issue; usually the problem will be related either to a latency or a bandwidth issue. In general, network performance issues are often tied to the underlying hardware; you cannot exceed the physical limits of the network environment. All performance issues are also usually relevant to a specific protocol or system, such as NFS or Web access. But you can diagnose and identify the issues from within the operating
UNIX network performance analysis © Copyright IBM Corporation 2009. All rights reserved.

Page 1 of 14

and will either limit the transmission of data to one host to the practical maximum supported by the network connection. This article looks at the following steps involved in identifying performance issues: • Getting a baseline performance level • Determining where the problem lies • Getting statistics • Identifying the bottleneck Understanding network metrics To understand and diagnose performance issues. you first need to determine your baseline performance level. . Network latency deals exclusively with the transmission of packets over the network. The network bandwidth should. All rights reserved. increased latency is a good indicator of a busy network. The bandwidth affects how much data can be transmitted. Network latency The network latency is the time between sending a request to a destination and the destination actually receiving the sent packet. never change. in theory. unless you change the UNIX network performance analysis Page 2 of 14 © Copyright IBM Corporation 2009. Let's first introduce two of the key concepts used in determining baseline performance: network latency and network bandwidth. while application latency refers to the delay between the application receiving a request and its ability to respond.developerWorks® ibm. As a metric for network performance. traditional copper cable will always be slower than using a fibre optic connection. Network bandwidth Bandwidth is a measure of the number of packets that can be transmitted over a network during a specific period of time. as it either indicates that the number of packets being transmitted exceeds the capacity. For long distances. Network latency can also be introduced when the complexity of the network and the number of hosts or gateways that a packet has to travel through increases.com/developerWorks system so that you can determine the correct course of action. The length of cable between points can also have an effect on the latency. or that the senders of data are having to wait before either transmission or re-transmission. or will limit the aggregate transmission rate when dealing with multiple simultaneous connections. Network latency is also different from application latency.

Measuring latency The ping tool is well known to all network administrators as a basic tool for checking the availability and latency of a network device. UNIX network performance analysis © Copyright IBM Corporation 2009. you should have the clients and servers attached to your standard network. Getting statistics Before you can identify whether there is a problem within your network. you should do them under controlled conditions. To do this you must check the various parameters -. Essentially. or 10MB to 100 hosts. there are a number of standard tools and tests that you can perform to determine your baseline values. Ping should work with most machines.ibm.com/developerWorks developerWorks® networking interface and hardware. For the actual testing process. 100MB to ten simultaneous hosts.to determine the performance and then monitor and compare this over time. and so the available bandwidth of a server can appear much greater than the sum of the client bandwidth. Web serving) disabled. both clients and servers. This means either shutting down other services. providing they have been configured to respond to the ICMP packets that the ping tool sends to the device. When performing the baseline networking tests. For example. you first need to have a baseline performance on which to base your assumptions. performance and any tests relevant to your network application environment -.latency. but all application-specific traffic (such as e-mail. All rights reserved. of course. and expects the device to echo the packet contents back. The major variable within network bandwidth is in the number of hosts using the network at any given time. except on the server that you are testing. a 1GB Ethernet interface can talk 1GB to one other network host. In reality. you should check the performance between the server and one or more clients when there is no other traffic on the network. file serving. you should perform them under both isolated (meaning with no other network traffic) and with typical network traffic to give you the two baselines: • For the isolated monitoring. the sustained bandwidth is not often required. or. ideally. and have the normal background traffic working. ping sends an echo packet to the device. Ideally. putting the server and client into an isolated network environment completely separate (but identical to) your standard network environment • For the standard monitoring. Page 3 of 14 . There will be many hundreds of smaller requests from a number of different hosts over a period of time.

0.pri 64 bytes from example. 0% packet loss round-trip (ms) min/avg/max/stddev = 0.103 ms ms ms ms ms ms ms ms ms ms ----example PING Statistics---10 packets transmitted. It is possible to switch off support for ping. For getting baseline figures. Using ping to determine latency $ ping example PING example.pri 64 bytes from example.example.019 The example in Listing 2 was made during a quiet period on the network.0.0. If the host being checked (or the network itself) was busy during the testing period.169 ms 64 bytes from 192. you need to use the -s option to send more than one echo packet and get the timing information.137/0.168. the ping times could be increased significantly. icmp_seq=9.168. icmp_seq=6. and so you should ensure that you can reach the host before using it as a verification that a host is available. You can then use this to extract the timing information automatically (see Listing 2).0.168. icmp_seq=8. icmp_seq=4.pri 64 bytes from example.168.2: icmp_seq=1 ttl=64 time=0.example.168.2): (192.163/0.example. ping can monitor the time it takes to send and receive the response. However. icmp_seq=2. Listing 2.168. you can use the -c option (on Linux®) to specify the count.pri 64 bytes from example.143 time=0.168/0. you must specify the packet size (the default is 56 bytes).168.168.pri 64 bytes from example.0.169/0.134 time=0.001 ms You need to use Control-C to stop the ping process.example.146 time=0.2: icmp_seq=0 ttl=64 time=0.pri (192.com/developerWorks During the process.168.example.2): (192. Listing 1. ping alone is not necessarily an indicator of a problem. which can be an effective method of measuring the response time of the echo process. UNIX network performance analysis Page 4 of 14 © Copyright IBM Corporation 2009.168.pri (192. Specifying the packet size when using ping on Solaris/AIX $ ping -s example 56 10 PING example: 56 data bytes 64 bytes from example.pri 64 bytes from example. On Solaris/AIX. icmp_seq=3. 0% packet loss round-trip min/avg/max/stddev = 0.0.103/0.168.168. In the simplest form.107 time= (192.163 time=0. icmp_seq=5.0. and the number of packets to send so that you do not have to manually terminate the process. 2 packets received. . icmp_seq=1. but it can occasionally give you a quick idea if there is something that needs to be identified.0.142 time=0.example.2): (192. you can send an echo request to a host and find out the response time (see Listing 1). time=0.143 time=0.2): (192.pri ping statistics --2 packets transmitted.developerWorks® ibm.167/0.pri 64 bytes from example. 10 packets received.pri 64 bytes from example.167 ms ^C --.2): 56 data bytes 64 bytes from 192. All rights reserved.0.example.example.pri 64 bytes from example.2): (192.example.example.2): (192.2): (192.0.example.example.2): icmp_seq=0. On Solaris and AIX®.example. icmp_seq=7.151 time= time=0.2): (192.

ibm. you can time a simple file transfer test. That said. you can run spray specifying the hostname (see Listing 3). using spray can tell you whether there is a lot of traffic on the network. Using sprayd The sprayd daemon and the associated spray tool send a large stream of packets to a specified host and determine how many of those packets get a response. packets sent using connectionless transport are not guaranteed to reach their destination. so that you can track the average response times and then identify where to start looking. it should not be relied on as a performance metric because it uses a connectionless transport mechanism.. and then time how long it takes to transfer the file over a network to another machine (see Listing 4). 101 packets (8. All rights reserved. For example. Using simple network transfer tests The best method for determining the bandwidth performance of your network is to check the actual speed when transferring data to or from the machine. Listing 3. Using spray $ spray tiger sending 1162 packets of length 86 to tiger . You may need to enable the spray daemon (usually through inetd) to use it. and even continually. Timing the length of time to transfer a file over a network to another UNIX network performance analysis © Copyright IBM Corporation 2009.com/developerWorks developerWorks® Ideally. because if the connectionless transport (UDP) is dropping packets. then it probably means the network (or the host) is too busy to carry the packets. you should track the ping times between specific hosts over a period of time.. Once the sprayd daemon has been started. and some other UNIX platforms. By definition. and so dropped packets are allowed in the communication anyway. 2GB: $ mkfile 2g 2gbfile). the speed should not be relied upon.692%) dropped by tiger 70 packets/sec. but the dropped packet counts can be a useful metric. 6078 bytes/sec As already mentioned. to determine the network bandwidth when transferring a file over the network using NFS. Page 5 of 14 . Spray is available on Solaris and AIX. Listing 4. There are lots of different tools that you can use to perform the tests across a number of different applications and protocols. As a method for measuring the performance of a network. but usually the simplest method is the most effective one. To create a simple test. create a large file using mkfile (for example.

$file)}). You can automate the copy and timing process by using a Perl script. Executing the Perl script $ . it is important to identify that the problem is network related and not a problem elsewhere. To execute.840s You should run the tests multiple times and then take the average of the transfer process to get an idea of the standard performance. real 3m45. If the machine does not UNIX network performance analysis Page 6 of 14 © Copyright IBM Corporation 2009. use File::Copy. You can then execute the script and get a time (see Listing 6). like the one in Listing 5.45s You can use this both to create a baseline figure and during normal operations to check the transfer performance. First.$file). Listing 5.010s sys 0m9. you will identify a network problem only when a network-related application fails for some reason. my $srcdir = shift or die "Need a source directory to copy from\n"./timexfer.$srcdir. my $file = shift or die "Need a file to copy from\n". . and an optional count of the number of copies to make.2fs\n". you should try to reach the machine using ping.developerWorks® ibm. supply the name of the source file and the source directory. use Data::Dumper. printf("Time is %. All rights reserved.648s user 0m0. my $count = shift || 10. my $t = timeit($count. Listing 6.($t->[0]/$count)). Diagnosing a problem Typically.sub {copy(sprintf("%s/%s".com/developerWorks machine $ time cp /nfs/mysql-live/transient/2gbfile .pl 2gbfile /nfs/mysql-live/transient 20 Time is 28. Automate the copy and timing process with a Perl script #!/usr/bin/perl use Benchmark. However.

The packets value is a simple count of the packets transferred. the tool provides more specific base protocol statistics. Using netstat UNIX network performance analysis © Copyright IBM Corporation 2009. Under Linux. dropped. An increase in ping times can in rare cases be related to the load on the machine.8 GiB) TX bytes:581702020 (554. and overruns figures show how many of the packets indicated some kind of fault.2 Bcast:192. then you should start to get some basic statistics about the network interface you are using to see if the problem is related to the network interface. the information contains some basic statistics (see Listing 8). Once you get a long ping time from one machine. and other network communication does not work. such as the packet transmissions for TCP-IP and UDP packet types.252. but the ping time is increased.255. then your first option should be to check the physical cables and make sure everything is still connected. but more often than not indicates an issue with the network.255 Mask:255. You can also get extended statistic information on all platforms by using the netstat tool. you can get some basic network statistic information by using the ifconfig tool (see Listing 7). All rights reserved. The errors. ideally on a different network switch. Under Linux. which show information about the packets sent and received. Listing 7. or a specific protocol. Checking network stats If the ping times are higher than you expect.168. then you need to determine where the problem lies.3. Again. A high number of dropped packets in comparison to the packets sent probably indicate that the network is busy.0 inet6 addr: fe80::21a:eeff:fe01:1c0/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:7916836 errors:0 dropped:78489 overruns:0 frame:0 TX packets:6285476 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:11675092739 (10.ibm.168. Page 7 of 14 . to find out if the problem is related to the specific machine or the network. Getting basic network statistic information using the ifconfig tool $ ifconfig eth1 eth1 Link encap:Ethernet HWaddr 00:1a:ee:01:01:c0 inet addr:192.com/developerWorks developerWorks® respond to a ping request.0. If you can still connect to the machine.7 MiB) Interrupt:16 Base address:0x2000 The important rows are those beginning RX and TX. you should run ping from another machine on the network. Listing 8.

0 packet receive errors 130335 packets sent UdpLite: TcpExt: 5 packets pruned from receive queue because of socket buffer overrun 6792 TCP sockets finished time wait in fast timer 5681 delayed acks sent Quick ack mode was activated 11637 times 150861 packets directly queued to recvmsg prequeue.com/developerWorks $ netstat -s Ip: 8437387 total packets received 1 with invalid addresses 0 forwarded 0 incoming packets discarded 8437383 incoming packets delivered 6820934 requests sent out 6 reassemblies required 3 packets reassembled ok Icmp: 502 ICMP messages received 3 input ICMP message failed. 74333 bytes directly in process context from backlog 9141882 bytes directly received in process context from prequeue 3608274 packet headers predicted 42627 packets header predicted and directly queued to user 77132 acknowledgments not containing data payload received 374105 predicted acknowledgments 2 times recovered from packet loss by selective acknowledgements 77 congestion windows recovered without slow start after partial ack 1 TCP data loss events 17 timeouts after SACK recovery 2 fast retransmits 8 retransmits in slow start 236 other TCP timeouts 1453 packets collapsed in receive queue due to low socket buffer 11634 DSACKs sent for old packets 2 DSACKs sent for out of order packets UNIX network performance analysis Page 8 of 14 © Copyright IBM Corporation 2009. ICMP input histogram: destination unreachable: 410 echo requests: 82 echo replies: 10 1406 ICMP messages sent 0 ICMP messages failed ICMP output histogram: destination unreachable: 1313 echo request: 11 echo replies: 82 IcmpMsg: InType0: 10 InType3: 410 InType8: 82 OutType0: 82 OutType3: 1313 OutType8: 11 Tcp: 8361 active connections openings 6846 passive connection openings 1 failed connection attempts 164 connection resets received 33 connections established 8305361 segments received 6688553 segments send out 640 segments retransmitted 0 bad segments received.developerWorks® ibm. 676 resets sent Udp: 126083 packets received 1294 packets to unknown port received. . All rights reserved.

com/developerWorks developerWorks® 2 DSACKs received 77 connections reset due to unexpected data 50 connections aborted due to timeout TCPDSACKIgnoredNoUndo: 1 TCPSackShiftFallback: 23 IpExt: InBcastPkts: 4126 Under Solaris and other UNIX variants.. then it may indicate a fault with the network hardware. Listing 9. In all cases. For example.. All rights reserved.ibm. all of which indicate that the network is busy. or dropped packet transmission. retransmissions. you get detailed statistics for each protocol. you are looking for a high level of error packets. Checking NFS stats UNIX network performance analysis © Copyright IBM Corporation 2009. and separate information for IPv4 and IPv6 connections (see Listing 9). Page 9 of 14 . Using netstat on Solaris $ netstat -s RAWIP rawipInDatagrams rawipInCksumErrs rawipOutErrors udpInDatagrams udpOutDatagrams = = = 440 0 0 rawipInErrors rawipOutDatagrams udpInErrors udpOutErrors tcpRtoMin tcpMaxConn tcpPassiveOpens tcpEstabResets tcpOutSegs tcpOutDataBytes tcpRetransBytes tcpOutAckDelayed tcpOutWinUpdate tcpOutControl tcpOutFastRetrans tcpInAckBytes tcpInAckUnsent tcpInInorderBytes tcpInUnorderBytes tcpInDupBytes tcpInPartDupBytes tcpInPastWinBytes tcpInWinUpdate tcpRttNoUpdate tcpTimRetrans tcpTimKeepalive tcpTimKeepaliveDrop tcpListenDropQ0 tcpOutSackRetrans = = = = 0 91 0 0 UDP TCP = 15756 = 16515 tcpRtoAlgorithm = 4 tcpRtoMax = 60000 tcpActiveOpens = 1735 tcpAttemptFails = 2 tcpCurrEstab = 2 tcpOutDataSegs =13975728 tcpRetransSegs = 90215 tcpOutAck =151539 tcpOutUrg = 0 tcpOutWinProbe = 86 tcpOutRsts = 63 tcpInSegs =7548720 tcpInAckSegs =2882026 tcpInDupAck =4413016 tcpInInorderSegs =415007 tcpInUnorderSegs = 7650 tcpInDupSegs = 222 tcpInPartDupSegs = 0 tcpInPastWinSegs = 0 tcpInWinProbe = 0 tcpInClosed = 33 tcpRttUpdate =2880379 tcpTimRetransDrop = 10 tcpTimKeepaliveProbe= 314 tcpListenDrop = 0 tcpHalfOpenDrop = 0 = 400 = -1 = 54 = 35 =13771839 =1648876686 =130340273 = 5570 = 31 = 3750 = 6 =1648874900 = 0 =367832646 =10389516 = 74649 = 0 = 0 = 2 = 660 = 2262 = 630 = 17 = 0 = 69348 . the information provided by netstat differs depending upon the platform. If the error rate is excessively high compared to the packets transmitted or received. under Solaris. The output in the listing has been truncated.

such as high load (which will obviously affect the speed at which requests can be processed).com/developerWorks When checking problems related to NFS connections. you should first ensure that the issue is not related to a problem on the machine. A simple check using uptime and ps to identify the processes will tell you how busy the machine is. and indeed most other network applications. All rights reserved. Listing 10. You can also check the NFS statistics that are generated by the NFS service. For example. the statistics in Listing 10 show the detailed NFS v3 statistics for the server side of the NFS service. The nfsstat command generates detailed stats for both the server and client side of the NFS service. selected by using the -s command-line option and -v to specify the NFS version. .developerWorks® ibm. nfsstat command with -s and -v command-line options $ nfsstat -s -v3 Server rpc: Connection oriented: calls badcalls nullrecv badlen xdrcall dupchecks dupreqs 36118 0 0 0 0 410 0 Connectionless: calls badcalls nullrecv badlen xdrcall dupchecks dupreqs 75 0 0 0 0 0 0 Server NFSv3: calls badcalls 35847 0 Version 3: (35942 calls) null getattr setattr lookup access readlink 15 0% 190 0% 83 0% 3555 9% 21222 59% 0 0% read write create mkdir symlink mknod 9895 27% 300 UNIX network performance analysis Page 10 of 14 © Copyright IBM Corporation 2009.

168. Page 11 of 14 . Related to the ping tool.1. Ping times in larger networks If you can ping the machine. 30 hops max.example.ibm. but the network performance is still a problem.com traceroute to gendarme.1) 14. then you need to determine where in your network the performance problem is located. All rights reserved.pri (192. either due to a software problem or faulty hardware.530 ms 4. In a larger network this can help you isolate where the problem is. the traceroute tool will normally provide you with the ping times for each router that the network packets travel through to reach their destination. For example.70. In this case.922 ms UNIX network performance analysis © Copyright IBM Corporation 2009.com/developerWorks developerWorks® 0% 7 0% 0 0% 0 0% 0 0% remove rmdir rename link readdir readdirplus 0 0% 0 0% 0 0% 0 0% 37 0% 20 0% fsstat fsinfo pathconf commit 521 1% 2 0% 1 0% 94 0% Server nfs_acl: Version 3: (0 calls) null getacl setacl getxattrdir 0 0% 0 0% 0 0% 0 0% A high number of badcalls values indicate that bad requests are being sent to the server.example. In a larger network where you have different segments of your network separated by routers. which may indicate that a client is not functioning correctly and submitting bad requests. the trace shown in Listing 11 is between two offices in the UK that use two different ISPs. Listing 11.102). where different routers are used at different points to transmit packets between different Internet Service Providers (ISP).998 ms 95.com (82. you can use the traceroute tool determine whether there is a specific point in the route between the two machines where there is a problem. 40 byte packets 1 voyager. This can also be used to identify potential problems when sending packets over the Internet.example. traceroute between two offices in the UK $ traceroute gendarme.138. the destination machine cannot be reached due to a fault.

Summary Identifying UNIX network performance issues is hard to determine from a single machine when the problem is usually widespread across the network.vispa.car1. and so traceroute will not be of any help.165 ms 35.718 ms 123.228.net. All rights reserved.249 (195.50.251 ms 95.174) 34.1) 49.hq.tcm.80.119. UNIX network performance analysis Page 12 of 14 © Copyright IBM Corporation 2009.zen.uk (62.net.249) 47.com/developerWorks 2 dsl. It is usually possible. This article looked at the basic methods to get baseline information and then the different tools that can be used to zero in on the issue.net ( ms 6 PACKET-EXCH. to use ping and/or traceroute to narrow down the machine by looking at the performance from different points within your network.97) 92.119.Level3.54) 33.217.net.674 ms 30.18) 32.261 ms 4 195.133. .341 ms 33.312 ms 7 spinoza-ae2-0.684 ms 33.382 ms 52.442 ms 33.69.253 ms 8 galileo-fe-3-1-172.uk (62.uk (62.vispa.hq.uk (83. You are now armed with some knowledge and techniques to deal with UNIX network performance.742 ms 3 rt-gw1.034 ms 39.50.Manchester1. though.398 ms 137.car1.123 ms 5 ae-11-11.Level3.16.net.440 ms 143.Manchesteruk1. Both ping and traceroute rely on being able to reach a host to determine the problem. Once you have some starting points.791 ms 140.developerWorks® ibm. you can use the other network tools to get more detailed information about the protocol or application that is causing the problem.net (4.036 ms 50.3.178 ms 47.zen.703 ms 9 * * * 10 * * * 11 * * * 12 * * * In a smaller network you are unlikely to have routers separating the networks.90) 45.

April 2006) is a guide to getting traditional UNIX distributions and Linux working together. developerWorks. trace. • Podcasts: Tune in and catch up with IBM technical experts. May 2006) to learn how to use the same command across multiple machines.ibm. December 2007): Get more tips on network scanning. May 2000). • Solve application problems with tracing (developerWorks): Get information on using truss. developerWorks. and similar tools. • Systems Administration Toolkit: Check out other parts in this series. March 2000). see Bash by example. • Read System Administration Toolkit: Standardizing your UNIX command-line tools (Martin Brown. and advanced tutorials. • Making UNIX and Linux work together (Martin Brown. and Bash by example. • Stay current with developerWorks technical events and webcasts. All rights reserved. • AIX Wiki: A collaborative environment for technical information related to AIX. • Technology bookstore Browse this site for books and other technical topics. • New to AIX and UNIX: Visit the New to AIX and UNIX page to learn more about AIX and UNIX. developerWorks. developerWorks. and the IBM Redbook Solaris to Linux Migration: A Guide for System Administrators will help you identify some key tools. Part 1: Fundamental programming in the Bourne again shell (bash) (Daniel Robbins. • developerWorks technical events and webcasts: Stay current with developerWorks technical events and webcasts. developerWorks. April 2000). Page 13 of 14 . Part 2: More bash programming fundamentals (Daniel Robbins. Discuss • Participate in the AIX and UNIX forums: UNIX network performance analysis © Copyright IBM Corporation 2009. intermediate.com/developerWorks developerWorks® Resources Learn • System Administration Toolkit: Network Scanning (Martin Brown. Part 3: Exploring the ebuild system (Daniel Robbins. • The developerWorks AIX and UNIX zone hosts hundreds of informative articles and introductory. • For an article series that will teach you how to program in bash. developerWorks. • Different systems use different tools. Bash by example.

as well as Web programming. Solaris. UNIX network performance analysis Page 14 of 14 © Copyright IBM Corporation 2009. Basic. The Apple Blog and other sites. Pascal. systems management and integration. C++.com. Linux.com. BeOS. JavaScript. Windows. Mac OS/X and more -.mcslp. Rebol.Perl.com/developerWorks • AIX Forum • AIX Forum for developers • Cluster Systems Management • IBM Support Assistant Forum • Performance Tools Forum • Virtualization Forum • More AIX and UNIX Forums About the author Martin Brown Martin Brown has been a professional writer for over eight years. C. Gawk.com and IBM developerWorks. He is the author of numerous books and articles across a range of topics.developerWorks® ibm. He can be contacted through his Web site at http://www. Martin is a regular contributor to ServerWatch. Shellscript. All rights reserved. Python. as well as a Subject Matter Expert (SME) for Microsoft. . LinuxToday. and a regular blogger at Computerworld. Java. His expertise spans myriad development languages and platforms -. Modula-2.

Sign up to vote on this title
UsefulNot useful