You are on page 1of 16

Deep-protocol analysis of UNIX networks

Martin Brown (mc@mcslp.com)


Freelance Writer
Author

08 June 2010

Whether you are monitoring your network to identify performance issues, debugging
an application, or have found an application on your network that you do not recognize,
occasionally you need to look deep into the protocols being used on your UNIX network to
understand what they are doing. Some protocols are easy to identify and understand, even
when used on non-standard ports. Others need more investigation to understand what they are
doing and what information they are exchanging. In this article, we will take a look at techniques
for performing detailed analysis of the protocols in use on your UNIX network.

Introduction
Networks have become so ubiquitous that in many cases we take the use of the network to
communicate with different machines, inside and outside of our network, for granted. Most of the
time this isn't an issue, but there are times when you need to take a closer look at your network
and find out what is going on.
There are a number of reasons to take a closer look at the contents of the network traffic. The first
is that you may simply be debugging an existing network application or one you are developing,
and want to monitor the traffic going past on your network. The second reason is to identify traffic
on your network that may be using up bandwidth and resources. For the former, you probably
already know the contents of the protocol, but you may want to get a more detailed look at the
actual data being transferred, for example, when using web services. For the latter, identifying the
contents of the packets requires some extensive knowledge of the protocols being used.
With both TCP/IP and UDP/IP communications, the key elements are the IP addresses used
to identify the hosts and the port number. The port number is used to provide additional
communication channels so that you can support multiple connections between two hosts. There
are some standards in the port definitions. For example, port 25 is for email (SMTP) traffic, and
most websites operate on port 80 (HTTP). These conventions are used to allow programs to
communicate with each other over a known channel in the same way as you would choose a
phone or fax number.
While these conventions exist, there is no limit or restriction on what ports you use. In fact, in many
cases, subversive network applications and some security methods will deliberately use non Copyright IBM Corporation 2010
Deep-protocol analysis of UNIX networks

Trademarks
Page 1 of 16

developerWorks

ibm.com/developerWorks/

standard ports. For example, some will hide content by misusing a standard port with a different
protocol, like using HTTP over port 25. Other examples include using a different port from the
standard so that it is not obvious which port is being used for the traffic (like using port 99 for
HTTP), or by encapsulating specific protocol traffic within another protocol. This last method is
actually the one used by network tunneling and virtual private networks (VPNs).
Regardless of the reasons or complexities of the network traffic, the first step is always to start
recording the data.

Recording raw data


There are a number of different tools available if you want to record the raw network data so
that you can examine the information yourself. Most of the network sniffers will also decode and
decipher specific packet contents, which will help you when you want to study the content of a
recognized protocol.
Under Solaris you can use the snoop tool, or under AIX the iptrace tool. You can also try the crossplatform tcpdump tool, which is supported on most UNIX and Linux operating systems. These
provide a combination of both capturing and decoding content for you, often performing the bulk of
the protocol analysis process for you. Note that with modern switches the Ethernet packets are not
echoed to every port, which often limits the information you can extract to the current host. Many
modern switches provide a management port that often carries a copy of all packets for exactly
this type of monitoring.
The primary complexity behind decoding network transmission is the levels of information that are
provided within the network packets. In addition, much of this information is also sent encoded in
binary format, and capturing pure raw packets off of the network requires a significant amount of
work to pick out the data that you need. By using a tool that provides some of the processing, you
can simplify the process of decoding network data.
To give an example, on an Ethernet network when looking at a typical TCP/IP protocol, the data
transmitted over the network will include:
Ethernet packet headers, including the Ethernet source and destination address, packet size
and the Ethernet packet type.
IP header, consisting of the IP addressing (source and destination), protocol identity and IP
flags. You will also get information about the fragmentation and packet sequence.
TCP header, which includes information on the port, implied protocol, flags and sequencing
numbers.
Even with all this information, we still haven't hit the actual content. Beneath the TCP (or UDP)
protocol will be additional protocols, standard data protocols (including HTTP, SMTP and FTP), or
encapsulating protocols such as Remote Procedure Call (RPC) and the subtype of RPC such as
NFS.

Deep-protocol analysis of UNIX networks

Page 2 of 16

ibm.com/developerWorks/

developerWorks

Often these tools rely on the protocol and/or port number to identify the content being transferred.
So, if traffic is being transferred on a non-standard port, the information may not be decoded
properly.

Basic network analysis


Many of the network sniffing tools already mentioned in this article provide varying levels of
protocol decoding by looking at the port and content details to determine the protocol being used.
For example, snoop and tcpdump both provide detailed information on different protocols under
both UDP and TCP to varying levels. In snoop, for example, you can get detailed information
about NFS operations, from the top level of the protocol right down to the individual data blocks
transferred. For example, you can monitor NFS traffic with snoop by specifying to monitor RPC
using the NFS protocol: $ snoop -v rpc nfs.
The output from this is quite detailed for each packet and deserves some closer investigation.
Listing 1 provides the Ethernet header data.

Listing 1. Ethernet header data


ETHER:
ETHER:
ETHER:
ETHER:
ETHER:
ETHER:
ETHER:
ETHER:

----- Ether Header ----Packet 64 arrived at 16:14:41.79434


Packet size = 238 bytes
Destination = 0:1a:ee:1:1:c0,
Source
= 0:21:28:3c:c0:61,
Ethertype = 0800 (IP)

The output here specifies that the Ethernet packet contains IP data, specifies the overall packet
size and time, and the destination and source Ethernet addresses for the packet.
Listing 2 shows the IP header. Much of the IP data is not useful, beyond the protocol and source/
destination address information.

Listing 2. IP header
IP:
IP:
IP:
IP:
IP:
IP:
IP:
IP:
IP:
IP:
IP:
IP:
IP:
IP:
IP:
IP:
IP:
IP:
IP:
IP:

----- IP Header ----Version = 4


Header length = 20 bytes
Type of service = 0x00
xxx. .... = 0 (precedence)
...0 .... = normal delay
.... 0... = normal throughput
.... .0.. = normal reliability
.... ..0. = not ECN capable transport
.... ...0 = no ECN congestion experienced
Total length = 224 bytes
Identification = 27460
Flags = 0x4
.1.. .... = do not fragment
..0. .... = last fragment
Fragment offset = 0 bytes
Time to live = 64 seconds/hops
Protocol = 6 (TCP)
Header checksum = 4d11

Deep-protocol analysis of UNIX networks

Page 3 of 16

developerWorks
IP:
IP:
IP:
IP:

ibm.com/developerWorks/

Source address = 192.168.0.112, tiger.mcslp.pri


Destination address = 192.168.0.2, bear.mcslp.pri
No options

In Listing 3 you can see the TCP header. Again, this information is usually useful only for the
source port and destination port numbers as these will either identify the expected protocol or
provide the information you need to further investigate traffic on this port.

Listing 3. TCP header


TCP:
TCP:
TCP:
TCP:
TCP:
TCP:
TCP:
TCP:
TCP:
TCP:
TCP:
TCP:
TCP:
TCP:
TCP:
TCP:
TCP:
TCP:
TCP:
TCP:
TCP:
TCP:
TCP:
TCP:

----- TCP Header ----Source port = 2049


Destination port = 889 (Sun RPC)
Sequence number = 2834727685
Acknowledgement number = 2654368001
Data offset = 32 bytes
Flags = 0x18
0... .... = No ECN congestion window reduced
.0.. .... = No ECN echo
..0. .... = No urgent pointer
...1 .... = Acknowledgement
.... 1... = Push
.... .0.. = No reset
.... ..0. = No Syn
.... ...0 = No Fin
Window = 32806
Checksum = 0x4852
Urgent pointer = 0
Options: (12 bytes)
- No operation
- No operation
- TS Val = 34449495, TS Echo = 253458642

The penultimate section, Listing 4, shows the RPC header data.

Listing 4. RPC header data


RPC:
RPC:
RPC:
RPC:
RPC:
RPC:
RPC:
RPC:
RPC:
RPC:

----- SUN RPC Header ----Record Mark: last fragment, length = 168
Transaction id = 3041181596
Type = 1 (Reply)
This is a reply to frame 63
Status = 0 (Accepted)
Verifier
: Flavor = 0 (None), len = 0 bytes
Accept status = 0 (Success)

Finally, Listing 5 provides the content of the NFS packet, including the permissions (file mode),
file size, ownership and other information. In this case, the NFS operation requested is for the
filesystem statistics (triggered by the equivalent of an ls operation), hence the level of detail.

Listing 5. Content of NFS packet


NFS:
NFS:
NFS:
NFS:

----- Sun NFS ----Proc = 18 (Get filesystem statistics)


Status = 0 (OK)

Deep-protocol analysis of UNIX networks

Page 4 of 16

ibm.com/developerWorks/

NFS:
NFS:
NFS:
NFS:
NFS:
NFS:
NFS:
NFS:
NFS:
NFS:
NFS:
NFS:
NFS:
NFS:
NFS:
NFS:
NFS:
NFS:
NFS:
NFS:
NFS:
NFS:
NFS:

developerWorks

Post-operation attributes:
File type = 2 (Directory)
Mode = 0777
Setuid = 0, Setgid = 0, Sticky = 0
Owner's permissions = rwx
Group's permissions = rwx
Other's permissions = rwx
Link count = 24, User ID = 502, Group ID = 10
File size = 29, Used = 2560
Special: Major = 4294967295, Minor = 4294967295
File system id = 781684113418, File id = 4304616
Last access time
= 28-Feb-10 15:49:51.042953989 GMT
Modification time
= 25-Feb-10 09:39:07.965422590 GMT
Attribute change time = 25-Feb-10 09:39:07.965422590 GMT
Total space = 759567510016 bytes
Available space = 659048374272 bytes
Available space - this user = 659048374272 bytes
Total file slots = 1288161604
Available file slots = 1287203856
Available file slots - this user = 1287203856
Invariant time = 0 sec

In this case, we can see the file being looked up was in fact a directory (see the File type line).
Although we do not get the actual path to the file, we could find the directory in question by using
Find to look for the file/path with the corresponding inode number (see Listing 6).

Listing 6. Looking for a file with the corresponding inode number


$ find /scratch -xdev -inum 4304616
/scratch/installed/mysql-6.0.11

The best way to use these tools if you are trying to identify traffic is first to run them and collect as
much data as possible, and then manually examine the content looking for items that you don't
expect to see on your network.
Once you have identified suspicious traffic, you can then start to add specifications on the
command line to zero in on the detail of the traffic. For example, you can specify to only display
traffic to a given host using either of the following shown in Listing 7.

Listing 7. Specifying to only display traffic to a give hose


$ snoop host 192.168.0.2
$ tcpdump host 192.168.0.2

To further restrict things, you could port the protocol details: $ snoop host 192.168.0.2 and port
25.

Parsing the raw data to understand the content


Another way to process the content from tcpdump is to save the raw network packet data to a file
and then process the file to find and decode the information that you want.
There are a number of modules in different languages that provide functionality for reading and
decoding the data captured by tcpdump and snoop. For example, within Perl, there are two
Deep-protocol analysis of UNIX networks

Page 5 of 16

developerWorks

ibm.com/developerWorks/

modules: Net::SnoopLog (for snoop) and Net::TcpDumpLog (for tcpdump). These will read the raw
data content. The basic interfaces for both of these modules is the same.
To start, first you need to create a binary record of the packets going past on the network by writing
out the data to a file using either snoop or tcpdump. For this example, we'll use tcpdump and the
Net::TcpDumpLog module: $ tcpdump -w packets.raw.
Once you have amassed the network data, you can start to process the network data contents
to find the information you want. The Net::TcpDumpLog parses the raw network data saved by
tcpdump. Because the data is in it's raw binary format, parsing the information requires processing
this binary data. For convenience, another suite of modules, NetPacket::*, provides decoding of
the raw data.
For example, Listing 8 shows a simple script that prints out the IP address information for all of the
packets.

Listing 8. Simple script that prints out the IP address info for all packets
use Net::TcpDumpLog;
use NetPacket::Ethernet;
use NetPacket::IP;
my $log = Net::TcpDumpLog->new();
$log->read("packets.raw");
foreach my $index ($log->indexes)
{
my $packet = $log->data($index);
my $ethernet = NetPacket::Ethernet->decode($packet);
if ($ethernet->{type} == 0x0800)
{
my $ip = NetPacket::IP->decode($ethernet->{data});

printf(" %s to %s protocol %s \n",


$ip->{src_ip},$ip->{dest_ip},$ip->{proto});
}
}

The first part is to extract each packet. The Net::TcpDumpLog module serializes each packet, so
that we can read each packet by using the packet ID. The data() method then returns the raw
data for the entire packet.
Deep-protocol analysis of UNIX networks

Page 6 of 16

ibm.com/developerWorks/

developerWorks

As with the output from snoop, we have to extract each of the blocks of data from the raw network
packet information. So in this example, we first need to extract the ethernet packet, including the
data payload, from the raw network packet. The NetPacket::Ethernet module does this for us.
Since we are looking for IP packets, we can check for IP packets by looking at the Ethernet packet
type. IP packets have an ID of 0x0800.
The NetPacket::IP module can then be used to extract the IP information from the data payload of
the Ethernet packet. The module provides the source IP, destination IP and protocol information,
among others, which we can then print.
Using this basic framework you can perform more complex lookups and decoding that do not rely
on the automated solutions provided by tcpdump or snoop. For example, if you suspect that there
is HTTP traffic going past on a non-standard port (i.e., not port 80), you could look for the string
HTTP on ports other than 80 from the suspected host IP using the script in Listing 9.

Listing 9. Looking for strong HHTP on ports other than 80


use Net::TcpDumpLog;
use NetPacket::Ethernet;
use NetPacket::IP;
use NetPacket::TCP;

my $log = Net::TcpDumpLog->new();
$log->read("packets.raw");

foreach my $index ($log->indexes)


{
my $packet = $log->data($index);

my $ethernet = NetPacket::Ethernet->decode($packet);

if ($ethernet->{type} == 0x0800)
{
my $ip = NetPacket::IP->decode($ethernet->{data});

if ($ip->{src_ip} eq '192.168.0.2')
{
if ($ip->{proto} == 6)

Deep-protocol analysis of UNIX networks

Page 7 of 16

developerWorks

ibm.com/developerWorks/

{
my $tcp = NetPacket::TCP->decode($ip->{data});
if (($tcp->{src_port} != 80) &&
($tcp->{data} =~ m/HTTP/))
{
print("Found HTTP traffic on non-port 80\n");
printf("%s (port: %d) to %s (port: %d)\n%s\n",
$ip->{src_ip},
$tcp->{src_port},
$ip->{dest_ip},
$tcp->{dest_port},
$tcp->{data});
}
}
}
}
}

Running the above script on a sample packet set returned the following shown in Listing 10.

Listing 10. Running the script on a sample packet set


$ perl http-non80.pl
Found HTTP traffic on non-port 80
192.168.0.2 (port: 39280) to 168.143.162.100 (port: 80)
GET /statuses/user_timeline.json HTTP/1.1
Found HTTP traffic on non-port 80
192.168.0.2 (port: 39282) to 168.143.162.100 (port: 80)
GET /statuses/friends_timeline.json HTTP/1

In this particular case we're seeing traffic from the host to an external website (Twitter).
Obviously, in this example, we are dumping out the raw data, but you could use the same basic
structure to decode and the data in any format using any public or proprietary protocol structure. If
you are using or developing a protocol using this method, and know the protocol format, you could
extract and monitor the data being transferred.

Using a protocol analyzer


Although, as already mentioned, tools like tcpdump, iptrace and snoop provide basic network
analysis and decoding, there are GUI-based tools that make the process even easier. Wireshark is
one such tool that supports a vast array of network protocol decoding and analysis.
Deep-protocol analysis of UNIX networks

Page 8 of 16

ibm.com/developerWorks/

developerWorks

One of the main benefits of Wireshark is that you can capture packets over a period of time (just
as with tcpdump) and then interactively analyze and filter the content based on the different
protocols, ports and other data. Wireshark also supports a huge array of protocol decoders,
enabling you to examine in minute detail the contents of the packets and conversations.
You can see the basic screenshot of Wireshark showing all of the packets of all types being listed
in Figure 1. The window is divided into three main sections: the list of filtered packets, the decoded
protocol details, and the raw packet data in hex/ASCII format.

Figure 1. Wireshark interface

As an example of the level of information and decoding that is provided by the Wireshark tool,
while writing this article I noticed that there were some error packets being returned by one of the
MySQL servers on the network.
To zero in on the content, I first applied the MySQL filter to the output. You can do this either
by typing an expression (like those provided to tcpdump, snoop or iptrace) into the Filter box.
Alternatively, you can click the Expression button and chose the filter from the built-in list. You can
see a sample of the filters available in Figure 2. Once you have chosen the filter, click Apply to
filter the packet list.

Deep-protocol analysis of UNIX networks

Page 9 of 16

developerWorks

ibm.com/developerWorks/

Figure 2. Choosing a Wireshark filter

By filtering on the MySQL protocol, I was able to identify the error packets. The MySQL protocol
returns a specific packet type with the error information. In this case, error 1242 means that the
query execution failed because there was a problem in a subquery. You can see the MySQL
protocol content details by expanding the MySQL protocol section of the Wireshark window, as
seen here in Figure 3.

Deep-protocol analysis of UNIX networks

Page 10 of 16

ibm.com/developerWorks/

developerWorks

Figure 3. Examining a MySQL error packet

Here we can see the detail of the error. By tracking back to the previous 'Request Query' packet, it
is possible to determine the query that triggered the error response (Figure 4).

Deep-protocol analysis of UNIX networks

Page 11 of 16

developerWorks

ibm.com/developerWorks/

Figure 4. The MySQL query that triggered the error response

By drilling down into the packets, I could identify a problem with the code I hadn't previously
noticed and identify both the error and the query that triggered the problem.
Wireshark supports such a wide variety of protocols and filters that you can get detailed
information. Another common use is to monitor the exact content of detailed protocols, such as
web services. Figure 5 shows the detailed (and structured) content from a SOAP request used to
log status information.

Deep-protocol analysis of UNIX networks

Page 12 of 16

ibm.com/developerWorks/

developerWorks

Figure 5. Looking at the details of SOAP web service request

This kind of detail can be invaluable when trying to debug any network protocol that you are using.
Another useful feature is that Wireshark can work with both live information, and it can record
information for later filtering and processing. This means that you can use it to monitor specific
periods of suspicious traffic and then allow you to drill down into the information at your leisure to
find out exactly what was occurring on your network.

Summary
Protocol analysis of the information going across the wire of your UNIX network could be a
complex process. However, with the combination of some simple and widely available tools, you
can decode and examine the details of your network traffic from the basics of the source and
destination through to the specific protocol and data being exchanged.
As shown in this article, using tools like tcpdump, snoop or iptrace, you can extract a wide range
of data at the command line. With tools like Wireshark, you can go even deeper and get more
Deep-protocol analysis of UNIX networks

Page 13 of 16

developerWorks

ibm.com/developerWorks/

detailed information on a much wider range of protocols and content. For custom protocols and
data structures, you can use Perl to extract the raw data and get all the information you need.

Deep-protocol analysis of UNIX networks

Page 14 of 16

ibm.com/developerWorks/

developerWorks

Resources
Learn
UNIX network analysis (Martin Brown, developerWorks, May 2009): Get more information on
understanding your general network structure, see UNIX network analysis.
Solutions for tracing UNIX applications (Martin Brown, developerWorks, March 2009): Read
this tutorial for background information on basic packet snooping, see Solutions for tracking
UNIX applications.
Solve application problems with tracing (Sean Wahlberg, developerWorks, March 2006): Get
information on using truss, trace, and similar tools.
System Administration Toolkit: Network Scanning (Martin Brown, developerWorks, December
2007): Get more tips on network scanning.
Read System Administration Toolkit: Standardizing your UNIX command-line tools (Martin
Brown, developerWorks, May 2006): Learn how to use the same command across multiple
machines.
For an article series that will teach you how to program in bash, see Bash by example,
Part 1: Fundamental programming in the Bourne again shell (bash) (Daniel Robbins,
developerWorks, March 2000), Bash by example, Part 2: More bash programming
fundamentals (Daniel Robbins, developerWorks, April 2000), and Bash by example, Part 3:
Exploring the ebuild system (Daniel Robbins, developerWorks, May 2000).
System Administration Toolkit: Check out other parts in this series.
Making UNIX and Linux work together (Martin Brown, developerWorks, April 2006): A guide
to getting traditional UNIX distributions and Linux working together.
Different systems use different tools, and the IBM Redbook Solaris to Linux Migration: A
Guide for System Administrators will help you identify some key tools.
New to AIX and UNIX: Visit the New to AIX and UNIX page to learn more about AIX and
UNIX.
The developerWorks AIX and UNIX zone hosts hundreds of informative articles and
introductory, intermediate, and advanced tutorials.
Wireshark can be downloaded from the Wireshark homepage.
To listen to interesting interviews and discussions for software developers, check out
developerWorks podcasts.
developerWorks technical events and webcasts: Stay current with developerWorks technical
events and webcasts.
Get products and technologies
Innovate your next open source development project with IBM trial software, available for
download or on DVD.
Discuss
Participate in the discussion forum for this content.
Participate in developerWorks blogs and get involved in the developerWorks community.
Deep-protocol analysis of UNIX networks

Page 15 of 16

developerWorks

ibm.com/developerWorks/

About the author


Martin Brown
Martin Brown has been a professional writer for more than seven years. He is the
author of numerous books and articles across a range of topics. His expertise
spans myriad development languages and platforms -- Perl, Python, Java,
JavaScript, Basic, Pascal, Modula-2, C, C++, Rebol, Gawk, Shellscript, Windows,
Solaris, Linux, BeOS, Mac OS X and more -- as well as Web programming,
systems management, and integration. He is a Subject Matter Expert (SME) for
Microsoft and regular contributor to ServerWatch.com, LinuxToday.com, and IBM
developerWorks. He is also a regular blogger at Computerworld, The Apple Blog, and
other sites. You can contact him through his Web site.
Copyright IBM Corporation 2010
(www.ibm.com/legal/copytrade.shtml)
Trademarks
(www.ibm.com/developerworks/ibm/trademarks/)

Deep-protocol analysis of UNIX networks

Page 16 of 16