UNIX
Network Programming
The sockets Networking API
NETWORK PROGRAMMING
UNIT-I
Introduction:
A Simple Daytime Client , Protocol independence, Error Handling, A Simple
Daytime Server, OSI model, Unix Standards, 64 bit architectures.
The Transport Layer:
Introduction, User datagram Protocol (UDP), Transmission Control Protocol
(TCP), Stream Control Transmission Protocol (SCTP), TCP Connection
Establishment and Termination, TIME_WAIT State, SCTP association
Establishment and Termination, Port Numbers, TCP Port Numbers and
Concurrent Servers, Buffer Sizes and Limitations, Standard Internet Services,
Protocol Usage
Sockets Introduction:
Introduction, Socket Address structures, Value-Result Arguments, Byte Ordering
Functions, inet_aton, inet_addr,and inet_ntoa Functions, inet_pton and inet_ntop
Functions,sock_ntop and Related Functions, readn, written and readline
Functions
What is network?
The term network can refer to any interconnected
group or system.
A computer network is composed of multiple
computers connected together using a
telecommunication system.
“…communication system for connecting end-systems”
End-systems . “hosts”
◦ PCs, workstations
◦ dedicated computers
◦ network components
Interconnection may be any medium capable of
communicating information:
◦ Copper wire
◦ Lasers (optical fiber)
◦ Radio /Satellite link
◦ Cable (coax)
Example: Ethernet.
Why network?
Sharingresources
◦ Resources become available regardless of
the user’s physical location (server based,
peer2peer)
Load Sharing/utilization
◦ Jobs processed on least crowed machine
◦ Resource can be shared
High reliability
◦ Alternative source of supply (multiple copies)
Computer as a communication tools
Wide variety of types of
networks
Circuit switched
◦ dedicated circuit per call
◦ performance (guaranteed)
◦ call setup required
◦ telephone system
Packet switched:
◦ data sent thru net in discrete “chunks”
◦ user A, B packets share network resources
◦ resources used as needed
◦ store and forward: packets move one hop
at a time
◦ The Internet (TCP/IP)
Client - Server
A server is a process - not a machine !
A server waits for a request from a client.
A client is a process that sends a request to an
existing server and (usually) waits for a reply.
This organization into client and server is used by
most network-aware applications.
Clients normally communicate with one server at a
time
server to be communicating with multiple clients.
Client and server on the same Ethernet communicating using TCP.
Client and server on different LANs connected through a WAN.
OSI MODEL
What’s a protocol?
a human protocol and a
computer network protocol:
TCP connection
Hi req.
TCP connection
Hi reply.
Got the
Get [Link]
time?
<file>
2:00
time
protocols define format, order of msgs sent and received among
network entities, and actions taken on msg transmission, receipt
Functions of OSI Layers
7 Application All
6 Presentation People
5 Session Seem
4 Transport To
3 Network Need
2 Data Link Data
Processin
1 Physical
g
1) Physical Layer
Concerned with the transmission of bits
How many volts for 0, how many for 1?
Number of bits of second to be transmitted
Two way or one-way transmission
Standardized protocol dealing with electrical, mechanical and signaling
interfaces
Many standards have been developed, e.g. RS-232 (for serial
communication lines)
Example : X.21
Functions of OSI Layers
(2) Data Link Layer
Handles errors in the physical
layer.
Groups bits into frames and
ensures their correct delivery.
Adds some bits at the
beginning and end of each
frame plus the checksum.
Receiver verifies the checksum If the checksum is not correct,
it asks for retransmission. (send a control message).
Consists of two sub layers:
Logical Link Control (LLC) -defines how data is transferred over
the cable and provides data link service to the higher layers.
Medium Access Control (MAC)- defines who can use the
network when multiple computers are trying to access it
simultaneously (i.e. Token passing, Ethernet [CSMA/CD]).
Functions of OSI Layers
(3) Network Layer
Concerned with the transmission of
packets
Choose the best path to send a packet
( routing )
It may be complex in a large network (e.g.
Internet)
Shortest (distance) route vs. route with
least delay
Static (long term average) vs. dynamic
(current load) routing
Two protocols are most widely used
X.25 -Connection Oriented
Public networks, telephone, European PTT
Send a call request at the outset to the destination
If destination accepts the connection, it sends an connection identifier
IP (Internet Protocol) -Connectionless
Part of Internet protocol suite
An IP packet can be sent without a connection being established
Each packet is routed to its destination independently
Functions of OSI Layers
(4) Transport Layer
Network layer does not deal with lost
messages
Transport layer ensures reliable service
Breaks the message (from sessions layer)
into smaller packets, assigns sequence
number and sends them
Reliable transport connections are built on top of X.25 or IP
In case IP, lost packets arriving out of order must be reordered
TCP : (Transport Control Protocol) Internet transport protocol
TCP/IP Widely used for network/transport layer (UNIX)
UDP (Universal Datagram Protocol) : Internet connectionless transport layer
protocol
Application programs that do not need connection-oriented protocol generally use
UDP
Functions of OSI Layers
(5) Sessions Layer
Enhanced version of transport layer
Dialog control, synchronization facilities
Rarely supported (Internet suite does not)
(6) Presentation Layer
Concerned with the semantics of the bits
Define records and fields in them
Sender can tell the receiver of the format
Makes machines with different internal representations to
communicate
If implemented, the best layer for cryptography
(7) Application Layer
Collection of miscellaneous protocols for high level applications
Electronic mail, file transfer, connecting remote terminals, etc.
E.g. SMTP, FTP, Telnet, HTTP, etc.
Protocol Stack: ISO OSI
Model
Data
Header
Application AH Data
encapsulation
and stripping
Presentation PH AH Data
Session SH PH AH Data
Transport TH SH PH AH Data
Network NH TH SH PH AH Data
Data link DH NH TH SH PH AH Data DT
Physical DH DH NH TH SH PH AH Data DT
ISO: the International Standards Organization
OSI: Open Systems Interconnection Reference Model (1984)
Communicating between End
Hosts
Host Host
Application Protocol
Application Application
Presentation Protocol
Presentation Presentation
Session Protocol
Session Session
Transport Protocol
Transport Transport
Network Network Network
Data link Data link Data link
interface
Physical Physical Physical
Route
r
Why layering?
Divide a task into pieces and then solve each piece
independently (or nearly so).
Establishing a well defined interface between layers makes
porting easier.
Functions of each layer are independent of functions of other
layers
◦ Thus each layer is like a module and can be developed independently
Each layer builds on services provided by lower layers
◦ Thus no need to worry about details of lower layers -- transparent to
this layer
Major Advantages:
◦ Code Reuse
◦ Eases maintenance, updating of system
OSI Model
the layers in a network is to use the International Organization for Standardization
(ISO) open systems interconnection (OSI) model for computer communications. This
is a seven-layer model, along with the approximate mapping to the Internet protocol
suite.
the bottom two layers of the OSI model as
the device driver and networking hardware
that are supplied with the system.
The network layer is handled by the IPv4
and IPv6 protocols.
The transport layers that we can choose
from are TCP and UDP.
We show a gap between TCP and UDP in
fig., indicate that it is possible for an
application to bypass the transport layer
and use IPv4 or IPv6 directly. This is called a
raw socket.
The upper three layers of the OSI model are
combined into a single layer called the
application.
◦ Eg., Web client (browser), Telnet client,
Web server, FTP server, or whatever
application we are using.
The sockets programming interfaces from the upper three layers
(the "application") into the transport layer.
There are two reasons for this design ,
◦ First, the upper three layers handle all the details of the
application (FTP, Telnet, or HTTP, for example)
◦ The lower four layers know little about the application, but
handle all the communication details: sending data, waiting for
acknowledgments, sequencing data that arrives out of order,
calculating and verifying checksums, and so on.
◦ The second reason is that the upper three layers often form what
is called a user process while the lower four layers are normally
provided as part of the operating system (OS) kernel. Unix
provides this separation between the user process and the
kernel, as do many other contemporary operating systems.
Therefore, the interface between layers 4 and 5 is the natural
place to build the API.
A Simple Daytime Client
Create TCP socket: get a file
descriptor
Prepare server address structure: fill-
in IP address and port number
Connect to the server: bind the file
descriptor with the remote server
Read/write from/to server
Close socket
A Simple Daytime Server
Create TCP socket: get a file descriptor
Bind the socket with its local port
Listen: convert the socket to a
listening descriptor
Accept blocks to sleep
Accept returns a connected descriptor
Read/write
Close socket
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int main(int argc, char **argv)
{
int sockfd, n;
char recvline[MAXLINE + 1];
struct sockaddr_in servaddr;
if (argc != 2)
printf("usage: [Link] <IPaddress>");
if ( (sockfd = socket(AF_INET, SOCK_STREAM, 0)) < 0)
printf("socket error");
bzero(&servaddr, sizeof(servaddr));
servaddr.sin_family = AF_INET;
servaddr.sin_port = htons(13); /* daytime server */
if (inet_pton(AF_INET, argv[1], &servaddr.sin_addr) <= 0)
printf("inet_pton error for %s", argv[1]);
if (connect(sockfd, (struct sockaddr *) &servaddr, sizeof(servaddr)) < 0)
printf("connect error");
while ( (n = read(sockfd, recvline, MAXLINE)) > 0) {
recvline[n] = 0; /* null terminate */
if (fputs(recvline, stdout) == EOF)
printf("fputs error");
}
if (n < 0)
printf("read error");
exit(0);
}
#include <time.h>
int main(int argc, char **argv)
{
int listenfd, connfd;
struct sockaddr_in servaddr;
char buff[MAXLINE];
time_t ticks;
listenfd = Socket(AF_INET, SOCK_STREAM, 0);
bzeros(&servaddr, sizeof(servaddr));
servaddr.sin_family = AF_INET;
servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
servaddr.sin_port = htons(13); /* daytime server */
Bind(listenfd, (struct sockaddr *) &servaddr, sizeof(servaddr));
Listen(listenfd, 5);
for ( ; ; ) {
connfd = Accept(listenfd, (struct sockaddr *) NULL, NULL);
ticks = time(NULL);
snprintf(buff, sizeof(buff), "%.24s\r\n", ctime(&ticks));
Write(connfd, buff, strlen(buff));
Close(connfd);
}
}
TCP and UDP
The Big picture :
The protocol suite is called "TCP/IP," there are more members of this
family than just TCP and IP.
Overview of TCP/IP protocols.
The leftmost application, tcpdump, communicates
directly with the datalink using either the BSD packet
filter (BPF) or the datalink provider interface (DLPI).
We mark the dashed line beneath the nine applications
on the right as the API, which is normally sockets or
XTI. The interface to either BPF or DLPI does not use
sockets or XTI.
IPv4: Internet Protocol version 4 :It uses 32-bit
addresses. IPv4 provides packet delivery service for
TCP, UDP, SCTP, ICMP, and IGMP.
IPv6 : Internet Protocol version 6: it uses larger
address comprising 128 bits, IPv6 provides packet
delivery service for TCP, UDP, SCTP, and ICMPv6.
TCP : Transmission Control Protocol: is a connection-
oriented protocol that provides a reliable, full-duplex byte
stream to its users. TCP sockets are an example of stream
sockets. TCP takes care of details such as
acknowledgments, timeouts, retransmissions, and the like.
Most Internet application programs use TCP. Notice that
TCP can use either IPv4 or IPv6.
UDP : User Datagram Protocol : is a connectionless
protocol, and UDP sockets are an example of datagram
sockets. There is no guarantee that UDP datagrams ever
reach their intended destination. As with TCP, UDP can use
either IPv4 or IPv6.
SCTP : Stream Control Transmission Protocol : SCTP is a
connection-oriented protocol that provides a reliable full-
duplex association. As with TCP and UDP, SCTP can use
either IPv4 or IPv6, but it can also use both IPv4 and IPv6
simultaneously on the same association.
ICMP : Internet Control Message Protocol : ICMP handles error
and control information between routers and hosts. These
messages are normally generated by and processed by the
TCP/IP networking software itself, not user processes.
IGMP : Internet Group Management Protocol : IGMP is used
with multicasting, which is optional with IPv4.
ARP : Address Resolution Protocol: ARP maps an IPv4 address
into a hardware address (such as an Ethernet address). ARP is
normally used on broadcast networks such as Ethernet, token
ring, and FDDI, and is not needed on point-to-point networks.
RARP : Reverse Address Resolution Protocol : RARP maps a
hardware address into an IPv4 address. It is sometimes used
when a diskless node is booting.
ICMPv6 : Internet Control Message Protocol version 6. ICMPv6
combines the functionality of ICMPv4, IGMP, and ARP.
BPF : BSD packet filter. This interface provides access to the
datalink layer. It is normally found on Berkeley-derived kernels.
DLPI : Datalink provider interface : This interface also provides
access to the datalink layer. It is normally provided with SVR4.
Internet Protocol (IP)
Establishes a “virtual” network between
hosts, independent of the underlying
network topology Provides “routing”
throughout the network, using IP
addressing. For example: [Link]
Features
◦ • Best-effort packet delivery
◦ • Connectionless (stateless)
◦ • Unreliable
User Datagram Protocol
(UDP)
UDP is a simple transport-layer protocol(RFC 768 )
Application Interface to IP - Packet Oriented
Establishes a “port”, which allows IP to distinguish among
processes running on the same host
The application writes a message to a UDP socket, which
is then encapsulated in a UDP datagram, which is then
further encapsulated as an IP datagram, which is then
sent to its destination.
Features resemble IP semantics
◦ • Connectionless
◦ • Unreliable
◦ • Checksums (optional)
Transmission Control Protocol (TCP) (RFC
793 )
First,
TCP provides connections between clients and
servers. A TCP client establishes a connection with
a given server, exchanges data with that server
across the connection, and then terminates the
connection.
TCP contains algorithms to estimate the round-trip
time (RTT) between a client and server dynamically.
Connection-oriented
Stream Data Transfer
Reliable
Flow-Control
Full-Duplex
Suited for critical data transfer applications
Stream Control Transmission
Protocol (SCTP)
It is a connection-oriented protocol in
computer networks which provides a full-
duplex association i.e., transmitting
multiple streams of data between two
end points at the same time that have
established a connection in network.
SCTP protocol makes it easier to establish
reliable connection. checksums are
calculated to detect damaged, corrupted,
discarded, duplicate and reordered data.
It is similar to TCP but SCTP is more
efficient when it comes to reordering of
data.
It allows half- closed connections.
The message’s boundaries are
maintained and application
doesn’t have to split messages.
It has properties of both TCP and
UDP protocol.
TCP Connection
establishment and
termination
Three-Way Handshake
The following scenario occurs when a TCP connection is
established:
1. The server must be prepared to accept an incoming
connection. This is normally done by calling socket,
bind, and listen and is called a passive open.
[Link] client issues an active open by calling connect.
This causes the client TCP to send a "synchronize"
(SYN) segment, which tells the server the client's initial
sequence number for the data that the client will send
on the connection. Normally, there is no data sent with
the SYN; it just contains an IP header, a TCP header,
and possible TCP options (which we will talk about
shortly).
3. The server must acknowledge (ACK)
the client's SYN and the server must
also send its own SYN containing the
initial sequence number for the data
that the server will send on the
connection. The server sends its SYN
and the ACK of the client's SYN in a
single segment.
4. The client must acknowledge the
server's SYN.
The minimum number of packets
required for this exchange is three;
hence, this is called TCP's three-way
handshake.
TCP three-way
handshake.
• the client's initial sequence number as J and the server's initial sequence
number as K. The acknowledgment number in an ACK is the next expected
sequence number for the end sending the ACK.
• the ACK of each FIN is the sequence number of the FIN plus one.
TCP Options
Each SYN can contain TCP options:
◦ MSS option: With this option, the TCP sending the
SYN announces its maximum segment size, The
sending TCP uses the receiver's MSS value as the
maximum size of a segment that it sends. We will see
how to fetch and set this TCP option with the
TCP_MAXSEG socket option.
◦ Window scale option: The maximum
window that either TCP can advertise to
the other TCP is 65,535, because the
corresponding field in the TCP header
occupies 16 bits. We will see how to
affect this option with the SO_RCVBUF
socket option.
◦ Timestamp option. This option is
needed for high-speed connections to
prevent possible data corruption caused
by old, delayed, or duplicated segments.
The Timestamp option can be used to
measure the round-trip time (RTT) of
every packet that is acknowledged. This
is done by including a Timestamp Value
TSval in every segment that is sent.
TCP Connection
Termination
While it takes three segments to establish a connection, it
takes four to terminate a connection.
[Link] application calls close first, and we say that this end
performs the active close. This end's TCP sends a FIN
segment, which means it is finished sending data.
2. The other end that receives the FIN performs the passive
close. The received FIN is acknowledged by TCP. The receipt
of the FIN is also passed to the application as an end-of-file
(after any data that may have already been queued for the
application to receive), since the receipt of the FIN means the
application will not receive any additional data on the
connection.
3. Sometime later, the application that received the end-of-file
will close its socket. This causes its TCP to send a FIN.
4. The TCP on the system that receives this final FIN (the end
that did the active close) acknowledges the FIN.
A FIN and an ACK are required in each direction, four
segments are normally required
Packets exchanged when a TCP connection is closed.
A FIN occupies one byte of sequence number space just like a SYN.
Therefore, the ACK of each FIN is the sequence number of the FIN plus one.
Between Steps 2 and 3 it is possible for data to flow from the end doing the
passive close to the end doing the active close. This is called a half-close.
The sending of each FIN occurs when a socket is closed.
TCP supports two types of connection
releases like most connection-
oriented transport protocols:
Graceful connection release –
In the Graceful connection release,
the connection is open until both
parties have closed their sides of the
connection.
Abrupt connection release –
In an Abrupt connection release,
either one TCP entity is forced to
close the connection or one user
closes both directions of data
transfer.
Graceful Connection Release
TCP State Transition
Diagram
The operation of TCP with regard to connection establishment and
connection termination can be specified with a state transition diagram.
There are 11 different states defined for a connection
and the rules of TCP.
if an application performs an active open in the
CLOSED state, TCP sends a SYN and the new state is
SYN_SENT.
If TCP next receives a SYN with an ACK, it sends an
ACK and the new state is ESTABLISHED.
This final state is where most data transfer occurs.
The two arrows leading from the ESTABLISHED
state deal with the termination of a connection.
If an application calls close before receiving a FIN (an
active close), the transition is to the FIN_WAIT_1
state.
But if an application receives a FIN while in the
ESTABLISHED state (a passive close), the transition is
to the CLOSE_WAIT state.
Watching the Packets
the actual packet exchange that takes place for a
complete TCP connection: the connection establishment,
data transfer, and connection termination.
the acknowledgment of the client's request is sent with the
server's reply. This is called piggybacking
TIME_WAIT State
one of the most misunderstood aspects of TCP
with regard to network programming is its
TIME_WAIT state.
The end that performs the active close goes
through this state. The duration that this
endpoint remains in this state is twice the
maximum segment lifetime (MSL), sometimes
called 2MSL.(in fig state transition diagram)
The duration of the TIME_WAIT state is between 1
and 4 minutes. The MSL is the maximum amount
of time that any given IP datagram can live in
network.
Every datagram contains 8-bit hop limit with a
max. Value of 255.
The way in which packet gets lost in a network
is usually the result of routing anomalies.
A router crashes or a link between two routers
goes down and it takes the routing protocols
seconds or minutes to stabilize and find an
alternate path . During that time period, routing
loops can occur and packets can get caught in
these loops. In the meantime, assuming the lost
packet is a TCP segment, the sending TCP times
out and retransmits the packet, and the
retransmitted packet gets to the final
destination by some alternate path.
But sometime later, the routing loop is
corrected and the packet that was lost in the
loop is sent to the final destination. This original
packet is called a lost duplicate or
a wandering duplicate . TCP must handle
these duplicates.
There are two reasons for the TIME_WAIT state:
◦ [Link] implement TCP's full-duplex connection termination
reliably(the final ACK is lost).
◦ [Link] allow old duplicate segments to expire in the network
To understand the first reason, assume that final ACK is lost.
The server will resend its final FIN, so the client must maintain
state information, allowing it to resend the final ACK.
To understand the second reason, assume that we have a TCP
connection between [Link] port 1500 and
[Link] port 21. This connection is closed and then
sometime later, we establish another connection between the
same IP addresses and ports: [Link] port 1500 and
[Link] port 21. This latter connection is called
an incarnation of the previous connection since the IP
addresses and ports are the same. TCP must prevent old
duplicates from a connection from reappearing at some later
time and being misinterpreted as belonging to a new incarnation
of the same connection. we are guaranteed that when we
successfully establish a TCP connection, all old duplicates from
previous incarnations of the connection have expired in the
network..
5 AN SCTP ASSOCIATION
SCTP is connection-oriented like TCP, so it also has association
establishment and termination handshakes. However, SCTP's
handshakes are different than TCP's, so we describe them here. Four-
Way Handshake
The following scenario, similar to TCP, occurs when an SCTP association
is established:
[Link] server must be prepared to accept an incoming association. This
preparation is normally done by calling socket, bind, and listen and is
called a passive open.
[Link] client issues an active open by calling connect or by sending a
message, which implicitly opens the association. This causes the client
SCTP to send an INIT message (which stands for "initialization") to tell the
server the client's list of IP addresses, initial sequence number,
initiation tag to identify all packets in this association, number of
outbound streams the client is requesting, and number of inbound
streams the client can support.
3. The server acknowledges the client's INIT message with an INIT-
ACK message, which contains the server's list of IP
addresses, initial sequence number, initiation tag, number of
outbound streams the server is requesting, number of
inbound streams the server can support, and a state cookie. The
state cookie contains all of the state that the server needs to
ensure that the association is valid, and is digitally signed to
ensure its validity.
4 The client echos the server's state cookie with a COOKIE-ECHO
message. This message may also contain user data
bundled within the same packet. 5. The server acknowledges
that the cookie was correct and that the association was
established with a COOKIE-ACK message. This message may
also contain user data bundled within the same packet. The
minimum number of packets required for this exchange is four;
hence, this process is called SCTP's four-way handshake
SCTP: Four-way Association setup
INIT
s t a t eC o okie) no TCB
C K (
INIT–A
COOKI
E–ECH
O (sta
teCo okie)
AC K create
COOKIE– TCB
DATA
Figure 16.19 Four-way handshaking
Note
A connection in SCTP is called an
association.
Verification Tag
InTCP, a connection is identified by a
combination of IP addresses and port
numbers
◦ A blind attacker can send segments to a TCP
server using randomly chosen source and
destination port numbers
◦ Delayed segment from a previous connection
TIME-WAIT
timer
can show up in a new connection that uses the
same source and destination port addresses
(incarnation)
Two verification tags, one for each
direction, identify an association
Cookie (1)
In TCP
◦ Each time the server receives a SYN
segment, it sets up a TCB and allocates
other resources
In SCTP
◦ Postpone the allocation of resources until
the reception of the third packet, when
the IP address of the sender is verified
Cookie (2)
In SCTP
◦ The information received in the first
packet must somehow be saved until
the third packet arrives
◦ Solution: to pack the information and
send it back to the client (cookie)
◦ The above strategy works if no entity
can “eat” a cookie “baked” by the
server
◦ To guarantee this, the server creates a
digest from the information using its
own secret key
Note
No other chunk is allowed in a packet
carrying an INIT or INIT ACK chunk.
A COOKIE ECHO or a COOKIE ACK
chunk can carry data chunks.
Note
In SCTP, only data chunks consume TSNs;
data chunks are the only chunks that are
acknowledged.
Figure 16.20 Simple data transfer
Note
The acknowledgment in SCTP defines the
cumulative TSN, the TSN of the last data
chunk received in order.
Multi-homing Data
Transfer
Primary address
◦ The rest are alternative addresses
◦ Defined during association
establishment
◦ Determined by the other end
◦ The process can always override the
primary address (explicitly)
◦ SACK is sent to the address from
which the corresponding SCTP packet
originated
Multi-stream
Delivery
Interesting feature in SCTP
◦ Distinction between data transfer and
data delivery
◦ Data transfer: TSN (error/flow control)
◦ Data delivery: SI, SSN
Data delivery (in each stream)
◦ Ordered (default)
◦ Unordered: using the U flag, do not
consume SSNs (U flag with
fragmentation?)
Fragmentation
IP fragmentation vs. SCTP
◦ SCTP preserves the boundaries of the msg
from process to process when creating a
DATA chunk from a message if the size of
the msg does not exceed the MTU of the
path
SCTP fragmentation
◦ Each fragment carries a different TSN
◦ All header chunks carries the same SI,
SSN, payload protocol ID, and U flag
◦ Combination of B and E flag: 11,10,00,01
STATE TRANSITION DIAGRAM
To keep track of all the different events
happening during association establishment,
association termination, and data transfer, the
SCTP software, like TCP, is implemented as a
finite state machine. Figure 16.23 shows the
state transition diagram for both client and
server.
The transitions from one state to another in the state machine are
dictated by the rules of SCTP, based on the current state and the
chunk received in that state.
For example, if an application performs an active open in the
CLOSED state, SCTP sends an INIT and the new state is
COOKIE-WAIT. If SCTP next receives an INIT ACK, it sends a
COOKIE ECHO and the new state is COOKIE-ECHOED. If SCTP
then receives a COOKIE ACK, it moves to the ESTABLISHED
state. This final state is where most data transfer occurs, although
DATA chunks can be piggybacked on COOKIE ECHO and
COOKIE ACK chunks.
The two arrows leading from the ESTABLISHED state deal with the
termination of an association.
If an application calls close before receiving a SHUTDOWN (an
active close), the transition is to the SHUTDOWN-PENDING state.
However, if an application receives a SHUTDOWN while in the
ESTABLISHED state (a passive close), the transition is to the
SHUTDOWN-RECEIVED state.
Figure 16.23 State transition diagram
Port Numbers
At any given time, multiple processes can be using any
given transport: UDP, SCTP, or TCP. All three transport
layers use 16-bit integer port numbers to differentiate
between these processes.
◦ FTP assigns the well-known port of 21 (decimal) to the
FTP server.
◦ Trivial File Transfer Protocol (TFTP) servers are assigned
the UDP port of 69.
Clients, on the other hand, normally use ephemeral
ports, that is, short-lived ports. These port numbers are
normally assigned automatically by the transport protocol
to the [Link] ephemeral port is unique on the client
host.
The Internet Assigned Numbers Authority (IANA) maintains
a list of port number assignments. Assignments were once
published as RFCs; RFC 1700 , RFC 3232 , RFC 1700:
[Link]
The port numbers are divided into three ranges:
1. The well-known ports: 0 through 1023. These
port numbers are controlled and assigned by the
IANA. When possible, the same port is assigned to a
given service for TCP, UDP, and SCTP. For example,
port 80 is assigned for a Web server, for both TCP
and UDP,
The registered ports: 1024 through 49151. These
are not controlled by the IANA, but the IANA
registers and lists the uses of these ports as a
convenience to the community. When possible, the
same port is assigned to a given service for both
TCP and UDP.
The dynamic or private ports, 49152 through
65535. The IANA says nothing about these ports.
These are what we call ephemeral ports. (The magic
number 49152 is three-fourths of 65536.)
Unix systems have the concept of a reserved port, which is any
port less than 1024. These ports can only be assigned to a socket
by an appropriately privileged process. All the IANA well-known
ports are reserved ports; hence, the server allocating this port
(such as the FTP server) must have super user privileges when it
starts.
Historically, Berkeley-derived implementations (starting with
4.3BSD) have allocated ephemeral ports in the range 1024–5000.
This was fine in the early 1980s, but it is easy today to find a host
that can support more than 3977 connections at any given time.
Socket Pair
The socket pair for a TCP connection is the
four-tuple that defines the two endpoints of
the connection: the local IP address, local
port, foreign IP address, and foreign port. A
socket pair uniquely identifies every TCP
connection on a network.
The two values that identify each endpoint,
an IP address and a port number, are often
called a socket.
For example, bind lets the application
specify the local IP address and local port for
TCP, UDP, and SCTP sockets.
TCP Port Numbers and
Concurrent Servers
Buffer Sizes and
Limitations
Maximum size of an IPv4 datagram: 65,535 bytes
(including the header), because of the 16-bit total length
field.
Maximum size of an IPv6 datagram: 65,575 bytes
(including the 40-byte IPv6 header), because of the 16-bit
payload length field. IPv6 has a jumbo payload option,
which extends the payload length field to 32 bits, but this
option is supported only on datalinks with a maximum
transmission unit (MTU) that exceeds 65,535.
MTU (maximum transmission unit): dictated by the
hardware. Ethernet MTU is 1,500 bytes; Point-to-point
links have a configurable MTU.
◦ Minimum link MTU for IPv4: 68 bytes. This permits a
maximum-sized IPv4 header (20 bytes of fixed header,
40 bytes of options) and minimum-sized fragment (the
fragment offset is in units of 8 bytes)
Path MTU: smallest MTU in the path between
two hosts. The path MTU need not be the same in
both directions between any two hosts because
routing in the Internet is often asymmetric.
Fragmentation is performed by both IPv4 and
IPv6 when the size of an IP datagram to be sent
out an interface exceeds the link MTU. The
fragments are not normally reassembled until
they reach the final destination.
◦ IPv4: hosts perform fragmentation on
datagrams that they generate and routers
perform fragmentation on datagrams that they
forward
◦ IPv6: only hosts perform fragmentation on
datagrams that they generate; routers do not
fragment datagrams that they are forwarding.
IPv4 header contains fields to handle
fragmentation. IPv6 contains an option header
"Don't Fragment" (DF) bit in IPv4 header
specifies that this datagram must not be
fragmented, either by the sending host or
by any router. A router that receives an
IPv4 datagram with the DF bit set whose
size exceeds the outgoing link's MTU
generates an ICMPv4 "destination
unreachable, fragmentation needed but
DF bit set" error message.
Since IPv6 routers do not perform
fragmentation, there is an implied DF bit
with every IPv6 datagram. When an IPv6
router receives a datagram whose size
exceeds the outgoing link's MTU, it
generates an ICMPv6 "packet too big"
error message
TCP has a maximum segment
size (MSS) that announces to the
peer TCP the maximum amount of
TCP data that the peer can send
per segment. We saw the MSS
option on the SYN segments. The
goal of the MSS is to tell the peer
the actual value of the reassembly
buffer size and to try to avoid
fragmentation.
The MSS is often set to the
interface MTU minus the fixed
sizes of the IP and TCP headers.
On an Ethernet using IPv4, this would be 1,460, and
on an Ethernet using IPv6, this would be 1,440.
(The TCP header is 20 bytes for both, but the IPv4
header is 20 bytes and the IPv6 header is 40 bytes.)
◦ IPv4: The MSS value in the TCP MSS option is a
16-bit field, limiting the value to 65,535. The
maximum amount of TCP data in an IPv4
datagram is 65,495 (65,535 minus the 20-byte
IPv4 header and minus the 20-byte TCP header).
◦ IPv6: the maximum amount of TCP data in an IPv6
datagram without the jumbo payload option is
65,515 (65,535 minus the 20-byte TCP header).
The MSS value of 65,535 is considered a special
case that designates "infinity." This value is used
only if the jumbo payload option is being used,
which requires an MTU that exceeds 65,535.
The IP Protocol
The TCP Segment Header
TCP Header.
TCP Output
Every TCP socket has a send buffer and we can
change the size of this buffer with the
SO_SNDBUF socket option. When an
application calls write, the kernel copies all the
data from the application buffer into the socket
send buffer. If there is insufficient room in the
socket buffer for all the application's data, the
process is put to sleep. This assumes the
normal default of a blocking socket. The kernel
will not return from the write until the final
byte in the application buffer has been copied
into the socket send buffer. Therefore, the
successful return from a write to a TCP socket
only tells us that we can reuse our application
buffer. It does not tell us that either the peer
TCP has received the data or that the peer
application has received the data.
TCP takes the data in the socket send
buffer and sends it to the peer TCP. The
peer TCP must acknowledge the data,
and as the ACKs arrive from the peer,
only then can our TCP discard the
acknowledged data from the socket
send buffer. TCP must keep a copy of
our data until it is acknowledged by the
peer.
TCP sends the data to IP in MSS-sized
or smaller chunks, prepending its TCP
header to each segment, where the
MSS is the value announced by the
peer.
IP prepends its header, searches the
routing table for the destination IP
address, and passes the datagram to
the appropriate datalink. IP might
perform fragmentation before passing
the datagram to the datalink.
IP might perform fragmentation
before passing the datagram to the
datalink, but one goal of the MSS
option is to try to avoid
fragmentation.
Each datalink has an output queue,
and if this queue is full, the packet is
discarded.
UDP Output
UDP socket doesn't have a socket send buffer,
since it does not need to keep a copy of the
application's data. It has a send buffer size
(which we can change with
the SO_SNDBUF socket option), but this is
simply an upper limit on the maximum-sized
UDP datagram that can be written to the
socket.
If an application writes a datagram larger than
the socket send buffer size, EMSGSIZE is
returned.
UDP simply prepends its 8-byte header and
passes the datagram to IP. IP determines the
outgoing interface by performing the routing
function, and then either adds the datagram to
the datalink output queue or fragments the
datagram and adds each fragment to the
datalink output queue
Standard Internet
Services
several standard services that are provided by most implementations of
TCP/IP.
Protocol Usage by Common Internet
Applications
The first two applications, ping
and traceroute, are diagnostic
applications that use ICMP.
traceroute builds its own UDP
packets to send and reads ICMP
replies.
The three popular routing protocols
demonstrate the variety of transport
protocols used by routing protocols.
OSPF uses IP directly, employing a
raw socket, while RIP uses UDP and
BGP uses TCP.
The next five are UDP-based
applications,
followed by seven TCP applications
and four that use both UDP and TCP.
The final five are IP telephony
applications that use SCTP
exclusively or optionally UDP, TCP,
or SCTP.
APIs depend on platform:
UNIX – sockets (original Berkley
system calls)
– TLI (transport layer interface)
Apple Mac – MacTCP
MS Windows – WinSock (similar to
sockets)
• UNIX TCP/IP API are kernel system
calls
• Mac & Windows are
extensions/drivers (+DLL)
Sockets introduction
What is a socket?
To the kernel, a socket is an endpoint of
communication.
To an application, a socket is a file descriptor that lets
the application read/write from/to the network.
Remember: All Unix I/O devices, including networks, are
modeled as files.
These structures can be passed in two
directions: from the process to the kernel, and
from the kernel to the process.
Clients and servers communicate with each by
reading from and writing to socket descriptors.
98
Socket Address Structure
Socket = IP address + TCP or UDP port
number
Used in a socket function as a argument
(as pointer).
IP address, TCP or UDP port number, length
of structure .....
Each protocol define its own Socket
Address Structure(IPv4, IPv6....)
The names of these structures begin with
sockaddr_ and end with a unique suffix for
each protocol suite.
99 It Dept, RVR&JC 04/03/14
IPv4 Socket Address
Structure
An IPv4 socket address structure, commonly called an "Internet socket address structure," is named
sockaddr_in and is defined by including the <netinet/in.h> header.
Struct in_addr{
in_addr_t s_addr; /*32bit IPv4
address*/
}; /*network byte
ordered*/
struct sockaddr_in {
uint8_t sin_len; /* length of
structure(16) */
sa_family_t sin_family; /* AF_INET */
in_port_t sin_port; /* 16bit TCP or UDP
port number */
/*network byte
ordered*/
struct in_addr sin_addr; /* 32bit IPv4
address */
/*network byte
ordered*/
Datatypes required by
Posix.1g
Datatype Description Header
int8_t Signed 8bit integer <sys/types.h>
uint8_t Unsigned 8bit integer <sys/types.h>
int16_t Signed 16bit integer <sys/types.h>
uint16_t Unsigned 16bit integer <sys/types.h>
int32_t Signed 32bit integer <sys/types.h>
uint32_t Unsigned 32bit integer <sys/types.h>
Sa_family_t Address family of socket address <sys/socket.h>
structure
Socklen_t Length od socket address structure <sys/socket.h>
normally uint32_t
In_addr_t Ipv4 address, normally uint32_t <netinet/in.h>
In_port_t TCP or UDP port, normally uint16_t <netinet/in.h>
101 It Dept, RVR&JC 04/03/14
The four socket functions that pass a socket address
structure from the process to the kernel, bind, connect,
sendto, and sendmsg, all go through the sockargs function
in a Berkeley-derived implementation.
This function copies the socket address structure from the
process and explicitly sets its sin_len member to the size of
the structure that was passed as an argument to these four
functions.
The five socket functions that pass a socket address structure
from the kernel to the process, accept, recvfrom, recvmsg,
getpeername, and getsockname, all set the sin_len
member before returning to the process.
The POSIX specification requires only three members in the
structure: sin_family, sin_addr, and sin_port.
all implementations add the sin_zero member so that all
socket address structures are at least 16 bytes in size.
the POSIX datatypes for the s_addr, sin_family, and sin_port
members.
The in_addr_t datatype must be an unsigned integer type of
102 It Dept, RVR&JC 04/03/14
at least 32 bits, in_port_t must be an unsigned integer type of
Generic Socket Address
Structure
A socket address structures is always
passed by reference when passed as an
argument to any socket functions.
a generic socket address structure in the
<sys/socket.h> header.
The generic socket address structure:
sockaddr.
struct sockaddr {
uint8_t sa_len;
sa_family_t sa_family; /* address family: AF_xxx value
*/
char sa_data[14]; /* protocol-specific address
The socket functions are then defined as
taking a pointer to the generic socket
address structure, as shown here in the
ANSI C function prototype for the bind
function:
int bind(int, struct sockaddr *, socklen_t);
This requires that any calls to these
functions must cast the pointer to the
protocol-specific socket address structure
to be a pointer to a generic socket address
structure.
struct sockaddr_in serv; /* IPv4 socket
address structure */ /* fill in serv{} */
IPv6 Socket Address
Structure
Struct in6_addr{
uint8_t s6_addr[16]; /*128bit IPv6 address*/
}; /*network byte ordered*/
#define SIN6_LEN /* required for compile-time tests */
struct sockaddr_in6 {
uint8_t sin6_len; /* length of
structure(24) */
sa_family_t sin6_family; /* AF_INET6*/
in_port_t sin6_port; /* Transport layer port#
*/
/*network byte
ordered*/
uint32_t sin6_flowinfo; /* priority & flow label */
/*network byte
ordered*/
struct in6_addr sin6_addr; /* IPv6 address */
/*network byte
ordered*/
}; /* included in <netinet/in.h> */
Note the following points about Figure :
The SIN6_LEN constant must be defined if the system
supports the length member for socket address
structures.
The IPv6 family is AF_INET6, whereas the IPv4 family is
AF_INET.
The members in this structure are ordered so that if the
sockaddr_in6 structure is 64-bit aligned, so is the 128-bit
sin6_addr member. On some 64-bit processors, data
accesses of 64-bit values are optimized if stored on a 64-
bit boundary.
The sin6_flowinfo member is divided into two fields:
The low-order 20 bits are the flow label
The high-order 12 bits are reserved
The sin6_scope_id identifies the scope zone in which a
scoped address is meaningful, most commonly an
interface index for a link-local address.
106 It Dept, RVR&JC 04/03/14
New Generic Socket Address
Structure
new generic socket address structure was defined as part of the IPv6 sockets
API, to overcome some of the shortcomings of the existing struct sockaddr.
Unlike the struct sockaddr, the new struct sockaddr_storage is large
enough to hold any socket address type supported by the system. The
sockaddr_storage structure is defined by including the <netinet/in.h>
header.
The sockaddr_storage type provides a generic socket address structure that
is different from struct sockaddr in two ways:
•If any socket address structures that the system supports have alignment
requirements, the sockaddr_storage provides the strictest alignment
requirement.
•The sockaddr_storage is large enough to contain any socket address
structure that the system supports.
107 It Dept, RVR&JC 04/03/14
Comparison of Socket Address
Structures
108 It Dept, RVR&JC 04/03/14
Value-Result Arguments
A socket address structure is passed to any socket function, it
is always passed by reference. That is, a pointer to the
structure is passed. The length of the structure is also passed
as an argument. But the way in which the length is passed
depends on which direction the structure is being passed: from
the process to the kernel, or vice versa.
Three functions, bind, connect, and sendto, pass a socket
address structure from the process to the kernel. One
argument to these three functions is the pointer to the socket
address structure and another argument is the integer size of
the structure, as in
struct sockaddr_in serv;
/* fill in serv{} */
connect (sockfd, (SA *) &serv, sizeof(serv));
• the kernel is passed both the pointer and the size of what the
pointer points to, it knows exactly how much data to copy from
109
the process
It Dept, RVR&JCinto the kernel. 04/03/14
Four functions, accept, recvfrom, getsockname, and
getpeername, pass a socket address structure from the
kernel to the process, the reverse direction from the previous
scenario. Two of the arguments to these four functions are the
pointer to the socket address structure along with a pointer to
an integer containing the size of the [Link] in
struct sockaddr_un cli; /* Unix domain */
socklen_t len;
len = sizeof(cli); /* len is a value */
getpeername(unixfd, (SA *) &cli, &len); /* len may have
changed */
The reason that the size changes from an integer to be a
pointer to an integer is because the size is both a value when
the function is called (it tells the kernel the size of the structure
so that the kernel does not write past the end of the structure
when filling it in) and a result when the function returns (it tells
110 Itthe process
Dept, RVR&JC how much information the kernel actually stored in
04/03/14
the structure). This type of argument is called a value-result
Socket address structure pass.
Bind, connect, sendto Accept, recvfrom, getsockname,
getpeername
111 It Dept, RVR&JC 04/03/14
Byte Ordering Functions
Consider a 16-bit integer that is made up of 2 bytes. There are two
ways to store the two bytes in memory: with the low-order byte at the
starting address, known as little-endian byte order, or with the high-
order byte at the starting address, known as big-endian byte order.
Little-endian byte order and big-endian byte order for a 16-bit integer.
In this figure, we show increasing memory addresses going from right to left in the top, and
from left to right in the bottom. We also show the most significant bit (MSB) as the leftmost bit
of the 16-bit value and the least significant bit (LSB) as the rightmost bit.
112 It Dept, RVR&JC 04/03/14
Summary of address conversion
functions.
113 It Dept, RVR&JC 04/03/14
sock_ntop and Related
Functions
A basic problem with inet_ntop is that it requires the
caller to pass a pointer to a binary address.
This address is normally contained in a socket address
structure, requiring the caller to know the format of the
structure and the address family.
That is, to use it, we must write code of the form
struct sockaddr_in addr;
inet_ntop(AF_INET, &addr.sin_addr, str, sizeof(str)); //
for IPv4,
or
struct sockaddr_in6 addr6;
inet_ntop(AF_INET6, &addr6.sin6_addr, str, sizeof(str));
114 It Dept, RVR&JC 04/03/14
To solve this, write function named sock_ntop that takes a
pointer to a socket address structure, looks inside the structure, and
calls the appropriate function to return the presentation format of the
address.
sockaddr points to a socket address structure whose length is
addrlen. The function uses its own static buffer to hold the result
and a pointer to this buffer is the return value.
115 It Dept, RVR&JC 04/03/14
readn, writen, and readline
Functions
Stream sockets (e.g., TCP sockets) exhibit a
behavior with the read and write functions that
differs from normal file I/O.
A read or write on a stream socket might input
or output fewer bytes than requested, but this
is not an error condition. The reason is that
buffer limits might be reached for the socket in
the kernel.
the following three functions that we use
whenever we read from or write to a stream
socket:
116 It Dept, RVR&JC 04/03/14
ssize_t /* Read "n" bytes from a descriptor. */
readn(int fd, void *vptr, size_t n)
{
size_t nleft;
ssize_t nread;
char *ptr;
ptr = vptr;
nleft = n;
while (nleft > 0)
{
if ( (nread = read(fd, ptr, nleft)) < 0)
{
if (errno == EINTR)
nread = 0; /* and call read() again */
else
return (-1);
}
else if (nread == 0)
break; /* EOF */
nleft -= nread;
ptr += nread;
}
return (n - nleft); /* return >= 0 */
}
ssize_t /* Write "n" bytes to a descriptor. */
writen(int fd, const void *vptr, size_t n)
{
size_t nleft;
ssize_t nwritten;
const char *ptr;
ptr = vptr;
nleft = n;
while (nleft > 0)
{
if ( (nwritten = write(fd, ptr, nleft)) <= 0)
{
if (nwritten < 0 && errno == EINTR)
nwritten = 0; /* and call write() again */
else
return (-1); /* error */
}
nleft -= nwritten;
ptr += nwritten;
}
return (n);
}
ssize_t readline(int fd, void *vptr, size_t maxlen)
{
ssize_t n, rc;
char c, *ptr;
ptr = vptr;
for (n = 1; n < maxlen; n++)
{
again:
if ( (rc = read(fd, &c, 1)) == 1)
{
*ptr++ = c;
if (c == '\n')
break; /* newline is stored, like fgets() */
}
else if (rc == 0)
{
*ptr = 0;
return (n - 1); /* EOF, n - 1 bytes were read */
}
else
{
if (errno == EINTR)
goto again;
return (-1); /* error, errno set by read() */
}
}
*ptr = 0; /* null terminate like fgets() */
return (n);
}