You are on page 1of 33

Inter process communication (IPC) between processes on different hosts over the network.

IPC has Two Forms :


Local IPC Network IPC

1 of 16

2 of 16

Local IPC
Communication between local processes (on same host) PIPE FIFO System V IPC
Message queues Semaphores Shared Memory

Network IPC
Communication between processes on different host socket

3 of 16

4 of 16

Pipe
FIFO

Message Queues
Shared Memory

Semaphores
Sockets

5 of 16

Client / Server
Client
Communication link

Server

Figure 1.1 Network application : client and server

Client
... Client ... Client
Figure 1.2 Server handling multiple clients at the same time.
UNIX Network Programming 6

Server

Example : Client and Server on the same Ethernet communication using TCP
User proces s

Web Client

Application protocol TCP protocol

Web server

Application layer transport layer

TCP
Protocol stack within kernel

TCP
IP

IP

IP protocol Ethernet protocol


Actual flow between client and server

network layer

Ethernet driver

Ethernet driver

datalink layer

Ethernet Figure 1.3 Client and server on the same Ethernet communicating using TCP
UNIX Network Programming 7

Example : Client and Server on different LANs connected through WAN.


client application Host with TCP/IP LAN router WAN router router router router router server application Host with TCP/IP LAN

Figure 1.4 Client and server on different LANs connected through a WAN
UNIX Network Programming 8

7 Application 6 Presentation 5 Session 4 Transport 3 Network 2 Datalink 1 Physical

Application TCP | | UDP IPv4, IPv6 Device driver and Hardware Internet protocol suite Sockets XTI

application details user process

kernel communication details

OSI Model

Figure 1.14 Layers on OSI model and Internet protocol suite

First, the upper three layers handle all the details of the application and The lower four layers handle all the communication details. Second, the upper three layers is called a user process while the lower four layers are provided as part of the operating system kernel.

UNIX Network Programming

POSIX
POSIX is an acronym for Portable Operating System Interface. POSIX is not a single standard, but a family of standards being developed by the Institute for Electrical and Electronics Engineers, Inc., normally called the IEEE. The POSIX standards have also been adopted as international standards by ISO and the International Electro technical Commission (IEC), called ISO/IEC.

Open group
The Open Group was formed in 1996 by the consolidation of the X/Open Company and the Open Software Foundation. It is an international association of vendors and end-user customers from industry, government, and academia.

IETF
The Internet Engineering Task Force (IETF) is a large, open, international community of network designers, operators, vendors, and researchers concerned with the evolution of the Internet architecture and the smooth operation of the Internet.
It is open to any interested individual

TCP provides connections between clients and servers (connection oriented protocol).

TCP also provides reliability.


TCP contains algorithms to estimate the round-trip time (RTT) between a client and server dynamically so that it knows how long to wait for an acknowledgment. TCP also sequences the data by associating a sequence number with every byte that it sends. TCP provides flow control. TCP always tells its peer exactly how many bytes of data it is willing to accept from the peer at any one time. TCP connection is full-duplex.

UDP provides a connectionless service, as there need not be any longterm relationship between a UDP client and server.
UDP provides no flow control. UDP supports multicasting. UDP is a simple transport-layer protocol. The application writes a message to a UDP socket, which is then encapsulated in a UDP datagram, which is then further encapsulated as an IP datagram, which is then sent to its destination. There is no guarantee that a UDP datagram will ever reach its final destination, that order will be preserved across the network, or that datagram arrive only once.

TCP

UDP

Binding between Yes (connection- No (connectionclient and server oriented) less) Data Byte-stream Record Reliability Sequencing Flow control Full-duplex Yes (ack, timeout, retx) Yes Yes (windowbased) Yes No No No Yes

13

The server must be prepared to accept an incoming connection. (By calling socket, bind, and listen )and is called a passive open.
The client issues an active open by calling connect. This causes the client TCP to send a "synchronize" (SYN) segment, which tells the server the client's initial sequence number for the data that the client will send on the connection. The server must acknowledge (ACK) the client's SYN and the server must also send its own SYN containing the initial sequence number for the data that the server will send on the connection. The server sends its SYN and the ACK of the client's SYN in a single segment. The client must acknowledge the server's SYN.

TCP Connection: Establishment


Three-way handshake
client
socket SYN j

server
socket,bind,listen LISTEN(passive open) accept (blocks) SYN_RCVD

(blocks) (active open) SYN_SENT SYN k, ack j+1 ESTABLISHED connect returns ack k+1

connect

ESTABLISHED accept returns read (blocks)

TCP options (in SYN): MSS (maximum segment size) option, window scale option (advertized window up to 65535x2^14, 1GB), timestamp option (the latter two: long fat pipe options)
15

One application calls close first, and we say that this end performs the active close. This end's TCP sends a FIN segment, which means it is finished sending data.
The other end that receives the FIN performs the passive close. The received FIN is acknowledged by TCP. The receipt of the FIN is also passed to the application as an end-of-file, since the receipt of the FIN means the application will not receive any additional data on the connection.

Sometime later, the application that received the end-of-file will close its socket. This causes its TCP to send a FIN.
The TCP on the system that receives this final FIN acknowledges the FIN.

TCP Connection: Termination


Four-way handshake
client
close (active close) FIN_WAIT_1 FIN m

server
CLOSE_WAIT (passive close) read returns 0 close LAST_ACK CLOSED

ack m+1
FIN n ack n+1

FIN_WAIT_2
TIME_WAIT 1~4 mins CLOSED

TIME_WAIT to allow old duplicate segment to expire for reliable termination (the end performing active close might have to retx the final ACK)
18

starting point

CLOSED

appl: passive open send: < nothing>

en op ive N ct Y :a :S pl a p en d s

: ecv

s en N; SY

d:

N ,A SY

CK T

LISTEN
passive open

S v: R rec

SYN_RCVD

recv: SYN

re se n d cv: AC :< K no thi ng >

send: SYN, ACK


simultaneous open

SYN_SENT
active open

CK ,A N S Y AC K v: : re c e n d s recv: FIN ESTABLISHED send: ACK

appl: close or timeout

CLOSE_WAIT

data transfer state

- State transition diagram

FIN_WAIT_1

e los l: c p F IN ap d: n se recv : FIN send: ACK

recv : close send: FIN


simultaneous close

CLOSING

LAST_ACK

recv: ACK send: < nothing>

:F se I recv : ACK nd N, A :A C send: < nothing> CK K


FIN_WAIT_2

re

cv

passive close

recv : ACK send: < nothing>

recv : FIN send: ACK

TIME_WAIT
2MSL timeout

active close

Figure 2.4 TCP state transition diagram


UNIX Network Programming 19

clien t
socket connect(blocks) (action open) SYN_SENT

server
S YN J , m s s = 146 0

socket,bind,listen LISTEN(passive open) accpet(blocks)

Watching the Packets

ESTABLISHED connect returns


<client forms request>

= 102 4 1 , m ss , a ck J + S YN K
a ck K + 1

write read(blocks)

data(req uest)

ESTABLISHED accept returns read(blocks) read returns


<server processes request>

read returns

ly) data (rep st k of reque ac

write read(blocks)

ack of re ply

close (active close) FIN_WAIT_1

FIN M
CLOSE_WAIT (passive close) read returns 0 close LAST_ACK

1 a ck M +
FIN_WAIT_2 TIME_WAIT

FIN N

a ck N + 1
CLOSED

F igure 2. 5 Packet exchange for TCP connection

UNIX Network Programming

20

The end that performs the active close is the end that remains in the TIME_WAIT state=>because that end is the one that might have to retransmit the final ACK.
The MSL is the maximum amount of time that any given IP datagram can live in a network. There are two reason for TIME_WAIT state
to implement TCPs full-duplex connection termination reliably to allow old duplicate segments to expire in the network

UNIX Network Programming

21

1. 2. 3.

TCP,UDP define a group of well known port to identify well known services. Clients normally use ephemeral ports, that is short lived ports. These port no are normally assigned automatically by the transport protocol to the client. IANA maintains list of port numbers assignments. Well-known ports: 0 to 1023controlled and assigned by IANA. Registered ports: 1024 to 49151. These are not controlled by IANA. Dynamic or private port:49152 to 65,535

UNIX Network Programming

23

The socket pair for a TCP connection is the four-tuple that defines the two endpoints of the connection:
local IP address, local port, foreign IP, Foreign port.

A socket pair uniquely identifies every TCP connection on a network. Two values that identify each endpoint, an IP address and a port number are often called a socket.

206.62.226.35 206.62.226.66
connection request to 206.62.226.35, port 21

198.69.10.2 client {198.69.10.2.1500, 206.62.226.35.21}

server
listening socket

(*.21, *.*)

Figure 2.8

Connection request from client to server

206.62.226.35 206.62.226.66

198.69.10.2 client {198.69.10.2.1500, 206.62.226.35.21}

server
listening socket

(*.21, *.*) fork server (child) {206.62.226.35.21, 198.69.10.2.1500} Figure 2.9


e nn co on cti

connected socket

Concurrent server has child handle client


UNIX Network Programming 25

206.62.226.35 206.62.226.66

198.69.10.2 client1
ion ect

server
listening socket

(*.21, *.*) fork server (child1) {206.62.226.35.21, 198.69.10.2.1500}


nn co

{198.69.10.2.1500, 206.62.226.35.21}

client2
ion ect

connected socket

nn co

{198.69.10.2.1500, 206.62.226.35.21}

connected socket

server (child2) {206.62.226.35.21, 198.69.10.2.1501}

Figure 2.10

Second client connection with same server

UNIX Network Programming

26

Maximum size of IPv4 => 65535 byte Maximum size of IPv6 => 65575 byte MTU(maximum transmit unit) => fragmentation The smallest MTU in the path between two hosts is called the path MTU Today, the Ethernet MTU of 1,500 bytes is often the path MTU. The path MTU need not be the same in both directions between any two hosts When an IP datagram is to be sent out an interface, if the size of the datagram exceeds the link MTU, fragmentation is performed by both IPv4 and IPv6.

UNIX Network Programming

27

The fragments are not normally reassembled until they reach the final destination. IPv4 hosts perform fragmentation on datagrams that they generate and IPv4 routers perform fragmentation on datagrams that they forward. But with IPv6, only hosts perform fragmentation on datagrams that they generate; IPv6 routers do not fragment datagrams that they are forwarding.

DF (dont fragment)
A router that receives an IPv4 datagram with the DF bit set whose size exceeds the outgoing link's MTU generates an ICMPv4 "destination unreachable, fragmentation needed but DF bit set" error message TCP decreases the amount of data it sends per datagram and retransmits. TCP has a maximum segment size (MSS) that announces to the peer TCP the maximum amount of TCP data that the peer can send per segment. The goal of the MSS is to tell the peer the actual value of the reassembly buffer size and to try to avoid fragmentation. The MSS is often set to the interface MTU minus the fixed sizes of the IP and TCP headers.

UNIX Network Programming

30

UNIX Network Programming

31

Notice: TCP and UDP port number is same.

UNIX Network Programming

32

UNIX Network Programming

33