You are on page 1of 40

Transport Layer

(3.1 - 3.6)
Instructor: Mohammad Mamun Elahi
Office: 5th Floor
Email: mmelahi@cse.uiu.ac.bd
Class Location: Room # 111 Computer
Lectures: M W 11:20 pm (Sec B), 12-45 pm (Sec D) Networking: A
Notes derived from “Computer Networking: A Top Top Down
Down Approach”, Jim Kurose, Keith Ross, Approach
Addison-Wesley. 6th edition
Jim Kurose, Keith Ross
Slides are adapted from the companion web site of Addison-Wesley
March 2012
the book, as modified by Mohammad Mamun
Elahi.
Application Layer 2-1
Chapter 3: Transport Layer
our goals:
 understand principles  learn about Internet
behind transport transport layer protocols:
layer services:  UDP: connectionless
 multiplexing, transport
demultiplexing  TCP: connection-oriented
 reliable data transfer reliable transport
 flow control  TCP congestion control
 congestion control

Transport Layer 3-2


Chapter 3 outline
3.1 transport-layer 3.5 connection-oriented
services transport: TCP
3.2 multiplexing and  segment structure
demultiplexing  reliable data transfer
3.3 connectionless  flow control
transport: UDP  connection management
3.4 principles of reliable 3.6 principles of congestion
data transfer control
3.7 TCP congestion control

Transport Layer 3-3


Transport services and protocols
application
transport
 provide logical network
data link
communication between app physical

processes running on
different hosts

lo
gi
ca
transport protocols run in

l

en
d
end systems

-e
nd
 send side: breaks app

tra
ns
messages into segments,

po
rt
passes to network layer
application
 rcv side: reassembles transport
network
segments into messages, data link
physical
passes to app layer
 more than one transport
protocol available to apps
 Internet: TCP and UDP
Transport Layer 3-4
Transport vs. network layer
 network layer: household analogy:
logical
communication 12 kids in Ann’s house
sending letters to 12 kids in
between hosts Bill’s house:
 transport layer:  hosts = houses

logical  processes = kids

communication  app messages = letters in


envelopes
between processes  transport protocol = Ann
 relies on, enhances, and Bill who demux to in-
network layer house siblings
services  network-layer protocol =
postal service

Transport Layer 3-5


Internet transport-layer protocols
application
 reliable, in-order transport
network

delivery (TCP) data link


physical
network
 congestion control network data link

lo
data link physical

gi
physical
 flow control

ca
network

le
data link

nd
 connection setup physical

-en
network

d
unreliable, unordered

tra
 data link
physical

ns
delivery: UDP

po
network

rt
data link
physical
 no-frills extension of network
data link application

“ best-effort” IP physical
network
data link
transport
network
data link
services not available:
physical
 physical

 delay guarantees
 bandwidth guarantees

Transport Layer 3-6


UDP: User Datagram Protocol [RFC 768]
 “ no frills,” “ bare bones”  UDP use:
Internet transport protocol  streaming multimedia
 “ best effort” service, UDP apps (loss tolerant, rate
segments may be: sensitive)
 lost  DNS
 delivered out-of-order to  SNMP
app  reliable transfer over
 connectionless:
UDP:
 no handshaking between
UDP sender, receiver  add reliability at
application layer
 each UDP segment
handled independently  application-specific error
of others recovery!

Transport Layer 3-7


UDP: segment header
length, in bytes of
32 bits UDP segment,
source port # dest port # including header

length checksum
why is there a UDP?
 no connection
application establishment (which can
data add delay)
(payload)  simple: no connection state
at sender, receiver
 small header size
 no congestion control:
UDP segment format
UDP can blast away as fast
as desired

Transport Layer 3-8


Principles of reliable data transfer
 important in application, transport, link layers
 top-10 list of important networking topics!

 characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Transport Layer 3-9


Pipelined protocols
pipelining: sender allows multiple, “ in-flight” , yet-
to-be-acknowledged pkts
 range of sequence numbers must be increased
 buffering at sender and/or receiver

 two generic forms of pipelined protocols: go-Back-N,


selective repeat
Transport Layer 3-10
Pipelined protocols: overview
Go-back-N: Selective Repeat:
 sender can have up to N  sender can have up to N
unacked packets in unack’ed packets in
pipeline pipeline
 receiver only sends  rcvr sends individual ack
cumulative ack for each packet
 doesn’t ack packet if
there’s a gap
 sender has timer for  sender maintains timer for
oldest unacked packet each unacked packet
 when timer expires,  when timer expires,
retransmit all unacked retransmit only that
packets unacked packet

Transport Layer 3-11


GBN in action
sender window (N=4) sender receiver
012345678 send pkt0
012345678 send pkt1
send pkt2 receive pkt0, send ack0
012345678
send pkt3 Xloss receive pkt1, send ack1
012345678
(wait)
receive pkt3, discard,
012345678 rcv ack0, send pkt4 (re)send ack1
012345678 rcv ack1, send pkt5 receive pkt4, discard,
(re)send ack1
ignore duplicate ACK receive pkt5, discard,
(re)send ack1
pkt 2 timeout
012345678 send pkt2
012345678 send pkt3
012345678 send pkt4 rcv pkt2, deliver, send ack2
012345678 send pkt5 rcv pkt3, deliver, send ack3
rcv pkt4, deliver, send ack4
rcv pkt5, deliver, send ack5

Transport Layer 3-12


Selective repeat
sender receiver
data from above: pkt n in [rcvbase, rcvbase+N-1]
 if next available seq # in  send ACK(n)
window, send pkt  out-of-order: buffer
timeout(n):  in-order: deliver (also
 resend pkt n, restart timer deliver buffered, in-order
pkts), advance window to
ACK(n) in [sendbase,sendbase+N]: next not-yet-received pkt
 mark pkt n as received
pkt n in [rcvbase-N,rcvbase-1]
 if n smallest unACKed
 ACK(n)
pkt, advance window base
to next unACKed seq # otherwise:
 ignore

Transport Layer 3-13


Selective repeat in action
sender window (N=4) sender receiver
012345678 send pkt0
012345678 send pkt1
send pkt2 receive pkt0, send ack0
012345678
send pkt3 Xloss receive pkt1, send ack1
012345678
(wait)
receive pkt3, buffer,
012345678 rcv ack0, send pkt4 send ack3
012345678 rcv ack1, send pkt5 receive pkt4, buffer,
send ack4
record ack3 arrived receive pkt5, buffer,
send ack5
pkt 2 timeout
012345678 send pkt2
012345678 record ack4 arrived
012345678 rcv pkt2; deliver pkt2,
record ack4 arrived
012345678 pkt3, pkt4, pkt5; send ack2

Q: what happens when ack2 arrives?

Transport Layer 3-14


sender window receiver window
Selective repeat: (after receipt) (after receipt)

dilemma 0123012 pkt0


pkt1 0123012
0123012
0123012 pkt2 0123012
example: 0123012
0123012 pkt3
 seq #’s: 0, 1, 2, 3 0123012
X
 window size=3 pkt0 will accept packet
with seq number 0
(a) no problem
 receiver sees no
difference in two receiver can’t see sender side.
scenarios! receiver behavior identical in both cases!
something’s (very) wrong!
 duplicate data
accepted as new in (b) 0123012 pkt0
0123012 pkt1 0123012
0123012 pkt2 0123012
Q: what relationship X 0123012
between seq # size X
timeout
and window size to retransmit pkt0 X
avoid problem in (b)? 0123012 pkt0
will accept packet
with seq number 0
(b) oops!
Transport Layer 3-15
TCP: Overview RFCs: 793,1122,1323, 2018, 2581

 point-to-point:  full duplex data:


 one sender, one receiver  bi-directional data flow
in same connection
 reliable, in-order byte
 MSS: maximum segment
steam: size
 no “ message boundaries

 connection-oriented:
 handshaking (exchange
 pipelined: of control msgs) inits
 TCP congestion and flow sender, receiver state
control set window size before data exchange
 flow controlled:
 sender will not
overwhelm receiver

Transport Layer 3-16


TCP segment structure
32 bits
URG: urgent data counting
(generally not used) source port # dest port #
by bytes
sequence number of data
ACK: ACK #
valid acknowledgement number (not segments!)
head not
PSH: push data now len used
UAP R S F receive window
(generally not used) # bytes
checksum Urg data pointer
rcvr willing
RST, SYN, FIN: to accept
options (variable length)
connection estab
(setup, teardown
commands)
application
Internet data
checksum (variable length)
(as in UDP)

Transport Layer 3-17


TCP seq. numbers, ACKs
outgoing segment from sender
sequence numbers: source port # dest port #
sequence number
 byte stream “ number” of acknowledgement number

first byte in segment’s checksum


rwnd
urg pointer
data window size
acknowledgements: N

 seq # of next byte


expected from other side sender sequence number space
 cumulative ACK
sent sent, not- usable not
Q: how receiver handles out- ACKed yet ACKed but not usable
of-order segments (“in-flight”) yet sent

 A: TCP spec doesn’t say, incoming segment to sender


- up to implementor source port # dest port #
sequence number
acknowledgement number
A rwnd
checksum urg pointer

Transport Layer 3-18


TCP seq. numbers, ACKs
Host A Host B

User
types
‘C’
Seq=42, ACK=79, data = ‘C’
host ACKs
receipt of
‘C’, echoes
Seq=79, ACK=43, data = ‘C’ back ‘C’
host ACKs
receipt
of echoed
‘C’ Seq=43, ACK=80

simple telnet scenario

Transport Layer 3-19


TCP round trip time, timeout
Q: how to set TCP Q: how to estimate RTT?
timeout value?  SampleRTT: measured
time from segment
 longer than RTT transmission until ACK
 but RTT varies receipt
 too short: premature  ignore retransmissions
timeout, unnecessary  SampleRTT will vary,
retransmissions want estimated RTT
“ smoother”
 too long: slow  average several recent
reaction to segment measurements, not just
loss current SampleRTT

Transport Layer 3-20


TCP reliable data transfer
 TCP creates rdt service
on top of IP’s unreliable
service
 pipelined segments
 cumulative acks let’s initially consider
 single retransmission simplified TCP sender:
timer  ignore duplicate acks
 retransmissions  ignore flow control,
triggered by: congestion control
 timeout events
 duplicate acks

Transport Layer 3-21


TCP sender events:
data rcvd from app: timeout:
 create segment with  retransmit segment that
seq # caused timeout
 seq # is byte-stream  restart timer

number of first data ack rcvd:


byte in segment  if ack acknowledges
 start timer if not previously unacked
already running segments
 think of timer as for  update what is known to
oldest unacked segment be ACKed
 expiration interval:  start timer if there are
TimeOutInterval still unacked segments

Transport Layer 3-22


TCP: retransmission scenarios
Host A Host B Host A Host B

SendBase=92
Seq=92, 8 bytes of data Seq=92, 8 bytes of data

Seq=100, 20 bytes of data


timeout

timeout
ACK=100
X
ACK=100
ACK=120

Seq=92, 8 bytes of data Seq=92, 8


SendBase=100 bytes of data
SendBase=120
ACK=100
ACK=120

SendBase=120

lost ACK scenario premature timeout


Transport Layer 3-23
TCP: retransmission scenarios
Host A Host B

Seq=92, 8 bytes of data

Seq=100, 20 bytes of data


ACK=100
timeout

X
ACK=120

Seq=120, 15 bytes of data

cumulative ACK
Transport Layer 3-24
TCP fast retransmit
 time-out period often
relatively long: TCP fast retransmit
 long delay before if sender receives 3
resending lost packet ACKs for same data
 detect lost segments (“triple
(“ triple duplicate
duplicate ACKs” ),
via duplicate ACKs. ACKs”
resend),unacked
 sender often sends segment with smallest
many segments back- seq #
to-back  likely that unacked
 if segment is lost, there segment lost, so don’t
will likely be many wait for timeout
duplicate ACKs.

Transport Layer 3-25


TCP fast retransmit
Host A Host B

Seq=92, 8 bytes of data


Seq=100, 20 bytes of data
X

ACK=100
timeout

ACK=100
ACK=100
ACK=100
Seq=100, 20 bytes of data

fast retransmit after sender


receipt of triple duplicate ACK
Transport Layer 3-26
TCP flow control
application
application may process
remove data from application
TCP socket buffers ….
TCP socket OS
receiver buffers
… slower than TCP
receiver is delivering
(sender is sending) TCP
code

IP
flow control code
receiver controls sender, so
sender won’t overflow receiver’s
buffer by transmitting too much, from sender
too fast
receiver protocol stack

Transport Layer 3-27


TCP flow control
 receiver “ advertises” free
buffer space by including to application process
rwnd value in TCP header
of receiver-to-sender
segments RcvBuffer buffered data
 RcvBuffer size set via
socket options (typical default rwnd free buffer space
is 4096 bytes)
 many operating systems
autoadjust RcvBuffer TCP segment payloads
 sender limits amount of
unacked (“ in-flight” ) data receiver-side buffering
to receiver’s rwnd value
 guarantees receive buffer
will not overflow
Transport Layer 3-28
TCP 3-way handshake

client state server state


LISTEN LISTEN
choose init seq num, x
send TCP SYN msg
SYNSENT SYNbit=1, Seq=x
choose init seq num, y
send TCP SYNACK
msg, acking SYN SYN RCVD
SYNbit=1, Seq=y
ACKbit=1; ACKnum=x+1
received SYNACK(x)
ESTAB indicates server is live;
send ACK for SYNACK;
this segment may contain ACKbit=1, ACKnum=y+1
client-to-server data
received ACK(y)
indicates client is live
ESTAB

Transport Layer 3-29


TCP: closing a connection
 client, server each close their side of connection
 send TCP segment with FIN bit = 1
 respond to received FIN with ACK
 on receiving FIN, ACK can be combined with own FIN
 simultaneous FIN exchanges can be handled

Transport Layer 3-30


TCP: closing a connection
client state server state
ESTAB ESTAB
clientSocket.close()
FIN_WAIT_1 can no longer FINbit=1, seq=x
send but can
receive data CLOSE_WAIT
ACKbit=1; ACKnum=x+1
can still
FIN_WAIT_2 wait for server send data
close

LAST_ACK
FINbit=1, seq=y
TIMED_WAIT can no longer
send data
ACKbit=1; ACKnum=y+1
timed wait
for 2*max CLOSED
segment lifetime

CLOSED

Transport Layer 3-31


Principles of congestion control
congestion:
 informally: “ too many sources sending too much
data too fast for network to handle”
 different from flow control!
 manifestations:
 lost packets (buffer overflow at routers)
 long delays (queueing in router buffers)
 a top-10 problem!

Transport Layer 3-32


Approaches towards congestion control

two broad approaches towards congestion control:

end-end congestion network-assisted


control: congestion control:
 no explicit feedback  routers provide feedback
from network to end systems
 congestion inferred  single bit indicating
from end-system congestion (SNA,
observed loss, delay DECbit, TCP/IP ECN,
 approach taken by ATM)
TCP  explicit rate for sender
to send at

Transport Layer 3-33


TCP congestion control: additive increase
multiplicative decrease
 approach: sender increases transmission rate (window
size), probing for usable bandwidth, until loss occurs
 additive increase: increase cwnd by 1 MSS every
RTT until loss detected
 multiplicative decrease: cut cwnd in half after loss
additively increase window size …
…. until loss occurs (then cut window in half)
congestion window size
cwnd: TCP sender

AIMD saw tooth


behavior: probing
for bandwidth

time
Transport Layer 3-34
TCP Congestion Control: details
sender sequence number space
cwnd TCP sending rate:
 roughly: send cwnd
bytes, wait RTT for
last byte last byte ACKS, then send
sent, not-
ACKed
yet ACKed
sent more bytes
(“in-flight”)
cwnd
 sender limits transmission: rate ~
~ bytes/sec
RTT
LastByteSent- < cwnd
LastByteAcked
 cwnd is dynamic, function of
perceived network congestion

Transport Layer 3-35


TCP Slow Start
Host A Host B
 when connection begins,
increase rate
exponentially until first one segm
ent

RTT
loss event:
 initially cwnd = 1 MSS two segm
ents
 double cwnd every RTT
 done by incrementing
cwnd for every ACK four segm
ents
received
 summary: initial rate is
slow but ramps up
exponentially fast time

Transport Layer 3-36


TCP: detecting, reacting to loss
 loss indicated by timeout:
 cwnd set to 1 MSS;
 window then grows exponentially (as in slow start) to threshold, then grows linearly
 loss indicated by 3 duplicate ACKs: TCP RENO
 dup ACKs indicate network capable of delivering some segments
 cwnd is cut in half window then grows linearly
 TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)

Transport Layer 3-37


TCP: switching from slow start to CA
Q: when should the
exponential increase
switch to linear?
A: when cwnd gets to
1/2 of its value before
timeout.

Implementation:
 variable ssthresh
 on loss event, ssthresh
is set to 1/2 of cwnd just
before loss event

Transport Layer 3-38


TCP throughput
 avg. TCP thruput as function of window size, RTT?
 ignore slow start, assume always data to send
 W: window size (measured in bytes) where loss occurs
 avg. window size (# in-flight bytes) is ¾ W
 avg. thruput is 3/4W per RTT
3 W
avg TCP thruput = bytes/sec
4 RTT

W/2

Transport Layer 3-39


TCP Futures: TCP over “ long, fat pipes”
 example: 1500 byte segments, 100ms RTT, want
10 Gbps throughput
 requires W = 83,333 in-flight segments
 throughput in terms of segment loss probability, L
[Mathis 1997]:
1.22 . MSS
TCP throughput =
RTT L

➜ to achieve 10 Gbps throughput, need a loss rate of L =


2·10-10 – a very small loss rate!
 new versions of TCP for high-speed

Transport Layer 3-40

You might also like