indigoo.

com

TCP

TCP - Transmission Control Protocol

TRANSMISSION CONTROL
PROTOCOL
INTRODUCTION TO TCP, THE INTERNET'S
STANDARD TRANSPORT PROTOCOL

© Peter R. Egli 2015

Peter R. Egli
INDIGOO.COM1/51
Rev. 3.60

TCP - Transmission Control Protocol

indigoo.com

Contents
1. Transport layer and sockets
2. TCP (RFC793) overview
3. Transport Service Access Point (TSAP) addresses
4. TCP Connection Establishment
5. TCP Connection Release
6. TCP Flow Control
7. TCP Error Control
8. TCP Congestion Control RFC2001
9. TCP Persist Timer
10. TCP Keepalive Timer – TCP connection supervision
11. TCP Header Flags
12. TCP Header Options
13. TCP Throughput / performance considerations
14. A last word on „guaranteed delivery“

© Peter R. Egli 2015

2/51
Rev. 3.60

indigoo.com

TCP - Transmission Control Protocol
1. Transport layer and sockets

 The transport layer is made accessible to applications through the socket layer (API).
The transport layer runs in kernel space (Operating System) while application processes run
in user space.

OSI Layer > 4

OSI Layer 4

Upper Layers
Application
Socket = API
Transport Layer TCP/UDP

TSAPs
(sockets)

Data units:

Application
message
TCP Segment
IP Packet

OSI Layer 3

Network Layer IP

OSI Layer 2

Data Link Layer

DL Frames

OSI Layer 1

Physical Layer

Bits

© Peter R. Egli 2015

User Space
(application)

Kernel
Space (OS)

3/51
Rev. 3.60

indigoo.com

TCP - Transmission Control Protocol
2. TCP (RFC793) overview
 TCP is a byte stream oriented transmission protocol:
App

App
300
500

Socket
interface

150

600
Application
writes

Application
reads

TCP

210
140

TCP

Socket
interface

100
700

TCP Segments

150

N.B.: The size of application data chunks (data units passed over socket interface)
may be different on the sending and receiving side; the segments sent by TCP may
again have different sizes.
 TCP error control provides reliable transmission (packet order preservation, retransmissions
in case of transmission errors and packet loss).
 TCP uses flow control to maximize throughput and avoid packet loss.
 Congestion control mechanisms allow TCP to react and recover from network congestion.
© Peter R. Egli 2015

4/51
Rev. 3.60

indigoo.com

TCP - Transmission Control Protocol
3. Transport Service Access Point (TSAP) addresses (1/3)

 TSAPs are identified by 16bit port numbers. 65536 port numbers are available (0...65535, but
0 is never used).

IP Header

TCP Header

© Peter R. Egli 2015

5/51
Rev. 3.60

IP Dest.168.1.2 31452 31452 192.1 Server process TSAP 31452 (port number) NSAP 1208 6/51 Rev.60 .1 Network Layer IP 213.2 Data Link Data Link Layer Data Link Physical Link Physical Layer Physical Link Source IP 192. a TCP port can not be opened multiple times.168.4.Transmission Control Protocol 3.4.1 213.e.1.4.3.indigoo.3.3. i. Egli 2015 Source Port Dest. Port 1208 213. Transport Service Access Point (TSAP) addresses (2/3)  Each TSAP (TCP port) is bound to at most 1 socket.168.2 © Peter R.com TCP .1. 3. Application process TSAP 1208 (port number) NSAP Application Layer TCP Transport Layer TCP IP 192.

65535] SUN solaris 2.60 . 1023] [1024.g. POP3 110) [1024. 65535] BSD Reserved ports Ephemeral ports BSD servers [1. 65535] ICANN: Internet Corporation for Assigned Names and Numbers IANA: Internet Assigned Numbers Authority BSD: Berkeley Software Distribution Ephemeral means „short-lived“ (not permanently assigned to an application). © Peter R. Transport Service Access Point (TSAP) addresses (3/3)  There are different definitions for the port ranges: ICANN (former IANA) Well-known ports Registered ports Dynamic/private ports [1.2 Reserved ports Non-priviledged ports Ephemeral ports [1. e.indigoo. 1023] (Standard ports.com TCP . 5000] [5001. Egli 2015 7/51 Rev. 3.Transmission Control Protocol 3. 32767] [32768. 1023] [1024. 49151] [49152.

after 3-way handshake the connection is established. that is there is no heartbeat poll mechanism!  The Ack-number is the number of the next byte expected by the receiver. TCP Connection Establishment (1/4)  A TCP connection is established with 3 TCP packets (segments) going back and forth.  A connection can remain open for hours.com 4. even months without sending data. days. Egli 2015 8/51 Rev.  Host1 performs an „active open“ while host2 does a „passive open“. Each TCP peer can close its (outgoing) TCP half-duplex connection any time independently of the other.  3-way handshake (SYN. 3. The SYN occupies 1 number in the sequence number space (even though a SYN segment usually does not carry user data)! © Peter R. ACK) is used to synchronize sequence and acknowledge numbers.Transmission Control Protocol indigoo. SYN ACK.60 .  A connection consists of 2 independent half-duplex connections.TCP .

© Peter R. e. In peer-2-peer applications collisions are possible.com 4. BGP (Internet backbone routing) where BGP speakers establish TCP connections with each other. Usually Internet applications are client/server where clients do active open and servers passive open (thus no connection establishment collisions possible).TCP . 3.g. Egli 2015 9/51 Rev. TCP Connection Establishment (2/4)  In case of a call collision only 1 connection is created.60 .Transmission Control Protocol indigoo.

Host 1 Host 2 SYN (SEQ=x) RST Demo: Telnet connection to closed port (telnet <server IP> 12345).indigoo.com TCP .60 . 3. TCP Connection Establishment (3/4)  If no server is listening on the addressed port number TCP rejects the connection and sends back a RST (reset) packet (TCP segment where RST bit = 1). © Peter R.Transmission Control Protocol 4. Egli 2015 10/51 Rev.

44.1.2.1. If only one of the addresses of this quadruple is different the quadruple identifies a different TCP connection.2.2 / 80 Connection 2: 208.2 / 80 © Peter R. 3. TCP Connection Establishment (4/4)  A TCP connection is identified by the quadruple source/destination IP address and source/destination port address.44.indigoo.4.1.6.2. Egli 2015 11/51 Rev.2 / 80 Connection 3: 17.2 Host 2 TCP port = 37659 17.3 / 3545 / 177.44.4. Host 1 TCP port = 3544 208.4.4 / 37659 / 177.3 / 3544 / 177.6.5.1.2.4.3 Host 3 TCP port = 80 177.60 .4 Connection 1: 208.Transmission Control Protocol 4.com TCP .3 Host 1 TCP port = 3545 208.44.5.

 Connection closing is a 4-way handshake and not a 3-way handshake since the closing of the half-duplex connections is independent of each other.  FIN segments occupy 1 number in the sequence number space (as do SYN segments)! © Peter R.  Host1 does an active close while host2 a passive close. Egli 2015 12/51 Rev. Half-duplex connection Host2  Host1 still open.indigoo. Half-duplex connection Host1  Host2 closed. TCP Connection Release (1/2)  The 2 half-duplex connections are closed independently of each other.com TCP .  Half-close: only one half-duplex connection is closed (still traffic in the other direction).Transmission Control Protocol 5. FIN ACK  Each host closes “its” half-duplex connection independently of the other (that means the closing of the 2 unidirectional connections is unsynchronized). Host 1 Both half-duplex Host 2 connections established Application calls close() FIN FIN ACK Deliver EOF (End Of File) to application. FIN Application calls close() Both half-duplex connections closed.60 . 3.

The closing of the connection Host1 Host2 is the only way to tell Host2 that it can start executing the command. UNIX rsh command: FIN ACK of FIN Data ACK ACK ACK (Data) FIN ACK of FIN © Peter R. Egli 2015 #rsh <host> sort < datafile This command is executed remotely. 3. Host 2 connections established e. 13/51 Rev. TCP Connection Release (2/2)  Different scenarios as to how both half-duplex connections are closed: Normal 4-way Host 1 close: Both half-duplex Host 2 connections established FIN 3-way Host 1 close: Both half-duplex Host 2 connections established FIN FIN.60 .com TCP .indigoo.g. ACK ACK FIN ACK Simultaneous close: Host 1 ACK Both half-duplex Host 2 connections established FIN FIN Host 1 Halfclose: Both half-duplex Few applications use half-close. The command needs all input from Host1. The output of the command is sent back to Host1 through the still existing half-duplex connecion Host2  Host1.Transmission Control Protocol 5.

 The receiver‘s ack number is the number of the next byte it expects. TCP Flow Control (1/10)  Sliding window mechanism: ISN Initial Sequence Number  Unlike lock-step protocols. this implicitly acknowledges all previously received bytes.  The sequence and acknowledge numbers are per byte (not per segment/packet).  The receiver advertises the size of receive buffer.indigoo.60 .com TCP . Ack=X acknowledges all bytes up to and including X-1.Transmission Control Protocol 6. 3. Thus acks are cumulative. Egli 2015 14/51 Rev. Window size advertisment TCP Header TCP fields involved in flow control © Peter R. TCP allows data burst for maximizing throughput.

TCP . This is indicated through Ack=3001 which means that TCP ‚B‘ expects that the sequence number of the next data byte is 3001 or. 4 TCP ‚A‘ sends a chunk of 500 bytes. 3 TCP ‚A‘ sends a first chunk of 1500 bytes as one TCP segment. TCP Flow Control (2/10) 1 3-way handshake for connection establishment. Upon reception of the acknowledge TCP ‚A‘ flushes the Tx buffer. These 1500 bytes are stored in TCP ‚B‘s receive buffer. In case of packet loss TCP ‚A‘ has the data still in the Tx buffer for retransmissions. Only when the Rx buffer’s capacity falls below a threshold it is advisable to send a TCP segment merely updating the window size. the sequence number of the last data byte successfully received is 3000. These 500 bytes again are stored in TCP ‚B‘s receive buffer (along with the previous 1500 bytes). The 2KB data are deleted from host ‚B‘s Rx buffer. The data remains in the Tx buffer until its reception is acknowledged by TCP ‚B‘. The sequence number is Seq=2501 = previous sequence number + size of previous data chunk. Application ‚B‘ may have called ‚receive()‘ much earlier and only now TCP ‚B‘ (the socket interface) unblocked the call and returned the chunk of 2KB data. Note that the initial 2KB data are still in TCP ‚A‘s Tx buffer awaiting to be acknowledged by TCP ‚B‘. 7 TCP ‘B’ sends a pure window update segment to signal to TCP ‘A’ that the receive window size is now 10000 again (Rx buffer). Host ‚A‘ has a receive buffer for 6000 bytes and announces this with Win=6000. Host ‚B‘ has a receive buffer for 10000 bytes and announces this with Win=10000. 3.Transmission Control Protocol indigoo. acknowledge this next data and together with the acknowledge segment also signal the new receive window size.60 . 5 TCP ‚B‘ sends an acknowledge segment acknowledging successful reception of 2KB. 2 Application Process (AP) ‚A‘ writes 2KB into TCP. © Peter R. A real TCP implementation would wait for more data to arrive. Through corresponding socket calls (indicated by ‚socket(). Host ‚A‘ and ‚B‘ exchange and acknowledge each other the sequence numbers (ISN Initial Sequence Number).. Note that Seq=1001 = ISN+1.com 6. Egli 2015 15/51 Rev. Note that the SYNs occupy 1 number in the sequence number space thus the first data byte has Seq=ISN+1. 6 AP ‚B‘ reads out 2KB with 1 socket call. in other words. Note that a real TCP implementation would not send a window update if the Rx buffer still has reasonable free capacity. These 2KB are stored in the transmit buffer (Tx buffer).‘) host ‚A‘ and ‚B‘ open a TCP connection (host ‚A‘ performs an active open while host ‚B‘ listens for an incoming connection request).

Shortly after AP ‚B‘ reads 1KB from its socket interface leaving 500 byte of data in the Rx buffer. 14 TCP ‚A‘ sends a segment with 1000 bytes. even though not necessarily received by application ‚B‘.Transmission Control Protocol indigoo. Seq = 3001 = last sequence number + size of last data segment.com 6. These 1KB are immediately freed from the Rx buffer to make room for more data from TCP ‚A‘. 3. TCP ‚B‘ writes this data into its Rx buffer there joining the previous 500 byte. Win = 8500 = Rx buffer size – size of data in buffer. Egli 2015 16/51 Rev. thus these 1500 do not need to be kept in the Tx buffer for retransmissions and can be deleted). 12 TCP ‚B‘ sends a chunk of 500 bytes (Seq = 5501 = last sequence number + data size of last segment). 9 TCP ‚A‘ sends a chunk of 1500 bytes as one TCP segment. 13 TCP ‚A‘ sends a segment with 1000 bytes. TCP Flow Control (3/10) 8 AP ‚A‘ writes 4KB into TCP. 15 TCP ‚A‘ sends a segment with 1500 bytes. © Peter R. Shortly after AP ‚B‘ reads 2KB from its socket interface thus emptying the Rx buffer. TCP ‚B‘ writes this data into its Rx buffer there joining the previous 500 byte. Ack = 4501 = sequence number of last received segment + size of data.60 . 11 AP ‚A‘ writes another 2KB into TCP ‚A‘. TCP ‚B‘ writes this data into its Rx buffer there joining the previous 500 byte. Around the same time AP ‚B‘ reads 1KB of data from its socket. 10 TCP ‚B‘ sends a chunk of 1500 bytes as one TCP segment. Shortly after AP ‚B‘ reads 1KB from its socket interface leaving 500 byte of data in the Rx buffer.TCP . TCP ‚A‘ deletes the acknowledged 1500 bytes from the Tx buffer (they are successfully received by TCP ‚B‘. Seq = 4001 = ISN + 1 (since it is the first TCP segment with data). The window update in this segment indicates that the Rx buffer has room for 9500 bytes. Shortly after application ‚B‘ writes 2KB into TCP. Thereupon TCP ‚A‘ sends an acknowledgment segment with Ack= 4001 + sizes of last 2 received data segments.5KB of data. This Ack segments makes TCP ‚B‘ delete the 2KB in its Tx buffer since these are no longer needed for possible retransmissions. These 2 KB are stored in TCP ‚A‘s Tx buffer along with previous 2.

TCP ‚B‘ writes this data into its Rx buffer there joining the previous 1500 bytes. This leaves 2KB of unsent data in TCP ‚A‘s Tx buffer.60 . TCP ‚B‘ writes this data into its Rx buffer. 17 TCP ‚A‘ sends a segment with 1500 bytes. 20 AP ‚A‘ is finished with sending data and closes its socket (close()). PSH. ACK. 3. 19 TCP ‚B‘ sends an acknowledgment segment that acknowledges all data received from TCP ‚A‘. Since no data is in the Rx buffer the receive window size is at its maximum again (Win=10000). This causes TCP ‚A‘ to send a FIN segment in order to close the half-duplex connection ‚A‘‘B‘. TCP Flow Control (4/10) 16 TCP ‚B‘ sends an acknowledge segment acknowledging successful reception of all data so far received.TCP . Egli 2015 17/51 Rev. Thus the acknowledgment sent back by TCP ‚B‘ has Ack=previous sequence number in ‚A‘s FIN segment + 1. FIN) Seq = Sequence number Win = Window size Ack = Acknowledgement number Data = Size of application data (TCP payload) © Peter R. Legend: CTL = Control bits in TCP header (URG. RST. Around the same time AP ‚A‘ reads out 2KB from the Rx buffer. TCP ‚B‘ acknowledges this FIN and closes the connection ‚B‘‘A‘ with a FIN. Note well that FINs also occupy 1 number in the sequence number space.com 6.Transmission Control Protocol indigoo. 18 TCP ‚A‘ sends a last data segment with 500 bytes. SYN. Shortly after that AP ‚B‘ reads out 2KB and thus empties the Rx buffer.

5 Rx Buffer 2K 2 Rx Buffer 2K Tx Buffer send(2KB) Tx Buffer 18/51 Rev. ACK><Seq=4000><Ack=1001><Win=10000> 1 <CTL=ACK><Seq=1001><Ack=4001><Win=6000> 3 <CTL=-><Seq=1001><Win=6000><Data=1500B> Rx Buffer 2K Tx Buffer 4 Tx Buffer <CTL=-><Seq=2501><Win=6000><Data=500B> Rx Buffer Rx Buffer Tx Buffer Tx Buffer Rx Buffer 5 <CTL=ACK><Seq=4001><Ack=3001><Win=8000> Rx Buffer © Peter R.Transmission Control Protocol AP ‚A‘ TCP ‚A‘ socket().com TCP . TCP ‚B‘ AP ‚B‘ socket().60 ... 3..indigoo. <CTL=SYN><Seq=1000><Win=6000> <CTL=SYN.. Egli 2015 Rx Buffer 2K 2K Tx Buffer Tx Buffer 1.

5 2. 3.60 .5 4K Tx Buffer send(2KB) 19/51 Rev.5 © Peter R.indigoo.5 2K 9 <CTL=-><Seq=3001><Win=6000><Data=1500B> 1. Egli 2015 <CTL=ACK><Seq=4001><Ack=4501><Win=8500> <Data=1500B> Tx Buffer 2K Rx Buffer 10 1.com AP ‚A‘ 6 TCP ‚A‘ TCP ‚B‘ Tx Buffer Rx Buffer Rx Buffer Tx Buffer Tx Buffer Rx Buffer AP ‚B‘ receive(2KB) 2K TCP .Transmission Control Protocol <CTL=ACK><Seq=4001><Ack=3001><Win=10000> send(4KB) 8 Rx Buffer Tx Buffer Tx Buffer Rx Buffer 4K 7 Tx Buffer 2K Rx Buffer Rx Buffer Rx Buffer Tx Buffer Tx Buffer Rx Buffer 1.

5K AP ‚A‘ 1K TCP .Transmission Control Protocol © Peter R.5 11 AP ‚B‘ 2K send(2KB) 0.60 .5 4. Egli 2015 20/51 Rev.com TCP ‚A‘ TCP ‚B‘ Tx Buffer Rx Buffer Rx Buffer Tx Buffer Tx Buffer Rx Buffer Rx Buffer 2K 2K Rx Buffer 4.5K 2K Rx Buffer 15 receive(1KB) Tx Buffer Tx Buffer 14 Rx Buffer 2K 13 <CTL=-><Seq=4501><Win=4000><Data=1000B> Tx Buffer 1K 4.5 4.5K 12 receive(1KB) 2K 1.5K Tx Buffer <CTL=ACK><Seq=5501><Ack=6001><Win=4000> <CTL=-><Seq=5501><Win=4000><Data=1500B> Tx Buffer Rx Buffer receive(2KB) Tx Buffer 2K Rx Buffer 0.5 4. 3.5 2K Rx Buffer 0.5K Tx Buffer <CTL=-><Seq=5501><Win=9500><Data=500B> 0.indigoo.

Egli 2015 21/51 Rev.0 Tx Buffer 17 2.Transmission Control Protocol TCP ‚A‘ 2K 16 <CTL=ACK><Seq=6001><Ack=7001><Win=10000> receive(2KB) Rx Buffer 2. 3.60 .ACK><Seq=6001><Ack=9002><Win=10000> 20 <CTL=ACK><Seq=9002><Ack=6002><Win=6000> © Peter R.0 18 Rx Buffer Tx Buffer <CTL=-><Seq=7001><Win=6000><Data=1500B> Rx Buffer Tx Buffer AP ‚B‘ Rx Buffer Tx Buffer <CTL=-><Seq=8501><Win=6000><Data=500B> Rx Buffer Rx Buffer Tx Buffer Tx Buffer Rx Buffer receive(2KB) 2K 2.com TCP .indigoo.5 AP ‚A‘ <CTL=ACK><Seq=6001><Ack=9001><Win=10000> 19 Rx Buffer close() Tx Buffer <CTL=FIN><Seq=9001><Win=6000> close() <CTL=FIN.0 Tx Buffer TCP ‚B‘ 1.

Sliding window: sender can send (small) burst before waiting for Ack.com TCP .Transmission Control Protocol 6.indigoo. 3. Lock-step protocol: sender must wait for Ack before sending next data packet. Egli 2015 Ack Ack 22/51 Rev. TCP Flow Control (5/10)  Sliding window (TCP) versus lock-step protocol: 1.60 . 2. Lock-step protocol: Receiver Sender Data Sender blocked Sender blocked Sliding window protocol: Receiver Sender Data Buffer Data Ack Data Data Ack Ack Data Data Data Data Sender blocked © Peter R.

it is initialized to ISN+1. It is initialized to ISN+1 + advertised window. The upper window edge is incremented by the number in window field.60 . TCP Flow Control (6/10)  Sliding window mechanism: Window size.Transmission Control Protocol 6. acknowledgments and sequence numbers are byte based. 3. Egli 2015 23/51 Rev. not segment based. Send window Bytes 1 2 3 4 5 Bytes sent and acknowledged Sending application writes (to socket) 6 7 8 9 10 Bytes waiting Can be to be sent acknowledged anytime Sending application writes (to socket) Bytes waiting to be sent in send buffer Sending application writes (to socket) Time or sequence number © Peter R. The lower window edge is incremented as bytes are acknowledged.indigoo.com TCP .

Egli 2015 AP writes 24/51 Rev.indigoo. TCP sender and receiver „process“ on a host are totally independent of each other in terms of sequence and acknowledge numbers.60 .Transmission Control Protocol 6. TCP Flow Control (7/10)  TCP layer and application interworking: ACK and window size advertisments are „piggy-backed“ onto TCP segments. 3. AP ‘A’ TCP ‘A’ TCP ‘B’ AP ‘B’ AP writes Sender Process Receiver Process AP reads 8 7 6 Tx Buffer TCP connection = 2 half-duplex connections 6 5 Rx Buffer Receiver Process Sender Process 3 6 5 4 Rx Buffer Tx Buffer AP reads © Peter R.com TCP .

Transmission Control Protocol 6. TCP Flow Control (8/10)  Delayed acknowledgments for reducing number of segments: The receiver does not send Acks immediately after the receipt of an (error-free) packet but waits up to ~200ms/500ms if a packet is in the send buffer (depends on host OS). © Peter R.60 . 2 acks). If so it „piggybacks“ the Ack onto the transmit packet. 3. ack). echoed character. if no transmit packet is available the receiver sends an Ack latest after ~200ms/500ms.indigoo.com TCP . Without delayed acks: With delayed acks: Receiver Sender Data Receiver Sender Data Up to 500ms Ack Data + ACK Data Ack Demo: Telnet connection with Echo:  Without ‘delayed ack’ 4 segments per character (character.  With ‘delayed ack’ 2 segments (character. Egli 2015 25/51 Rev.

60 . © Peter R. Then the sender sends all so far buffered bytes in one segment. Egli 2015 26/51 Rev. This can save a substantial number of TCP segments (IP packets) when the user types fast and the network is slow. Without Nagle‘s algorithm: AP With Nagle‘s algorithm: Receiver Sender AP Receiver Sender Data Data Data Data Data RTT ACK Data Data RTT Round Trip Time (time between sending a packet and receiving the response). 3. TCP Flow Control (9/10)  Nagle‘s Algorithm for further optimization: In interactive applications (Telnet) where single bytes come in to the sender from the application sending 1 byte per TCP segment is inefficient (98% overhead).indigoo.Transmission Control Protocol 6.com TCP . When activated Nagle‘s algorithm sends only the first byte and buffers all subsequent bytes until the first byte has been acknowledged.

 The sender must not send „tynigrams“ (small segments). Solution: Clark‘s algorithm. Then the flow-control mechanism becomes inefficient and ruins TCP performance.com 6. Egli 2015 27/51 Rev. 3. half buffer size)). TCP segment (MSS) IP TCP Payload IP packet MSS: Maximum Segment Size © Peter R.Transmission Control Protocol indigoo.  The receiver only sends window updates when there is sufficient space in receive buffer (sufficient = min(MSS.60 .TCP . TCP Flow Control (10/10)  Silly window syndrome: This Problem occurs when the receiver reads out only small amounts of bytes from the receive buffer and the sender sends relatively fast (and in large chunks).

Win=4096 Seq=X ACK=X+512. Win=3584 Seq=X+1024 Seq=X+512 ACK=X+1536 © Peter R.Transmission Control Protocol indigoo.com 7. TCP Error Control (1/6)  Out of sequence segments: Out of sequence segments are not dropped but buffered for later use. 28/51 Rev. Egli 2015 The receiver does not throw away out-of-sequence segment but waits a little bit for the outstanding segment. Receiver Sender SYN SYN ACK ACK. 3. in order) received. the receiver acknowledges then all bytes correctly (error-free.60 .TCP .

Packet Loss. Retransmission timer expired.com TCP . TCP Error Control (2/6)  Lost packets cause retransmissions triggered by the expiry of the retransmission timer: Both half-duplex connections established Host 1 (1) 1 Host 2 (1) SEQ = X ACK=X+512 1 write (1) to RB 5 write (5) to RB 6 write (6) to RB MSS: Max.60 . segment Retransmission timer stopped (6) Retransmission timer expires 6 8 7 6 (6) (6) SEQ = X+2560 ACK=X+2560 network temporarily out of order (6) SEQ = X+2560 ACK=X+3072 8 7 Retransmission timer stopped (7) 8 7 (7) SEQ = X+3072 7 write (7) to RB size (=512 in example).Transmission Control Protocol 7. RB: Application’s receive buffer ACK=X+3584 © Peter R. 3. Retransmission timer stopped (not expired).indigoo. Egli 2015 29/51 Rev.

Egli 2015 30/51 Rev. segment size (=512 in example).com TCP . RB: Application’s receive buffer write (2) (3) (4) (5) to RB (6) SEQ = X+2560 6 © Peter R. Retransmission timer expired.Transmission Control Protocol 7.60 .indigoo. 3. TCP Error Control (3/6)  Lost packets / retransmissions. 3 acks („fast retransmit“): Both half-duplex connections established Host 1 (1) 1 Host 2 (1) SEQ = X 1 (2) Retransmission timer stopped 2 1 (3) 3 2 (4) 4 3 2 (5) 5 4 3 2 3rd ACK received Retransmission timers stopped (2) SEQ = X+512 ACK=X+512 (3) SEQ = X+1024 3 (4) SEQ = X+1536 ACK=X+512 4 3 (5) SEQ = X+2048 ACK=X+512 5 4 3 (2) SEQ = X+512 (retransmission) 2 ACK=X+2560 5 4 3 (6) 6 write (1) to RB MSS: Max. Retransmission timer stopped (not expired). Packet Loss.

Transmission Control Protocol 7.com TCP . RB: Application’s receive buffer Retransmission timer stopped ACK=X+2560 (6) 6 (6) SEQ = X+2560 ACK=X+3072 Retransmission timer expires (6) 6 (6) SEQ = X+2560 ACK=X+3072 6 discard (6).indigoo.60 . 3. return ACK  TCP does not specify how often a specific segment has to be retransmitted (in case of repeated packet loss of the same segment). lost ack: Both half-duplex connections established Host 1 (1) 1 Host 2 (1) SEQ = X ACK=X+512 1 write (1) to RB 5 write (5) to RB 6 write (6) to RB MSS: Max. Retransmission timer stopped (not expired). TCP Error Control (4/6)  Lost packets / retransmissions. Typical values are max. Egli 2015 31/51 Rev. Retransmission timer expired. © Peter R. segment size (=512 in example). 5 or 8 retransmissions of the same segment (if more retransmissions necessary connection is closed). Packet Loss.

also during lifetime of a TCP conn.com 7.α) * |RTT – M| M = observed RTT value for a specific ACK α = smoothing factor that determines how much weight is given to the old value (typ. 3. RTO = RTT + 4 * D where D is mean deviation as per Dnew=αDold + (1. Solution: Constantly adjust RTO based on measurement of RTT. For each segment that was sent start a (retransmission) timer. α=7/8) © Peter R.Transmission Control Protocol indigoo. b.TCP . Internet (multiple data links)  RTT varies considerably.60 . TCP Error Control (5/6)  Retransmission timer (RTO Retransmission Timeout) assessment: Problem: a. Egli 2015 32/51 Rev. Single data link  RTO = (2 + e)*RTT where e<<2 (RTT rather deterministic).

60 .com TCP . TCP Error Control (6/6)  TCP Header Checksum Calculation: TCP uses a 16-bit checksum for the detection of transmission errors (bit errors). TCP header and data © Peter R.indigoo. TCP header and data (the checksum field is initialized with zeroes before the checksum calculation). 3. By including the IP header in the checksum calculation TCP depends on IP. it may not run on anything other than IP (this is a violation of the layering principle!). The checksum field is the 16 bit one's complement of the one's complement sum of all 16 bit words in the pseudo IP header. Egli 2015 33/51 Rev.Transmission Control Protocol 7. Pseudo header Checksum calculated over pseudo header.

Router 2 © Peter R.Transmission Control Protocol 8.indigoo. 3.com TCP . Router 3 Packets dropped by router 3. Problem (b): A slow network feeding a high-capacity receiver (congestion in network). TCP Congestion Control RFC2001 (1/6)  Congestion control is a control (feedback control) mechanism to prevent congestion (congestion avoidance).60 . Router 1 Aggregate bandwidth exceeds available bandwidth of output link. Congestion can always occur (aggregate ingress Bandwidth exceeds egress bandwidth). Problem (a): A fast network feeding a low capacity receiver (congestion in receiver). Egli 2015 34/51 Rev.

 How to avoid congestion efficiently (ideas of Van Jacobson): a. This fact is used for the slow start algorithm (see below).com 8. moderately reacting to good news (no packet loss. If a sender receives an ACK packet it knows that a segment has been received and is no longer on the network (in transmission). Sender can detect packet loss and interpret it as network congestion (this is in most cases true. © Peter R. With the growth of the Internet router links became congested and thus buffers got full which resulted in packet loss. Thus it should start slowly with sending segments („slow start“ procedure). c.  Network stability is achieved through quickly reacting to bad news (packet loss).  This procedure is „self-clocking“. When there is no (more) congestion (no more packet loss) the sender can slowly increase the transmission rate. packet loss can however also be due to bit errors).Transmission Control Protocol indigoo. When a sender detects packet loss it should reduce the transmission rate quickly. all segments sent are acknowledged). 3. b. At the very beginning of a TCP session a sender does not know the maximum possible transmission rate yet.TCP . Packet loss lead to retransmissions which in turn aggravated the congestion problem.  It became necessary to avoid congestion in the first place (it’s always better to avoid a problem in the first place rather than cure it!). TCP Congestion Control RFC2001 (2/6) Prior to 1988 TCP did not define a congestion control mechanism. f.60 . TCP stacks just sent as many TCP segments as the receiver’s buffer could hold (based on advertise window). e. Egli 2015 35/51 Rev.

indigoo.60 .com TCP .Transmission Control Protocol 8. The max. 3. 1 2 3 4 5 Bytes sent and acknowledged © Peter R. Lightly loaded networks  flow of segments controlled by Ws. Egli 2015 6 7 8 9 10 Bytes waiting to Can be be acknowledged sent anytime Bytes waiting to be sent in send buffer 36/51 Rev. TCP Congestion Control RFC2001 (3/6)  Congestion control window: A congestion control window is used in addition to the flow control window (both windows work in parallel). Ws = Flow control window Wc = Congestion control window Heavily loaded networks  flow of segments controlled by Wc. Wc). number of segments that can be sent = min(Ws.

60 .com TCP . Constant phase: Wc remains constant.indigoo.Transmission Control Protocol 8. 3. Transmission number: 0  1 segment sent 1  2 segments sent (burst) 2  4 segments sent (burst) etc. TCP Congestion Control RFC2001 (4/6)  Normal congestion control window procedure: Wc segments or kbytes Slow start phase Congestion avoidance phase Constant phase (Wc fully open) 64 Slow start phase: Wc starts opening „slowly“ from 1 segment until threshold (SST) reached (exponential growth of Wc). Egli 2015 37/51 Rev. Slow start threshold 32 16 8 4 2 RTT 1 23 4 5 10 20 30 37 Wc remains constant Wc increases by 1 segment for each 32 ACKs received Wc increases by 1 segment for each ACK received (exponential growth) © Peter R. Congestion avoidance phase: Wc is increased by 1/Wc (linear growth) until next threshold reached.

3.com TCP .Transmission Control Protocol 8. TCP Congestion Control RFC2001 (5/6)  Light congestion (reception of 3 duplicate Acks): Wc segments/kbytes First set of 3 duplicate ACKs received On receipt of 3 duplicate ACKS Wc is reduced by half. 64 Second set of 3 duplicate ACKs received Fast Recovery Wc=34 Slow start threshold 32 Wc=17 16 8 4 2 RTT 1 23 4 5 © Peter R. Egli 2015 10 20 30 37 38/51 Rev.indigoo.60 .

From there slow-start restarts normally. TCP Congestion Control RFC2001 (6/6)  Heavy congestion (no reception of Acks): Wc segments/kbytes First RTO When retransmission timer expires Wc is immediately reset to 1.Transmission Control Protocol 8.indigoo.com TCP . Egli 2015 10 20 30 37 39/51 Rev. 64 Second RTO Slow start threshold 32 Third RTO 16 8 4 2 RTT 1 23 4 5 © Peter R.60 . 3.

TCP Persist Timer (1/2)  Problem: Possible deadlock situation in TCP flow control: The receiver buffer is full thus the receiver sends an ACK with Win=0 (window is closed).com TCP . 3. This segment however is lost. the receiver waits for data. Win=0 ACK=X+2048. Packet Loss Host 1 Both half-duplex connections established ACK=X+2048. Sender and receiver are now deadlocked: sender waits for the window to be re-opened.Transmission Control Protocol 9. The receiver is ready again to receive data and sends an ACK with Win>0 (window is re-opened). Egli 2015 40/51 Rev.indigoo. Win=512 Host 2 6 5 4 3 write (3) to RB 6 5 4 sender and receiver deadlocked © Peter R. Thereafter data is read from the receiver buffer to the application.60 .

Persist timer values are ascertained by a „exponential backoff algorithm“ that produces output values from 5s (min.60 . TCP Persist Timer (2/2)  Solution: The sender sends a probe segments containing 1 single byte of data to invoke the receiver to acknowledge previous bytes again.indigoo.) seconds. Egli 2015 41/51 Rev. 3.com TCP .) to 60 (max. Win=512 Persist timer expires Packet Loss Both half-duplex connections established Host 2 6 5 4 3 write (3) to RB 6 5 4 Window Probe SEQ=X+2048 (1 byte) ACK=X+2049. Win=2048 Persist timer stopped SEQ=X+2049 7 © Peter R.Transmission Control Protocol 9. Host 1 ACK=X+2048. Win=0 (1) ACK=X+2048.

com TCP . rearmed to 2h ACK=X 2 hours © Peter R.indigoo. Case 1: TCP connection is still alive (i. client is still alive and the connection open).Transmission Control Protocol 10. TCP Keepalive Timer – TCP connection supervision (1/2)  The keepalive timer is used to periodically check if TCP connections are still active.60 .e. 3. Egli 2015 42/51 Rev. Host 1 (Client) Both half-duplex connections established Host 2 (Server) Data Segment Server starts keepalive timer ACK=Y 2 hours Keep alive timer expires Keep Alive Probe No data SEQ=X-1 75s Keep alive timer stopped.

Transmission Control Protocol 10.60 . Host 1 (Client) Both half-duplex connections established Host 2 (Server) Data Segment ACK=Y Keep Alive Probe 1 No data SEQ=X-1 Keep Alive Probe 2 No data SEQ=X-1 Keep Alive Probe 3 No data SEQ=X-1 Keep Alive Probe 10 No data SEQ=X-1 © Peter R. 3. TCP Keepalive Timer – TCP connection supervision (2/2)  The keepalive timer is used to periodically check if TCP connections are still active. the connection is deleted (not closed since it is considered dead) 43/51 Rev. Egli 2015 2 hours Keep alive timer expires 75s Keep alive timer expires 75s Keep alive timer expires 75s Keep alive timer expires.indigoo. Case 2: The TCP connection dead.com TCP .

indigoo. Receiver: When receiving PUSH flag „pushes“ all buffered data to the application (no further Buffering).  Solution: With PUSH flag data can be expedited: Sender: Send data immediately without further buffering. This is unsuitable for interactive applications like X Windows. Receiver Sender Send(data. congestion control. 3.60 .PUSH) Data.com TCP . PUSH=1 Ack Write immediately to Application process (no buffering in TCP) Send(data.Transmission Control Protocol 11. TELNET where data should be sent as soon as possible.PUSH) Data. TCP Header Flags (1/2)  PUSH flag: Problem: Segment size and time of transmit are determined by TCP (flow control. Egli 2015 Write immediately to Application process (no buffering in TCP) 44/51 Rev. PUSH=1 Ack © Peter R. TCP does internal buffering).

URGENT) Data. the urgent pointer points to the first byte of urgent data. Receiver Sender Send(data) Send(data. 3.60 . Egli 2015 45/51 Rev.com TCP .indigoo. TCP Header Flags (2/2)  URGENT flag: This flag allows a sender to place urgent (=important) data into the send stream.Transmission Control Protocol 11. Seq. the receiver can then directly jump to the processing of that data.# = 2219 Seq.g. abort command). The receiving application process may then process urgent data (e.# = 2144 TCP header: URG=1 URGPTR=2219 Urgent data TCP segment © Peter R. example: Ctrl+C abort sequence after other data. URGENT=1 URGENT=1 indicates that the urgent pointer in the header is valid.

g.Transmission Control Protocol 12. 155Mbps line with 40ms delay  775Kbytes are underway in the transmission path but the maximum window size is restricted to 65535 bytes (16 bit field). TCP Header Options (1/3)  Window scale option: Problem: High-speed networks with long delay. Egli 2015 1=nop kind=3 len=3 shift count Window size field remains unchanged. long intercontinental fiberoptic line 155Mbps * 40ms = 775Kbytes © Peter R.g. window size = 65535 (16bit) Required window size to fully utilize link bandwidth = Blink * Delaylink e.indigoo. 3.com TCP . TCP Header Lower window edge incremented as bytes are acknowledged initialized to ISN+1 Upper window edge incremented by number in window field Send window initialized to ISN+1 + advertised window Byte stream Bytes sent and acknowledged Bytes waiting to Can be be acknowledged sent anytime Bytes waiting to be sent in send buffer Max. E.60 . but is scaled by window scale option E. Solution: With the window scaling option the window size can be increased (exponentially) thus making the window large enough to „saturate“ the transmission path.g. shift count=0:  Ws * 20 = Ws * 1 46/51 Rev.g. shift count=3:  Ws * 23 = Ws * 8 E. network (transmission path) then stores large amounts of data („long fat pipe“).

TS option ack T1 SEQ=X.indigoo. TCP Header Options (2/3) Timestamp option in TCP header  Timestamp option: Problem: RTO calculation would require a recomputation for each TCP 1=nop 1=nop kind=8 segment sent. TS=T1 T2 SEQ=X+512. TS=T4 3 © Peter R. On faster links this behavior leads to inaccurate RTO computations (and thus less-than-optimal flow control). Egli 2015 store TSrecent = T4 47/51 Rev. TS option request Option negotiation SYN ACK. The TCP timestamp option allows to accurately ascertain the RTO. TS=T1 T3 RTT = T3 – T1 T4 SEQ=X+1024. This is ok for slower links where timestamp echo reply the sliding window contains only few TCP segments.Transmission Control Protocol 12. TS=T2 1 2 1 store TSrecent = T1 write (1) (2) to RB ACK=X+1024.com TCP . Enable timestamp and window scale option on Windows: Registry: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Tcp1323Opts = 3 len=10 Receiver Sender SYN. Some TCP implementations recompute the RTO only once per timestamp value window (less processing power required). 3.60 .

X+511}. © Peter R.60 . 3. right edge of data block} ACK=X+512 SACK={X. {X+1024. The SACK does not change the meaning of the ACK field.Transmission Control Protocol 12.X+1535} (5) SEQ = X+512 (6) SEQ = X+1536 Selective retransmissions of missing segments. Egli 2015 48/51 Rev. SACK-permitted option 1=nop SYN ACK 1=nop kind=4 len=2 SACK option ACK kind=5 (1) SEQ = X (2) SEQ = X+512 ACK=X+512 len relative origin block size Packet Loss (3) SEQ = X+1024 (4) SEQ = X+1536 Receiver signals contiguous blocks of successfully received segments: {X. if the SACK option is not supported by the sender TCP will still function (less optimally though in case of retransmissions). TCP Header Options (3/3)  SACK permitted option (selective retransmissions): Receiver can request sender to retransmit specific segments (due to loss).X+511} = {left edge of data block. Host 1 Host 2 SACK-permitted option SYN.com TCP .indigoo.

© Peter R. In case of packet loss on a radio link the sender should try harder instead slowing down.g. TCP Throughput / performance considerations („goodput“) (1/2)  TCP is delay sensitive! max.60 . TCP as such is NOT suited for networks with long delays (e. TCP performs badly on radio links (high BER. On wired networks the sender should slow down in case of packet loss (caused by congestion) in order to alleviate the problem. Egli 2015 49/51 Rev. packet loss due to errors). Slowing down just further decreases the throughput.TCP . satellite and interplanetary links). How to handle TCP connection that spans a wired and a radio link?  „Split TCP“: Effectively 2 separate TCP connections interconnected by the base station passing TCP payload data between the connections. throughput = Ws / RTT [Padhye 98] This is not an exact formula and it is valid only for average RTT values. 3.  TCP is not independent of the underlying network (as should be the case in theory)! TCP was designed to run over wired networks (low Bit Error Rate BER. Example: TCP accelerator devices for satellite links. Each of the TCP connections is optimized for their respective use. packet loss mostly due to congestion). The maximum throughput is bound by the window size Ws and decreases with increased RTT (=delay).com 13.Transmission Control Protocol indigoo.

in a radio environment this is the wrong strategy). © Peter R. e. TCP Throughput / performance considerations („goodput“) (2/2) TCP adaptations for radio (satellite) links as per RFC2488. To overcome this use selective ACKs (SACK) option to selectively acknowledge segments.60 . Data link protocol: Use data link protocols that do not do retransmissions and flow control or make sure that retransmission mechanism and flow control of data link do not negatively affect TCP performance. 3. Larger segments allow the congestion window to be more rapidly increased (because slow start mechanism is segment and not byte based). The congestion recovery algorithm of TCP is time consuming since it only gradually recovers from packet loss.TCP .com 13. b. To overcome this use the window scaling option thus increasing the size of the sliding window. Path MTU discovery: Path MTU discovery enables TCP to use maximum sized segments without the cost of fragmentation/reassembly. Egli 2015 50/51 Rev. RFC3481: a. Selective ACKs (SACK): The TCP algorithms „Fast Retransmit“ and „Fast Recovery“ may considerably reduce TCP throughput (because after a packet loss TCP gingerly probes the network for available bandwidth. The solution is to use FEC to avoid false signals as much as possible.Transmission Control Protocol indigoo. On radio links (satellite. FEC means that the receiver can detect and correct bit errors in the data based on a FEC code. microwave) this is not the case – packet loss is primarily due to bit errors. Window scale option: Long delay in fast transmission links limits TCP throughput because large amounts of data are underway but TCPs sliding window only allows up to 65536 bytes to be „in the air“. d. Forward Error Correction: TCP assumes that packet loss is always due to network congestion and not due to bit errors. c.

6 p. App Packet Loss both half-duplex connections established Host 1 1 Host 2 (1) SEQ = X 1 App ACK=X+512 App 1 (1) SEQ = X 1 App ACK=X+512 App 1 (1) SEQ = X 1 ACK=X+512 App Data gets lost/corrupted on its way to the application process (AP). Egli 2015 51/51 Rev. not more) © Peter R. A last word on „guaranteed delivery“  TCP provides guaranteed transmission between 2 APs.com TCP .. The application starts a transaction to store the data into a database. TCP delivers data safely and error-free between APs (not less.60 . but the transaction fails.8). but then is overwritten (by some other process.Transmission Control Protocol 14. Data gets safely to the application’s storage. hardware failure . There are still zillions of reasons why things can fail! Thus it is still up to the application to cater for application error detection and error handling (see RFC793 chapter 2..). the app itself. 3.indigoo. That‘s it.

Sign up to vote on this title
UsefulNot useful