TIME-WAIT Hack

For High Performance Ephemeral Connection in

Linux TCP Stack
E A Faisal
eafaisal@nexoprima.com

$ whoami
Engku Ahmad Faisal



github.com/efaisal
twitter.com/efaisal
facebook.com/eafaisal
plus.google.com/u/0/+EAFaisal

Linux user since 1996/1997
Attempted to contribute to open source projects:
few accepted, most rejected ;-P

$ whoami
Worked with Nexo Prima Sdn Bhd
● Open Source Cloud Infrastructure

Virtualisation: oVirt/OpenStack
Storage: Gluster/Ceph

● High Availability & Scalability Infrastructure

Linux-based solutions

● System Performance Tuning & Profiling

Focusing on web-based application on Linux platform

TCP STATE MACHINE

TCP :: ACTIVE CLOSE
3-way
handshake

ESTABLISHED
close()/fin

CLOSING

FIN_WAIT_1
fin+a

ck/ac

ack/-

fin/ack

k

ack/-

TIME_WAIT

Active Close

fin/ack

FIN_WAIT_2
2MSL Timeout

CLOSED

TCP :: ACTIVE CLOSE
● By the initiator of close()
● TIME-WAIT & 2MSL are there for good reasons:

due to nature of Internet - packet lost, re-transmission, arrives late
to ensure the other end properly closed

● RFC 793 states 2MSL should be 4 minutes
● 2MSL:

MS Windows - 4 minutes
Linux - 1 minute (hard coded)

TIME-WAIT is good for TCP communication over the Internet

TCP :: PASSIVE CLOSE
3-way
handshake

ESTABLISHED
fin/ack

close
()/fin

LAST_ACK
ack/-

CLOSED

Passive Close

CLOSE_WAIT

TCP :: PASSIVE CLOSE
● By the receiver of close()
● CLOSE-WAIT

waits up to 60 seconds in Linux
configurable via tcp_fin_timeout

● WARNING!
Some resources on the Web wrongly informed their readers to tweak
tcp_fin_timeout to tune TIME-WAIT

WEB APPLICATION OF TODAY

SIMPLIFIED WEB APP STACK
Client
Load
Balancer
Web App

Cache

Database

MQ

REST
API

WEB APP STACK
● Supporting services for Web App layer typically use TCP as transport protocol
● Web App layer is both:

TCP server listening to connection from the client
TCP client connecting to various supporting services

● Consider a LAMP stack + memcached server


Each HTTP request, creates/opens a TCP connection to the memcached
At the end of the request, the connection is closed
OMG! Ephemeral connection!

If we have more supporting services (MQ, REST API, etc), there might be more open/close
operations for each request
HTTP is considered ephemeral by “nature”

IMPACT AND PROBLEMS

BUSY SERVER WITH EPHEMERAL CONNECTIONS
● Busy server, e.g. 1,000 HTTP requests/second
● Web App layer also open TCP connection to backend services at that rate or
more
● In 1 minute, we’re going to have thousands lingering TCP TIME_WAIT
● You can check using netstat or ss command
$ ss -nt state time-wait
$ netstat -tn | grep TIME_WAIT

PROBLEMS: CONNECTION TABLE SLOT
Connection in TIME-WAIT state hold a local port for 1 minute
Local port range is finite - 16-bit integer
In many distro, default to around 30,000
Can be changed: net.ipv4.ip_local_port_range
If local port range is exhausted, any connect() results in EADDRNOTAVAIL

PROBLEMS: ADDITIONAL MEMORY & CPU USAGE
● Memory Usage to Hold Socket Structure

Though not really significant but annoying enough

● Additional CPU Usage

Searching for free port uses CPU
Wasting CPU cycle to iteratively purge tons TIME_WAIT connections

EXISTING & POTENTIAL SOLUTIONS

SOLUTION 1: tcp_tw_reuse
From Linux doc:
“Allow to reuse TIME-WAIT sockets for new connections when it is safe from
protocol viewpoint. Default value is 0. It should not be changed without
advice/request of technical experts.”
Commonly recommended to be enabled
$ echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse
Dependent on another kernel param to be enabled: net.ipv4.tcp_timestamps
Does it really work?

SOLUTION 2: TIME-WAIT NEGOTIATION
Proposed by Theodore Faber, Joe Touch & Wei Yue from University of Southern
California in 1999
No code available, claimed have experimental code written for SunOS 4.1.3
Involves modifying TCP by adding a new TCP option called TW-Negotiate,
negotiated during the three-way handshake
Not a viable solution, simply a theoretical one

INTRODUCING LINUXTCPTW

LINUXTCPTW
Implementation of an old idea



Once discussed in kernel core dev mailinglist to make TIME-WAIT tunable
Rejected by kernel core dev - TIME-WAIT is there for good reasons
Easily abused to make TCP non-compliant to standard
Open source project to create patch set to the kernel for configurable TIMEWAIT
● Introduce a new kernel param - tcp_timewait_len
● A new entry in proc fs - /proc/sys/net/ipv4/tcp_timewait_len
● Able to use sysctl for configuration - net.ipv4.tcp_timewait_len

THE PROJECT
Project lives at https://github.com/efaisal/linuxtcptw/
Binary release available for CentOS 6 and 7 at https://github.
com/efaisal/linuxtcptw/releases
Unfortunately not battle tested in production environment yet - any volunteer?
Currently working on Ubuntu 14.04 LTS kernel

THANK YOU

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.