You are on page 1of 3

WhenanapplicationputsasocketintoLISTENstateusingthelistensyscall,itneedstospecifyabacklogforthatsocket.

Thebacklogis
usuallydescribedasthelimitforthequeueofincomingconnections.

Becauseofthe3wayhandshakeusedbyTCP,anincomingconnectiongoesthroughanintermediatestateSYNRECEIVEDbeforeitreaches
theESTABLISHEDstateandcanbereturnedbytheacceptsyscalltotheapplication(seethepartoftheTCPstatediagramreproduced
above).ThismeansthataTCP/IPstackhastwooptionstoimplementthebacklogqueueforasocketinLISTENstate:
1. Theimplementationusesasinglequeue,thesizeofwhichisdeterminedbythebacklogargumentofthelistensyscall.Whena
SYNpacketisreceived,itsendsbackaSYN/ACKpacketandaddstheconnectiontothequeue.WhenthecorrespondingACKis
received,theconnectionchangesitsstatetoESTABLISHEDandbecomeseligibleforhandovertotheapplication.Thismeansthatthe
queuecancontainconnectionsintwodifferentstate:SYNRECEIVEDandESTABLISHED.Onlyconnectionsinthelatterstatecan
bereturnedtotheapplicationbytheacceptsyscall.
2. Theimplementationusestwoqueues,aSYNqueue(orincompleteconnectionqueue)andanacceptqueue(orcompleteconnection
queue).ConnectionsinstateSYNRECEIVEDareaddedtotheSYNqueueandlatermovedtotheacceptqueuewhentheirstate
changestoESTABLISHED,i.e.whentheACKpacketinthe3wayhandshakeisreceived.Asthenameimplies,theacceptcallis
thenimplementedsimplytoconsumeconnectionsfromtheacceptqueue.Inthiscase,thebacklogargumentofthelistensyscall
determinesthesizeoftheacceptqueue.
Historically,BSDderivedTCPimplementationsusethefirstapproach.Thatchoiceimpliesthatwhenthemaximumbacklogisreached,the
systemwillnolongersendbackSYN/ACKpacketsinresponsetoSYNpackets.UsuallytheTCPimplementationwillsimplydroptheSYN
packet(insteadofrespondingwithaRSTpacket)sothattheclientwillretry.Thisiswhatisdescribedinsection14.5,listenBacklogQueue
inW.RichardStevensclassictextbookTCP/IPIllustrated,Volume3.
NotethatStevensactuallyexplainsthattheBSDimplementationdoesusetwoseparatequeues,buttheybehaveasasinglequeuewithafixed
maximumsizedeterminedby(butnotnecessaryexactlyequalto)thebacklogargument,i.e.BSDlogicallybehavesasdescribedinoption1:
Thequeuelimitappliestothesumof[]thenumberofentriesontheincompleteconnectionqueue[]and[]thenumber
ofentriesonthecompletedconnectionqueue[].
OnLinux,thingsaredifferent,asmentionedinthemanpageofthelistensyscall:
ThebehaviorofthebacklogargumentonTCPsocketschangedwithLinux2.2.Nowitspecifiesthequeuelengthfor
completelyestablishedsocketswaitingtobeaccepted,insteadofthenumberofincompleteconnectionrequests.Themaximum
lengthofthequeueforincompletesocketscanbesetusing/proc/sys/net/ipv4/tcp_max_syn_backlog.
ThismeansthatcurrentLinuxversionsusethesecondoptionwithtwodistinctqueues:aSYNqueuewithasizespecifiedbyasystemwide
settingandanacceptqueuewithasizespecifiedbytheapplication.
Theinterestingquestionisnowhowsuchanimplementationbehavesiftheacceptqueueisfullandaconnectionneedstobemovedfromthe
SYNqueuetotheacceptqueue,i.e.whentheACKpacketofthe3wayhandshakeisreceived.Thiscaseishandledbythetcp_check_req
functioninnet/ipv4/tcp_minisocks.c.Therelevantcodereads:

child=inet_csk(sk)>icsk_af_ops>syn_recv_sock(sk,skb,req,NULL);
if(child==NULL)
gotolisten_overflow;
ForIPv4,thefirstlineofcodewillactuallycalltcp_v4_syn_recv_sockinnet/ipv4/tcp_ipv4.c,whichcontainsthefollowingcode:
if(sk_acceptq_is_full(sk))
gotoexit_overflow;
Weseeherethecheckfortheacceptqueue.Thecodeaftertheexit_overflowlabelwillperformsomecleanup,updatethe
ListenOverflowsandListenDropsstatisticsin/proc/net/netstatandthenreturnNULL.Thiswilltriggertheexecutionofthe
listen_overflowcodeintcp_check_req:
listen_overflow:
if(!sysctl_tcp_abort_on_overflow){
inet_rsk(req)>acked=1;
returnNULL;
}
Thismeansthatunless/proc/sys/net/ipv4/tcp_abort_on_overflowissetto1(inwhichcasethecoderightafterthecodeshown
abovewillsendaRSTpacket),theimplementationbasicallydoesnothing!
Tosummarize,iftheTCPimplementationinLinuxreceivestheACKpacketofthe3wayhandshakeandtheacceptqueueisfull,itwill
basicallyignorethatpacket.Atfirst,thissoundsstrange,butrememberthatthereisatimerassociatedwiththeSYNRECEIVEDstate:ifthe
ACKpacketisnotreceived(orifitisignored,asinthecaseconsideredhere),thentheTCPimplementationwillresendtheSYN/ACKpacket
(withacertainnumberofretriesspecifiedby/proc/sys/net/ipv4/tcp_synack_retriesandusinganexponentialbackoffalgorithm).
Thiscanbeseeninthefollowingpackettraceforaclientattemptingtoconnect(andsenddata)toasocketthathasreacheditsmaximum
backlog:
0.000127.0.0.1>127.0.0.1TCP7453302>9999[SYN]Seq=0Len=0
0.000127.0.0.1>127.0.0.1TCP749999>53302[SYN,ACK]Seq=0Ack=1Len=0
0.000127.0.0.1>127.0.0.1TCP6653302>9999[ACK]Seq=1Ack=1Len=0
0.000127.0.0.1>127.0.0.1TCP7153302>9999[PSH,ACK]Seq=1Ack=1Len=5
0.207127.0.0.1>127.0.0.1TCP71[TCPRetransmission]53302>9999[PSH,ACK]Seq=1Ack=1Len=5
0.623127.0.0.1>127.0.0.1TCP71[TCPRetransmission]53302>9999[PSH,ACK]Seq=1Ack=1Len=5
1.199127.0.0.1>127.0.0.1TCP749999>53302[SYN,ACK]Seq=0Ack=1Len=0
1.199127.0.0.1>127.0.0.1TCP66[TCPDupACK6#1]53302>9999[ACK]Seq=6Ack=1Len=0
1.455127.0.0.1>127.0.0.1TCP71[TCPRetransmission]53302>9999[PSH,ACK]Seq=1Ack=1Len=5
3.123127.0.0.1>127.0.0.1TCP71[TCPRetransmission]53302>9999[PSH,ACK]Seq=1Ack=1Len=5
3.399127.0.0.1>127.0.0.1TCP749999>53302[SYN,ACK]Seq=0Ack=1Len=0
3.399127.0.0.1>127.0.0.1TCP66[TCPDupACK10#1]53302>9999[ACK]Seq=6Ack=1Len=0
6.459127.0.0.1>127.0.0.1TCP71[TCPRetransmission]53302>9999[PSH,ACK]Seq=1Ack=1Len=5
7.599127.0.0.1>127.0.0.1TCP749999>53302[SYN,ACK]Seq=0Ack=1Len=0
7.599127.0.0.1>127.0.0.1TCP66[TCPDupACK13#1]53302>9999[ACK]Seq=6Ack=1Len=0
13.131127.0.0.1>127.0.0.1TCP71[TCPRetransmission]53302>9999[PSH,ACK]Seq=1Ack=1Len=5
15.599127.0.0.1>127.0.0.1TCP749999>53302[SYN,ACK]Seq=0Ack=1Len=0
15.599127.0.0.1>127.0.0.1TCP66[TCPDupACK16#1]53302>9999[ACK]Seq=6Ack=1Len=0
26.491127.0.0.1>127.0.0.1TCP71[TCPRetransmission]53302>9999[PSH,ACK]Seq=1Ack=1Len=5
31.599127.0.0.1>127.0.0.1TCP749999>53302[SYN,ACK]Seq=0Ack=1Len=0
31.599127.0.0.1>127.0.0.1TCP66[TCPDupACK19#1]53302>9999[ACK]Seq=6Ack=1Len=0
53.179127.0.0.1>127.0.0.1TCP71[TCPRetransmission]53302>9999[PSH,ACK]Seq=1Ack=1Len=5
106.491127.0.0.1>127.0.0.1TCP71[TCPRetransmission]53302>9999[PSH,ACK]Seq=1Ack=1Len=5
106.491127.0.0.1>127.0.0.1TCP549999>53302[RST]Seq=1Len=0
SincetheTCPimplementationontheclientsidegetsmultipleSYN/ACKpackets,itwillassumethattheACKpacketwaslostandresendit
(seethelineswithTCPDupACKintheabovetrace).Iftheapplicationontheserversidereducesthebacklog(i.e.consumesanentryfromthe
acceptqueue)beforethemaximumnumberofSYN/ACKretrieshasbeenreached,thentheTCPimplementationwilleventuallyprocessoneof
theduplicateACKs,transitionthestateoftheconnectionfromSYNRECEIVEDtoESTABLISHEDandaddittotheacceptqueue.
Otherwise,theclientwilleventuallygetaRSTpacket(asinthesampleshownabove).
Thepackettracealsoshowsanotherinterestingaspectofthisbehavior.Fromthepointofviewoftheclient,theconnectionwillbeinstate
ESTABLISHEDafterreceptionofthefirstSYN/ACK.Ifitsendsdata(withoutwaitingfordatafromtheserverfirst),thenthatdatawillbe
retransmittedaswell.FortunatelyTCPslowstartshouldlimitthenumberofsegmentssentduringthisphase.

Ontheotherhand,iftheclientfirstwaitsfordatafromtheserverandtheserverneverreducesthebacklog,thentheendresultisthatonthe
clientside,theconnectionisinstateESTABLISHED,whileontheserverside,theconnectionisconsideredCLOSED.Thismeansthatweend
upwithahalfopenconnection!
Thereisoneotheraspectthatwedidntdiscussyet.ThequotefromthelistenmanpagesuggeststhateverySYNpacketwouldresultinthe
additionofaconnectiontotheSYNqueue(unlessthatqueueisfull).Thatisnotexactlyhowthingswork.Thereasonisthefollowingcodein
thetcp_v4_conn_requestfunction(whichdoestheprocessingofSYNpackets)innet/ipv4/tcp_ipv4.c:
/*Acceptbacklogisfull.Ifwehavealreadyqueuedenough
*ofwarmentriesinsynqueue,droprequest.Itisbetterthan
*cloggingsynqueuewithopenreqswithexponentiallyincreasing
*timeout.
*/
if(sk_acceptq_is_full(sk)&&inet_csk_reqsk_queue_young(sk)>1){
NET_INC_STATS_BH(sock_net(sk),LINUX_MIB_LISTENOVERFLOWS);
gotodrop;
}
Whatthismeansisthatiftheacceptqueueisfull,thenthekernelwillimposealimitontherateatwhichSYNpacketsareaccepted.Iftoomany
SYNpacketsarereceived,someofthemwillbedropped.Inthiscase,itisuptotheclienttoretrysendingtheSYNpacketandweendupwith
thesamebehaviorasinBSDderivedimplementations.
Toconclude,letstrytoseewhythedesignchoicemadebyLinuxwouldbesuperiortothetraditionalBSDimplementation.Stevensmakesthe
followinginterestingpoint:
Thebacklogcanbereachedifthecompletedconnectionqueuefills(i.e.,theserverprocessortheserverhostissobusythat
theprocesscannotcallacceptfastenoughtotakethecompletedentriesoffthequeue)oriftheincompleteconnectionqueue
fills.ThelatteristheproblemthatHTTPserversface,whentheroundtriptimebetweentheclientandserverislong,
comparedtothearrivalrateofnewconnectionrequests,becauseanewSYNoccupiesanentryonthisqueueforoneround
triptime.[]
Thecompletedconnectionqueueisalmostalwaysemptybecausewhenanentryisplacedonthisqueue,theserverscallto
acceptreturns,andtheservertakesthecompletedconnectionoffthequeue.
ThesolutionsuggestedbyStevensissimplytoincreasethebacklog.Theproblemwiththisisthatitassumesthatanapplicationisexpectedto
tunethebacklognotonlytakingintoaccounthowitintentstoprocessnewlyestablishedincomingconnections,butalsoinfunctionoftraffic
characteristicssuchastheroundtriptime.TheimplementationinLinuxeffectivelyseparatesthesetwoconcerns:theapplicationisonly
responsiblefortuningthebacklogsuchthatitcancallacceptfastenoughtoavoidfillingtheacceptqueue)asystemadministratorcanthen
tune/proc/sys/net/ipv4/tcp_max_syn_backlogbasedontrafficcharacteristics.

You might also like