You are on page 1of 51

CHEETAH: A High-Speed Load-Balancer Design with

Guaranteed Per-Connection-Consistency
Tom Barbette, Chen Tang, Haoran Yao, Gerald Q. Maguire Jr, Dejan Kostic,
Panagiotis Papadimitratos & Marco Chiesa

LB
Load-balancer

LB

Datacenter

2020-03-09 2
Load-balancer

Per-Connection-
Consistency
(PCC)

LB

2020-03-09 3
Load-balancer

Per-Connection- !!!
Consistency
(PCC)

LB

2020-03-09 4
Load-balancer

Per-Connection- !!!
Consistency
(PCC)

LB

2020-03-09 5
Uniform Load
Load-balancer Balancing

Per-Connection-
Consistency
(PCC)

LB

2020-03-09 6
Uniform Load
Load-balancer Balancing

Per-Connection-
Consistency
(PCC)

Efficient LB

2020-03-09 7
Uniform Load
Load-balancer Balancing

Per-Connection-
Consistency
(PCC)

Efficient LB

Dynamicity

2020-03-09 8
The challenge: Ensuring PCC

For each packet of an existing connection, the LB asks itself

« Which server is handling this connection? »


Per-Connection-Consistency

2020-03-09 9
Today’s solutions cannot ensure all
requirements at the same time

Uniform Efficient

PCC Dynamic

2020-03-09 10
Stateless solutions
ECMP, WCMP [EuroSys’14]
Efficient
Uniform

Dynamic
PCC

2020-03-09 11
Stateless solutions
Consistent hashing [STOC’97], Beamer [NSDI’18], Faild [NSDI’18]
Efficient
Uniform

PCC
Dynamic

2020-03-09 12
Stateful solutions
Silkroad [SIGCOMM’17], Ananta [SIGCOMM’13], Maglev [NSDI’16], Katran
Uniform
Efficient

PCC Dynamic

2020-03-09 13
CHEETAH

PCC « Which server is handling this connection? »

CHEETAH: ask the user to remember for us

2020-03-09 14
CHEETAH

key idea: store information about the load balancing


decisions into a cookie

support any realizable load-balancing logic

high resilience

amenable to simple and fast implementation

2020-03-09 15
CHEETAH IMPLEMENTATIONS

100G with 4 cores


FastClick P4_Tofino P4_16

→ 5x faster software processing time while guaranteeing PCC


→ twice better tail latency compared to hashing thanks to more
uniform load spreading

2020-03-09 16
CHEETAH

2020-03-09 17
CHEETAH Overview

selected
server 2
LB-logic

Users Servers
Load-Balancer

2020-03-09 18
CHEETAH Overview

selected
server 2
LB-logic

set cookie get server id

Users Servers
Load-Balancer

2020-03-09 19
CHEETAH Overview
Where to store the cookie?
• TCP timestamp
selected • QUIC connection ID
server 2 • IPv6 address
LB-logic

set cookie get server id

extract server id
2
Users Servers
Load-Balancer

2020-03-09 20
CHEETAH Overview

selected
Allows attackers to target a server… server 2
LB-logic

set cookie get server id

extract server id
2 was
Users server 2 Servers
Load-Balancer

2020-03-09 21
Stateless CHEETAH

selected
server
LB-logic

set cookie
AE7B
cookie = id XOR hash(4-tuple) get server id

extract server id
AE7B previous
Users server Servers
id = cookie XOR hash (4-tuple)
Load-Balancer
2020-03-09 22
Stateless CHEETAH

selected
AE7B server
LB-logic

set cookie
AE7B
cookie = id XOR hash(4-tuple) get server id

AE7B extract server id


previous
Users server Servers
id = cookie XOR hash (4-tuple)
Load-Balancer
2020-03-09 23
Stateless CHEETAH

selected
server 1
LB-logic

set cookie

cookie = id XOR hash(4-tuple) get server id


3
extract server id
previous
Users server Servers
id = cookie XOR hash (4-tuple)
Load-Balancer
9
2020-03-09 24
Stateless CHEETAH

selected
server
LB-logic

set cookie

cookie = id XOR hash(4-tuple) get server id

extract
The server can server id
directly set up the cookie
to enable previous
Users server Servers
Direct Server Return (DSR)
id = cookie XOR hash (4-tuple)
Load-Balancer
2020-03-09 25
There are two CHEETAHS

Stateless CHEETAH Stateful CHEETAH


→ Keep per-connection state
on the LB

NAT
Statistics
Rate limiter
...
2020-03-09 https://www.goodfreephotos.com 26
Stateful CHEETAH

cookie Flow index

O(1) lookup

Per-connection state table


hash Server ID NAT Statistics
… …
Stateful CHEETAH FEFE
11D7 DIP_3 1.2.3.4:5678 9 packets
A7A7 DIP_6 9.0.1.2:3456 90 packets
… …

2020-03-09 https://www.goodfreephotos.com 27
Stateful CHEETAH

Fast in software
cookie Flow index
Entirely doable from
O(1) lookup hardware dataplane

Per-connection state table


Stack of hash Server ID NAT Statistics
Empty indexes
O(1) insertion … …
39F0
1AB2
... 11D7 DIP_3 1.2.3.4:5678 9 packets
39F0
A7A7 DIP_6 9.0.1.2:3456 90 packets
… O(1) deletion
… …

2020-03-09 https://www.goodfreephotos.com 28
EVALUATION
Stateful at the price of stateless

2020-03-09 30
Packet processing performance analysis

2020-03-09 Requests for a 8K file to 64 servers using HTTP, 640 000 req/s 31
Packet processing performance analysis

5x

2020-03-09 Requests for a 8K file to 64 servers using HTTP, 640 000 req/s 32
Packet processing performance analysis

We have the advantages


of stateful classification at
the price of stateless:

PCC 5x
Dynamicity

Uniformity

2020-03-09 Requests for a 8K file to 64 servers using HTTP, 640 000 req/s 33
Packet processing performance analysis

2 to 3 times faster 2.8x


classification than a
cuckoo hash table

2020-03-09 Requests for a 8K file to 64 servers using HTTP, 640 000 req/s 34
EVALUATION
Reducing load imbalances
with arbitrary LB mechanisms

2020-03-09 35
Uniform workload:
Variance across server load with hashing

20-30% variance with


medium to high load

In line with Maglev


[NSDI’16] which shows
up to a ~30%
overprovision

2020-03-09 Requests for a 8K file using HTTP to 64 servers, at 100 to 200 req/s per servers 36
Uniform workload:
Variance across server load with Round-Robin (RR)

By using round-robin,
CHEETAH can bring
down the variance to
reach a near-uniform
load balancing

2020-03-09 Requests for a 8K file using HTTP to 64 servers, at 100 to 200 req/s per servers 38
Uniform workload:
Tail latency

3x

CHEETAH lowers the tail


latency by 2 to 3x 2x

2020-03-09 Requests for a 8K file using HTTP to 64 servers, at 100 to 200 req/s per servers 39
Bimodal workload:
Round-Robin (RR) does not help

2020-03-09 100req/s per servers with 10% of heavy request and 90% of small requests 40
Bimodal workload:
Least loaded

2020-03-09 100req/s per servers with 10% of heavy request and 90% of small requests 41
Bimodal workload:
Average Weighted Round-Robin and Power-of-2-choice

CHEETAH allows to
use your preferred
advanced server
selection techniques

a factor of ~2X

2020-03-09 100req/s per servers with 10% of heavy request and 90% of small requests 42
In the paper

Large-scale simulations
• 468 servers, motivation, # broken connections when
ensuring uniformity with hashing

More experiments LB
• PCC w/ dynamicity, comparison with Beamer [NSDI’18]

Details about the cookie encoding and limitations LB LB LB


• Other possible cookie encodings !!!

Multi-tier load-balancing considerations


• Stateful load balancers still break connections at scale!

2020-03-09 43
CHEETAH: stateful at the price of stateless

Exploited a network cookie to: github.com/cheetahlb


> Guarantee PCC and support any
implementable LB mechanism
> Present a fast design to allow O(1) flow
insertion and lookup from the dataplane
Cookie as a standard? includes
dataset and
Implemented on both software switches and
experiments
programmable Tofino ASIC
Tom Barbette
Thank you!

2020-03-09 This work is supported by SSF and ERC 44


Backup slides

2020-03-09 45
TESTBED

2020-03-09 46
Uniform workload experiment

SRIOV Machine 5
CPU 2 CPU 3 CPU 4 CPU 5 CPU 6

CPU 1
Machine 1 CPU 7 CPU 8 CPU 9 CPU 10 CPU 11

CPU 12 CPU 13 CPU1 4 CPU 15 CPU 16

Machine 2
LB Machine 6
Machine 3 Machine 7

Machine 8
Machine 4
8K HTTP requests
64 servers

2020-03-09 47
FULL STATEFUL DESIGN

2020-03-09 48
client-side CHEETAH stateful load balancer server-side

ConnTable[0]
IP src, VIP
id hash DIP
TCP src, dst
ConnStack[0] … … …
payload 39EF FEFE DIP_6
cookie
39F0 39F0 - -
1D15 39F1 A7A7 DIP_4
hashS(c) % m … … … …
... ...
ConnTable[m-1]
ConnStack[m-1]
id hash DIP
cookie
… … …
1A2B
39EF - -
39EF
39F0 3434 DIP_1

39F1 5656 DIP_3
… … …
client-side CHEETAH stateful load balancer server-side

LB-logic DIP_3
ConnTable[0]
IP src, VIP 11D7
hashS(c) id hash DIP
TCP src, dst
ConnStack[0] … … …
payload 39EF FEFE DIP_6
cookie
39F0
1D15 39F0
… 39F1 A7A7 DIP_4
hashS(c) % m … … … …
... ...
ConnTable[m-1]
ConnStack[m-1]
id hash DIP
cookie
… … …
1A2B
39EF - -
39EF
39F0 3434 DIP_1

39F1 5656 DIP_3
… … …
client-side CHEETAH stateful load balancer server-side

LB-logic DIP_3
ConnTable[0]
IP src, VIP 11D7
hashS(c) id hash DIP
TCP src, dst
ConnStack[0] … … …
payload IP src,
cookie 39EF FEFE DIP_6 DIP_3
39F0
1D15 39F0 11D7 DIP_3
TCP src, dst
… 39F1 A7A7 DIP_4
cookie q
hashS(c) % m … … … …
... ... payload
ConnTable[m-1]
ConnStack[m-1]
id hash DIP
cookie
… … …
1A2B
39EF - -
39EF
39F0 3434 DIP_1

39F1 5656 DIP_3
… … …
TS PARSING

2020-03-09 52
Implementation: storing the cookie
Where to store the cookie?
• TCP timestamp, QUIC connection-id, IPv6 addresses
• quickly extracting the TCP timestamp is key to high performance

Based on a 1-
hour KTH
packet trace
2020-03-09 53

You might also like