You are on page 1of 32

Bandwidth partitioning

(jointly with R. Pan, C. Psounis, C. Nair, B. Yang,


L. Breslau and S. Shenker)
H ig h Perfo rmance
Swi tc hi ng and Rout in g
Te le com Ce nte rWor ksho p: Se pt 4,1 9 7.
The Setup

• A congested network with many users

• Problems:
– allocate bandwidth fairly
– control queue size and hence delay

2
Approach 1: Network-centric

• Network node: fair queueing


• User traffic: any type
 Pros: perfect (short-time scale) fairness
 Cons:
 depending on number of users/flows need to add and remove queues dynamically
 incremental addition of queues not possible
 in data centers, may want to support VM migrations, etc

3
Approach 2: User-centric

• Network node: FIFO


• User traffic: responsive to congestion (e.g. TCP)
 problem: requires user cooperation, there are many TCP
implementations!
• For example, if the red source blasts away, it will get all of the link’s
bandwidth
• Question: Can we prevent a single source (or a small number of
sources) from hogging up all the bandwidth, without explicitly
identifying the rogue source?
• We will deal with full-scale bandwidth partitioning later
4
Active Queue Management

• Solve problems with tail-drop


– Proactively indicate congestion
– No bias against bursty flow
– No synchronization effect

• Provide better QoS at router


– low steady-state delay
– lower packet dropping

5
Random Early Detection (RED)

Arriving packet

no AvgQsize > Minth?

Admit the yes


new packet
AvgQsize > Maxth?
no yes
end

Admit packet with Drop the new packet


a probability p
end
end

6
RED Dropping Curve

Drop Probability
1

maxp

0
minth maxth

Average Queue Size

7
What QoS does RED Provide?

• Lower buffer delay: good interactive service


– qavg is controlled and small

• With congestion responsive flows: packet


dropping is reduced
– early congestion indication allows traffic to throttle
back before congestion

• With responsive: fair bandwidth allocation,


approximately and over long time scales

8
Simulation Comparison: The setup

S(1) D(1)
10Mbps 10Mbps
TCP S(2) D(2) TCP
Sources
Sinks

S(m) 1Mbps D(m)


R1 R2

S(m+1) D(m+1)

UDP UDP
Sources Sinks

S(m+n) D(m+n)

9
1 UDP source and 32 TCP sources

1000
RED
UDP Throughput (Kbps)

CHOKe
800

600

400

200

0
100 1000 10000
UDP Arrival Rate (Kbps)

10
A Randomized Algorithm: First Cut

• Consider a single link shared by 1 unresponsive (red) flow and k distinct


responsive (green) flows
• Suppose the buffer gets congested

• Observe: It is likely there are more packets from the red (unresponsive)
source
• So if a randomly chosen packet is evicted, it will likely be a red packet
• Therefore, one algorithm could be:
When buffer is congested evict a randomly chosen packet

11
Comments

• Unfortunately, this doesn’t work because there is a small non-zero


chance of evicting a green packet

• Since green sources are responsive, they interpret the packet drop as
a congestion signal and back-off

• This only frees up more room for red packets

12
Randomized algorithm: Second attempt

• Suppose we choose two packets at random from the queue and


compare their ids, then it is quite unlikely that both will be green

• This suggests another algorithm:


Choose two packets at random and drop them both if their ids agree

• This works: That is, it limits the maximum bandwidth the red source
can consume

13
CHOKe
RED
Arriving packet

no AvgQsize > Minth? no


Admit the Draw ayes
packet at
new packet random from queue
Flow id same as AvgQsize > Maxth?
no packet id ?
the new no yes
end yes
Drop both Admit packet with Drop the
AvgQsize new packet
> Max th?
no
matched packets a probability p yes
yes
end
end end with
Admit packet Drop the new packet
a probability p
end
end

14
Simulation Comparison: The setup

S(1) D(1)
10Mbps 10Mbps
TCP S(2) D(2) TCP
Sources
Sinks

S(m) 1Mbps D(m)


R1 R2

S(m+1) D(m+1)

UDP UDP
Sources Sinks

S(m+n) D(m+n)

15
1 UDP source and 32 TCP sources

1000
RED
UDP Throughput (Kbps)

CHOKe
800

600

400

200

0
100 1000 10000
UDP Arrival Rate (Kbps)

16
A Fluid Analysis

discards from the queue

permeable tube
with leakage

17
Setup
Li(t)

location in tube
0 t t+t D
discards from the queue

N: the total number of packets in the buffer
 i: the arrival rate for flow i
 Li(t): the rate at which flow i packets cross location t

18
The Equation

Li (t )
Li (t )  Li (t  t )  it
N
dLi (t ) Li (t )
  i
dt N

Boundary Conditions

Li (0)  i (1  pi ); D
Li (t )t
Li ( D)  i (1  2 pi )
pi  
0
N

19
Simulation Comparison: 1UDP, 32 TCPs

350
300
250
Throughput

200
150
100 fluid model
50 CHOKe ns simulation
0
0.1 1 10
Arrival Rate

20
Complete bandwidth partitioning

• We have just seen how to prevent a small number of sources from


hogging all the bandwidth

• However, this is far from ideal fairness


– What happens if we use a bit more state?

21
Our approach: Exploit power laws
• Most flows are very small (mice), most bandwidth is consumed by a few large (elephant) flows: simply partition the bandwidth amongst the elephant flows

• New problem: Quickly (automatically) identify elephant flows, allocate bandwidth to them

22
Detecting large (elephant) flows

• Detection:
– Flip a coin with bias p (= 0.1, say) for heads on each arriving
packet, independently from packet to packet.
– A flow is “sampled” if one its packets has a head on it

T T T T H T H



• A flow of size X has roughly 0.1X chance of being sampled
– flows with fewer than 5 packets are sampled with prob 0.5
– flows with more than 10 packets are sampled with prob 1
• Most mice will not be sampled, most elephants will be

23
The AFD Algorithm
Data Buffer

Di

Flow Table

• AFD is a randomized algorithm


 joint work with Rong Pan, Lee Breslau and Scott Shenker
 currently being ported onto Cisco’s core router (and other) platforms

24
AFD vs. WRED
Test 1: TCP Traffic (1Gbps, 4 Classes)

Weight 1 Weight 2

Class 2
Class 1

400 Flows
400 Flows 160 Flows

0 15 30 45 60 time 0 15 30 45 60 time

1600 Flows
Weight 3 Weight 4

Class 4
Class 3

400 Flows 400 Flows

0 15 30 45 60 time 0 15 30 45 60 time
TCP Traffic:Throughputs Under WRED

Throughput
Ratio:13/1
TCP Traffic: Throughputs Under AFD

Throughput
Ratio:2/1 as desired
AFD’s Implementation in IOS

• AFD is implemented as an IOS feature directly on top of IO


driver
– Integration Branch : haw_t
– LABEL=V124_24_6

• Lines of codes: 689 lines


– Structure definition and initialization: 253
– Per packet enque process function: 173
– Background timer function: 263

• Tested on e-ARMS c3845 platform


Throughput Comparison vs. 12.5T IOS

3.50E+07
AFD-prec1
3.00E+07
AFD-prec2
Throughput (bps)

2.50E+07 AFD-prec3
AFD-prec4
2.00E+07
AFD-prec5
1.50E+07 12.5T-prec1
12.5T-prec2
1.00E+07 12.5T-prec3
5.00E+06 12.5T-prec4
12.5T-prec5
0.00E+00

Time (sec)
Scenario 2 (100Mbps Link) With Smaller Measurement Intervals
- the longer the interval => the better rate accuracy

12.5T with 48ms Interval AFD with 48ms Interval

3.50E+07 3.50E+07

3.00E+07 3.00E+07
Throughput (bps)

Throughput (bps)
Prec1 Prec1
2.50E+07 Prec2 2.50E+07 Prec2
Prec3 Prec3
2.00E+07 Prec4 2.00E+07
Prec4
Prec5 Prec5
1.50E+07 1.50E+07

1.00E+07 1.00E+07

5.00E+06 5.00E+06

0.00E+00 0.00E+00
0 1 2 3 4 5 0 1 2 3 4 5

Test Time (sec) Test Time (sec)

12.5T with 200ms Interval AFD with 200ms Interval

3.50E+07
3.50E+07
3.00E+07

Throughput (bps)
Throughput (bps)

Prec1 3.00E+07
Prec1
2.50E+07 Prec2 2.50E+07 Prec2
Prec3
2.00E+07 Prec3
Prec4 2.00E+07
Prec4
1.50E+07 Prec5
1.50E+07 Prec5

1.00E+07 1.00E+07
5.00E+06 5.00E+06
0.00E+00 0.00E+00
0 1 2 3 4 5 0 1 2 3 4 5

Test Time (sec) Test Time (sec)


AFD Tradeoffs
• There is no free lunch and AFD does make a tradeoff
• AFD’s tradeoff is as follows:
– by allowing bandwidth (rate) guarantees to be delivered over
relative larger time intervals, the algorithm is able to achieve
increased efficiency and lower cost implementation (e.g.,
lower cost ASICs; to lower instruction and memory bandwidth
overhead for software)
– what does “allowing bandwidth (rate) guarantees to be
delivered over relative larger time intervals” really mean?
 for example: if a traffic stream is guaranteed a rate of 10Mbps, is
that rate delivered over every time interval of size 1 sec, or is the
rate delivered over time intervals of 100 milliseconds;
 if the time interval is larger, AFD is more efficient, but the traffic
can be more bursty within the interval
 as link speeds go up, the time intervals for which AFD can be
efficient becomes smaller and smaller.

You might also like