Bandwidth Partitioning: (Jointly With, C. Psounis, C. Nair, B. Yang, L. Breslau and S. Shenker)

Bandwidth partitioning
(jointly with R. Pan, C. Psounis, C. Nair, B. Yang,

L. Breslau and S. Shenker)
H ig h Perfo rmance
Swi tc hi ng and Rout in g
Te le com Ce nte rWor ksho p: Se pt 4,1 9 7.
The Setup
• A congested network with many users
• Problems:
– allocate bandwidth fairly
– control queue size and hence delay
2
Approach 1: Network-centric
• Network node: fair queueing

• User traffic: any type
 Pros: perfect (short-time scale) fairness
 Cons:
 depending on number of users/flows need to add and remove queues dynamically
 incremental addition of queues not possible
 in data centers, may want to support VM migrations, etc
3
Approach 2: User-centric
• Network node: FIFO

• User traffic: responsive to congestion (e.g. TCP)
 problem: requires user cooperation, there are many TCP
implementations!
• For example, if the red source blasts away, it will get all of the link’s
bandwidth
• Question: Can we prevent a single source (or a small number of
sources) from hogging up all the bandwidth, without explicitly
identifying the rogue source?
• We will deal with full-scale bandwidth partitioning later
4
Active Queue Management
• Solve problems with tail-drop

– Proactively indicate congestion
– No bias against bursty flow
– No synchronization effect
• Provide better QoS at router

– low steady-state delay
– lower packet dropping
5
Random Early Detection (RED)
Arriving packet
no AvgQsize > Minth?
Admit the yes

new packet
AvgQsize > Maxth?
no yes
end
Admit packet with Drop the new packet

a probability p
end
end
6
RED Dropping Curve
Drop Probability
1
maxp
0
minth maxth
Average Queue Size
7
What QoS does RED Provide?
• Lower buffer delay: good interactive service

– qavg is controlled and small
• With congestion responsive flows: packet

dropping is reduced
– early congestion indication allows traffic to throttle
back before congestion
• With responsive: fair bandwidth allocation,

approximately and over long time scales
8
Simulation Comparison: The setup
S(1) D(1)
10Mbps 10Mbps
TCP S(2) D(2) TCP
Sources
Sinks
S(m) 1Mbps D(m)

R1 R2
S(m+1) D(m+1)
UDP UDP
Sources Sinks
S(m+n) D(m+n)
9
1 UDP source and 32 TCP sources
1000
RED
UDP Throughput (Kbps)
CHOKe
800
600
400
200
0
100 1000 10000
UDP Arrival Rate (Kbps)
10
A Randomized Algorithm: First Cut
• Consider a single link shared by 1 unresponsive (red) flow and k distinct

responsive (green) flows
• Suppose the buffer gets congested
• Observe: It is likely there are more packets from the red (unresponsive)
source
• So if a randomly chosen packet is evicted, it will likely be a red packet
• Therefore, one algorithm could be:
When buffer is congested evict a randomly chosen packet
11
Comments
• Unfortunately, this doesn’t work because there is a small non-zero

chance of evicting a green packet
• Since green sources are responsive, they interpret the packet drop as
a congestion signal and back-off
• This only frees up more room for red packets
12
Randomized algorithm: Second attempt
• Suppose we choose two packets at random from the queue and

compare their ids, then it is quite unlikely that both will be green
• This suggests another algorithm:

Choose two packets at random and drop them both if their ids agree
• This works: That is, it limits the maximum bandwidth the red source
can consume
13
CHOKe
RED
Arriving packet
no AvgQsize > Minth? no

Admit the Draw ayes
packet at
new packet random from queue
Flow id same as AvgQsize > Maxth?
no packet id ?
the new no yes
end yes
Drop both Admit packet with Drop the
AvgQsize new packet
> Max th?
no
matched packets a probability p yes
yes
end
end end with
Admit packet Drop the new packet
a probability p
end
end
14
Simulation Comparison: The setup
S(1) D(1)
10Mbps 10Mbps
TCP S(2) D(2) TCP
Sources
Sinks
S(m) 1Mbps D(m)

R1 R2
S(m+1) D(m+1)
UDP UDP
Sources Sinks
S(m+n) D(m+n)
15
1 UDP source and 32 TCP sources
1000
RED
UDP Throughput (Kbps)
CHOKe
800
600
400
200
0
100 1000 10000
UDP Arrival Rate (Kbps)
16
A Fluid Analysis
discards from the queue
permeable tube
with leakage
17
Setup
Li(t)
location in tube
0 t t+t D
discards from the queue

N: the total number of packets in the buffer
 i: the arrival rate for flow i
 Li(t): the rate at which flow i packets cross location t
18
The Equation
Li (t )
Li (t )  Li (t  t )  it
N
dLi (t ) Li (t )
  i
dt N
Boundary Conditions
Li (0)  i (1  pi ); D
Li (t )t
Li ( D)  i (1  2 pi )
pi  
0
N
19
Simulation Comparison: 1UDP, 32 TCPs
350
300
250
Throughput
200
150
100 fluid model
50 CHOKe ns simulation
0
0.1 1 10
Arrival Rate
20
Complete bandwidth partitioning
• We have just seen how to prevent a small number of sources from

hogging all the bandwidth
• However, this is far from ideal fairness

– What happens if we use a bit more state?
21
Our approach: Exploit power laws
• Most flows are very small (mice), most bandwidth is consumed by a few large (elephant) flows: simply partition the bandwidth amongst the elephant flows
• New problem: Quickly (automatically) identify elephant flows, allocate bandwidth to them
22
Detecting large (elephant) flows
• Detection:
– Flip a coin with bias p (= 0.1, say) for heads on each arriving
packet, independently from packet to packet.
– A flow is “sampled” if one its packets has a head on it
T T T T H T H


• A flow of size X has roughly 0.1X chance of being sampled
– flows with fewer than 5 packets are sampled with prob 0.5
– flows with more than 10 packets are sampled with prob 1
• Most mice will not be sampled, most elephants will be
23
The AFD Algorithm
Data Buffer
Di
Flow Table
• AFD is a randomized algorithm

 joint work with Rong Pan, Lee Breslau and Scott Shenker
 currently being ported onto Cisco’s core router (and other) platforms
24
AFD vs. WRED
Test 1: TCP Traffic (1Gbps, 4 Classes)
Weight 1 Weight 2
Class 2
Class 1
400 Flows
400 Flows 160 Flows
0 15 30 45 60 time 0 15 30 45 60 time
1600 Flows
Weight 3 Weight 4
Class 4
Class 3
400 Flows 400 Flows
0 15 30 45 60 time 0 15 30 45 60 time
TCP Traffic:Throughputs Under WRED
Throughput
Ratio:13/1
TCP Traffic: Throughputs Under AFD
Throughput
Ratio:2/1 as desired
AFD’s Implementation in IOS
• AFD is implemented as an IOS feature directly on top of IO

driver
– Integration Branch : haw_t
– LABEL=V124_24_6
• Lines of codes: 689 lines

– Structure definition and initialization: 253
– Per packet enque process function: 173
– Background timer function: 263
• Tested on e-ARMS c3845 platform

Throughput Comparison vs. 12.5T IOS
3.50E+07
AFD-prec1
3.00E+07
AFD-prec2
Throughput (bps)
2.50E+07 AFD-prec3
AFD-prec4
2.00E+07
AFD-prec5
1.50E+07 12.5T-prec1
12.5T-prec2
1.00E+07 12.5T-prec3
5.00E+06 12.5T-prec4
12.5T-prec5
0.00E+00
Time (sec)
Scenario 2 (100Mbps Link) With Smaller Measurement Intervals
- the longer the interval => the better rate accuracy
12.5T with 48ms Interval AFD with 48ms Interval
3.50E+07 3.50E+07
3.00E+07 3.00E+07
Throughput (bps)
Throughput (bps)
Prec1 Prec1
2.50E+07 Prec2 2.50E+07 Prec2
Prec3 Prec3
2.00E+07 Prec4 2.00E+07
Prec4
Prec5 Prec5
1.50E+07 1.50E+07
1.00E+07 1.00E+07
5.00E+06 5.00E+06
0.00E+00 0.00E+00
0 1 2 3 4 5 0 1 2 3 4 5
Test Time (sec) Test Time (sec)
12.5T with 200ms Interval AFD with 200ms Interval
3.50E+07
3.50E+07
3.00E+07
Throughput (bps)
Throughput (bps)
Prec1 3.00E+07
Prec1
2.50E+07 Prec2 2.50E+07 Prec2
Prec3
2.00E+07 Prec3
Prec4 2.00E+07
Prec4
1.50E+07 Prec5
1.50E+07 Prec5
1.00E+07 1.00E+07
5.00E+06 5.00E+06
0.00E+00 0.00E+00
0 1 2 3 4 5 0 1 2 3 4 5
Test Time (sec) Test Time (sec)

AFD Tradeoffs
• There is no free lunch and AFD does make a tradeoff
• AFD’s tradeoff is as follows:
– by allowing bandwidth (rate) guarantees to be delivered over
relative larger time intervals, the algorithm is able to achieve
increased efficiency and lower cost implementation (e.g.,
lower cost ASICs; to lower instruction and memory bandwidth
overhead for software)
– what does “allowing bandwidth (rate) guarantees to be
delivered over relative larger time intervals” really mean?
 for example: if a traffic stream is guaranteed a rate of 10Mbps, is
that rate delivered over every time interval of size 1 sec, or is the
rate delivered over time intervals of 100 milliseconds;
 if the time interval is larger, AFD is more efficient, but the traffic
can be more bursty within the interval
 as link speeds go up, the time intervals for which AFD can be
efficient becomes smaller and smaller.

Bandwidth Partitioning: (Jointly With, C. Psounis, C. Nair, B. Yang, L. Breslau and S. Shenker)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bandwidth Partitioning: (Jointly With, C. Psounis, C. Nair, B. Yang, L. Breslau and S. Shenker)

Uploaded by

Copyright:

Available Formats

Bandwidth partitioning

(jointly with R. Pan, C. Psounis, C. Nair, B. Yang,

• A congested network with many users

• Network node: fair queueing

• Network node: FIFO

• Solve problems with tail-drop

• Provide better QoS at router

no AvgQsize > Minth?

Admit the yes

Admit packet with Drop the new packet

Average Queue Size

• Lower buffer delay: good interactive service

• With congestion responsive flows: packet

• With responsive: fair bandwidth allocation,

S(m) 1Mbps D(m)

• Consider a single link shared by 1 unresponsive (red) flow and k distinct

• Unfortunately, this doesn’t work because there is a small non-zero

• This only frees up more room for red packets

• Suppose we choose two packets at random from the queue and

• This suggests another algorithm:

no AvgQsize > Minth? no

S(m) 1Mbps D(m)

discards from the queue

• We have just seen how to prevent a small number of sources from

• However, this is far from ideal fairness

• AFD is a randomized algorithm

400 Flows 400 Flows

• AFD is implemented as an IOS feature directly on top of IO

• Lines of codes: 689 lines

• Tested on e-ARMS c3845 platform

12.5T with 48ms Interval AFD with 48ms Interval

Test Time (sec) Test Time (sec)

12.5T with 200ms Interval AFD with 200ms Interval

Test Time (sec) Test Time (sec)

You might also like