You are on page 1of 4

744 IEEE COMMUNICATIONS LETTERS, VOL. 24, NO.

4, APRIL 2020

A Dynamic Threshold Calculation for Congestion Notification


in IEEE 802.1Qbb
Waqar Khurshid , Imran A. Khan, Miss L. M. Kiah, Osman Khalid, and Sajjad A. Madani

Abstract— Ethernet has been a backbone of data centers since that use User Datagram Protocol [2]. Hence, a hop to hop
inception of cloud computing. The reason is its simplicity, cost based congestion notification (CN) method is introduced in
effectiveness, wide existing base, and maturity. However, Ethernet IEEE 802.1Qbb to provide reliability to such applications
is unreliable when it comes to providing guaranteed transmission.
Consequently, IEEE 802.1Qbb standard has been introduced along with Priority-based Flow Control (PFC). This standard is
to provide reliability based on congestion notification (CN) not just responsible for reliability provision for real-time flows
mechanism of IEEE 802.3x. However, IEEE 802.1Qbb has some on a hop to hop basis, but it elevates reliability of infrastructure
issues and limitations, such as higher buffer space reservation, as well, thus purges the need for upper layer protocols. In IEEE
lack of support for heterogeneous switching devices, and low 802.1Qbb, layer-2 switches provide feedback regarding buffer
throughput and efficiency of the network. This letter proposes
a novel approach for modified CN mechanism that provides conditions to the uplink switches when the buffer occupancy
44 percent less space reservation for CN mechanism, support stretches to a certain threshold limit. However, this threshold
for heterogeneity, higher throughput, and efficiency. selection is based on worst-case delay scenarios that roots
Index Terms— Congestion notification, IEEE802.1Qbb, optimal to issues of higher buffer space reservation, limitation of
threshold selection, reliable layer-2 infrastructure. heterogeneity of switching devices, and less throughput and
efficiency due to increased pause instigation [2].
I. I NTRODUCTION This letter proposes a novel approach to dynamic threshold

I N THE past three decades, Ethernet has been the most


dominant communication technology worldwide. Ethernet
has also become increasingly ubiquitous as workplaces are
estimation mechanism for IEEE 802.1Qbb instead of worst-
case threshold implementation. The following are the major
contributions of the proposed work.
more automated with Internet connectivity. Ethernet is usually a) A dynamic buffer threshold calculation technique is
considered unreliable for majority of applications due to the devised that shows 44 percent less buffer space reservation
fact that route between a source to a destination typically for lossless operations
consists of multiple intermediate nodes, and there is inade- b) The proposed work provides support for heterogeneity of
quate feedback about buffer occupancy between intermediate switching devices
hops [1], [2]. A reliable network has inherent mechanism c) Compared to existing work, high throughput and greater
to provide reliability at infrastructure level. However, upper efficiency is experienced in queuing and storing network
layer protocols are responsible for reliability provision in packets
Ethernet which should not be the case, as the reliability The efficiency of our approach in real-world congested
mechanism in Ethernet should already exist at layer-2 [2], traffic scenario is extensively evaluated through simulations
[3]. A network switch silently drops packets when it is faced carried in widely used Network Simulator 2 (NS2) [4].
with congestion, and the upper layers infer the packet drop
incidence as network congestion. Unawareness of the sender II. IEEE 802.3X AND IEEE 802.1Qbb
for such packet drops contributes towards the unreliability CN mechanism can be either selective or non-selective.
of Ethernet. Novel Ethernet faces problems in flows where Selective CN technique stops the selective flows that are
reliability cannot be integrated in the upper layer protocols like causing congestion, whereas permitting remaining flows to
Fiber Channel over Ethernet (FCoE) or real-time applications continue normally. On the contrary, non-selective technique
pauses all the traffic on the link when a pause signal is
Manuscript received November 25, 2019; revised January 1, 2020; accepted
January 1, 2020. Date of publication January 13, 2020; date of current acknowledged by the sender [2].
version April 9, 2020. The associate editor coordinating the review of this In 1997, IEEE introduced a non-selective standard for CN to
letter and approving it for publication was M. Khabbaz. (Corresponding fabricate a reliable layer-2 infrastructure called IEEE 802.3x,
author: Waqar Khurshid.)
Waqar Khurshid is with the Department of Computer Science, COMSATS where packet drops in the traditional IEEE 802.3 LANs could
University Islamabad–Abbottabad, Abbottabad 22060, Pakistan, and also be tackled at layer-2 [5]. In such implementation, the receiver
with HITEC University, Taxila 47080, Pakistan (e-mail: waqarkhurshid@ used a pause frame (64-byte MAC control frame) to convey
cuiatd.edu.pk).
Imran A. Khan and Osman Khalid are with the Department of Computer the buffer occupancy feedback of the receiver to the upstream
Science, COMSATS University Islamabad–Abbottabad, Abbottabad 22060, sender. This allows sender to stop sending packets to the
Pakistan (e-mail: imran@cuiatd.edu.pk; osman@cuiatd.edu.pk). particular receiver [5]. The pause mechanism activates when
Miss L. M. Kiah is with the Faculty of Computer Science and Tech-
nology, University of Malaya, Kuala Lumpur 50603, Malaysia (e-mail: the receiver’s buffer occupancy reached a certain limit, called
misslaiha@um.edu.my). threshold. The pause frame contained the pause duration
Sajjad A. Madani is with the Department of Computer Science, COM- that was defined as quanta. One quantum was equal to the
SATS University Islamabad–Wah Cantt, Wah Cantt 47040, Pakistan (e-mail:
madani@comsats.edu.pk). time to transmit 512 bits at the current link speed onto the
Digital Object Identifier 10.1109/LCOMM.2020.2966198 network. When the threshold time expires, the sender resumes
1558-2558 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Edith Cowan University. Downloaded on April 17,2020 at 11:47:56 UTC from IEEE Xplore. Restrictions apply.
KHURSHID et al.: DYNAMIC THRESHOLD CALCULATION FOR CN IN IEEE 802.1Qbb 745

possibility that some buffer space will not be utilized in


efficient manner [2]. To calculate the delay caused by pause
frame, following delays should be considered in order to find
out the optimal threshold [8].
a) Receiver’s maximum transmission unit delay
b) Receiver’s transceiver latency
c) Pause frame’s serialization delay
d) Pause frame’s propagation delay
e) Sender’s maximum transmission unit delay
f) Sendery’s transceiver latency
g) Sender’s response time
The aforementioned delays can occur from pause instigation
till pause actually initiated by the sender. During that time,
more data will be sent to the receiver to be accommodated
Fig. 1. IEEE802.1Qbb pause frame format. in the buffer. IEEE 802.1Qbb considers worst cases of these
delays. The delay value (DV) is as follows [2], [3].
the transmission [1]. Moreover, the paused traffic can be
resumed by sending a pause frame with zero (0) quanta value. DV = 2(Fmax ) + FP F C + 2(P ropc ) + T Xdr
The popularity of IEEE 802.3x was short lived due to its inef- + RXdr + RXds + T Xds + HDs . (1)
ficiency for real-time data and time-sensitive applications as
Here, 2(Fmax ) is the MTU serialization delay of the sender
compared to traditional IEEE 802.3 Ethernet. In IEEE 802.3x,
and the receiver for the ongoing packet, FP F C is the serializa-
once the receiver initiates pause, the sender is not allowed
to produce any traffic flow for that receiver; consequently, tions delay of the pause frame, 2(P ropc ) is the propagation
delays of the cable for the sent pause frame and response by
stopping all the traffic that might entail different Quality of
receiver, T Xdr and RXdr are transceivers’ latencies of the
Service (QoS) [1], [2].
In March 2008, IEEE authorized CN standard with selec- receiver, RXds and T xds are the transceivers’ latencies of the
sender, and the HDs is delay of higher layer for pause frame
tive CN mechanism along with priority realization. Conse-
quently, IEEE 802.1Qbb (PFC) was proposed in 2010 and creation.
In IEEE 802.1Qbb (PFC) there are 8 different CoS, and
in June 2011 it became an active standard [2], [6]. IEEE
DV is calculated for each CoS separately because each CoS
802.1Qbb was an improvement to the previous 802.3x pause
frame. IEEE 802.1Qbb PFC supported class of service (CoS) has its own buffer queue [3]. For any given device, the Inter-
face Delay comprises of both receive and transmit paths
for eight different traffic flows on a single link [6]. Moreover,
(i.e., ID = RXd + T Xd) [3], resulting:
802.1Qbb PFC applied separate flow control for each CoS,
and each class was prioritized on the basis of flow repre- DV = 2(Fmax ) + FP F C + 2(P ropc ) + IDr + IDs + HDs .
sented by the specific class. Layer-2 switches maintained eight (2)
different queues, one for each CoS. While a queue reaches
the threshold, a pause frame was instigated for only that As per IEEE 802.1Qbb, both the sender and the receiver are
particular queue, and initiating CoS was temporarily paused considered to be communicating on the same technology [3].
instead of all CoS. The threshold was selected on the basis of Therefore, we get:
worst case delay scenarios. The pause frame contained eight DV = 2(Fmax ) + FP F C + 2(P ropc ) + 2(ID) + HDs . (3)
different values in quanta that represented each CoS. Each
class had a quanta value of 2-byte, and the maximum value The resulted DV in this scenario is calculated as:
was 65,535. Fig.1 shows the format of the pause frame for DV = 2(16160) + (672) + 2(5556) + 2(12288)
IEEE 802.1Qbb PFC. + (25504) = 145384 bit times. (4)
The concept of initial constant congestion window (ICCW)
was introduced in [7] based on injecting two different thresh- This delay value is calculated as a worst case scenario
olds to reduce congestion. However, the work in [7] deals roughly equivalent to 18KB for each CoS [3]. IEEE 802.1Qbb
with Wireless Sensor Network (WSN) and based on end to assumed that sender and receiver are of same technology
end congestion control. thus limitation for heterogeneity is not addressed. Current
The selection of an optimal threshold was forever been a implementation of the standard in industry is still based on the
challenge for CN mechanisms [2], [8]. Whenever a pause DV proposed based on worst case delays and mainly employed
frame is sent by receiver it is most probable that the frame by the CISCO nexus series switches for datacenters.
will face some delays before it can be received by the sender. Based on the calculations above we are faced with the two
During that delay, more traffic will be sent to downstream major concerns:
receiver that needs to be accommodated in the receiver’s a) What will be the actual DV in real-world scenario?
buffer. If the threshold for pause is set to high, then receiver b) How to employ heterogeneity in the real world scenario?
will not be able to accommodate the incoming traffic. On the c) What are the effects of actual DV on the performance and
contrary, if the threshold is set too low then there is a throughput of the network?

Authorized licensed use limited to: Edith Cowan University. Downloaded on April 17,2020 at 11:47:56 UTC from IEEE Xplore. Restrictions apply.
746 IEEE COMMUNICATIONS LETTERS, VOL. 24, NO. 4, APRIL 2020

III. P ROPOSED T ECHNIQUE Algorithm 1 Compute RTT of the Initial Frame Output: RTT
To calculate the actual DV in real-world scenario and 1: initialize: frame = 64 byte initial frame (min MTU);
employ heterogeneity, we utilize dynamic calculation of delay 2: initialize: time = current system time;
values in the boot up process of the network as spanning tree 3: initialize: t = 0;
algorithm. To calculate the actual delay value we must first 4: initialize: q = time required to inject 512 bits on link;
analyze each delay individually in (1). 5: send frame to upstream link;
The parameter 2(Fmax ) in (1) is the serialization delay of 6: while response frame not received do
the sender and the receiver. If the sender or receiver started 7: wait q (one quanta);
transmitting a full size frame, then the pause instigation has to 8: end while
wait for the serialization of the ongoing MTU that cannot be 9: t = current system time - time;
predicted at the network boot up. FP F C is the time required 10: RTT = t / q;
to create and serialize the pause frame by the receiver that 11: return RTT ;
also cannot be predicted at the network boot up. However,
2(P ropc ) is the delay of the cable to send the pause frame
and receive the acknowledgement frame that is calculated reservation, end-to-end delay, throughput, and buffer utiliza-
dynamically at network boot up. The parameters T Xdr and tion. The results are also compared with the previous standard
RXdr , and RXds , and T Xds , and HDs are computed at run that also employed the same technique using non-selective
time. The aforementioned delays are variable due to the length CN technique. The simulated environment consists of 10Gbps
and speed of the link, heterogeneity of network devices, and links and a primary link is created as a bottleneck to create
the switching fabrics of different devices making such delays the back-pressure in order to trigger the congestion notification
unavoidable and impervious to be fixed. Therefore, to calculate mechanism. The network was purposefully overwhelmed with
actual DV at network boot up, each switch creates and sends a traffic to depict the peak load scenario where this technique
pause frame to its upstream switch and waits for the response. comes into play in order to achieve lossless operation.
Then, all the delays can be calculated by round trip time (RTT) The proposed technique utilized 44 percent less buffer
of the frame. So the RTT will be: space in order to achieve lossless operation as compared to
the technique proposed in IEEE 802.1Qbb. On a single port
RT Tinitial f rame = 2(P ropc ) + T Xdr + RXdr our technique reserved 81KB for each CoS as compared to
+ RXds + T Xds + HDs . (5) IEEE82.1Qbb that reserves 144KB for each CoS. This sums up
to <4MB space reservation on a 48 port switch in comparison
Now the delay value can be calculated as: to IEEE 802.1Qbb that reserve approx. 7MB of buffer space as
DV = 2(Fmax ) + FP F C + RT Tinitial f rame . (6) shown in Fig. 2(a). The results for IEEE 802.3x are not shown
because it does not reserve buffer space as it stops traffic for
Moreover, we can calculate the DV as per IEEE 802.1Qbb all flows with a fixed buffer value that can be set as required.
maximum MTU and pause frame size as: Fig. 2(b) shows the end-to-end delay comparison of all three
techniques for 8 different CoS as IEEE 802.1Qbb supports
DV = 2(16160) + (672) + (RT Tinitial f rame × 512). (7) up to only 8 different CoS. It can be seen that proposed
Maximum frame size in IEEE 802.1Qbb is 2000 octets + technique has less end-to-end delay as compared to other.
preamble (16160 bits). PFC frame is 64 octets + preamble Moreover, end-to-end delay for CoS 4 and above is much more
(672 bits), and 512 bits per quanta value for the RTT frame [3]. in proposed technique and IEEE 802.1Qbb because these flows
In (7), RT Tinitialf rame is a variable that depends on the are low priority flows. In order to compensate the high priority
interface delays of the switching devices, length of the wire, flows, the low priority flows suffer and result in higher end-
and bandwidth of the network. The dynamic RT Tinitialf rame to-end delay. IEEE 802.3x does not support multiple flows;
calculation lift the restriction of same technology employed by therefore, all the flows have uniform end-to-end delay for all
both the switches as each switch can contribute its own delay flows including high priority flows. This results in 33 percent
values when the initial frame is processed in real-time. The higher delays as compared to proposed technique and IEEE
DV is the total number of bits required in the buffer before 802.1Qbb.
initiating a pause frame. Therefore, the threshold for pause Fig. 2(c) shows the average throughput of the bottleneck
instigation should be calculated as given in (8). link. The proposed technique and IEEE 802.1Qbb both out-
performed IEEE 802.3x because IEEE 802.3x stops all the
T hreshold = Buf f erSpace(P erLinkInterf ace) − DV. (8) flows at the same time and when the flows are reinitiated,
it encounters delay that results in low utilization of the link
Pseudocode for calculating RT Tinitialf rame on each port
during delay. The proposed technique achieved 1 percent
of the switch is expressed in Algorithm 1.
higher throughput because it reserves less buffer space as
compared to IEEE 802.1Qbb resulting in more space available
IV. E XPERIMENTAL R ESULTS at the receiver to accommodate more data instead of initiating
In this section, performance of the proposed technique the pause thus results in better throughput.
is analyzed. For simulation environment, NS2 was used to The results in Fig. 2(d) show that the proposed technique
investigate the effects of proposed technique on buffer space utilizes the buffer much efficiently than the others. This is

Authorized licensed use limited to: Edith Cowan University. Downloaded on April 17,2020 at 11:47:56 UTC from IEEE Xplore. Restrictions apply.
KHURSHID et al.: DYNAMIC THRESHOLD CALCULATION FOR CN IN IEEE 802.1Qbb 747

Fig. 2. Comparison of result parameters.

due to the fact that proposed technique does not reserve extra R EFERENCES
space for implementation of pause operation. Here, extra space [1] Priority-Based Flow Control, IEEE Standard 802.1Qbb, 2011.
refers to the space reserved by IEEE 802.1Qbb on the basis Accessed: May 12, 2019. [Online]. Available: http://www.ieee802.org/1/
of worst-case delays while the proposed technique reserve pages/802.1bb.html
[2] W. Khurshid, M. L. M. Kiah, I. A. Khan, R. Salleh, A. T. Chronopoulos,
space based on actual delay thus the difference between space and S. A. Madani, “Comparative study of congestion notification tech-
reserved by IEEE 802.1Qbb and the proposed technique is niques for hop-by-hop-based flow control in data centre Ethernet,” IET
referred as extra space. The proposed technique and 802.1Qbb Netw., vol. 7, no. 4, pp. 248–257, Jul. 2018.
[3] IEEE Draft Standard for Local and Metropolitan Area Networks—
make use of buffer more proficiently than IEEE 802.3x Virtual Bridged Local Area Networks—Amendment: Priority-Based
because buffer space is divided into eight queues for each flow. Flow Control, Standard IEEE P802.1Qbb/D2.3, (DRAFT Amendment
Consequently, when the PFC mechanism is triggered, only to IEEE Std 802.1Q -2005), Sep. 2010, pp. 1–40. [Online]. Available:
https://ieeexplore.ieee.org/document/5570062
relevant data is paused, meanwhile the other flows transmit
[4] T. Issariyakul and E. Hossain, “Introduction to network simulator 2
normally. (NS2),” in Introduction to Network Simulator NS2. Boston, MA, USA:
Springer, 2012, pp. 21–40.
[5] IEEE Standards for Local and Metropolitan Area Networks: Sup-
V. C ONCLUSION plements to Carrier Sense Multiple Access With Collision Detec-
tion (CSMA/CD) Access Method and Physical Layer Specifications—
In this study, we proposed a novel technique to dynam- Specification for 802.3 Full Duplex Operation and Physical Layer
ically calculate the threshold for congestion notification for Specification for 100 Mb/s Operation on Two Pairs of Category 3
IEEE 802.1Qbb standard. Current implementation of IEEE Or Better Balanced Twisted Pair Cable (100BASE-T2), IEEE Standard
802.3-1997, 1997, pp. 1–324.
802.1Qbb employ the congestion notification mechanism
[6] IEEE Standard for Local and Metropolitan Area Networks–Media
based on worst case delays. To resolve the issue, an algorithm Access Control (MAC) Bridges and Virtual Bridged Local Area
is devised to calculate the threshold dynamically in order to Networks–Amendment 17: Priority-based Flow Control, IEEE Stan-
avoid the worst-case based delay threshold. Also, the proposed dard 802.1Qbb-2011 (Amendment to IEEE Standard 802.1Q-2011
as Amended by IEEE Standard 802.1Qbe-2011 and IEEE Standard
scheme was implemented by extensive simulations and was 802.1Qbc-2011, 2011, pp. 1–40.
evaluated for its performance. [7] N. Aslam, K. Xia, A. Ali, and S. Ullah, “Adaptive TCP-ICCW conges-
tion control mechanism for QoS in renewable wireless sensor networks,”
IEEE Sensors Lett., vol. 1, no. 6, pp. 1–4, Dec. 2017.
ACKNOWLEDGMENT [8] IEEE Standard for Local and Metropolitan Area Networks—Virtual
Bridged Local Area Networks Amendment 13: Congestion Notification,
The authors would like to thank Dr. C. Desanti for his con- IEEE Standard 802.1Qau-2010 (Amendment to IEEE Standard 802.1Q-
tribution towards better understanding of the IEEE802.1Qbb. 2005), 2010, pp. 1–135.

Authorized licensed use limited to: Edith Cowan University. Downloaded on April 17,2020 at 11:47:56 UTC from IEEE Xplore. Restrictions apply.

You might also like