Professional Documents
Culture Documents
NE40E V800R010C00 Feature Description QoS PDF
NE40E V800R010C00 Feature Description QoS PDF
V800R010C00
Issue 02
Date 2018-06-20
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.
Notice
The purchased products, services and features are stipulated by the contract made between Huawei and the
customer. All or part of the products, services and features described in this document may not be within the
purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information,
and recommendations in this document are provided "AS IS" without warranties, guarantees or
representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
Website: http://www.huawei.com
Email: support@huawei.com
Contents
3 DiffServ Overview...................................................................................................................... 13
3.1 DiffServ Model.............................................................................................................................................................13
3.2 DSCP and PHB.............................................................................................................................................................14
3.3 Components in the DiffServ Model..............................................................................................................................17
9 MPLS QoS...................................................................................................................................156
9.1 MPLS QoS Overview................................................................................................................................................. 156
9.2 MPLS DiffServ...........................................................................................................................................................157
9.3 MPLS HQoS...............................................................................................................................................................163
9.3.1 Implementation Principle........................................................................................................................................ 163
9.3.2 Application.............................................................................................................................................................. 166
11 L2TP QoS...................................................................................................................................173
11.1 Introduction to L2TP QoS........................................................................................................................................ 173
11.2 Principles.................................................................................................................................................................. 173
11.2.1 Principles............................................................................................................................................................... 174
Purpose
This document describes the QoS feature in terms of its overview, principles, and
applications.
Related Version
The following table lists the product version related to this document.
U2000 V200R017C60
eSight V300R009C00
Intended Audience
This document is intended for:
l Network planning engineers
l Commissioning engineers
l Data configuration engineers
l System maintenance engineers
Security Declaration
l Encryption algorithm declaration
The encryption algorithms DES/3DES/RSA (RSA-1024 or lower)/MD5 (in digital
signature scenarios and password encryption)/SHA1 (in digital signature scenarios) have
a low security, which may bring security risks. If protocols allowed, using more secure
encryption algorithms, such as AES/RSA (RSA-2048 or higher)/SHA2/HMAC-SHA2 is
recommended.
l Password configuration declaration
– Do not set both the start and end characters of a password to "%^%#". This causes
the password to be displayed directly in the configuration file.
– To further improve device security, periodically change the password.
l Personal data declaration
Your purchased products, services, or features may use users' some personal data during
service operation or fault locating. You must define user privacy policies in compliance
with local laws and take proper measures to fully protect personal data.
l Feature declaration
– The NetStream feature may be used to analyze the communication information of
terminal customers for network traffic statistics and management purposes. Before
enabling the NetStream feature, ensure that it is performed within the boundaries
permitted by applicable laws and regulations. Effective measures must be taken to
ensure that information is securely protected.
– The mirroring feature may be used to analyze the communication information of
terminal customers for a maintenance purpose. Before enabling the mirroring
function, ensure that it is performed within the boundaries permitted by applicable
laws and regulations. Effective measures must be taken to ensure that information is
securely protected.
– The packet header obtaining feature may be used to collect or store some
communication information about specific customers for transmission fault and
error detection purposes. Huawei cannot offer services to collect or store this
information unilaterally. Before enabling the function, ensure that it is performed
within the boundaries permitted by applicable laws and regulations. Effective
measures must be taken to ensure that information is securely protected.
l Reliability design declaration
Network planning and site design must comply with reliability design principles and
provide device- and solution-level protection. Device-level protection includes planning
principles of dual-network and inter-board dual-link to avoid single point or single link
of failure. Solution-level protection refers to a fast convergence mechanism, such as FRR
and VRRP.
Special Declaration
l This document serves only as a guide. The content is written based on device
information gathered under lab conditions. The content provided by this document is
intended to be taken as general guidance, and does not cover all scenarios. The content
provided by this document may be different from the information on user device
interfaces due to factors such as version upgrades and differences in device models,
board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are
beyond the scope of this document.
l The maximum values provided in this document are obtained in specific lab
environments (for example, only a certain type of board or protocol is configured on a
tested device). The actually obtained maximum values may be different from the
maximum values provided in this document due to factors such as differences in
hardware configurations and carried services.
l Interface numbers used in this document are examples. Use the existing interface
numbers on devices for configuration.
l The pictures of hardware in this document are for reference only.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Change History
Updates between document issues are cumulative. Therefore, the latest document issue
contains all updates made in previous issues.
l Changes in Issue 03 (2018-04-10)
This issue is the third official release. The software version of this issue is
V800R010C00SPC200.
l Changes in Issue 02 (2018-02-28)
This issue is the second official release. The software version of this issue is
V800R010C00SPC200.
l Changes in Issue 01 (2017-11-30)
This issue is the first official release. The software version of this issue is
V800R010C00SPC100.
2 What Is QoS
IP calls
E-commerce
……
Multi-media games Online movies
Diversified services enrich users' lives but also increase the risk of traffic congestion on the
Internet. In the case of traffic congestion, services can encounter long delays or even packet
loss. As a result, services deteriorate or even become unavailable. Therefore, a solution to
resolve traffic congestion on the IP network is urgently needed.
The best way to resolve traffic congestion is actually to increase network bandwidths.
However, increasing network bandwidths is not practical in terms of operation and
maintenance costs.
The quality of service (QoS) that uses a policy to manage traffic congestion at a low cost has
been deployed. QoS aims to provide end-to-end service guarantees for differentiated services
and has played an overwhelmingly important role on the Internet. Without QoS, service
quality cannot be guaranteed.
l Bandwidth/throughput
l Delay
l Delay variations (Jitter)
l Packet loss rate
Bandwidth/Throughput
Bandwidth, also called throughput, refers to the maximum number of bits allowed to transmit
between two ends within a specified period (1 second) or the average rate at which specific
data flows are transmitted between two network nodes. Bandwidth is expressed in bit/s.
IP
network
NOTE
Two concepts, upstream rate and downstream rate, are closely related to bandwidth. The upstream rate
refers to the rate at which users can send or upload information to the network, and the downstream rate
refers to the rate at which the network sends data to users. For example, the rate at which users upload
files to the network is determined by the upstream rate, and the rate at which users download files is
determined by the downstream rate.
Delay
A delay refers to the period of time during which a packet is transmitted from a source to its
destination.
Use voice transmission as an example. A delay refers to the period during which words are
spoken and then heard. If a long delay occurs, voices become unclear or interrupted.
Most users are insensitive to a delay of less than 100 ms. If a delay ranging from 100 ms to
300 ms occurs, the speaker can sense slight pauses in the responder's reply, which can seem
annoying to both. If a delay greater than 300 ms occurs, both the speaker and responder
obviously sense the delay and have to wait for responses. If the speaker cannot wait but
repeats what has been said, voices overlap, and the quality of the conversation deteriorates
severely.
IP
network
IP
network
Time I can
Delay
leave D1=50ms I can
. D2=50ms
leave
D3=10ms
You ,
D4=40ms you
can
not
D5=90ms
can
D6=90ms
not
Jitters also affect protocol packet transmissions. Specific protocol packets are transmitted at a
fixed interval. If high jitters occur, such protocols alternate between Up and Down, adversely
affecting quality.
Jitter thrives on networks but service quality will not be affected if jitters do not exceed a
specific tolerance. Buffers can alleviate excess jitters but prolong delays.
IP
network
Table 2-1 Service performance requirements (Table I.1 and Table I.2 in ITU-T G.1010)
M Application Symm Typical Key Performance Parameters and
ed etry Rate Values
ia
One-way Jitter Information Loss
Delay Rate
Voi Voice phone Two- 4-64 <150 ms <1 < 3% packet loss
ce way kbit/s preferred ms ratio (PLR)
<400 ms limit
Con V Convers Two- 4-25kb/ <150 msec <1 msec < 3% FER
vers oi ational way s (recommended)
ation ce
al/ <400 msec (upper
Real limit)
- V Video Two- 32-384 <150 msec ≤ 10 ms < 1% FER
time id phone way kb/s (recommended) (generall
e y),
os <400 msec (upper dependi
limit) ng on
< 100 msec (voice the
and image terminal
synchronization) perform
ance
<5% FER
(upper
limit)
Inter V Voice + Mainly 4-13 <1 sec (recording < 1 msec < 3% FER
activ oi Message one-way kb/s replay)
e ce <2 sec (recording)
3 DiffServ Overview
PHB-based
DS domain DS domain
forwarding
Boundary node Boundary node
Interior node Interior node
SLA/TCA
Service Boundary Boundary node
Different PHBs in
classification node
different DSs, being
and aggrgation
coordinated based on
User
the SLA/TCA User
network
network
l DiffServ (DS) node: a network node that implements the DiffServ function.
l DS boundary node: connects to another DS domain or a non-DS-aware domain. The DS
boundary node classifies and manages incoming traffic.
l DS interior node: connects to DS boundary nodes and other interior nodes in one DS
domain. DS interior nodes implement simple traffic classification based on DSCP
values, and manage traffic.
l DS domain: a contiguous set of DS nodes that adopt the same service policy and per-hop
behavior (PHB). One DS domain covers one or more networks under the same
administration. For example, a DS domain can be an ISP's networks or an organization's
intranet. For an introduction to PHB, see the next section.
l DS region: consists of one or more adjacent DS domains. Different DS domains in one
DS region may use different PHBs to provide differentiated services. The service level
agreement (SLA) and traffic conditioning agreement (TCA) are used to allow for
differences between PHBs in different DS domains. The SLA or TCA specifies how to
maintain consistent processing of the data flow from one DS domain to another.
l SLA: The SLA refers to the services that the ISP promises to provide for individual
users, enterprise users, or adjacent ISPs that need intercommunication. The SLA covers
multiple dimensions, including the accounting protocol. The service level specification
(SLS) provides technique description for the SLA. The SLS focuses on the traffic control
specification (TCS) and provides detailed performance parameters, such as the
committed information rate (CIR), peak information rate (PIR), committed burst size
(CBS), and peak burst size (PBS).
DSCP
DSCP domain
0 1 2 3 4 5 6 7
Precedence D T R C
IPv4
packet
8 bit
Version HeadLength ToS Total Length …
IPv6
packet
8 bit 20 bit
Version Traffic Class Flow Label Payload Length …
In an IPv4 packet, the six left-most bits (0 to 5) in the DS field are defined as the DSCP value,
and the two right-most bits (6 and 7) are reserved bits. Bits 0 to 2 are the Class Selector Code
Point (CSCP) value, indicating a class of DSCP. Devices that support the DiffServ function
perform forwarding behaviors for packets based on the DSCP value.
In IPv6 packet headers, two fields are related to QoS: TC and Flow Label (FL). The TC field
contains eight bits and functions the same as the ToS field in IPv4 packets to identify the
service type. The FL field contains 20 bits and identifies packets in the same data flow. The
FL field, together with the source and destination addresses, uniquely identifies a data flow.
All packets in one data flow share the same FL field, and devices can rapidly process packets
in the same data flow.
PHB
Per-hop Behavior (PHB) is a description of the externally observable forwarding treatment
applied at a differentiated services-compliant node to a behavior aggregate. A DS node
performs the same PHB for packets with the same DSCP value. The PHB defines some
forwarding behaviors but does not specify the implementation mode.
At present, the IETF defines four types of PHBs: Class Selector (CS), Expedited Forwarding
(EF), Assured Forwarding (AF), and best-effort (BE). BE PHB is the default.
PHB Applications
CS6 CS6 and CS7 PHBs are used for protocol packets by default, such as OSPF and
and BGP packets. If these packets are not forwarded, protocol services are interrupted.
CS7
EF EF PHB is used for voice services. Voice services require a short delay, low jitter,
and low packet loss rate, and are second only to protocol packets in terms of
importance.
NOTE
The bandwidth dedicated to EF PHB must be restricted so that other services can use the
bandwidth.
AF3 AF3 PHB is used for BTV services of IPTV. Live programs are real-time services,
requiring continuous bandwidth and a large throughput guarantee.
AF2 AF2 PHB is used for VoD services of IPTV. VoD services require lower real-time
performance than BTV services and allow delays or buffering.
AF1 AF1 PHB is used for leased-line services, which are second to IPTV and voice
services in terms of importance. Bank-based premium services, one type of
leased-line services, can use the AF4 or even EF PHB.
BE BE PHB applies to best-effort services on the Internet, such as email and telnet
services.
Traffic marking refers to external re-marking, which is implemented on outgoing packets. Re-
marking modifies the priority field of packets to relay QoS information to the next-hop device.
Internal marking is used for internal processing and does not modify packets. Internal marking is
implemented on incoming packets for the device to process the packets based on the marks before
forwarding them. The concept of internal marking is discussed later in this document.
l Policing and Shaping: restricts the traffic rate to a specific value. When traffic exceeds
the specified rate, traffic policing drops excess traffic, and traffic shaping buffers excess
traffic.
l Congestion management: places packets in queues for buffering when traffic
congestion occurs and determines the forwarding order based on a specific scheduling
algorithm.
l Congestion avoidance: monitors network resources. When network congestion
intensifies, the device proactively drops packets to regulate traffic so that the network is
not overloaded.
The four QoS components are performed in a specific order, as shown in the following figure.
The QoS components are performed at different locations on the network, as shown in the
following figure. In principle, traffic classification, traffic re-marking, and traffic policing are
implemented on the inbound user-side interface, and traffic shaping is implemented on the
outbound user-side interface (if packets of various levels are involved, queue scheduling and a
packet drop policy must be configured on the outbound user-side interface). Congestion
management and congestion avoidance are configured on the outbound network-side
interface.
PC
BRAS IGW
Phone
HG ONT
OLT Internet
STB LSW
P/CR CR
VoIP
Broadband DSLAM
access network
Enterprise PE/SR
leased line
IPTV
Corporation CE
Incoming traffic: traffic classification/marking, traffic
policing
Outgoing traffic: congestion management, congestion
avoidance, traffic shaping
Outgoing traffic: congestion management, congestion
avoidance
l Best-Effort
l Integrated service (IntServ)
l Differentiated service (DiffServ)
Best-Effort
Best-Effort is the default service model on the Internet and applies to various network
applications, such as FTP and email. It is the simplest service model. Without network
approval or notification, an application can send any number of packets at any time. The
network then makes its best attempt to send the packets but does not provide any guarantee
for performance.
The Best-Effort model applies to services that have low requirements for delay and reliability.
IntServ
Before sending a packet, IntServ uses signaling to apply for a specific level of service from
the network. The application first notifies the network of its traffic parameters and specific
service qualities, such as bandwidths and delays. After receiving a confirmation that sufficient
resources have been reserved, the application sends the packets. The network maintains a state
for each packet flow and executes QoS behaviors based on this state to fulfill the promise
made to the application. The packets must be controlled within the range described by the
traffic parameters.
IntServ uses the Resource Reservation Protocol (RSVP) as signaling, which is similar to
Asynchronous Transfer Mode Static Virtual Circuit (ATM SVC), and adopts connection-
oriented transmission. RSVP is a transport layer protocol and does not transmit data at the
application layer. Like ICMP, RSVP functions as a network control protocol and transmits
resource reservation messages between nodes.
When RSVP is used for end-to-end communication, the routers including the core routers on
the end-to-end network maintain a soft state for each data flow. A soft state is a temporary
state that refreshes periodically using RSVP messages. Routers check whether sufficient
resources can be reserved based on these RSVP messages. The path is available only when all
involved routers can provide sufficient resources.
IntServ uses RSVP to apply for resources over the entire network, requiring that all nodes on
the end-to-end network support RSVP. In addition, each node periodically exchanges state
information with its neighbor, consuming a large number of resources. More importantly, all
nodes on the network maintain a state for each data flow. On the backbone network, however,
there are millions of data flows. Therefore, the IntServ model applies to edge networks and
does not widely apply to the backbone network.
DiffServ
DiffServ classifies packets on the network into multiple classes for differentiated processing.
When traffic congestion occurs, classes with a higher priority are given preference. This
function allows packets to be differentiated and to have different packet loss rates, delays, and
jitters. Packets of the same class are aggregated and sent as a whole to ensure the same delay,
jitter, and packet loss rate.
In the DiffServ model, edge routers classify and aggregate traffic. Edge routers classify
packets based on a combination of fields, such as the source and destination addresses of
packets, precedence in the ToS field, and protocol type. Edge routers also re-mark packets
with different priorities, which can be identified by other routers for resource allocation and
traffic control. Therefore, DiffServ is a flow-based QoS model.
PC DiffServ
DiffServ域
domain VoIP
Phone PHB
PHB PHB IPTV
STB
PHB
Traffic Classification Resource allocation
and marking on the and traffic control
boundary node based on the marker
on the interior node
Compared with IntServ, DiffServ requires no signaling. In the DiffServ model, an application
does not need to apply for network resources before transmitting packets. Instead, the
application notifies the network nodes of its QoS requirements by setting QoS parameters in
IP packet headers. The network does not maintain a state for each data flow but provides
differentiated services based on the QoS parameters of each.
DiffServ takes full advantage of network flexibility and extensibility and transforms
information in packets into per-hop behaviors, greatly reducing signaling operations.
Therefore, DiffServ not only adapts to Internet service provider (ISP) networks but also
accelerates IP QoS applications on live networks.
Downstream
Downstream Downstream Downstream
PIC
PFE TM FIC
eTM
Abbreviations:
PIC: Physical Interface Controller Upstream
PFE: Packet Forward Engine
Downstream
TM: Traffic Manager
FIC: Fabric Interface Controller
eTM: Extra Traffic Manager
Figure 5-2 Packet forwarding process when the PIC is not equipped with an eTM subcard
CPU
ls
a Packet Packet
n
g
is
l Upstream board
a
ci lsl
trc t Upstream PFE Upstream TM Upstream e
cSwitched
e e e
l rn m (NP/ASIC) t t
e
FIC roNetwork
/e
l e a
h
e
kc kc ci
k a C
I t rf a M
in
l ict P E e y d
e s y a
P
g g P
l p n ca n tri n s e b w in in n
a m o rf ci io a o o t o
l s s o
ti
ics O a t i ) cif io
o b d t n F s s
e
rt
g
n t uf a rp g
e e
l f n
i f ta g
le
b R
yl e ) l)
e
c e a
t
y i in dra ic p ra ic n a A p s U a o c n
s f p t if i r o e
h s
p s d ot is rd d
t
C
p
a st P n p r
P e n mA s a a F ss r
a
n
o
( t e C io
t e
p m
g
U cr u B la w r mM la R o k p u Q
oP b c o c w
r A n c
a O e O ra
f( o p ( u F
in F C q V
ls
a
n
ig
s
l Downstream board
a
ci
rt Downst lsl
c CP Downstream Downstream e
le c
e t
e
Car PFE t TM t ream o
l/ n e e e
kc FIC
r
k a
cit r m e
(NP/ASIC) kc ic
in e a c a a M
l
l p th fr no a
fr / n g w P n
r it o P
Downstream n d n S o n in lo
a O E g e
t e icf tio ra ) ic
g ff tio L s F
g
n t
s
n
o
o
ti
ics n l P e la ti s e i ti a
PIC i n
i u R fa a n
i a a y a e )l u s a a izt
y s
s d d A rt cfii w k p tr ci
fi M a
-l
u
s m c a e s
e
ci
tl cil
h e n o C
A ss a
c p
aF s in k p r ro n
io
u
q c u p
e
k
P c u m
B a s a
t n
il a
c fo p t o
r M u c
o o l B mM a
l b n p a
r
P b
t c ( c n i Q p d
P
u O e C (O
o
CAR does not apply to CPU packets to prevent packet loss in the case of traffic congestion.
For packets to be sent to the CPU, the upstream PFE independently implements CP-CAR.
f. The upstream PFE sends packets to the upstream TM.
g. The upstream TM processes flow queues (optional) based on the user-queue
configuration on the inbound interface or in the MF classification profile, and then
implements VOQ processing. After that, the upstream TM sends packets to the
upstream Flexible Interface Card (FIC).
h. The upstream FIC fragments packets and encapsulates them into micro cells before
sending them to the switched network.
NOTE
Similar to an ATM module, the switched network forwards packets based on a fixed cell
length. Therefore, packets are fragmented before being sent to the switched network.
l Packet forwarding process for downstream traffic
Micro cells are sent from the switched network to the downstream TM.
a. The downstream FIC encapsulates the micro cells into packets again.
b. The downstream TM duplicates multicast packets.
c. The downstream TM processes flow queues based on the user-queue configuration
on the outbound interface (including the VLANIF interface) if needed, and
processes class queues (CQs) before sending them to the downstream PFE.
d. The downstream PFE searches the forwarding table for packet encapsulation
information. For example, for an IPv4, the PFE searches the forwarding table based
on the next hop. For an MPLS packet, the PFE searches the MPLS forwarding
table.
e. The downstream PFE implements MF classification based on the outbound
interface configuration and then BA traffic classification (only mapping from the
service class and drop precedence to the external priority).
f. The downstream PFE implements rate limit for downstream traffic based on the
CAR configuration on the outbound interface or in the MF traffic classification
profile.
g. For packets to be sent to the CPU, the downstream PFE implements CP-CAR
before sending them to the CPU. For packets not to be sent to the CPU, the
downstream PFE sends them to the outbound interface processing module for an
additional Layer 2 header (Layer 2 header and MPLS header are added for an
MPLS packet). After that, these packets are sent to the PIC.
h. The PIC converts packets to optical/electrical signals and sends them to the physical
link.
Figure 5-3 shows how a packet is forwarded when the PIC is equipped with an eTM subcard.
The operation for the upstream traffic is the same as that when the PIC is not equipped with
an eTM subcard. The difference in operations for downstream traffic lies in that the
downstream flow queues are processed in the eTM subcard when the PIC is equipped with an
eTM subcard and the downstream flow queues are processed on the downstream TM when
the PIC is not equipped with an eTM subcard. In addition, five-level scheduling (FQ -> SQ ->
GQ -> VI -> port) is implemented for downstream flow queues when the PIC is equipped
with an eTM subcard, whereas three-level scheduling + two-level scheduling are implemented
for downstream flow queues when the PIC is not equipped with an eTM subcard.
Figure 5-3 Packet forwarding process when the PIC is equipped with an eTM subcard
CPU
l
a Packet Packet
n
g
is
l
a
Upstream board switched
ci e lsl network
trc m Upstream PFE Upstream Upstrea e
c
e a
rf (NP/ASIC) FIC
l t
m FIC ro
/e
l e t t
e
ci
k a n
r d e kc M
in
l ict e e g e s y kc g g n
n c in ) e w in n a o
l p re
t th o fa n p s n s
a o o
t
b
t a lo s i
s P ti
a O s C
mI E g r c
le iff tio p e ic io
ff ta b e d n P F s s ta
ics p a P n
is te
n ua a a tii
a g
l
b R ly
p
e
s
) )l e
c e
c n
y U i d rt c
fii m ro rt cfii in a
t A p
U a ro o e
s i s P n r
h e d o s rd rp F s d
r n C
( a t C io
p p m
P c n
u mA B s
a a f Ma
s a o t
o
e
k t e g
o
r o l w l R c p u Q a
b c r o c rw A n a e O r
P
in
o
f( o C p (O u V F
F q
l
a
n
ig
s
l Downstream board
a
ci
rt e e lsl
c m ETM m CP Car Downstream Downstr Downstr e
le a
rf subcard a
rf c
PFE (NP/ASIC) eam TM eam FIC
/e
l t t t ro
a IC e
n e
n t
e
e
kc
ci
k ict P r r - k M
in
l p e e e
c k c a
l m
a th h
t no a n
c
la M n i/l n a P n
a O e E e g Eg rf o ) o
it o P
n ts n o
ics rt u n e c
i i d s ti
s
e
u
is
in
t
in
le
u A ff ta ra g
ns
i
F S
ifi tr L re la it
a a tio a
z
y s s a c
d C R tr ifi w Py u c a it
h n q e s d r pc a a s m til c i
w w c e n o s o p tio ffic M a
l p r u lp e
P o lo o
r c
o
u m As
Ba F a
( m in a
c
o
f Mu
k
c
D F p r o lc n ta n in d a
P b
t b e P
u O
o
Internal priority:
Service class & Color
flow queues based on the service class, and WRED drop policy is implemented for
flow queues based on the color if needed.
b. The upstream TM processes VOQs. VOQs are classified based on the destination
board. The information about the destination board is obtained based on the
outbound interface of packets. Then, packets are put into different VOQs based on
the service class.
c. After being scheduled in VOQs, packets are sent to the switched network and then
forwarded to the destination board on which the outbound interface is located.
d. Then, packets are sent to the downstream TM.
l On the downstream TM
a. (This step is skipped when the downstream PIC is equipped with an eTM subcard)
The downstream TM processes flow queues based on the user-queue configuration
on the outbound interface. Packets are put into different flow queues based on the
service class, and WRED drop policy is implemented for flow queues based on the
color if needed.
b. (This step is skipped when the downstream PIC is equipped with an eTM subcard)
The downstream TM processes port queues (CQs). Packets are put into different
CQs based on the service class, and WRED drop policy is implemented for CQs
based on the color if WRED is configured.
c. Then, packets are sent to the downstream PFE.
l On the downstream PFE:
a. The downstream PFE implements MF traffic classification based on the outbound
interface configuration. MF traffic classification requires the downstream PFE to
obtain multiple field information for traffic classification. Behaviors, such as filter
and re-mark, are performed based on traffic classification results. If the behavior is
re-mark, the downstream PFE modifies the internal priority of packets (service class
and color).
b. The downstream PFE implements CAR for packets based on the outbound interface
configuration or MF traffic classification configuration. If both interface-based
CAR and MF traffic classification-based CAR are configured, MF traffic
classification-based CAR takes effect. In a CAR operation, a pass, drop, or pass+re-
mark behavior can be performed for incoming traffic. If the behavior is pass+re-
mark, the downstream PFE modifies the internal priority of packets (service class
and color).
c. The priorities of outgoing packets are set for newly added packet headers and are
modified for existing packet headers, based on the service class and color.
d. Then, packets are sent to the downstream PIC.
n When the PIC is not equipped with an eTM subcard, the PIC adds the link-
layer CRC to the packets before sending them to the physical link.
n When the PIC is equipped with an eTM subcard, the PIC adds the link-layer
CRC to the packets and performs a round of flow queue scheduling before
sending the packets to the physical link. Downstream flow queues are
processed based on the user-queue configuration on the outbound interface.
Packets are put into different FQs based on the service class, and WRED drop
policy is implemented for FQs based on the color if WRED is configured.
When the PIC is equipped with an eTM subcard, downstream packets are not
scheduled on the downstream TM.
L2 Header(14) NPtoTM(4)
Data(46~1500) Data(46~1500)
This field exists only when the downstream
CRC(4)
PIC is equipped with an eTM.
NOTE
l CAR calculates the bandwidth of packets based on the entire packet. For example, CAR counts the
length of the frame header and CRC field but not the preamble, inter frame gap, or SFD of an
Ethernet frame in the bandwidth. The following figure illustrates a complete Ethernet frame (bytes).
Minimum 12 7 1 6 6 2 46 to 1500 4
The bandwidth covers the CRC field but not the IFG field.
l The upstream PFE adds a Frame Header, which is removed by the downstream PFE. The Frame
header is used to transfer information between chips. NPtoTM and TMtoNP fields are used to
transfer information between the NP and TM.
l When the PIC is not equipped with an eTM subcard, the length of a packet scheduled on the
downstream TM is different from that of the packet sent to the link. To perform traffic shaping
accurately, you must run the network-header-length command to compensate the packet with a
specific length.
On the downstream interface on the network side:
l when the downstream TM implements traffic shaping for packets, the TNtoNP and Frame Header
field values of the packets are not calculated. Therefore, the packet scheduled on the downstream
TM does not contain the IFG, L2 Header (14 bytes), two MPLS Labels, or CRC fields, compared
with the packet sent to the link. A +26-byte compensation (including the L2 header, two MPLS
labels, and CRC field, but not including the IFG field) or a +46-byte compensation (including the
IFG field) can be performed for the packet.
l When the PIC is equipped with an eTM subcard, no packet length compensation is required.
L2 Header(14)
CRC(4)
This filed exists only when the
downstream PIC is equipped with an eTM.
NOTE
NOTE
NOTE
CRC(4) IP Header(20)
Data(46~1500)
This field exists only when the downstream
PIC is equipped with an eTM.
NOTE
In Layer Ethernet forwarding scenarios, a data frame can be a VLAN-tagged, QinQ-tagged, or untagged
frame. Use a VLAN-tagged frame as an example. In Layer 2 forwarding, both the Layer 2 Ethernet
frame header and the VLAN tag of a packet are forwarded to the downstream TM, and only the CRC
field is removed.when the downstream TM implements traffic shaping for packets, the TNtoNP and
Frame Header field values of the packets are not calculated. Therefore, the packet scheduled on the
downstream TM does not contain the CRC field, compared with the packet sent to the link. A +4-byte
compensation (not including the IFG field) or a +24-byte compensation (including the IFG field) can be
performed for the packet.
For more details, see Incoming packet in sub-interface accessing L3VPN networking.
IFG(12+7+1) NPtoTM(4)
L2 Header(14) Frame
Header(14)
L2 Header(14)
Data(46~1500)
IP Header(20) switched
Data(46~1500)
CRC(4) IP Header(20) IP Header(20) network
CRC(4) Data(46~1500) Data(46~1500)
Downstream Downstream
PFE Downstream
PIC TM
IFG(12+7+1) eTM Header(4) TMtoNP(2)
CRC(4)
NOTE
L2 Header(14)
Frame
Data(46~1500)
IP Header(20) Header(14)
L2 Header(14) switched
Data(46~1500)
CRC(4) IP Header(20) IP Header(20) network
CRC(4) Data(46~1500) Data(46~1500)
Downstream
Downstream Downstream
PIC
PFE TM
IFG(12+7+1) eTM Header(4) TMtoNP(2)
Data(46~1500) Data(46~1500)
This field exists only when the downstream
CRC(4) PIC is equipped with an eTM.
NOTE
Frame
PPP Header(8) PPP Header(8)
Header(14) switched
Data(46~1500)
IP Header(20) IP Header(20) IP Header(20)
network
Data(46~1500)
CRC(4) Data(46~1500) Data(46~1500)
Data(46~1500) IP Header(20)
Data(46~1500)
NOTE
Figure 5-13 Outgoing L3VPN packet on the user side of the PE in QinQ interface accessing
L3VPN networking
IFG(12+7+1)
L2 Header(14)
Data(46~1500) Data(46~1500)
This field exists only when the downstream
CRC(4)
PIC is equipped with an eTM.
NOTE
Figure 5-14 Outgoing L3VPN packet on the user side of the PE in POS interface accessing
L3VPN networking
PPP Header(8) PPP Header(8) NPtoTM(4)
Data(46~1500)
IP Header(20) PPP Header(8) Frame
Header(14)
Data(46~1500)
CRC(4) IP Header(20)
Data(46~1500) IP Header(20)
Data(46~1500)
NOTE
CRC(4) IP Header(20)
Data(46~1500)
This field exists only when the downstream
PIC is equipped with an eTM.
NOTE
In VLAN mapping scenarios, both the Layer 2 Ethernet frame header and the VLAN tag of a packet are
forwarded to the downstream TM, and only the CRC field is removed. The VLAN tag value is replaced
with a new VLAN tag value.
when the downstream TM implements traffic shaping for packets, the TNtoNP and Frame Header field
values of the packets are not calculated. Therefore, the packet scheduled on the downstream TM does
not contain the CRC field, compared with the packet sent to the link. A +4-byte compensation (not
including the IFG field) or a +24-byte compensation (including the IFG field) can be performed for the
packet.
For more details, see Incoming packet in sub-interface accessing L3VPN networking.
Figure 5-16 Outgoing packet in POS interface accessing VLL heterogeneous interworking
scenarios
IFG(12+7+1)
L2 Header(14)
Data(46~1500)
IP Header(20) PPP Header(8) Frame
Header(14)
Data(46~1500)
CRC(4) IP Header(20)
Data(46~1500) IP Header(20)
Data(46~1500)
NOTE
In VLL heterogeneous interworking scenarios, both the L2 header and MPLS label of a packets are
removed on the upstream TM.
l when the downstream TM implements traffic shaping for packets, the TNtoNP and Frame Header
field values of the packets are not calculated. Therefore, the packet scheduled on the downstream
TM does not contain the PPP header, compared with the packet sent to the link. A +8-byte
compensation can be performed for the packet.
l When the PIC is equipped with an eTM subcard, no packet length compensation is required.
Figure 5-17 Outgoing packet in POS interface accessing VLL homogeneous interworking
scenarios
IFG(12+7+1)
L2 Header(14)
Data(46~1500)
IP Header(20) PPP Header(8) Frame
Header(14)
Data(46~1500)
CRC(4) IP Header(20)
Data(46~1500) IP Header(20)
Data(46~1500)
NOTE
In VLL heterogeneous interworking scenarios, both the L2 header and MPLS label of a packets are
removed on the upstream TM.
when the downstream TM implements traffic shaping for packets, the TNtoNP and Frame Header field
values of the packets are not calculated. Therefore, the packet scheduled on the downstream TM does
not contain the PPP header, compared with the packet sent to the link. A +8-byte compensation can be
performed for the packet.
After packets are classified at the DiffServ domain edge, internal nodes provide differentiated
services for classified packets. A downstream node can accept and continue the upstream
classification or classify packets based on its own criteria.
Traffic Behaviors
A traffic classifier is configured to provide differentiated services and must be associated with
a certain traffic control or resource allocation behavior, which is called a traffic behavior.
The following table describes traffic behaviors that can be implemented individually or jointly
for classified packets on a NE40E.
Traffic policing Restricts the traffic rate to a specific value. When traffic
exceeds the specified rate, excess traffic is dropped.
Congestion management Places packets in queues for buffering. When traffic congestion
occurs, the device determines the forwarding order based on a
specific scheduling algorithm and performs traffic shaping for
outgoing traffic to meet users' requirements on the network
performance.
Packet filtering Functions as the basic traffic control method. The device
determines whether to drop or forward packets based on traffic
classification results.
URPF (Unicast Reverse Prevents the source address spoofing attack. URPF obtains the
Path Forwarding) source IP address and the inbound interface of a packet and
checks them against the forwarding table. If the source IP
address is not found, URPF considers the source IP address as a
pseudo address and drops the packet.
Flow mirroring Allows a device to copy an original packet from a mirrored port
and to send the copy to the observing port.
Modifying the TTL Modifies the Time To Live (TTL) value of IP packet headers.
value
DSCP
0 1 2 3 4 5 6 7
Precedence D T R C
8 bit
Version HeadLength ToS Total Length …
The EXP field is 3 bits long and indicates precedence. The value ranges from 0 to 7 with a
larger value reflecting a higher precedence.
The precedence field in an IP header also has three bits. Therefore, one precedence value in
an IP header exactly corresponds to one precedence value in an MPLS header. However, the
DSCP field in an IP header has 6 bits, unlike the EXP length. Therefore, multiple DSCP
values correspond to only one EXP value. As the IEEE standard defines, the three left-most
bits in the DSCP field (the CSCP value) correspond to the EXP value, regardless of what the
three right-most bits are.
The PRI field is 3 bits long and indicates precedence. The value ranges from 0 to 7 with a
larger value reflecting a higher precedence.
Table 6-1 Mapping between the 802.1p/IP Precedence value and applications
802.1p/IP Precedence Typical Applications
5 Voice streams
4 Video conferencing
3 Call signaling
6.3 BA Classification
802.1p
Downstream
Service-class
Service-class refers to the internal service class of packets. Eight service-class values are
available: class selector 7 (CS7), CS6, expedited forwarding (EF), assured forwarding 4
(AF4), AF3, AF2, AF1, and best effort (BE). Service-class determines the type of queues to
which packets belong.
The priority of queues with a specific service-class is calculated based on scheduling
algorithms.
l If queues with eight service-class all use priority queuing (PQ) scheduling, the queues
are displayed in descending order of priorities: CS7 > CS6 > EF > AF4 > AF3 > AF2 >
AF1 > BE.
l If the BE queue uses PQ scheduling (rarely on live networks) but all the other seven
queues use weighted fair queuing (WFQ) scheduling, the BE queue is of the highest
priority.
l If queues with eight service-class all use WFQ scheduling, the priority is irrelevant to
WFQ scheduling.
NOTE
More details about queue scheduling are provided later in this document.
Color
Color, referring to the drop precedence of packets on a device, determines the order in which
packets in one queue are dropped when traffic congestion occurs. As defined by the Institute
of Electrical and Electronics Engineers (IEEE), the color of a packet can be green, yellow, or
red.
Drop precedences are compared based on the configured parameters. For example, if a
maximum of 50% of the buffer area is configured to store packets colored Green, whereas a
maximum of 100% of the buffer area is configured to store packets colored Red, the drop
precedence of packets colored Green is higher than that of packets colored Red.
IEEE defines eight PHBs (CS7, CS6, EF, AF4, AF3, AF2, AF1, and BE) and further defines four
PHBs for three drop precedences. Therefore, the total number of PHBs is 16 (4 + 4 x 3 = 16).
There are 64 DSCP values, allowing each PHB to correspond to a DSCP value. However, there are
only eight 802.1p values, causing some PHBs not to have corresponding 802.1p values. Generally
the eight 802.1p values correspond to the eight scheduling precedence. IEEE 802.1ad defines
STAG and CTAG formats, with the STAG supporting Drop Eligible Indicator (DEI) whereas the
CTAG does not. IEEE 802.1ad provides a 3-bit Priority Code Point (PCP) field that applies to both
the CTAG and STAG to specify the scheduling and drop precedence. PCP allows an 802.1p value
to indicate both the scheduling and drop precedences, and also brings the concepts of 8p0d, 7p1d,
6p2d, and 5p3d. The letter p indicates the scheduling precedence, and the letter d indicates the
drop precedence. For example, 5p3d supports five scheduling precedences and three drop
precedences.
The default and 5p3d domains exist by default and cannot be deleted, and only the default
domain can be modified.
Table 6-2 Default mapping from the DSCP value to the service-class and color
DSCP Service- Color DSCP Service- Color
class class
8 AF1 29 BE Green
9 BE 30 AF3 Red
10 AF1 31 BE Green
11 BE 32 AF4
12 AF1 Yellow 33 BE
13 BE Green 34 AF4
14 AF1 Red 35 BE
16 AF2 37 BE Green
17 BE 38 AF4 Red
18 AF2 39 BE Green
19 BE 40 EF
21 BE Green 46 EF
22 AF2 Red 47 BE
23 BE Green 48 CS6
24 AF3 49~55 BE
25 BE 56 CS7
26 AF3 57~63 BE
27 BE
Table 6-3 Default mapping from the service-class and color to the DSCP value
Service-class Color DSCP
BE Green 0
AF1 Green 10
AF1 Yellow 12
AF1 Red 14
AF2 Green 18
AF2 Yellow 20
AF2 Red 22
AF3 Green 26
AF3 Yellow 28
AF3 Red 30
AF4 Green 34
AF4 Yellow 36
AF4 Red 38
EF Green 46
CS6 Green 48
CS7 Green 56
Table 6-4 Default mapping from the IP Precedence/MPLS EXP/802.1p to the service-class
and color
0 BE Green
1 AF1 Green
2 AF2 Green
3 AF3 Green
4 AF4 Green
5 EF Green
6 CS6 Green
7 CS7 Green
Table 6-5 Default mapping from the service-class and color to IP Precedence/MPLS EXP/
802.1p
Priority
7 7DE 6 6DE 5 5DE 4 4DE 3 3DE 2 2DE 1 1DE 0 0DE
drop_eligible
8P0D
7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0
(default)
7P1D 7 7 6 6 5 4 5 4 3 3 2 2 1 1 0 0
PCP
6P2D 7 7 6 6 5 4 5 4 3 2 3 2 1 1 0 0
5P3D 7 7 6 6 5 4 5 4 3 2 3 2 1 0 1 0
PCP 7 6 5 4 3 2 1 0
8P0D
7 6 5 4 3 2 1 0
(default)
drop_eligible
7P1D 7 6 4 4DE 3 2 1 0
Priority
As shown in Figure 6-6, the number that ranges from 0 to 7 indicates the 802.1p value. The
value in the format of number x+letter DE indicates that the 802.1p priority is x and the
drop_eligible value is true. If the drop_eligible value is false, the drop precedence can be
ignored. If the drop_eligible value is true, the drop precedence cannot be ignored.
The 5p3d domain on a NE40E uses an IEEE 802.1ad-compliant priority mapping table by
default. Table 1-9 shows the mapping table that is designed to match the IEEE 802.1ad.
The default mapping between the 802.1p value, service-class, and color for the 5p3d domain
on a NE40E is shown in Table 6-7 and Table 6-8.
Table 6-7 Mapping from the 802.1p value to the service-class and color
0 BE Yellow
1 BE Green
2 AF2 Yellow
3 AF2 Green
4 AF4 Yellow
5 AF4 Green
6 CS6 Green
7 CS7 Green
NOTE
The mapping from the 802.1p value to the service-class may apply to an inbound interface that belongs
to a non-5p3d domain, leading to eight 802.1p values in Table 6-7. The outbound interface belongs to a
5p3d domain, leading to five service-classes in Table 1-10: BE, AF2, AF4, CS6, and CS7.
Table 6-8 Mapping from the service-class and color to the 802.1p value
BE Green 1
BE Yellow 0
BE Red 0
AF1 Green 1
AF1 Yellow 0
AF1 Red 0
AF2 Green 3
AF2 Yellow 2
AF2 Red 2
AF3 Green 3
AF3 Yellow 2
AF3 Red 2
AF4 Green 5
AF4 Yellow 4
AF4 Red 4
EF Green 5
EF Yellow 4
EF Red 4
CS6 Green/Yellow/Red 6
CS7 Green/Yellow/Red 7
NOTE
In Table 6-8, the mapping from the service-class and color to the 802.1p value may apply to an inbound
interface that uses a 5p3d domain or DSCP, EXP, or IP precedence as a basis for mapping, leading to
eight service-classes. The outbound interface may use a non-5p3d domain, leading to eight 802.1p
values.
Table 6-10 Mappings from service types to DSCP values (Figure 3 in RFC)
Service Type DSCP DSCP Value Application Examples
Name
Table 6-11 Traffic classification recommendations from the 3GPP (Table 6.1.7 in 3GPP
TS23.203)
QC Resourc Priority Data Packet Error Typical Service
I e Type Packet Rate and
Delay Loss Rate
The 3GPP does not provide any recommendations on mappings between QCIs and DSCP
values. For Huawei's recommendations, see Table 6-12.
IP Clock - 0x2E(4 5 EF
6)
Table 6-13 Mappings between traffic types and DSCP values (Table 6 in GSMA IR34)
Table 6-14 Mappings between service applications and DSCP values (Table 7 in GSMA
IR34)
VoIP EF Conversational
Best Effort 0(default) HTTP, IM, Default service types, requiring only
X11 best-effort service quality
The MEF 23.1 standard also provides recommended mappings between CoS labels and DSCP
values. For details, see Table 6-17 and Table 6-18.
VoIP H
Videoconf data M
IPTV data M
IPTV control M
Streaming media L
Database client/server L
Financial/Trading H
CCTV H
Telepresence H
Circuit Emulation H
Mobile BH H H
Mobile BH M M
Mobile BH L L
Table 6-17 Color IDs when the CoS ID type is only EVC or OVC EP (Table 3 in MEF 23.1)
CoS Label CoS ID Color ID
Type
C-Tag PCP PHB (DSCP)
M 3 2 AF31(26 AF32(28 3 2 3
) )
AF33(30
)
L 1 0 AF11(10 AF12(12 1 0 1
) )
AF13(14
)
DF(0)
Table 6-19 IP network QoS type definitions and network performance counters (Table 1 in
ITU-T Y.1541)
Network Network QoS Type
Performa Performance
nce Target Class Class Class Class Class Class 5
Paramete 0 1 2 3 4 Unspeci
r fied
Figure 6-7 Multi-service classification based on a few QoS types (Figure 2 in ITU-T Y.1541)
MPLS DiffSev
On an MPLS network, EXP values are used to identify a maximum of eight service priorities.
If there are more than eight types of services, multiple types must be aggregated to one PHB.
Relevant standards reclassifies services as four types and provides recommended DSCP and
EXP values. For details, see Table 6-21.
Table 6-21 Treatment Aggregate and MPLS EXP Field Usage (Figure 2 and Figure 3 in RFC)
Service P DSCP Four- QoS Counter Exp
Type H Type
B Binary Dela Jitter Packet Binary
(Deci y Tolera Loss (Decimal
mal Tole nce Rate Notation)
Notati ranc Tolera
on) e nce
C 10100
S5 0(40)
A 10001
F4 0(34)
1
Signaling A 10010
F4 0(36)
2
Multimedia A 10011
Conferencing F4 0(38)
3
Real-Time C 10000
Interactive S4 0(32)
Broadcast C 01100
Video S3 0(24)
A 01001
F2 0(18)
1
Low-Latency A 00101
Data F1 0(10)
1
A 01010
F2 0(20)
2
A 00110
F1 0(12)
2
A 01111
F3 0(30)
3
A 01011
F2 0(22)
3
High- A 01011
Throughput F1 0(14)
Data 3
trust upstream The board takes BA action for Both the BA and PHB Symbols are
inbound packets and takes PHB set as "Y".
action for outbound packets.
diffserv-mode { pipe | short- The board takes BA action for l The "BA-symbol" and "PHB-
pipe } inbound packets. symbol" keeps unchanged.
diffserv-mode uniform This command is default Both the "BA-symbol" and "PHB-
configuration and does not affect symbol" keep unchanged.
the actions of the inbound and
outbound boards.
remark (inbound) The "BA-symbol" is set as "Y" and the "PHB-symbol" is not affected.
The inbound board takes BA action and remarks the inbound packets,
regardless of the "BA-symbol" and "PHB-symbol".
remark (outbound) The outbound board remarks the outbound packets, regardless of the
"BA-symbol" and "PHB-symbol".
For example, assume that the remark dscp 11 command is configured
for outbound interface and the service-class and color of the packet are
<ef, green>. The DSCP of the packet is set as 11 directly, rather than the
value mapped from <ef, green> based on the downstream PHB mapping
table. If the outbound packet has vlan tag, the 802.1p of the vlan tag is set
based on <ef, green> and the downstream PHB mapping table. If both the
remark dscp 11 command and the remark 8021p command are
configured for the outbound interface, then both the DSCP and the
802.1p of the packet are modified directly according the remark
commands.
qos car { green | yellow | red } The service-class and color of the Both the "BA-symbol" and "PHB-
pass service-class color packet are reset. symbol" keep unchanged.
l To set BA-symbol as "N", configure the "service-class class-value color color-value no-
remark" command, or do not configure the four commands stated above on the inbound
interface.
l To set PHB-symbol as "Y", configure the "trust upstream" or "qos phb enable"
command on outbound interface.
l To set PHB-symbol as "N", configure the "qos phb disable" command or do not
configure the "trust upstream" command on outbound interface.
BA PHB
N N No
Y N No
Y Y Yes
trust trust
upstream 8021p
No No Any types No field is trusted and the packet is mapped to <BE, Green>.
No Yes Any types No field is trusted and the packet is mapped to <BE, Green>.
IPoPPP, DSCP
IPoHDLC, IPoFR
trust trust
upstream 8021p
Non-VLAN & No field is trusted and the packet is mapped to <BE, Green>.
Non-QinQ
NOTE
"Other types" here indicates neither IP nor MPLS packets when the outer L2 header is removed.
Table 6-26 Rules for Marking the 802.1p Field of New-added VLAN Tag
PHB-symbol Rules for marking the 802.1p field of the new-added VLAN tag
Y According to the <service-class, color> of the packet and the downstream priority
mapping table.
N Mark as 0.
Table 6-27 Rules for Marking the EXP Field of New-added MPLS Header
PHB-symbol Rules for marking the EXP field of the new-added MPLS header
Y Both inner and outer EXP: according to the <service-class, color> of the packet and the
downstream priority mapping table.
N Inner EXP of L3VPN or VLL: according to the <service-class, color> of the packet and
the downstream priority mapping table.
Outer EXP: according to the service-class of the packet:
l if service-class =BE, then Exp=0;
l if service-class =AF1, then Exp=1;
l if service-class =AF2, then Exp=2;
l if service-class =AF3, then Exp=3;
l if service-class =AF4, then Exp=4;
l if service-class =EF, then Exp=5;
l if service-class =EF, then Exp=5;
l if service-class =CS6, then Exp=6;
l if service-class =CS7, then Exp=7.
Inner EXP of VPLS: the same as the mark method of the outer EXP.
6.4 MF Classification
MF Items Remarks
Classification
802.1p value in
the inner VLAN
tag
Source MAC
address
Destination MAC
address
Protocol field
encapsulated in
Layer 2 headers
Source IPv4
address
NOTE
The IP address
pool is also
supported.
Destination IPv4
address
NOTE
The IP address
pool is also
supported.
IPv4 fragments
TCP/UDP source
port number
TCP/UDP
destination port
number
Protocol number
TCP
synchronization
flag
Protocol number
MF Items Remarks
Classification
Source IPv6
address
NOTE
The IP address
pool is also
supported.
Destination IPv6
address
NOTE
The IP address
pool is also
supported.
TCP/UDP source
port number
TCP/UDP
destination port
number
Source IPv4/IPv6
address
Destination IPv4/
IPv6 address
IPv4 fragments
TCP/UDP source
port number
TCP/UDP
destination port
number
Protocol number
TCP
synchronization
flag
User-group
NOTE
In addition to the preceding items that can be used in MF classification, a NE40E can perform MF
classification based on VLAN IDs, but does not use the VLAN ID solely. Instead, the MF classification
policy is bound to a VLAN ID (the same as being bound to an interface). The MF classification modes
shown in Table 1-1 support MF classification based on VLAN IDs.
In addition, a NE40E supports MF classification based on time periods for traffic control. MF
classification based on time periods allows carriers to configure a policy for each time period so that
network resources are optimized. For example, analysis on the usage habits of subscribers shows that the
network traffic peaks from 20:00 to 22:00, during which large volumes of P2P and download services
affect the normal use of other data services. Carriers can lower the bandwidths for P2P and download
services during this time period to prevent network congestion.
Configuration example:
time-range test 20:00 to 22:00 daily
acl 2000
rule permit source 10.9.0.0 0.0.255.255 time-range test //Configure
time-range in the ACL rule to specify the period during which the rule takes
effect.
traffic classifier test
if-match acl 2000
interface xxx
traffic-policy test inbound
Figure 6-8 Relationships between an interface, traffic policy, traffic behavior, traffic
classifier, and ACL.
Port Port
Policy Policy
ACL ACL
(2) One or more classifier and behavior pairs can be configured in a traffic policy. One
classifier and behavior pair can be configured in different traffic policies.
(3) One or more if-match clauses can be configured for a traffic classifier, and each if-match
clause can specify an ACL. An ACL can be applied to different traffic classifiers and contains
one or more rules.
l And: Packets that match all the if-match clauses configured in a traffic classifier belong
to this traffic classifier.
l Or: Packets that match any of the if-match clauses configured in a traffic classifier
belong to this traffic classifier.
NOTE
If several ACL rules and if-match rules configured in a traffic classifier, the And Logic needs
packets match one ACL rule and all if-match rules.
l If the traffic policy is in unshared mode, the two interfaces to which the traffic policy
applies are restricted individually. On each interface, the bandwidths of TCP traffic, UDP
traffic, and other traffic are restricted to 100 Mbit/s, 200 Mbit/s, and 300 Mbit/s,
respectively.
l If the traffic policy is in shared mode, the two interfaces to which the traffic policy
applies are restricted as a whole. The total bandwidths of TCP traffic, UDP traffic, and
other traffic on the two interfaces are restricted to 100 Mbit/s, 200 Mbit/s, and 300
Mbit/s, respectively.
NOTICE
If a traffic policy works in shared mode, the interfaces must apply traffic policy from the same
network processor on the same board.
Start
Get First
Classifier
No Yes
Get Next If-match Get First
Classifier valid? Rule
Deny
Get Next No Rule Yes Match Yes Rule Permit
If-match valid? rule? or Deny?
No Permit
Get Next
Rule
End
As shown in the figure, a packet is matched against traffic classifiers in the order in which
those classifiers are configured. If the packet matches a traffic classifier, no further match
operation is performed. If not, the packet is matched against the following traffic classifiers
one by one. If the packet matches no traffic classifier at all, the packet is forwarded with no
traffic policy executed.
If multiple if-match clauses are configured for a traffic classifier, the packet is matched
against them in the order in which they are configured. If an ACL or UCL is specified in an
if-match clause, the packet is matched against the multiple rules in the ACL or UCL. The
system first checks whether the ACL or UCL exists. (A non-existent ACL or UCL can be
applied to a traffic classifier.) If the packet matches a rule in the ACL or UCL, no further
match operation is performed.
A permit or deny action can be specified in an ACL for a traffic classifier to work with
specific traffic behaviors as follows:
l If the deny action is specified in an ACL, the packet that matches the ACL is denied,
regardless of what the traffic behavior defines.
l If the permit action is specified in an ACL, the traffic behavior applies to the packet that
matches the ACL.
NOTE
For traffic behavior mirroring or sampling, even if a packet matches a rule that defines a deny action, the
traffic behavior takes effect for the packet.
One traffic policy (parent policy) can cascade over multiple traffic policies (child policies),
and one traffic policy (child policy) can be cascaded by multiple traffic policies (parent
policies). However, the traffic policies cannot be circulated or nested.
When a two-level traffic policy instance is formed and the action of the traffic behavior in the
parent policy is the same as that of the traffic behavior in the child policy, the action of the
traffic behavior in the child policy is implemented.
NOTE
The same action configuration refers to the same action type. Even if the parameters are different, the
actions of the same type are considered the same action configuration. In this case, the action of the
traffic behavior in the child policy is implemented.
When the traffic behaviors for the parent and child policies are both service-class, service-class in the
parent policy preferentially takes effect. However, if service-class in the parent policy carries no-
remark, service-class in the child policy preferentially takes effect.
Hierarchical CAR
When a two-level traffic policy instance is created and the actions of the traffic behaviors in
both policies are CAR, both CAR configurations take effect. In addition, the CAR of the child
policy is implemented before the CAR of the parent policy. This is hierarchical CAR.
For example, the overall rate of the 1.1.1/24 network segment is set to 5 Mbit/s, and the rates
of the IP addresses 1.1.1.1/32 and 1.1.1.2/32 on the 1.1.1/24 network segment need to be
separately restricted and are set to 1 Mbit/s and 3 Mbit/s, respectively.
6.4.3 QPPB
QoS Policy Propagation on BGP (QPPB) is a special Multi-Field (MF) classification
application.
Background
The following example uses the network shown in Figure 6-10 to illustrate how QPPB is
introduced. In this networking, the AS 400 is a high priority network. All packets transmitted
across AS 400 must be re-marked with an IP precedence for preferential transmission. To
meet such requirements, edge nodes (Node-A, Node-B, and Node-C) in AS 100 must be
configured to re-mark the IP precedence of packets destined for or sent from AS 400. The
edge interface connecting to AS 400 on Node-C must be configured to re-mark packets.
Node-A or Node-B must be configured to perform traffic classification for packets destined
for an IP address in AS 400. If a large number of IP addresses or address segments are
configured in AS 400, Node-A and Node-B encounter excess traffic classification operations.
In addition, if the network topology is prone to changes, a large number of configuration
modifications are required.
AS 300
AS 400
Node-B 1.2.3.4
...
Node-C 12.1.3.5
Node-A ...
124.32.1.1
AS 100 ...
196.1.2.1
...
AS 200
To simplify configuration on Node-A and Node-B, QPPB is introduced. QPPB allows packets
to be classified based on AS information or community attributes.
QPPB, as the name implies, applies QoS policies using Border Gateway Protocol (BGP). The
primary advantage of QPPB is that BGP route attributes can be set for traffic classification by
the route sender and the route receiver must only configure an appropriate policy for receiving
routes. The route receiver sets QoS parameters for packets matching the BGP route attributes
and then implements corresponding traffic behaviors before data forwarding. When the
network topology changes, the BGP route receiver does not modify local configurations if the
route attributes of the advertised BGP routes do not change.
Implementation
As shown in Figure 6-11, Node-A and Node-C are IBGP peers in AS 100. Node-A is
configured to re-mark IP precedence for packets destined for or sent from AS 400. The QPPB
implementation is as follows:
1) Configure
2) AS400 routes BGP attributes
3) If received routes advertised to for AS400
match the BGP BGP peers carry routes
attribute, set BGP attributes
behavior IDs in the AS 400
FIB 1.2.3.4
...
Node-C 12.1.3.5
4) Perform the
BGP route ...
Behavior with Node-A
advertisement 124.32.1.1
the behavior ID
...
AS 100 196.1.2.1
AS 200 ...
Configure traffic
behaviors for
routes matching
attributes
1. The BGP route sender (Node-C) sets specific attributes for BGP routes (such as the
AS_Path, community attributes, and extended community attributes).
2. Node-C advertises these BGP routes.
3. The BGP route receiver (Node-A) presets attribute entries. After receiving BGP routes
matching the attribute entries, the BGP route receiver sets a behavior ID identifying a
traffic behavior in the forwarding information base (FIB) table.
4. Before transmitting packets, Node-A obtains the behavior IDs of the routes from the FIB
for these packets and performs the corresponding traffic behaviors for these packets.
The preceding process demonstrates that QPPB does not transmit the QoS policy along with
the BGP route information. The route sender sets route attributes for routes to be advertised,
and the route receiver sets the QoS policy based on the route attributes of the destination
network segment.
AS 300
Node-B
Node-C
Node-A AS 400
BGP route
advertisement
AS 100
AS 200
If-math community
Set community
apply car
As shown in Figure 6-12, QPPB allows the edge devices in AS 100 to classify inter-AS
packets. For example, to configure rate limit on Node-C for packets transmitted between AS
200 and AS 400, perform the following operations:
l For packets from AS 200 to AS 400, apply source address-based QPPB on all Node-C's
interfaces that belong to AS 100.
l For packets from AS 400 to AS 200, apply destination address-based QPPB on the
Node-C's interface connecting to AS 400.
NOTICE
FIB-based packet forwarding applies to upstream traffic but not downstream traffic.
Therefore, QPPB is enabled on the upstream interface of traffic.
MPLS CE
CE backbone PE
VPN2 VPN2
PE
As shown in Figure 6-13, PEs connect to multiple VPNs. A PE can set route attributes, such
as community, for a specified VPN instance before advertising any route. After receiving the
routing information, the remote peer imports the route and the associated QoS parameters to
the FIB table. This enables the traffic from CEs to be forwarded based on the corresponding
traffic behaviors. In this manner, different VPNs can be provided with different QoS
guarantees.
Inter-nation gateway
1.1.1.10
ISP of nation Y 13
As shown in Figure 6-14, QPPB is implemented as follows for user-to-ISP traffic accounting:
l BGP routes are advertised with community attributes.
l BGP routes are imported and the community attributes of the BGP routes are matched
against attribute entries. Behavior IDs are set in the FIB table for the routes matching the
attribute entries.
l A QPPB policy is configured. A corresponding traffic behavior (such as statistics
collection, CAR, and re-marking) is configured for the qos-local-id (Behavior ID).
l Destination address-based QPPB is enabled for incoming traffic.
l The QPPB policy is applied to incoming traffic on the user-side interface.
l During packet forwarding, the Behavior ID (qos-local-id) is obtained for packets based
on the destination IP address, and the corresponding traffic behavior is performed.
BGP community
1.1.1.12 attribute
100:12
Apply a QPPB
policy
User A ISP of nation X
BGP route
Enable source advertisement
address-based Enable source Inter-nation gateway
1.1.1.10
QPPB on interface address-based
BGP community QPPB on interface
ISP of nation Y
attribute
100:10
Local ISP BGP community
1.1.1.13
attribute
1.1.1.11 100:13
Destination Behavior ID BGP community
attribute
Local ISP 10
Domestic inter- 100:11
Domestic inter- domain ISP
11
domain ISP
ISP of nation X 12
ISP of nation Y 13
As shown in Figure 6-15, QPPB is implemented as follows for ISP-to-user traffic accounting:
l BGP routes are advertised with community attributes.
l BGP routes are imported and the community attributes of the BGP routes are matched
against attribute entries. Behavior IDs are set in the FIB table for the routes matching the
attribute entries.
l A QPPB policy is configured. A corresponding traffic behavior (such as statistics
collecting, CAR, and re-marking) is configured for the qos-local-id (Behavior ID).
l Source address-based QPPB is enabled for incoming traffic.
l The QPPB policy is applied to outgoing traffic on the user-side interface.
l During packet forwarding, the Behavior ID (qos-local-id) is obtained for packets based
on the source IP address, and the corresponding traffic behavior is performed.
qppb local-policy policyA
qos-local-id 10 behavior b10
qos-local-id 11 behavior b11
qos-local-id 12 behavior b12
qos-local-id 13 behavior b13
7.1.1 Overview
Traffic policing controls the rate of incoming packets to ensure that network resources are
properly allocated. If the traffic rate of a connection exceeds the specifications on an
interface, traffic policing allows the interface to drop excess packets or re-mark the packet
priority to maximize network resource usage and protect carriers' profits. An example of this
process is restricting the rate of HTTP packets to 50% of the network bandwidth)
Traffic policing implements the QoS requirements defined in the service level agreement
(SLA). The SLA contains parameters, such as the Committed Information Rate (CIR), Peak
Information Rate (PIR), Committed Burst Size (CBS), and Peak Burst Size (PBS) to monitor
and control incoming traffic. The device performs Pass, Drop, or Markdown actions for the
traffic exceeding the specified limit. Markdown means that packets are marked with a lower
service class or a higher drop precedence so that these packets are preferentially dropped
when traffic congestion occurs. This measure ensures that the packets conforming to the SLA
can have the services specified in the SLA.
Traffic policing uses committed access rate (CAR) to control traffic. CAR uses token buckets
to meter the traffic rate. Then preset actions are implemented based on the metering result.
These actions include:
l Pass: forwards the packets conforming to the SLA.
l Discard: drops the packets exceeding the specified limit.
l Re-mark: re-marks the packets whose traffic rate is between the CIR and PIR with a
lower priority and allows these packets to be forwarded.
Add tokens
Overflow tokens are
dropped.
Token capability
(depth)
NOTE
A token bucket measures traffic but does not filter packets or perform any action, such as dropping
packets.
As shown in Figure 7-2, when a packet arrives, the device obtains enough tokens from the
token bucket for packet transmission. If the token bucket does not have enough tokens to send
the packet, the packet either waits for enough tokens or is discarded. This feature limits
packets to be sent at a rate less than or equal to the rate at which tokens are generated.
Add tokens
The token bucket mechanism widely applies to QoS technologies, such as the committed
access rate (CAR), traffic shaping, and Line Rate (LR).
NOTE
This section only describes how to meter and mark packets using token buckets.
CIR
Overflow
Bucket
Bucket E EBS
C CBS
The srTCM uses two token buckets, C and E, which both share the common rate CIR. The
maximum size of bucket C is the CBS, and the maximum size of bucket E is the EBS.
When the EBS is 0, no token is added in bucket E. Therefore, only bucket C is used for
srTCM. When only bucket C is used, packets are marked either green or red. When the EBS
is not 0, two token buckets are used and packets are marked either green, yellow or red.
l CIR: the rate at which tokens are put into a token bucket. The CIR is expressed in bit/s.
l CBS: the committed volume of traffic that an interface allows to pass through, also the
depth of a token bucket. The CBS is expressed in bytes. The CBS must be greater than or
equal to the size of the largest possible packet entering a device.
l PIR: the maximum rate at which an interface allows packets to pass and is expressed in
bit/s. The PIR must be greater than or equal to the CIR.
l PBS: the maximum volume of traffic that an interface allows to pass through in a traffic
burst.
PIR
CIR Overflow
Overflow
Drop
Bucket
Drop PBS
Bucket P
C CBS
Tc and Tp refer to the numbers of tokens in buckets C and P, respectively. The initial values of
Tc and Tp are respectively the CBS and PBS.
In Color-Blind mode, the following rules apply when a packet of size B arrives at time t:
l If Tp(t) – B < 0, the packet is marked red, and The Tc and Tp values remain unchanged.
l If Tp(t) – B ≥ 0 but Tc(t) – B < 0, the packet is marked yellow, and Tp is decremented
by B.
l If Tc(t) – B ≥ 0, the packet is marked green and both Tp and Tc are decremented by B.
In Color-Aware mode, the following rules apply when a packet of size B arrives at time t:
l If the packet has been pre-colored as green, and Tp(t) – B < 0, the packet is re-marked
red, and neither Tp nor Tc is decremented.
l If the packet has been pre-colored as green and Tp(t) – B ≥ 0 but Tc(t) – B < 0, the
packet is re-marked yellow, and Tp is decremented by B, and Tc remains unchanged.
l If the packet has been pre-colored as green and Tc(t) – B ≥ 0, the packet is re-marked
green, and both Tp and Tc are decremented by B.
l If the packet has been pre-colored as yellow and Tp(t) – B < 0, the packet is re-marked
red, and neither Tp nor Tc is decremented.
l If the packet has been pre-colored as yellow and Tp(t) – B ≥ 0, the packet is re-marked
yellow, and Tp is decremented by B and Tc remains unchanged.
l If the packet has been pre-colored as red, the packet is re-marked red regardless of what
the packet length is. The Tp and Tc values remain unchanged.
7.1.3 CAR
What Is CAR
In traffic policing, committed access rate (CAR) is used to control traffic. CAR uses token
buckets to measure traffic and determines whether a packet is conforming to the specification.
CAR has the following two functions:
l Rate limit: Only packets allocated enough tokens are allowed to pass so that the traffic
rate is restricted.
l Traffic classification: Packets are marked internal priorities, such as the scheduling
precedence and drop precedence, based on the measurement performed by token
buckets.
CAR Process
l When a packet arrives, the device matches the packet against matching rules. If the
packet matches a rule, the router uses token buckets to meter the traffic rate.
l The router marks the packet red, yellow, or green based on the metering result. Red
indicates that the traffic rate exceeds the specifications. Yellow indicates that the traffic
rate exceeds the specifications but is within an allowed range. Green indicates that the
traffic rate is conforming to the specifications.
l The device drops packets marked red, re-marks and forwards packets marked yellow,
and forwards packets marked green.
CAR supports srTCM with single bucket, srTCM with two buckets, and trTCM. This section
provides examples of the three marking methods in Color-Blind mode. The implementation in
Color-Aware mode is similar to that in Color-Blind mode.
- - - - - 2000 2000 -
l TrTCM
This example uses the CIR 1 Mbit/s, the PIR 2 Mbit/s, and the CBS and EBS both 2000
bytes. Buckets C and P are initially full of tokens.
– If the first packet arriving at the interface is 1500 bytes long, the packet is marked
green because the number of tokens in both buckets P and C is greater than the
packet length. Then the number of tokens in both buckets P and C decreases by
1500 bytes, with 500 bytes remaining.
– Assume that the second packet arriving at the interface after a delay of 1 ms is 1500
bytes long. Additional 250-byte tokens are put into bucket P (PIR x time period = 2
Mbit/s x 1 ms = 2000 bits = 250 bytes) and 125-byte tokens are put into bucket C
(CIR x time period = 1 Mbit/s x 1 ms = 1000 bits = 125 bytes). Bucket P now has
750-byte tokens, which are not enough for the 1500-byte second packet. Therefore,
the second packet is marked red, and the number of tokens in buckets P and C
remain unchanged.
– Assume that the third packet arriving at the interface after a delay of 1 ms is 1000
bytes long. Additional 250-byte tokens are put into bucket P (PIR x time period = 2
Mbit/s x 1 ms = 2000 bits = 250 bytes) and 125-byte tokens are put into bucket C
(CIR x time period = 1 Mbit/s x 1 ms = 1000 bits = 125 bytes). Bucket P now has
1000-byte tokens, which equals the third packet length. Bucket C has only 750-byte
tokens, which are not enough for the 1000-byte third packet. Therefore, the third
packet is marked yellow. The number of tokens in bucket P decreases by 1000
bytes, with 0 bytes remaining. The number of tokens in bucket C remains
unchanged.
– Assume that the fourth packet arriving at the interface after a delay of 20 ms is 1500
bytes long. Additional 5000-byte tokens are put into bucket P (PIR x time period =
2 Mbit/s x 20 ms = 40000 bits = 5000 bytes), but excess tokens over the PBS (2000
bytes) are dropped. Bucket P has 2000-byte tokens, which are enough for the 1500-
byte fourth packet. Bucket C has 750-byte tokens left, and additional 2500-byte
tokens are put into bucket C (CIR x time period = 1 Mbit/s x 20 ms = 2000 bits =
250 bytes). This time 3250-byte tokens are destined for bucket C, but excess tokens
over the CBS (2000 bytes) are dropped. Bucket C then has 2000-byte tokens, which
are enough for the 1500-byte fourth packet. Therefore, the fourth packet is marked
green. The number of tokens in both buckets P and C decreases by 1500 bytes, with
500 bytes remaining.
The following table illustrates this process:
packet length always exceeds the CBS, causing the packets to be marked red or yellow
even if the traffic rate is lower than 100 Mbit/s. This leads to an inaccurate CAR
implementation.
The Bucket depth (CBS, EBS or PBS) is set based on actual rate limit requirements. In
principle, the bucket depth is calculated based on the following conditions:
1. Bucket depth must be greater than or equal to the MTU.
2. Bucket depth must be greater than or equal to the allowed burst traffic volume.
Condition 1 is easy to meet. Condition 2 is difficult to operate, and the following formula is
introduced:
Bucket depth (bytes) = Bandwidth (kbit/s) x RTT (ms)/8. Note that RTT refers to round trip
time and is set to 200 ms.
The following formulas are used for NE40Es:
l When the bandwidth is lower than or equal to 100 Mbit/s: Bucket depth (bytes) =
Bandwidth (kbit/s) x 1500 (ms)/8.
l When the bandwidth is higher than 100 Mbit/s: Bucket depth (bytes) = 100,000 (kbit/s) x
1500 (ms)/8.
NOTICE
CAR calculates the bandwidth of packets based on the entire packet. For example, CAR
counts the length of the frame header and CRC field but not the preamble, inter frame gap, or
SFD of an Ethernet frame in the bandwidth. The following figure illustrates a complete
Ethernet frame (bytes):
Minimum 12 7 1 6 6 2 46 to 1500 4
As shown in Figure 7-7, a NE40E connects a wide area network (WAN) and a local area
network (LAN). The LAN bandwidth (100 Mbit/s) is higher than the WAN bandwidth (2
Mbit/s). When a LAN user attempts to send a large amount of data to a WAN, the NE40E at
the network edge is prone to traffic congestion. Traffic policing can be configured on the
NE40E at the network edge to restrict the traffic rate, preventing traffic congestion.
LAN WAN
High-speed link Low-speed link
Interface bandwidth
2Mbps
s
bp
5 6k
A :2
User SL
network
As shown in Figure 7-9, traffic from the three users at 1.1.1.1, 1.1.1.2, and 1.1.1.3 is
converged to a NE40E. The SLA defines that each user can send traffic at a maximum rate of
256 kbit/s. However, burst traffic is sometimes transmitted. When a user sends a large amount
of data, services of other users may be affected even if they send traffic at a rate lower than
256 kbit/s. To resolve this problem, configure traffic classification and traffic policing based
on source IP addresses on the inbound interface of the device to control the rate of traffic sent
from different users. The device drops excess traffic when the traffic rate of a certain user
exceeds 256 kbit/s.
Traffic
1.1.1.1
policing
IP
backbone Internet
1.1.1.2
1.1.1.3
NOTE
Multiple traffic policies must be configured on the inbound interface to implement different rate limits
for data flows sent from different source hosts. The traffic policies take effect in the configuration order.
The first traffic policy configured is the first to effect first after data traffic reaches the interface.
Figure 7-10 shows how traffic policing works with congestion avoidance to control traffic. In
this networking, four user networks connect to a NE40E at the ISP network edge. The SLA
defines that each user can send FTP traffic at a maximum rate of 256 kbit/s. However, burst
traffic is sometimes transmitted at a rate even higher than 1 Mbit/s. When a user sends a large
amount of FTP data, FTP services of other users may be affected even if they send traffic at a
rate lower than 256 kbit/s. To resolve this problem, configure class-based traffic policing on
each inbound interface of the NE40E to monitor the FTP traffic and re-mark the DSCP values
of packets. The traffic at a rate lower than or equal to 256 kbit/s is re-marked AF11. The
traffic at a rate ranging from 256 kbit/s to 1 Mbit/s is re-marked AF12. The traffic at a rate
higher than 1 Mbit/s is re-marked AF13. Weighted Random Early Detection (WRED) is
configured as a drop policy for these types of traffic on outbound interfaces to prevent traffic
congestion. WRED drops packets based on the DSCP values. Packets in AF13 are first
dropped, and then AF12 and AF11 in sequence.
ISP
User network
network
Internet
User
network
Interface-based
traffic policing
User
network
l For upstream traffic, only statistics about packets after a CAR operation is implemented
can be collected. Statistics about the actual traffic in need and the packet loss during
CAR are not provided.
l For downstream traffic, only statistics about packets after a CAR operation is
implemented can be collected. Statistics about the forwarded and dropped packets are
not provided.
Carriers require statistics about traffic that has been implemented with CAR to analyze user
traffic beyond the specifications, which provides a basis for persuasion of purchasing a higher
bandwidth. Using the interface-based CAR statistics collection function, NE40Es can collect
and record statistics about the upstream traffic after a CAR operation (the actual access traffic
of an enterprise user or an Internet bar), as well as statistics about the forwarded and dropped
downstream packets after a CAR operation.
Figure 7-11 Data transmission from the high-speed link to the low-speed link
Traffic shaping
LAN WAN
High-speed link Low-speed link
Bandwidth Bandwidth
1 Gbit/s 2 Mbit/s
As shown in Figure 7-12, traffic shaping can be configured on the outbound interface of an
upstream device to make irregular traffic transmitted at an even rate, preventing traffic
congestion on the downstream device.
CIR
Time
On router, tokens are added at an interval, which is calculated in the format of CBS/CIR, with
the quantity equal to the CBS for traffic shaping.
NOTE
On router, the length of the frame header and CRC field are calculated in the bandwidth for packets to
which CAR applies but not calculated in the bandwidth for packets that have been implemented with
traffic shaping. For example, if the traffic shaping value is set to 23 Mbit/s for IPoE packets, the IP
packets are transmitted at a rate of 23 Mbit/s with the lengths of the frame header and CRC field not
counted.
In addition, whether the CBS can be modified in traffic shaping is determined by the product model,
product version, and board type.
Traffic shaping is implemented for packets that have been implemented with queue
scheduling and are leaving the queues. For details about queues and queue scheduling, see
Congestion Management and Avoidance.
There are two traffic shaping modes: queue-based traffic shaping and interface-based traffic
shaping.
l Queue-based traffic shaping applies to each queue on an outbound interface.
– When packets have been implemented with queue scheduling and are leaving
queues, the packets that do not need traffic shaping are forwarded; the packets that
need traffic shaping are measured against token buckets.
– After queues are measured against token buckets, if packets in a queue are
transmitted at a rate conforming to the specifications, the packets in the queue are
marked green and forwarded. If packets in a queue are transmitted at a rate
exceeding the specifications, the packet that is leaving the queue is forwarded, but
the queue is marked unscheduled and can be scheduled after new tokens are added
to the token bucket. After the queue is marked unscheduled, more packets can be
put into the queue, but excess packets over the queue capacity are dropped.
Therefore, traffic shaping allows traffic to be sent at an even rate but does not
provide a zero-packet-loss guarantee.
– Figure 7-13 Queue-based traffic shaping
Forward
g
in
l
u Leave Traffic
queue d
e queue shaping?
h
c
S
Comply
NOTE
Table 7-1
No. Tim Pack Tokens Tokens Process Queue Queue
e et in in ing Status Status
Leng Bucket Bucket Result Before After
th C C After Packet Packet
Before Packet Process Process
Packet Processi ing ing
Process ng
ing
l Interface-based traffic shaping, also called line rate (LR), is used to restrict the rate at
which all packets (including burst packets) are transmitted. Interface-based traffic
shaping takes effect on the entire outbound interface, regardless of packet priorities.
Figure 7-14 shows how interface-based traffic shaping is implemented:
– When packets have been implemented with queue scheduling and are leaving
queues, all queues are measured together against token buckets.
– After queues are measured against token buckets, if the packets total-rate
conforming to the specifications, the queue is forwarded. If the packet rate on an
interface exceeds the specification, the interface stops packet scheduling and will
resume scheduling when tokens are enough.
Queue 1
If complying, forward
Queue 2
Scheduling
Leave Token
Queue 3 queue bucket
……
Queue N If the packet rate on an interface
exceeds the specification, the
interface stops packet scheduling
and will resume scheduling when
tokens are enough.
NOTE
The principle of traffic shaping on an interface is the same as that of traffic shaping for queues and is
not described here.
Hub
Headquarters
Congestion
1G point
ISP
1G 1G
Spoke
Branch Spoke Branch
Hub Headquarters
1G 100M
I n t e r n et
ISP
1G 1G
Spoke
Branch Spoke Branch
ef: VoIP
Port
scheduling
af2: IPTV Multicast
af3: HSI
Shaping rate
Share-shaping
Last Mile
CPE
STB
HG
PC
Access IP
Phone MAN backbone
Modem network
Ethernet BRAS/SR
DSLAM
Router
Residential /Enterprise
network
IPoE/PPPoE
IPoE/PPPoE
Even if the link connects the user and DSLAM is also an Ethernet link, the encapsulation cost
of the packets sent between the user and DSLAM can possibly exceed that on the user side of
the BRAS or SR. For example, the Ethernet packet encapsulated on the BRAS or SR does not
carry a VLAN tag, but the packet sent between the user and DSLAM carries a single or
double VLAN tags due to VLAN or QinQ encapsulation.
To resolve this problem, last-mile QoS can be configured on the BRAS or SR. Last-mile QoS
allows a device to calculate the length of headers to be added to packets based on the
bandwidth purchased by users and the bandwidth of the downstream interface on the DSLAM
for traffic shaping.
Therefore, the BRAS or SR cannot automatically infer the sum length of the packets that has
been encapsulated on the DSLAM and requires compensation bytes.
After compensation bytes are configured, if the DSLAM connects to the CPE through an
Ethernet link, the BRAS or SR can automatically infer the sum length of the packet
encapsulated on the DSLAM based on the length of the forwarded packet and the configured
compensation bytes, and determine the shaped rate to be adjusted.
PPP header 2
Eth header 14
VLAN header 4
QinQ header 8
Difference
The following table lists the differences between traffic policing and traffic shaping.
Drops excess traffic over the specifications Buffers excess traffic over the
or re-marks such traffic with a lower specifications.
priority.
Packet loss may result in packet Packet loss rarely occurs, so does packet
retransmission. retransmission.
WAN
2M
E1 p s
s
2M E1
bp
Congestion
b
point
Data flow
Ethernet 10M bps Ethernet
b it/s
0M
Bandwidth mimatching 10
Aggregation problem
Traffic congestion is derived not only from link bandwidth restriction but also from any
resource shortage, such as available processing time, buffer, and memory resource shortage.
In addition, traffic is not satisfactorily controlled and exceeds the capacity of available
network resources, also leading to traffic congestion.
Location
As shown in Figure 8-3, traffic can be classified into the following based on the device
location and traffic forwarding direction:
l Upstream traffic on the user side
l Downstream traffic on the user side
l Upstream traffic on the network side
l Downstream traffic on the network side
Figure 8-3 Upstream and downstream traffic on the user and network sides
VoIP
SFU
Upstream on Upstream on
the user side the network side
Downstream Downstream
on the user on the network
side side
Generally, upstream traffic is not congested because upstream traffic does not bother with
traffic rate mismatch, traffic aggregation, or forwarding resource shortage. Downstream
traffic, instead, is prone to traffic congestion.
Impacts
Traffic congestion has the following adverse impacts on network traffic:
l Traffic congestion intensifies delay and jitter.
l Overlong delays lead to packet retransmission.
l Traffic congestion reduces the throughput of networks.
l Intensified traffic congestion consumes a large number of network resources (especially
storage resources). Unreasonable resource allocation may cause resources to be locked
and the system to go Down.
Therefore, traffic congestion is the main cause of service deterioration. Since traffic
congestion prevails on the PSN network, traffic congestion must be prevented or effectively
controlled.
Solutions
A solution to traffic congestion is a must on every carrier network. A balance between limited
network resources and user requirements is required so that user requirements are satisfied
and network resources are fully used.
Congestion management and avoidance are commonly used to relieve traffic congestion.
l Congestion management provides means to manage and control traffic when traffic
congestion occurs. Packets sent from one interface are placed into multiple queues that
are marked with different priorities. The packets are sent based on the priorities.
Different queue scheduling mechanisms are designed for different situations and lead to
different results.
l Congestion avoidance is a flow control technique used to relieve network overload. By
monitoring the usage of network resources in queues or memory buffer, a device
automatically drops packets on the interface that shows a sign of traffic congestion.
NOTE
The Traffic Manager (TM) on the forwarding plane houses high-speed buffers, for which all interfaces
have to compete. To prevent traffic interruptions due to long-time loss in the buffer battle, the system
allocates a small buffer to each interface and ensures that each queue on each interface can use the
buffer.
The TM puts received packets into the buffer and allows these packets to be forwarded in time when
traffic is not congested. In this case, the period during which packets are stored in the buffer is at μs
level, and the delay can be ignored.
When traffic is congested, packets accumulate in the buffer and wait to be forwarded. The delay greatly
prolongs. The delay is determined by the buffer size for a queue and the output bandwidth allocated to a
queue. The format is as follows:
Delay of a queue = Buffer size for the queue/Output bandwidth for the queue
Each interface on a NE40E stores eight downstream queues, which are called class queues
(CQs) or port queues. The eight queues are BE, AF1, AF2, AF3, AF4, EF, CS6, and CS7.
The first in first out (FIFO) mechanism is used to transfer packets in a queue. Resources used
to forward packets are allocated based on the arrival order of packets.
Scheduling Algorithms
The commonly used scheduling algorithms are as follows:
FIFO
FIFO does not need traffic classification. As shown in Figure 8-4, FIFO allows the packets
that come earlier to enter the queue first. On the exit of a queue, FIFO allows the packets to
leave the queue in the same order as that in which the packets enter the queue.
SP
SP schedules packets strictly based on queue priorities. Packets in queues with a low priority
can be scheduled only after all packets in queues with a high priority have been scheduled.
As shown in Figure 8-5, three queues with a high, medium, and low priority respectively are
configured with SP scheduling. The number indicates the order in which packets arrive.
Leave queue
Packet Packet Packet Packet PacketPacket
Medium-priority queue Packet Packet Packet
5 4 3 1 5 4 3 6 2
Packet
Low-priority queue
1
When packets leave queues, the device forwards the packets in the descending order of
priorities. Packets in the higher-priority queue are forwarded preferentially. If packets in the
higher-priority queue come in between packets in the lower-priority queue that is being
scheduled, the packets in the high-priority queue are still scheduled preferentially. This
implementation ensures that packets in the higher-priority queue are always forwarded
preferentially. As long as there are packets in the high queue no other queue will be served.
The disadvantage of SP is that the packets in lower-priority queues are not processed until all
the higher-priority queues are empty. As a result, a congested higher-priority queue causes all
lower-priority queues to starve.
RR
RR schedules multiple queues in ring mode. If the queue on which RR is performed is not
empty, the scheduler takes one packet away from the queue. If the queue is empty, the queue
is skipped, and the scheduler does not wait.
WRR
Compared with RR, WRR can set the weights of queues. During the WRR scheduling, the
scheduling chance obtained by a queue is in direct proportion to the weight of the queue. RR
scheduling functions the same as WRR scheduling in which each queue has a weight 1.
WRR configures a counter for each queue and initializes the counter based on the weight
values. Each time a queue is scheduled, a packet is taken away from the queue and being
transmitted, and the counter decreases by 1. When the counter becomes 0, the device stops
scheduling the queue and starts to schedule other queues with a non-0 counter. When the
counters of all queues become 0, all these counters are initialized again based on the weight,
and a new round of WRR scheduling starts. In a round of WRR scheduling, the queues with
the larger weights are scheduled more times.
In an example, three queues with the weight 50%, 25%, and 25% respectively are configured
with WRR scheduling.
Packet 3 is taken from queue 1, with Count[1] = 1. Packet 6 is taken from queue 2, with
Count[2] = 0. Packet 9 is taken from queue 3, with Count[3] = 0.
l Fourth round of WRR scheduling:
Packet 4 is taken from queue 1, with Count[1] = 0. Queues 2 and 3 do not participate in
this round of WRR scheduling since Count [2] = 0 and Count[3] = 0.
Then, Count[1] = 0; Count[2] = 0; Count[3] = 0. The counters are initialized again:
Count[1] = 2; Count[2] = 1; Count[3] = 1.
In statistical terms, you can see that the times for the packets to be scheduled in each queue is
in direct ratio to the weight of this queue. The higher the weight, the more the times of
scheduling. If the interface bandwidth is 100 Mbit/s, the queue with the lowest weight can
obtain a minimum bandwidth of 25 Mbit/s, preventing packets in the lower-priority queue
from being starved out when SP scheduling is implemented.
During the WRR scheduling, the empty queue is directly skipped. Therefore, when the rate at
which packets arrive at a queue is low, the remaining bandwidth of the queue is used by other
queues based on a certain proportion.
l WRR schedules packets based on the number of packets. Therefore, each queue has no
fixed bandwidth. With the same scheduling chance, a long packet obtains higher
bandwidth than a short packet. Users are sensitive to the bandwidth. When the average
lengths of the packets in the queues are the same or known, users can obtain expected
bandwidth by configuring WRR weights of the queues; however, when the average
packet length of the queues changes, users cannot obtain expected bandwidth by
configuring WRR weights of the queues.
l Services that require a short delay cannot be scheduled in time.
DRR
The scheduling principle of DRR is similar to that of RR.
RR schedules packets based on the packet number, whereas DRR schedules packets based on
the packet length.
DRR configures a counter Deficit for each queue. The counters are initialized as the
maximum bytes (assuming Quantum, generally the MTU of the interface) allowed in a round
of DRR scheduling. Each time a queue is scheduled, if the packet length is smaller than
Deficit, a packet is taken away from the queue, and the Deficit counter decreases by 1. If the
packet length is greater than Deficit, the packet is not sent, and the Deficit value remains
unchanged. The system continues to schedule the next queue. After each round of scheduling,
Quantum is added for each queue, and a new round of scheduling is started. Unlike SP
scheduling, DRR scheduling prevents packets in low-priority queues from being starved out.
However, DRR scheduling cannot set weights of queues and cannot schedule services
requiring a low-delay (such as voice services) in time.
As shown in Figure 8-8, after six rounds of DRR scheduling, three 200-byte packets in Q1
and six 100-byte packets in Q2 are scheduled. The output bandwidth ratio of Q1 to Q2 is
actually 1:1.
Unlike SP scheduling, DRR scheduling prevents packets in low-priority queues from being
starved out. However, DRR scheduling cannot set weights of queues and cannot schedule
services requiring a low-delay (such as voice services) in time.
MDRR
Modified Deficit Round Robin (MDRR) is an improved DRR algorithm. MDRR and DRR
implementations are similar. Unlike MDRR, DRR allows the Deficit to be a negative so that
long packets can be properly scheduled. In the next round of scheduling, however, this queue
will not be scheduled. When the counter becomes 0 or a negative, the device stops scheduling
the queue and starts to schedule other queues with a positive counter.
In an example, the MTU of an interface is 150 bytes. Two queues Q1 and Q2 use DRR
scheduling. Multiple 200-byte packets are buffered in Q1, and multiple 100-byte packets are
buffered in Q2. Figure 8-8 shows how DRR schedules packets in these two queues.
Q1: 200 200 200 Stop Stop 200 200 Stop 200
scheduling scheduling scheduling
g After scheduling After scheduling After scheduling
n
li Deficit[1]=-150 Deficit[1]=-100 Deficit[1]=-50
u
d
e
h Before Before Before Before Before Before
c scheduling scheduling
S scheduling scheduling scheduling scheduling
Deficit[2]=100 Deficit[2]=50 Deficit[2]=150 Deficit[2]=100 Deficit[2]=50 Deficit[2]=150
All deficits are added with an initial value because all deficits
in the previous round are smaller than or equal to 0.
As shown in Figure 8-8, after six rounds of DRR scheduling, three 200-byte packets in Q1
and six 100-byte packets in Q2 are scheduled. The output bandwidth ratio of Q1 to Q2 is
actually 1:1.
MDRR is an improved DRR algorithm. MDRR and DRR implementations are similar. Unlike
MDRR, DRR allows the Deficit to be a negative so that long packets can be properly
scheduled. In the next round of scheduling, however, this queue will not be scheduled. When
the counter becomes 0 or a negative, the device stops scheduling the queue and starts to
schedule other queues with a positive counter.
DWRR
Compared with DRR, Weighted Deficit Round Robin (WDRR) can set the weights of queues.
DRR scheduling functions the same as WDRR scheduling in which each queue has a weight
1.
DWRR configures a counter, which implies the number of excess bytes over the threshold
(deficit) in the previous round for each queue. The counters are initialized as the Weight x
MTU. Each time a queue is scheduled, a packet is taken away from the queue, and the counter
decreases by 1. When the counter becomes 0, the device stops scheduling the queue and starts
to schedule other queues with a non-0 counter. When the counters of all queues become 0, all
these counters are initialized as weight x MTU, and a new round of DWRR scheduling starts.
In an example, the MTU of an interface is 150 bytes. Two queues Q1 and Q2 use DRR
scheduling. Multiple 200-byte packets are buffered in Q1, and multiple 100-byte packets are
buffered in Q2. The weight ratio of Q1 to Q2 is 2:1. Figure 8-9 shows how WDRR schedules
packets.
WFQ
WFQ allocates bandwidths to flows based on the weight. In addition, to allocate bandwidths
fairly to flows, WFQ schedules packets in bits. Figure 8-10 shows how bit-by-bit scheduling
works.
reassembling
Bit-by-bit
Scheduling
queue
Packet
Queue 2: 25% 6bit 8bit 6bit 4bit
Packets leaving a
Queue 3: 25% 8bit queue
The bit-by-bit scheduling mode shown in Figure 8-10 allows the device to allocate
bandwidths to flows based on the weight. This prevents long packets from preempting
bandwidths of short packets and reduces the delay and jitter when both short and long packets
wait to be forwarded.
The bit-by-bit scheduling mode, however, is an ideal one. A NE40Eperforms the WFQ
scheduling based on a certain granularity, such as 256 B and 1 KB. Different boards support
different granularities.
Advantages of WFQ:
l Different queues obtain the scheduling chances fairly, balancing delays of flows.
l Short and long packets obtain the scheduling chances fairly. If both short and long
packets wait in queues to be forwarded, short packets are scheduled preferentially,
reducing jitters of flows.
l The lower the weight of a flow is, the lower the bandwidth the flow obtains.
In the actual application, best effort (BE) flows can be put into the LPQ queue. When the
network is overloaded, BE flows can be limited so that other services can be processed
preferentially.
WFQ, PQ, and LPQ can be used separately or jointly for eight queues on an interface.
Scheduling Order
SP scheduling is implemented between PQ, WFQ, and LPQ queues. PQ queues are scheduled
preferentially, and then WFQ queues and LPQ queues are scheduled in sequence, as shown in
Figure 8-11. Figure 8-12 shows the detailed process.
Queue 1 Shaping
PQ Shaping SP
……
scheduling
Queue i Shaping
Queue 1 Shaping
WFQ SP Destination
WFQ …… Shaping
scheduling scheduling Port
Queue j Shaping
Queue 1 Shaping
LPQ SP
…… Shaping
scheduling
Queue k Shaping
Start
No A round of PQ
PQ empty?
scheduling
Yes
No A round of WFQ
WFQ empty?
scheduling
Yes
No A round of LPQ
LPQ empty?
scheudling
Yes
l Packets in PQ queues are preferentially scheduled, and packets in WFQ queues are
scheduled only when no packets are buffered in PQ queues.
l When all PQ queues are empty, WFQ queues start to be scheduled. If packets are added
to PQ queues afterward, packets in PQ queues are still scheduled preferentially.
l Packets in LPQ queues start to be scheduled only after all PQ and WFQ queues are
empty.
Bandwidths are preferentially allocated to PQ queues to guarantee the peak information rate
(PIR) of packets in PQ queues. The remaining bandwidth is allocated to WFQ queues based
on the weight. If the bandwidth is not fully used, the remaining bandwidth is allocated to
WFQ queues whose PIRs are higher than the obtained bandwidth until the PIRs of all WFQ
queues are guaranteed. If any bandwidth is remaining at this time, the bandwidth resources
are allocated to LPQ queues.
CS7 PQ 65 M 55 M
CS6 PQ 30 M 30 M
l Then the first round of WFQ scheduling starts. The remaining bandwidth after PQ
scheduling is allocated to WFQ queues. The bandwidth allocated to a WFQ queue is
calculated based on this format: Bandwidth allocated to a WFQ queue = Remaining
bandwidth x Weight of this queue/Sum of weights = 15 Mbit/s x Weight/15.
– Bandwidth allocated to the EF queue = 15 Mbit/s x 5/15 = 5 Mbit/s = PIR. The
bandwidth allocated to the EF queue is fully used.
– Bandwidth allocated to the AF4 queue = 15 Mbit/s x 4/15 = 4 Mbit/s < PIR. The
bandwidth allocated to the AF4 queue is exhausted.
– Bandwidth allocated to the AF3 queue = 15 Mbit/s x 3/15 = 3 Mbit/s < PIR. The
bandwidth allocated to the AF3 queue is exhausted.
– Bandwidth allocated to the AF2 queue = 15 Mbit/s x 2/15 = 2 Mbit/s < PIR. The
bandwidth allocated to the AF2 queue is exhausted.
– Bandwidth allocated to the AF1 queue = 15 Mbit/s x 1/15 = 1 Mbit/s < PIR. The
bandwidth allocated to the AF1 queue is exhausted.
l The bandwidth is exhausted, and BE packets are not scheduled. The output BE
bandwidth is 0.
The output bandwidth of each queue is as follows:
CS7 PQ 65 M 55 M 55 M
CS6 PQ 30 M 30 M 30 M
CS7 PQ 15 M 25 M
CS6 PQ 30 M 10 M
l Packets in the PQ queue are scheduled preferentially to ensure the PIR of the PQ queue.
After PQ scheduling, the remaining bandwidth is 75 Mbit/s (100 Mbit/s - 15 Mbit/s - 10
Mbit/s).
l Then the first round of WFQ scheduling starts. The remaining bandwidth after PQ
scheduling is allocated to WFQ queues. The bandwidth allocated to a WFQ queue is
calculated based on this format: Bandwidth allocated to a WFQ queue = Remaining
bandwidth x Weight of this queue/Sum of weights = 75 Mbit/s x Weight/15.
– Bandwidth allocated to the EF queue = 75 Mbit/s x 5/15 = 25 Mbit/s < PIR. The
bandwidth allocated to the EF queue is fully used.
– Bandwidth allocated to the AF4 queue = 75 Mbit/s x 4/15 = 20 Mbit/s > PIR. The
AF4 queue actually obtains the bandwidth 10 Mbit/s (PIR). The remaining
bandwidth is 10 Mbit/s.
– Bandwidth allocated to the AF3 queue = 75 Mbit/s x 3/15 = 15 Mbit/s = PIR. The
AF3 queue actually obtains the bandwidth 10 Mbit/s (PIR). The remaining
bandwidth is 5 Mbit/s.
– Bandwidth allocated to the AF2 queue = 75 Mbit/s x 2/15 = 10 Mbit/s < PIR. The
bandwidth allocated to the AF2 queue is exhausted.
– Bandwidth allocated to the AF1 queue = 75 Mbit/s x 1/15 = 5 Mbit/s < PIR. The
bandwidth allocated to the AF1 queue is exhausted.
l The remaining bandwidth is 15 Mbit/s, which is allocated to the queues, whose PIRs are
higher than the obtained bandwidth, based on the weight.
– Bandwidth allocated to the EF queue = 15 Mbit/s x 5/8 = 9.375 Mbit/s. The sum of
bandwidths allocated to the EF queue is 34.375 Mbit/s, which is also lower than the
PIR. Therefore, the bandwidth allocated to the EF queue is exhausted.
– Bandwidth allocated to the AF2 queue = 15 Mbit/s x 2/8 = 3.75 Mbit/s. The sum of
bandwidths allocated to the AF2 queue is 13.75 Mbit/s, which is also lower than the
PIR. Therefore, the bandwidth allocated to the AF2 queue is exhausted.
– Bandwidth allocated to the AF1 queue = 15 Mbit/s x 1/8 = 1.875 Mbit/s. The sum
of bandwidths allocated to the AF1 queue is 6.875 Mbit/s, which is also lower than
the PIR. Therefore, the bandwidth allocated to the AF1 queue is exhausted.
l The bandwidth is exhausted, and the BE queue is not scheduled. The output BE
bandwidth is 0.
CS7 PQ 15 M 25 M 15 M
CS6 PQ 30 M 10 M 10 M
CS7 PQ 15 M 25 M
CS6 PQ 30 M 10 M
l Packets in the PQ queue are scheduled preferentially to ensure the PIR of the PQ queue.
After PQ scheduling, the remaining bandwidth is 75 Mbit/s (100 Mbit/s - 15 Mbit/s - 10
Mbit/s).
l Then the first round of WFQ scheduling starts. The remaining bandwidth after PQ
scheduling is allocated to WFQ queues. The bandwidth allocated to a WFQ queue is
calculated based on this format: Bandwidth allocated to a WFQ queue = Remaining
bandwidth x weight of this queue/sum of weights = 75 Mbit/s x weight/15.
– Bandwidth allocated to the EF queue = 75 Mbit/s x 5/15 = 25 Mbit/s > PIR. The EF
queue actually obtains the bandwidth 10 Mbit/s (PIR). The remaining bandwidth is
15 Mbit/s.
– Bandwidth allocated to the AF4 queue = 75 Mbit/s x 4/15 = 20 Mbit/s > PIR. The
AF4 queue actually obtains the bandwidth 10 Mbit/s (PIR). The remaining
bandwidth is 10 Mbit/s.
– Bandwidth allocated to the AF3 queue = 75 Mbit/s x 3/15 = 15 Mbit/s = PIR. The
AF3 queue actually obtains the bandwidth 10 Mbit/s. The remaining bandwidth is 5
Mbit/s.
– Bandwidth allocated to the AF2 queue = 75 Mbit/s x 2/15 = 10 Mbit/s = PIR. The
bandwidth allocated to the AF2 queue is exhausted.
– Bandwidth allocated to the AF1 queue = 75 Mbit/s x 1/15 = 5 Mbit/s < PIR. The
bandwidth allocated to the AF1 queue is exhausted.
l The remaining bandwidth is 30 Mbit/s, which is allocated to the AF1 queue, whose PIRs
are higher than the obtained bandwidth, based on the weight. Therefore, the bandwidth
allocated to the AF1 queue is 5 Mbit/s.
l The remaining bandwidth is 25 Mbit/s, which is allocated to the BE queue.
CS7 PQ 15 M 25 M 15 M
CS6 PQ 30 M 10 M 10 M
Tail Drop
Tail drop is the traditional congestion avoidance mechanism used to drop all newly arrived
packets when congestion occurs.
Tail Drop causes TCP global synchronization. If TCP detects packet loss, TCP enters the
slow-start state. Then TCP probes the network by sending packets at a lower rate, which
speeds up until packet loss is detected again. In Tail drop mechanisms, all newly arrived
packets are dropped when congestion occurs, causing all TCP sessions to simultaneously
enter the slow start state and the packet transmission to slow down. Then all TCP sessions
restart their transmission at roughly the same time and then congestion occurs again, causing
another burst of packet drops, and all TCP sessions enters the slow start state again. The
behavior cycles constantly, severely reducing the network resource usage.
WRED
WRED is a congestion avoidance mechanism used to drop packets before the queue
overflows. WRED resolves TCP global synchronization by randomly dropping packets to
prevent a burst of TCP retransmission. If a TCP connection reduces the transmission rate
when packet loss occurs, other TCP connections still keep a high rate for sending packets. The
WRED mechanism improves the bandwidth resource usage.
WRED sets lower and upper thresholds for each queue and defines the following rules:
l When the length of a queue is lower than the lower threshold, no packet is dropped.
l When the length of a queue exceeds the upper threshold, all newly arrived packets are
tail dropped.
l When the length of a queue ranges from the lower threshold to the upper threshold,
newly arrived packets are randomly dropped, but a maximum drop probability is set. The
maximum drop probability refers to the drop probability when the queue length reaches
the upper threshold. Figure 8-13 is a drop probability graph. The longer the queue, the
larger the drop probability.
As shown in Figure 8-14, the maximum drop probability is a%, the length of the current
queue is m, and the drop probability of the current queue is x%. WRED delivers a random
value i to each arrived packet, (0 < i% < 100%), and compares the random value with the drop
probability of the current queue. If the random value i ranges from 0 to x, the newly arrived
packet is dropped; if the random value ranges from x to 100%, the newly arrived packet is not
dropped.
Drop
prabability
a%
x%
Random value I < x
i%
Actual queue length
Lower Upper Maximum
m queue
threshold threshold
length
i: a random value in the range of [0,a]
As shown in Figure 8-15, the drop probability of the queue with the length m (lower
threshold < m < upper threshold) is x%. If the random value ranges from 0 to x, the newly
arrived packet is dropped. The drop probability of the queue with the length n (m < n < upper
threshold) is y%. If the random value ranges from 0 to y, the newly arrived packet is dropped.
The range of 0 to y is wider than the range of 0 to x. There is a higher probability that the
random value falls into the range of 0 to y. Therefore, the longer the queue, the higher the
drop probability.
100%
a%
y%
x% [0,y]
[0,x]
Actual queue length
Lower m n Upper Maximum
threshold threshold queue
length
x%: drop probability when the queue length is m
y%: drop probability when the queue length is n
a%: configured drop probability, which determines the random value range
As shown in Figure 8-16, the maximum drop probabilities of two queues Q1 and Q2 are a%
and b%, respectively. When the length of Q1 and Q2 is m, the drop probabilities of Q1 and
Q2 are respectively x% and y%. If the random value ranges from 0 to x, the newly arrived
packet in Q1 is dropped, If the random value ranges from 0 to y, the newly arrived packet in
Q2 is dropped. The range of 0 to y is wider than the range of 0 to x. There is a higher
probability that the random value falls into the range of 0 to y. Therefore, When the queue
lengths are the same, the higher the maximum drop probability, the higher the drop
probability.
Figure 8-16 Drop probability change with the maximum drop probability
Drop probability
100%
Q2 drop probability=b%
Q1 drop probability=a%
[0,y] [0,x]
You can configure WRED for each flow queue (FQ) and class queue (CQ) on Huawei routers.
WRED allows the configuration of lower and upper thresholds and drop probability for each
drop precedence. Therefore, WRED can allocate different drop probabilities to service flows
or even packets with different drop precedences in a service flow.
WRED applies to WFQ queues. WFQ queues share bandwidth based on the weight and are
prone to traffic congestion. Using WRED for WFQ queues effectively resolves TCP global
synchronization when traffic congestion occurs.
100%
Red drop probability
Actual queue
Red length
Red Yellow Yellow Green Green Maximum
lower
upper lower higher lower higher queue
threshold
length
The queue length cannot be set too small. If the length of a queue is too small, the buffer is
not enough even if the traffic rate is low. As a result, packet loss occurs. The shorter the
queue, the less the tolerance of burst traffic.
The queue length cannot be set too large. If the length of a queue is too large, the delay
increases along with it. Especially when a TCP connection is set up, one end sends a packet to
the peer end and waits for a response. If no response is received within the timer timeout
period, the TCP sender retransmits the packet. If a packet is buffered for a long time, the
packet has no difference with the dropped ones.
Setting the queue length to 10 ms x output queue bandwidth is recommended for high-priority
queues (CS7, CS6, and EF); setting the queue length to 100 ms x output queue bandwidth is
recommended for low-priority queues.
When traffic is congested, packets accumulate in the buffer and wait to be forwarded. The
delay greatly prolongs. The interval from the time when a packet enters the buffer to the time
when the packet is forwarded is called the buffer delay or queue delay.
The buffer delay is determined by the buffer size for a queue and the output bandwidth
allocated to the queue. The format is as follows:
Buffer delay = Buffer size for the queue/Output bandwidth for the queue
The buffer size is expressed in bytes, and the output bandwidth (also called the traffic shaping
rate) is expressed in bit/s. Therefore, the preceding format can also be expressed as follows:
Buffer delay = (Buffer size for the queue x 8)/Traffic shaping rate for the queue
As the format indicates, the larger the buffer size, the longer the buffer delay.
Severe jitters are mainly caused by the following two scenarios: 1. Route status on IP
networks frequently changes, causing packets to be transmitted through different routes. 2.
Packets are buffered on various nodes during traffic congestion, resulting in different delays.
Scenario 2 is commonly seen on live networks.
Jitters increase when packet delays become increasingly varied. If packet delays are
controlled at lower levels, jitters are then also controlled. Therefore, you can control jitters by
controlling delays. For example, if delays are controlled below 5 us, delay variations (jitters)
are definitely below 5 us.
As described in Impact of Queue Buffer on Delay, large buffer sizes increase buffer delays.
Controlling buffer sizes means control over packet delays.
8.5 HQoS
Hierarchical Quality of Service (HQoS) is a technology that uses a queue scheduling
mechanism to guarantee the bandwidth of multiple services of multiple users in the DiffServ
model.
Traditional QoS performs 1-level traffic scheduling. The device can distinguish services on an
interface but cannot identify users. Packets of the same priority are placed into the same
queue on an interface and compete for the same queue resources.
HQoS uses multi-level scheduling to distinguish user-specific or service-specific traffic and
provide differentiated bandwidth management.
Scheduler
Scheduler Attribute: scheduling algorithm based on priority or weight
Behavior: choose queues
Scheduled object
Attribute: 1) Priority/weight
Queue 2) Traffic shaping rate PIR
Queue
3)Drop policy (Tail-Drop/WRED)
Behavior: 1) Enter a queue: Based on tail drop or WRED, drop packets
or allow packets to enter the tail of a queue.
2) Leave a queue: Shape and send packets.
Branch node
Scheduler /transit node
Branch node
Scheduler Scheduler Scheduler /transit node
Queue
Queue
Queue
Queue
Queue
Queue
Queue
Leaf node
A scheduler can schedule multiple queues or schedulers. The scheduler can be considered a
parent node, and the scheduled queue or scheduler can be considered a child node. The parent
node is the traffic aggregation point of multiple child nodes.
Traffic classification rules and control parameters can be specified on each node to classify
and control traffic. Traffic classification rules based on different user or service requirements
can be configured on nodes at different layers. In addition, different control actions can be
performed for traffic on different nodes. This ensures multi-layer/user/service traffic
management.
HQoS Hierarchies
In HQoS scheduling, one-layer transit node can be used to implement three-layer scheduling
architecture, or multi-layer transit nodes can be used to implement multi-layer scheduling
architecture. In addition, two or more hierarchical scheduling models can be used together by
mapping a packet output from a scheduling model to a leaf node in another scheduling model,
as shown in Figure 8-19. This provides flexible scheduling options.
Transit Transit
Scheduler Scheduler Scheduler node
node Scheduler Scheduler Scheduler
e e e e e e e
e e e e e e e u u u u u
e
u u
e Leaf
Leaf u u u u u u u e
u
e
u
e
u
e
u u
e
u u
e
u
e
u
e
u
e
u
e
u
e
u
e
u Q Q Q Q Q Q Q node
node Q Q Q Q Q Q Q
Three-level
scheduling Mapping
Root
Scheduler Root
node Schedule
r node
Transit Schedule Transit
node r Scheduler node
Schedule
Scheduler
r
Transit Schedul Schedul Schedul Schedul
node er er Schedule Schedul Schedule Schedule Transit
er er node
r er r r
Leaf e
u
e
u
e
u
e
u
e
u
e
u
e
u
e
u
e
u
e
u
e
u
e
u
e
u
e
u
e
u
e
u
e
u
e
u Leaf
e e e e e e e e e e e e e e e e e e
node u
Q
u
Q
u
Q
u
Q
u
Q
u
Q
u
Q
u
Q
u
Q
u u u u u u u u u node
Q Q Q Q Q Q Q Q Q
Destination port
Port
SP/
Scheduler attribute:
WFQ
Scheduling algorithm
(SP/WFQ)
CQ
CS7 CS6 EF AF4 AF3 AF2 AF1 BE Attribute: 1) priority/weight
2) traffic shaping rate PIR
3)Drop policiy
(Tail drop/WRED)
PIR GQ
Scheduler attribute:
SP 1) scheduling algorithm
(DRR+SP)
2) PIR
DRR DRR
SQ
CIR
Virtual queue attribute:
1) CIR
EIR 2) PIR
(PIR-CIR)
SP/
Scheduler attribute:
SP/
WFQ WFQ Scheduling algorithm
(SP/WFQ)
FQ
FQ2
FQ8
FQ1
FQ2
FQ8
FQ1
… … Attribute:
1) priority/weight
2) traffic shaping rate PIR
3) drop policy (Tail drop/WRED)
– Three FQs are configured for the three services (VoIP, IPTV, and HSI).
– Altogether 20 SQs are configured for 20 residential users. The CIR and PIR are
configured for each SQ.
– One GQ is configured for the whole building and correspond to 20 residential users.
The sum bandwidth of the 20 residential users is actually the PIR of the GQ. Each
of the 20 residential users uses services individually, but the sum bandwidth of them
is restricted by the PIR of the GQ.
The hierarchy model is as follows:
– FQs are used to distinguish services of a user and control bandwidth allocation
among services.
– SQs are used to distinguish users and restrict the bandwidth of each user.
– GQs are used to distinguish user groups and control the traffic rate of twenty SQs.
FQs enable bandwidth allocation among services. SQs distinguish each user. GQs enable
the CIR of each user to be guaranteed and all member users to share the bandwidth.
The bandwidth exceeds the CIR is not guaranteed because it is not paid by users. The
CIR must be guaranteed because the CIR has been purchased by users. As shown in
Figure 8-21, the CIR of users is marked, and the bandwidth is preferentially allocated to
guarantee the CIR. Therefore, the bandwidth of CIR will not be preempted by the burst
traffic exceeded the service rates.
On NE40Es, HQoS uses different architectures to schedule upstream or downstream
queues.
SFU
CQ
Scheduler attribute:
SP SP/WFQ
WFQ Default configuration, not to be changed
COS0
DRR COS1 DRR COS2 DRR COS3 DRR
(CS7, (AF4, (AF2, (BE) TB
CS6, AF3) AF1) (Target Blade)
EF) DRR DRR DRR DRR Scheduler attribute:
ts 1 t
s t t t 1 t 2 t n t DRR
a s s s s s s
a B to 2 o
…t n
cti o
t 1 o
t 2 o
t a o
t 1 o
t 2 to
a
ic aB aB aB a
ic
Queue attribute:
ic T CB CB CB … CnB ic
t CB …C n t c
i T c
i T …c
i T t
no UT CTB u u CB B u no no no u Default configuration,
Ut U M UT UT UT M UT UT UT M Ut Ut Ut M
not to be changed.
DRR DRR
CIR SQ
Virtual queue attribute:
EIR 1) CIR
(PIR-CIR) 2) PIR
SP/WFQ SP/WFQ Scheduler attribute:
Scheduling algorithm (SP/WFQ)
PIR FQ
1 2 …8 1 2 …8
Q Q Q Q Q Q Attribute:
F F F F F F
1) priority/weight 2) traffic shaping rate PIR 3) drop policy (Tail-Drop/WRED)
The scheduling path of upstream HQoS traffic is FQ -> SQ -> GQ, and then joins the non-
HQoS traffic for the following two-layer scheduling:
l Target Blade (TB) scheduling
TB scheduling is also called Virtual Output Queue (VOQ) scheduling.
On a crossroad shown in the following figure, three vehicles (car, pallet trunk, and
carriage truck) come along crossing A and are bound for crossing B, C, and D
respectively. If crossing B is jammed at this time, the car cannot move ahead and stays in
the way of the pallet trunk and carriage truck, although crossing C and D are clear.
Traffic congestion
Carriage
truck B
Pallet A
trunk
Car
C
D
If three lanes destined for crossing B, C, and D are set on crossing A, the problem is
resolved.
Pallet
trunk
Traffic congestion
Car
Carriage trunk
B
A
C
D
Upstream multicast traffic has not been duplicated and does not show a destination. Therefore,
multicast traffic is put into a separate VOQ.
For unicast traffic, a VOQ is configured for a destination board.
DRR is implemented between unicast VOQs and then between unicast and multicast
VOQs.
l Class queue (CQ) scheduling
Four CQs, COS0 for CS7, CS6, and EF services, COS1 for AF4 and AF3 services,
COS2 for AF2 and AF1 services, and COS3 for BE services, are used for upstream
traffic.
SP scheduling applies to COS0, which is preferentially scheduled. WFQ scheduling
applies to COS1, COS2, and COS3, with the WFQ weight of 1, 2, and 4 respectively.
Users cannot modify the attributes of upstream CQs and schedulers.
Non-HQoS traffic directly enters four upstream CQs, without passing FQs. HQoS traffic
passes FQs and CQs.
The process of upstream HQoS scheduling is as follows:
1. Entering a queue: An HQoS packet enters an FQ. When a packet enters an FQ, the
system checks the FQ status and determines whether to drop the packet. If the packet is
not dropped, it enters the tail of the FQ.
2. Applying for scheduling: After entering the FQ, the packet reports the queue status
change to the SQ scheduler and applies for scheduling. The SQ scheduler reports the
queue status change to the GQ scheduler and applies for scheduling. Therefore, the
scheduling request path is FQ -> SQ -> GQ.
3. Hierarchical scheduling: After receiving a scheduling request, a GQ scheduler selects an
SQ, and the SQ selects an FQ. Therefore, the scheduling path is GQ -> SQ -> FQ.
4. Leaving a queue: After an FQ is selected, packets in the front of the FQ leave the queue
and enters the VOQ tail. The VOQ reports the queue status change to the CQ scheduler
and applies for scheduling. After receiving the request, the CQ selects a VOQ. Packets in
the front of the VOQ leave the queue and are sent to an Switch Network.
Therefore, the scheduling process is (FQ -> SQ -> GQ) + (VOQ -> CQ).
This function applies only to incoming traffic of user queues. The application scenario is the
same as that of common HQoS.
This function is implemented by replacing FQ queues with CAR token buckets. The
scheduling at other layers is the same as that of common HQoS. In this case, the forwarding
chip notifies the eTM chip of the status of the token bucket, and then the eTM chip adds
tokens and determines whether to discard or forward the packets.
PIR GQ
SP Scheduler attribute:
1) scheduling algorithm (DRR+SP)
DRR DRR 2) PIR
CIR SQ
EIR Virtual queue attribute:
(PIR-CIR) 1) CIR 2) PIR
Scheduler attribute:
SP/WFQ SP/WFQ
Scheduling algorithm (SP/WFQ)
FQ
PIR Attribute:
1 2 …8 1 2 …8 1) priority/weight
Q Q Q Q Q Q 2) traffic shaping rate PIR
F F F F F F
3) drop policy (Tail-Drop/WRED)
Downstream TM scheduling includes the scheduling paths FQ -> SQ -> GQ and CQ ->
port. There are eight CQs for downstream traffic, CS7, CS6, EF, AF4, AF3, AF2, AF1,
and BE. Users can modify the queue parameters and scheduling parameters.
The process of downstream TM scheduling is as follows:
a. Entering a queue: An HQoS packet enters an FQ.
b. Applying for scheduling: The downstream scheduling application path is (FQ -> SQ
-> GQ) + (GQ -> destination port).
c. Hierarchical scheduling: The downstream scheduling path is (destination port ->
CQ) + (GQ -> SQ -> FQ).
d. Leaving a queue: After an FQ is selected, packets in the front of the FQ leave the
queue and enters the CQ tail. The CQ forwards the packet to the destination port.
Non-HQoS traffic directly enters eight downstream CQs, without passing FQs.
FQ
PIR PIR Attribute:
1 2 …8 … 1
Q
2
Q
…8 1 2 … 8Q
Q Q Q Q Q Q
F F F F F F C C C 1) priority/weight
2) traffic shaping rate PIR
HQoS flow Non-HQoS traffic 3) drop policy(Tail-Drop/WRED)
SQ bandwidth share Multiple SQs in the same Multiple SQs in the same
GQ on different physical GQ on different sub-
interfaces share the interfaces but not physical
bandwidth. interfaces share the
bandwidth.
interface gigabitethernet1/0/0
port-queue ef shaping 100M
interface gigabitethernet1/0/0.1
user-queue cir 50m pir 50m flow-queue FQ
interface gigabitethernet1/0/0.2
//Note: user-queue and qos-profile are not configured on
gigabitethernet1/0/0.2.
– For downstream TM scheduling, the traffic shaping rate configured using the port-
queue command determines the sum bandwidth of both HQoS and non-HQoS
traffic. Based on the preceding configuration:
n The rate of EF traffic sent from GE 1/0/0.1 does not exceed 10 Mbit/s.
n The rate of EF traffic sent from GE 1/0/0 (including GE 1/0/0, GE 1/0/0.1, and
GE 1/0/0.2) does not exceed 100 Mbit/s.
– For downstream eTM scheduling, the traffic shaping rate configured using the port-
queue command determines the sum bandwidth of non-HQoS traffic (default SQ
bandwidth). Based on the preceding configuration:
n The rate of EF traffic sent from GE 1/0/0 and GE 1/0/0.2 (non-HQoS traffic)
does not exceed 100 Mbit/s.
n The rate of EF traffic sent from GE 1/0/0.1 does not exceed 10 Mbit/s.
n The rate of EF traffic sent from GE 1/0/0 can reach a maximum of 110 Mbit/s.
Packets enter an FQ based on the service class. After that, packets in the front of the FQ
queue leave the queue. and enter a CQ based on the mapping.
The mapping from FQs to CQs can be in Uniform or Pipe mode.
l Uniform: The system defines a fixed mapping. Upstream scheduling uses the uniform
mode.
l Pipe: Users can modify the mapping. The original priorities carried in packets will not be
modified in pipe mode.
By default, in the downstream HQoS, the eight priority queues of an FQ and eight CQs are in
one-to-one mapping. In the upstream HQoS, COS0 corresponds to CS7, CS6, and EF, COS1
corresponds to AF4 and AF3, COS2 corresponds to AF2 and AF1, and COS3 corresponds to
BE.
Share Shaping
Share shaping, also called Flow Group Queue shaping (FGQ shaping), implements traffic
shaping for a group that two or more flow queues (FQs) in a subscriber queue (SQ) constitute.
This ensures that other services in the SQ can obtain bandwidths.
For example, a user has HSI, IPTV, and VoIP services, and IPTV services include IPTV
unicast and multicast services. To ensure the CIR of IPTV services and prevent IPTV services
from preempting the bandwidth reserved for HIS and VoIP services, you can configure four
FQs, each of which is specially used for HSI, IPTV unicast, and IPTV multicast, and VoIP
services. As shown in Figure 8-25, share shaping is implemented for IPTV unicast and
multicast services, and then HQoS is implemented for all services.
FQ: VoIP
FQ: HSI
Shaping rate PIR
Share-shaping
Currently, a maximum of two share shaping configurations can be configured for eight FQs
on each SQ, as shown by the first two modes in Figure 8-26. The third share shaping mode
shown in Figure 8-26 is not available on the NE40E.
SQ PIR=100Mbps
AF2
SP
AF1
Share-shaping
Assume that the PIR is ensured for the SQ. The input rate of the EF queue is 10 Mbit/s, and
that of each other queue is 70 Mbit/s. Share shaping allocates bandwidths to the queues in
either of the following modes:
EF SP 10 90 10 10
AF3 SP 70 90 70 70
AF2 SP 70 90 20 10
AF1 SP 70 90 0 10
BE SP 70 Not 0 0
configure
d
Example 2: Assume that the WFQ scheduling applies to the EF, AF1, AF2, and AF3 queues
with the weight ratio as 1:1:1:2 (EF:AF3:AF2:AF1) in example 1. The LPQ scheduling
applies to the BE queue. The PIR 100 Mbit/s is ensured for the SQ. The input rate of the EF
and AF3 queues is 10 Mbit/s, and that of each other queue is 70 Mbit/s. Share shaping
allocates bandwidths to the queues in either of the following modes:
l Mode A: The WFQ scheduling applies to all queues.
First-round WFQ scheduling:
– The bandwidth allocated to the EF queue is calculated as follows: 1/(1 + 1 + 1 + 2)
x 100 Mbit/s=20 Mbit/s. The input rate of the EF queue, however, is only 10 Mbit/s.
Therefore, the remaining bandwidth is 90 Mbit/s.
– The bandwidth allocated to the AF3 queue is calculated as follows: 1/(1 + 1 + 1
+ 2) x 100 Mbit/s=20 Mbit/s. The input rate of the AF3 queue, however, is only 10
Mbit/s. Therefore, the remaining bandwidth is 80 Mbit/s.
– The bandwidth allocated to the AF2 queue is calculated as follows: 1/(1 + 1 + 1
+ 2) x 100 Mbit/s=20 Mbit/s. Therefore, the AF2 queue obtains the 20 Mbit/s
bandwidth, and the remaining bandwidth becomes 60 Mbit/s.
– The bandwidth allocated to the AF1 queue is calculated as follows: 2/(1 + 1 + 1
+ 2) x 100 Mbit/s=40 Mbit/s. Therefore, the AF2 queue obtains the 40 Mbit/s
bandwidth, and the remaining bandwidth becomes 20 Mbit/s.
Second-round WFQ scheduling:
– The bandwidth allocated to the AF2 queue is calculated as follows: 1/(1 + 2) x 20
Mbit/s = 6.7 Mbit/s.
– The bandwidth allocated to the AF1 queue is calculated as follows: 2/(1 + 2) x 20
Mbit/s = 13.3 Mbit/s.
No bandwidth is remaining, and the BE queue obtains no bandwidth.
l Mode B: The AF3 and AF1 queues, as a whole, are scheduled with the EF and AF2
queues using the WFQ scheduling. The weight ratio is calculated as follows: EF:
(AF3+AF1):AF2 = 1:(1+2):1 = 1:3:1.
First-round WFQ scheduling:
– The bandwidth allocated to the EF queue is calculated as follows: 1/(1 +3 + 1) x
100 Mbit/s=20 Mbit/s. The input rate of the EF queue, however, is only 10 Mbit/s.
Therefore, the EF queue actually obtains the 10 Mbit/s bandwidth, and the
remaining bandwidth is 90 Mbit/s.
– The bandwidth allocated to the AF3 and AF1 queues, as a whole, is calculated as
follows: 3/(1 + 3 + 1) x 100 Mbit/s = 60 Mbit/s. Therefore, the remaining
bandwidth becomes 30 Mbit/s. The 60 Mbit/s bandwidth allocated to the AF3 and
AF1 queues as a whole are further allocated to each in the ratio of 1:2. The 20
Mbit/s bandwidth is allocated to the AF3 queue. The input rate of the AF3 queue,
however, is only 10 Mbit/s. Therefore, the AF3 queue actually obtains the 10 Mbit/s
bandwidth, and the remaining 50 Mbit/s bandwidth is allocated to the AF1 queue.
– The bandwidth allocated to the AF2 queue is calculated as follows: 1/(1 +3 + 1) x
100 Mbit/s=20 Mbit/s. Therefore, the AF2 queue obtains the 20 Mbit/s bandwidth,
and the remaining bandwidth becomes 10 Mbit/s.
Second-round WFQ scheduling:
– The bandwidth allocated to the AF3 and AF1 queues as a whole is calculated as
follows: 3/(3 + 1) x 10 Mbit/s=7.5 Mbit/s. The 7.5 Mbit/s bandwidth, not exceeding
the share shaping bandwidth, can be all allocated to the AF3 and AF1 queues as a
whole. The PIR of the AF3 queue has been ensured. Therefore, the 7.5 Mbit/s
bandwidth is allocated to the AF1 queue.
– The bandwidth allocated to the AF2 queue is calculated as follows: 1/(3 + 1 ) x 10
Mbit/s = 2.5 Mbit/s.
No bandwidth is remaining, and the BE queue obtains no bandwidth.
The following table shows the bandwidth allocation results.
Queu Sched Input PIR Output Bandwidth (Mbit/s)
e uling Bandwidt (Mbit/s)
Algori h (Mbit/s) Mode A Mode B
thms
EF SP 10 90 10 10
AF3 SP 70 90 10 70
BE SP 70 Not 0 0
configure
d
2G
RTN RTN
3G L3 Gateway
IP
backbone
interface1 Internet
LTE
2G
RTN RTN
3G L3 Gateway
IP
backbone
interface1 Internet
RTN
LTE
2G
RTN RTN
IP
3G L3 Gateway backbone
Internet
/0
1/0
GE
Flow
FQ: video SQ:
LTE FQ: voice
FQ:other 2G GQ:
2G FQ: video SQ: RTN1
FQ: voice
FQ:other 3G
RTN RTN FQ: video SQ:
FQ: voice
FQ:other LTE GE1/0/0
3G
FQ: video SQ:
FQ: voice
FQ:other 2G
FQ: video SQ: GQ:
FQ: voice
FQ:other 3G RTN1
LTE FQ: video
FQ: voice
SQ:
FQ:other LTE
You can configure class-based HQoS to meet the preceding requirements. An interface
has two user groups (two RTNs), one user group has three users (three base stations), and
one base station runs multiple services. The hierarchical architecture is configured as
port -> RTN -> base station -> base station services, corresponding to the scheduling
path: port -> GQ -> SQ ->FQ.
l Profile-based HQoS
Traffic that enters different interfaces can be scheduled in an SQ.
Profile-based HQoS implements QoS scheduling management for access users by
defining various QoS profiles and applying the QoS profiles to interfaces. A QoS profile
is a set of QoS parameters (such as the queue bandwidth and flow queues) for a specific
user queue.
Profile-based HQoS supports upstream and downstream scheduling.
As shown in Figure 8-29, the router, as an edge device on an ISP network, accesses a
local area network (LAN) through E-Trunk 1. The LAN houses 1000 users that have
VoIP, IPTV, and common Internet services. Eth-Trunk 1.1000 accesses VoIP services;
Eth-Trunk 1.2000 accesses IPTV services; Eth-Trunk 1.3000 accesses other services.
The 802.1p value in the outer VLAN tag is used to identify the service type (802.1p
value 5 for VoIP services and 802.1p value 4 for IPTV services). The VID that ranges
from 1 to 1000 in the inner VLAN tag identifies the user. The VIDs of Eth-Trunk 1.1000
and Eth-Trunk 1.2000 are respectively 1000 and 2000. It is required that the sum
bandwidth of each user be restricted to 120 Mbit/s, the CIR be 100 Mbit/s, and the
bandwidth allocated to VoIP and IPTV services of each user are respectively 60 Mbit/s
and 40 Mbit/s. Other services are not provided with any bandwidth guarantee.
E-Trunk1 IP backbone
Internet
You can configure profile-based HQoS to meet the preceding requirements. Only traffic
with the same inner VLAN ID enters the same SQ. Therefore, 1000 SQs are created.
Traffic with the same inner VLAN ID but different outer VLAN IDs enter different FQs
in the same SQ.
9 MPLS QoS
traffic rate exceeds the specification, requirements for services that are sensitive to QoS are
not satisfied. Therefore, MPLS TE alone cannot provide the QoS guarantee.
Scheme 1: E-LSP
The EXP-Inferred-PSC LSP (E-LSP) scheme uses the 3-bit EXP value in an MPLS header to
determine the PHB of the packets. Figure 9-1 shows an MPLS header.
The EXP value can be copied from the DSCP or IP precedence in an IP packet or be set by
MPLS network carriers.
The label determines the forwarding path, and the EXP determines the PHB.
The E-LSP is applicable to networks that support not more than eight PHBs. The precedence
field in an IP header also has three bits, same as the EXP field length. Therefore, one
precedence value in an IP header exactly corresponds to one precedence value in an MPLS
header. However, the DSCP field in an IP header has six bits, different from the EXP length.
Therefore, more DSCP values correspond to only one EXP value. As the IEEE standard
defines, the three left-most bits in the DSCP field (the CSCP value) correspond to the EXP
value, regardless of what the three right-most bits are.
During traffic classification, the EXP value in an MPLS packet is mapped to the scheduling
precedence and drop precedence. Except traffic classification, QoS operations on an MPLS
network, such as traffic shaping, traffic policing, and congestion avoidance, are implemented
in the same manner as those on an IP network.
E-LSP
BE queue
EF queue
When the MPLS packet is leaving the LSR, the scheduling precedence and drop precedence
are mapped back to the EXP value for further EXP-based operations on the network.
NOTE
For more details about the default mapping between the EXP value, service class, and color on NE40Es,
see 6.3.2 QoS Priority Mapping.
Scheme 2: L-LSP
The Label-Only-Inferred-PSC LSP (L-LSP) scheme uses labels to transmit PHB information.
The EXP field has only three bits, and therefore cannot be used alone to identify more than
eight PHBs. Instead, only the 20-bit label in an MPLS header can be used to identify more
than eight PHBs. The L-LSP is applicable to networks that support more than eight PHBs.
During packet forwarding, the label determines the forwarding path and scheduling behaviors
of the packets; the EXP carries the drop precedence. Therefore, the label and EXP both
determine the PHB. PHB information needs to be transmitted during LSP establishment. The
L-LSPs can transmit single-PHB flow, and also multi-PHB flow that has packets of the same
scheduling behavior but different drop precedences.
The EXP determines the PHB (including the The label and EXP determine the PHB.
drop precedence).
Each LSP supports up to eight behavior Each LSP supports only one BA.
aggregates (BAs).
Trust EXP or IP
Trust IP DSCP?
DSCP?
MPLS Exp=2
IP DSCP=40 IP DSCP=40
IP packet IP over MPLS
packet
Carriers need to determine whether to trust the CoS information in an IP or MPLS packet that
is entering an MPLS network or is leaving an MPLS network for an IP network. Relevant
standards defines three modes for processing the CoS: Uniform, Pipe, and Short Pipe.
Uniform Mode
When carriers determine to trust the CoS value (IP precedence or DSCP) in a packet from an
IP network, the Uniform mode can be used. The MPLS ingress LSR copies the CoS value in
the packet to the EXP field in the MPLS outer header to ensure the same QoS on the MPLS
network. When the packet is leaving the MPLS network, the egress LSR copies the EXP
value back to the IP precedence or DSCP in the IP packet.
MPLS Network
IP Network
IP Network
Ingress Penultimate Egress
node node node
MPLS->MPLS pop
MPLS->MPLS swap Outer label
Outer label MPLS Exp=5 MPLS->IP pop
MPLS->MPLS push Outer label
MPLS Exp=5 MPLS Exp=5 Inner label
Inner label MPLS Exp=5
Inner label Inner label
IP->MPLS push MPLS Exp=5 MPLS Exp=5 MPLS Exp=5
IP DSCP=40 IP DSCP=40 IP DSCP=40 IP DSCP=40
As its name implies, Uniform mode ensures the same priority of packets on the IP and MPLS
networks. Priority mapping is performed for packets when they are entering or leaving an
MPLS network. Uniform mode has disadvantages. If the EXP value in a packet changes on an
MPLS network, the PHB for the packet that is leaving the MPLS network changes
accordingly. In this case, the original CoS of the packet does not take effect.
MPLS Network
IP Network
IP Network
Ingress Penultimate Egress
node node node
MPLS->MPLS pop
MPLS->MPLS swap Outer label
Outer label MPLS Exp=6 MPLS->IP pop
MPLS->MPLS push Outer label
MPLS Exp=5 MPLS Exp=6 Inner label
Inner label MPLS Exp=6
Inner label Inner label
IP->MPLS push MPLS Exp=5 MPLS Exp=5 MPLS Exp=6
IP DSCP=40 IP DSCP=40 IP DSCP=40 IP DSCP=48
Pipe Mode
When carriers determine not to trust the CoS value in a packet from an IP network, the Pipe
mode can be used. The MPLS ingress delivers a new EXP value to the MPLS outer header,
and the QoS guarantee is provided based on the newly-set EXP value from the MPLS ingress
to the egress. The CoS value is used only after the packet leaves the MPLS network.
MPLS Network
IP Network
IP Network
Ingress Penultimate
Egress
node node
node
MPLS->MPLS pop
Outer label
MPLS->MPLS swap MPLS->MPLS swap MPLS Exp=1
MPLS->MPLS push Outer label Outer label Outer label
MPLS Exp=1
MPLS->IP pop
MPLS Exp=1 MPLS Exp=1
Inner label
IP->MPLS push Inner label Inner label Inner label MPLS Exp=1
MPLS Exp=1 MPLS Exp=1 MPLS Exp=1
IP DSCP=46 IP DSCP=46 IP DSCP=46 IP DSCP=46
In Pipe mode, the MPLS ingress does not copy the IP precedence or DSCP to the EXP field
for a packet that enters an MPLS network. Similarly, the egress does not copy the EXP value
to the IP precedence or DSCP for a packet that leaves an MPLS network. If the EXP value in
a packet changes on an MPLS network, the change takes effect only on the MPLS network.
When a packet leaves an MPLS network, the original CoS continues to take effect.
NOTE
In Pipe mode, the egress implements QoS scheduling for packets based on the CoS value defined by
carriers. The CoS value defined by carriers is relayed to the egress using the outer MPLS header.
MPLS Network
IP Network
IP Network
Ingress Penultimate Egress
node node node
MPLS->MPLS pop
MPLS->MPLS swap Outer label
MPLS->MPLS push Outer label MPLS Exp=1 MPLS->IP pop
Outer label
New EXP value MPLS Exp=1 MPLS Exp=1 Inner label
IP->MPLS push Inner label MPLS Exp=2
Inner label Inner label
New EXP value MPLS Exp=2 MPLS Exp=2 MPLS Exp=2
IP DSCP=40 IP DSCP=40 IP DSCP=40 IP DSCP=40
In Pipe or Short Pipe mode, carriers can define a desired CoS value for QoS implementation
on the carriers' own network, without changing the original CoS value of packets.
The difference between Pipe mode and Short Pipe mode lies in the QoS marking for the
outgoing traffic from a PE to a CE. In Pipe mode, outgoing traffic is scheduled based on a
CoS value defined by carriers, whereas outgoing traffic uses the original CoS value in Short
Pipe mode, as shown in Figure 9-8.
IP MPLS
IP
network network
network
CE PE PE CE
Flow direction
The tunnel to which a PW is to be iterated may vary after the PW bandwidth is configured. If
the PW bandwidth does not meet specific requirements or the SQ resources are insufficient,
the PW may fail to be iterate to the tunnel and becomes Down.
Implementing HQoS (L2VPN) at the Public Network Side Based on VPN + Peer
PE
In the MPLS VPN, bandwidth agreement may need to be reached between PE devices of an
operator, so that traffic between two PEs is restricted or guaranteed according to the
bandwidth agreement. To achieve this end, HQoS at the public network side that is based on
VPN + Peer PE can be adopted.
As shown in Figure 9-9, bandwidth and class of service are specified for traffic between PEs
at the MPLS VPN side. For example, in VLL1, the specified bandwidth for traffic between
PE1 and PE2 is 30 Mbit/s, and higher priority services are given bandwidth ahead of lower
priority services.
NOTE
If, however, you need to implement bandwidth restriction rather than bandwidth guarantee at the
network side, you can simply specify the CIR to be 0, and the PIR to the desired bandwidth.
Figure 9-9 Implementing HQoS at the public network side based on VLL + peer PE
flow1
flow2
Scheduler
flow3
classfier
flow4 port
flow5
flow6
Base t
flow7
flow8
por
CE2
d on
Base t
VLL 1
por
VLL 1
d on
PE2
CE1
PE1 VLL 2
VLL 2
CE4
P2
CE5
PE3
P3
VLL 2
PW1 Traffic:30M CE3
PW2 Traffic:20M
Flow queue 8
(Service flow 8) User queue 2 User group
(VLL2 + PE2) queue 2
Based on Based on
Based on PW
user queue outbound
Flow queue 1 interface
(Service flow 1) User queue 1 User group
queue 1 Port queue 1
(VLL1 + PE1)
If traffic is load-balanced among TE tunnels on peer PEs, all traffic that is load-balanced
undergoes priority scheduling and bandwidth restriction according to the traffic
scheduling procedure as shown in Figure 9-11.
NOTE
In this scenario, it is recommended that the TE tunnel that is configured with bandwidth resources be
adopted to achieve PE-to-PE bandwidth guarantee for traffic.
9.3.2 Application
End-to-End MPLS HQoS Solution
Figure 9-12shows the procedures of implementing end-to-end MPLS HQoS.
PE2 VLL 1
CE1 PW1 P2
PE1
VLL 2
CE3 PE3
PW2 P3 CE4
VLL 2
Configuring interface-
based QoS attributes
of incoming/outgoing
packets
On the CE-side interfaces of PEs, interface-based QoS policies are configured to implement
QoS enforcement on packets that are received from CEs or sent to CEs.
On the ingress PE that is, PE1, QoS policies are configured based on VLL/VLL instance +
peer PE for packets that are sent to the public network side. Besides, to deliver end-to-end
QoS guarantee for traffic, the TE tunnel that is allocated bandwidth can be adopted to carry
VLL traffic. In addition, on PEs, QPPB can be configured to propagate QoS policies, and the
MPLS DiffServ model can be configured so that in MPLS VPN services, both the private
network and the public network that are configured with the DiffServ QoS model can
communicate.
On the P node, QoS policies are enforced based on interface/TE tunnel without distinguishing
between VLL services and non-VLL services.
10.1 Introduction
10.2 Principles
10.3 Applications
10.1 Introduction
Definition
Multicast virtual scheduling is a traffic scheduling mechanism for subscribers who demand
multicast programs. After a subscriber joins a multicast group, if multicast traffic needs to be
copied based on the multicast VLAN or if the replication point is a downstream device, the
bandwidth for the unicast traffic of the subscriber is adjusted accordingly. As a result,
bandwidths for the unicast traffic and multicast traffic of the subscriber are adjusted in a
coordinated manner.
Purpose
Multicast virtual scheduling is a subscriber-level traffic scheduling. It adjusts the bandwidths
for the unicast traffic and multicast traffic of a subscriber in a coordinated manner without
changing the total bandwidth of the subscriber, thus ensuring the quality of BTV services of
the subscriber.
As shown in Figure 10-1, a family views multicast programs (multicast data) through a Set
Top Box (STB) and browses the Internet (unicast data) through a PC. For example, the
maximum bandwidth for the family is 3 Mbit/s. The Internet service occupies all the 3 Mbit/s
bandwidth, and then the user demands a multicast program requiring a bandwidth of 2 Mbit/s
through the STB.
As the multicast data and unicast data require the bandwidth of 5 Mbit/s in total, data
congestion will occur in the access network, and some packets will be discarded. Therefore,
the quality of the multicast program cannot be ensured.
The multicast virtual scheduling can solve the problem shown in Figure 10-1. The router is
configured with the multicast virtual scheduling feature. When the sum of the multicast traffic
and unicast traffic received by a user is greater than the bandwidth assigned to the user, the
router reduces the bandwidth for unicast traffic of the user to 1 Mbit/s to meet the requirement
of bandwidth for multicast traffic. Therefore, the multicast program can be played normally.
10.2 Principles
As shown in Figure 10-2, the maximum bandwidth for traffic from the DSLAM to the
subscriber is 3 Mbit/s. Assume that the subscriber uses up the 3 Mbit/s of bandwidth for
unicast traffic service, and then demands a multicast program which requires 2 Mbit/s of
bandwidth. In this case, the total traffic required by the subscriber is 5 Mbit/s, much higher
than the allowed 3 Mbit/s bandwidth. As a result, the link between the DSLAM and LAN
Switch is congested, and packets begin to be dropped. Because the DSLAM does not provide
QoS treatment, packets are randomly discarded. As a result, multicast traffic is discarded, and
the subscriber cannot have quality service for the requested multicast program. To ensure
quality service for the requested multicast program, the BRAS needs to be configured to
dynamically adjust the bandwidth for unicast traffic according to the bandwidth for multicast
traffic. The DSLAM sends the IGMP Report message of the subscriber through the
subscriber's VLAN to the BRAS. After receiving the IGMP Report message, the BRAS
reduces the bandwidth for the subscriber's unicast traffic to 1 Mbit/s, leaving the remaining 2
Mbit/s for the subscriber's multicast traffic. In this manner, quality service is ensured for the
requested multicast program.
RADIUS server
STB
interface2 Internet
interface1 100.1.1.1/24
LAN switch DSLAM
Device
Internet user
10.3 Applications
When the replication point of multicast traffic is not on the BRAS, multicast virtual
scheduling can be applied in the following two typical scenarios.
STB
interface2 Internet
interface1 100.1.1.1/24
LAN switch DSLAM
Device
Internet user
bandwidth of the subscriber. Then, Device B forwards the requested multicast traffic through
the multicast VLAN, and the downstream device copies the multicast traffic to the subscriber.
In addition to forwarding multicast data to the subscriber, Device B also forwards the
multicast data to Device A. Device A measures the received multicast data, and implements
multicast virtual scheduling based on the measurement result.
interface2
100.1.1.1/24
interface1
11 L2TP QoS
Purpose
In the L2TP service wholesale scenario, an L2TP Access Concentrator (LAC) is responsible
for service wholesale, whereas an L2TP Network Server (LNS) is the service control point.
There is an L2TP tunnel between the LAC and the LNS. Traffic being transmitted over the
L2TP tunnel needs to be controlled in a refined manner on the LNS. This is to minimize the
impact on service quality caused by out-of-order resource competition between the LAC and
the LNS. In addition, an ISP can control the traffic that enters different service tunnels on the
ISP network. This prevents burst traffic of different users.
L2TP HQoS provides QoS scheduling on the traffic of LNS-connected users to control traffic
in the L2TP tunnel in a refined manner.
11.2 Principles
11.2.1 Principles
Figure 11-1 L2TP networking diagram
Remote user
PSTN/ISDN LNS
L2TP tunnel
NAS
Internet sever
Remote branch
As shown in Figure 11-1, users need to log in to a private network through a Layer 2
network.
The LAC is a Layer 2 network device that can process PPP packets and support L2TP
functions. Usually it is an access device on the local Internet Service Provider (ISP) network.
The LAC is deployed between an LNS and a remote system (a remote user or a remote
branch).
The LAC performs traffic rate limiting, flow queue mapping, and traffic scheduling.
The LNS is the receiving end of a PPP session. Users authenticated by the LNS can log in to
the private network to access resources.
The LNS can also perform traffic rate limiting, flow queue mapping, and traffic scheduling.
The LNS supports the following QoS scheduling modes:
l Tunnel-specific scheduling
In this mode, services of each user are not differentiated, and therefore there is no need
to allocate a Subscriber Queue (SQ) for each user. Instead, an L2TP tunnel is allocated
an SQ.
l Session-specific scheduling
In this mode, each user is allocated an SQ and an L2TP tunnel is allocated a Group
Queue (GQ). Each user has one to eight Priority Queues (PQs) on which Strict Priority
(SP) scheduling or Weighted Fair Queue (WFQ) scheduling can be performed.
3G 3rd Generation
BA Behavior Aggregation
BC Bandwidth Control
BE Best-Effort
CE Customer Edge
CQ Class Queue
CR Core Router
CS Class Selector
CT Class Type
DF Don't Fragment
DS Differentiated Service
EF Expedited Forwarding
FL Flow Label
FQ Flow Queue
FR Frame Relay
GQ Group Queue
HG Home Gateway
IP Internet Protocol
IPinIP IP in IP Encapsulation
LR Line Rate
MF Multiple Field
MP Merge Point
PC Personal Computer
PE Provider Edge
PQ Priority Queuing
RR Round Robin
SP Strict Priority
SQ Subscriber Queue
SR Service Router
TB Target Blade
TC Traffic Class
TE Traffic Engineering
TLV Type-Length-Value
TM Traffic Manager
TP Traffic Policing
VC Virtual Circuit
VE Virtual Ethernet
VI Virtual Interface
VP Virtual Path