Professional Documents
Culture Documents
Carrier Ethernet
Service Level Agreement Support Tools
Abstract
The growth in popularity of Business Ethernet services is closely linked to the level of maturity Ethernet technology has reached, enabling carriers to deliver and audit hard service level agreement (SLA) guarantees that satisfy exacting requirements from enterprise users. Carriers and service providers deploying Business Ethernet VPNs must be prepared to ensure measurable and enforceable SLAs that detail commitments for user traffic handling, availability and performance guarantees, among others. Focusing on a Layer 2 VPN use case, this application guide reviews the various service delivery and service assurance support mechanisms that carriers and telecom providers can utilize to ensure service reliability, measurable KPIs (key performance indicators) and SLA commitments.
Contents
1 Business Ethernet Services and the Evolution of Carrier SLAs .................................................. 2 1.1 2 Business Ethernet SLA support tools ............................................................................... 4
SLA and Service Description .................................................................................................... 5 2.1 2.2 2.3 2.4 Layer 2 VPN use case...................................................................................................... 5 Service description ......................................................................................................... 6 Traffic Mapping .............................................................................................................. 8 Bandwidth Commitments .............................................................................................. 10 Effective Throughput ............................................................................................ 13
Performance Guarantees .............................................................................................. 15 Layer 2 Control Protocol Processing .............................................................................. 17 Service Availability, Response and Repair Time .............................................................. 17
Service Delivery .................................................................................................................... 19 3.1 3.2 3.3 3.4 3.5 3.6 Classification ................................................................................................................ 20 Metering and Policing ................................................................................................... 21 Hierarchical Scheduling Level 0 ..................................................................................... 23 Shaping ........................................................................................................................ 26 Hierarchical Scheduling Level 1 ..................................................................................... 27 Packet Editing and Marking ........................................................................................... 29
Service Assurance................................................................................................................. 31 4.1 4.2 Critical Service Test Points ............................................................................................ 31 Service Validation Tests ................................................................................................ 33 Connectivity verification........................................................................................ 36 Fault detection and diagnostic loopbacks .............................................................. 36 Performance monitoring ....................................................................................... 39 Throughput measurements (RFC 2544) ................................................................. 40
Conclusion ................................................................................................................................... 41
Naturally, service quality and assurance are pre-requisites for the enterprise market a fact that is clearly recognized by telecom providers, who consider service level agreements important to their strategy for winning corporate business2. Before they migrate all their corporate traffic to new Ethernet services, organizations need to be assured that theyll receive appropriate quality of service (QoS) and performance guarantees to support critical applications. In a Heavy Reading 2008 survey, over 87% of polled enterprise users indicated that service reliability was a key factor in choosing their provider. Enterprise users are expecting the same service consistency and reach that have been offered by legacy TDM, ATM and Frame Relay a requirement that best effort Ethernet services were unable to fulfill. They also demand service differentiation to facilitate efficient operations and to meet their particular business needs, both current and future. Table 1 summarizes the must-have carrier-class service attributes of business Ethernet offerings.
Reliable
Automatic fault isolation and quick troubleshooting; 24x7x365 support Minimal service disruptions due to link failures Quality of service priority guarantees per class of service (CoS) VPN and data security Low expenditures on customer located equipment and multi-site connectivity
Economical
High throughput without heavy investments in infrastructure and equipment Scalable data rates, provisioned remotely, for pay as you grow flexibility Minimal down-time for servicing and repair Differentiated SLA-based performance commitments for voice, video and data
Accountable
Clear network visibility, proactive service monitoring Real-time, on-demand reporting linked to OSS and billing systems SLA-defined penalties and credits based on performance targets Data rates from 1 Mbps to 1 Gbps and beyond
Limitless
Consistent service over any infrastructure (fiber, PDH, SDH/SONET, xDSL) Versatile connectivity options (point-to-point, any-to-any) Table 1: Business-grade attributes of Carrier Ethernet services
Source: The 2007 IBM Institute for Business Value and Economist Intelligence Telecom Industry Executive Survey
1.1
Carriers and service providers deploying business Ethernet VPNs must also be prepared to deliver measurable and enforceable SLAs that detail commitments for user traffic handling, bandwidth and performance guarantees, user control protocols processing and availability, as well as for response and repair times. This requires the installation of intelligent demarcation devices, or network termination units (NTUs), at the customer premises, to ensure end-to-end service control and efficient service provisioning from the service hand-off points. Such Ethernet demarcation devices are ideally equipped with Ethernet SLA support tools, including advanced service delivery and service assurance capabilities, as shown in Figure 2.
The following chapters explain the various functionalities and support mechanisms available to telecom providers for delivering business Ethernet SLAs, using a specific service scenario as an example. In this scenario, a service provider is delivering a managed Layer 2 VPN service to its business customer over a native Ethernet network with fiber, PDH and DSL access. The enterprise uses Ethernet virtual connections (EVCs) to transport various types of traffic between remote branches and company headquarters, as illustrated in Figure 3 below. Table 2 provides examples of services and applications matching the different traffic types. The L2 VPN conforms to a service level agreement, which specifies performance commitments for different QoS levels, depending on traffic type and application.
Figure 3: Managed Layer 2 VPN service over fiber, PDH and DSL access
Typical Application Examples IP telephony (VoIP), IP video Critical data applications, storage and LAN-to-LAN connectivity between local enterprise routers Business Internet access
2.2
Service description
The managed Layer 2 VPN in this example is delivered between corporate headquarters and two branches, which are not only located in remote sites, but are also connected to the service providers network by different technologies. Network access for Headquarters is fiber-based, whereas Branch A is connected over multiple bonded copper PDH circuits and Branch B over SHDSL.bis lines. To meet the particular networking needs of the enterprise, the L2 VPN service is deployed in a point-to-point EVPL (Ethernet Virtual Private Line) topology between Headquarters and the branches, using a different EVC for each branch-to-HQ connection. The service provider installs intelligent Ethernet NTUs at the customer premises. These demarcation devices feature the service hand-off points (UNI: User-Network Interfaces) and support the particular capabilities required at each location, as well as the available access: At Headquarters: A RAD ETX-202A Ethernet over fiber demarcation device provides a service multiplexed UNI, whereby all the EVCs share the same UNI for efficient utilization of available interfaces. The network connection rate is 100 Mbps via two redundant Fast Ethernet/Gigabit Ethernet ports, enabling future upgrades up to 1 Gbps to accommodate an anticipated increase in traffic volumes to and from this location. At Branch A: A RICi Ethernet over bonded PDH demarcation device with a non-multiplexed UNI that is dedicated to a single EVC and supporting a network access rate of 32 Mbps. At Branch B: An LA-210 Ethernet over DSL demarcation device with a non-multiplexed UNI, supporting a line rate of up to 22.8 Mbps over four bonded pairs of SHDSL.bis links.
The NTUs perform traffic processing and SLA management to ensure consistent user experience and to maintain SLA metrics end-to-end, despite the difference in transport technologies and devices. Tables 3 and 4 summarize the different service parameters of the various UNIs and EVCs.
Branches B 1-22.8
EVC Service Attributes EVC Type CE-VLAN ID Preservation IEEE 802.1Q3 EVC1 Yes CE-CoS Preservation IEEE 802.1p Unicast Frame Delivery Multicast Frame Delivery Broadcast Frame Delivery Max Frame Size (bytes)4 Table 4: EVC service attributes
3 4
See section 2.3: Traffic Mapping Maximum frame size should correspond with the relevant burst size values (CBS and EBS). For further details, see section 2.4: Bandwidth Commitments
2.3
Traffic Mapping
There are two EVCs connecting HQ to the branches: EVC1 links Headquarters to Branch A, while EVC2 connects it to Branch B. Within the network, these EVCs are identified by service provider VLAN tags (SP-VLANs), which are added to customer frames by the local demarcation device upon entering the network and then stripped off at network egress (push and pop operations). Inband management traffic is allocated a dedicated SP-VLAN to separate it from user traffic. The EVCs deliver real-time (RT), priority data (PD) and best effort (BE) traffic between locations, with each traffic type representing a different class of service within the EVCs (EVC.CoS). As each class of service requires its own QoS guarantees, it is marked differently so it can be distinguished by the Enterprises equipment and, more importantly, by the network: In EVC1, this is done by the three-bit priority field (P-bits) of a customer-assigned VLAN tag (CE-VLAN), while in EVC2 traffic classes are identified by different customer VLAN IDs (CE-VID). Since both EVCs are associated with multiple traffic types, a mapping plan of CE-VLANs and CE-P bits to EVCs is defined in advance to ensure efficient traffic delivery. Tables 5 and 6 detail the correlation between VLAN IDs (VIDs) and EVCs.
Service Point
UNI H (Headquarters)
UNI A (Branch A)
Network
EVC ID Tags RT Traffic PD Traffic BE Traffic Management Traffic 17 CE-VLAN CE-P bit 6 4 1 N/A 17
Service Point
UNI H (Headquarters)
UNI B (Branch B)
Network
EVC ID Tags RT Traffic PD Traffic BE Traffic Management Traffic CE-VLAN 42 43 44 CE-P bit x y z N/A
Table 6: Mapping CE-VLANs to EVC2, services are separated by customer VLAN tags
Because the customers equipment in Branch A is capable of traffic differentiation based on P-bit values, all traffic is assigned a single VID with a separate P-bit per service. User equipment and IT considerations in Branch B, however, require that each class of service receive its own CE-VLAN tag. In this case, packets carrying the same CE-VID will be treated similarly by the network, regardless of their specific CE-P bit value. All traffic assigned to EVC1 carry an outer SP-VID 2,000, while traffic associated with EVC2 is double-tagged with SP-VID 2,001. The different classes of service within each EVC are marked with different SP-P bit values. As can be seen in Tables 5 and 6, the classes of service in each EVC are tagged differently at the associated UNIs. In EVC1, both locations use CE-VLAN 17.6 (CE-VID 17, CE-P bit 6) to mark RT, 17.4 for PD and 17.1 for BE and therefore ingress/egress CE-VLAN ID preservation is required between locations. This is not in the case in EVC2, where the various service types are assigned different CEVIDs at each location and the local demarcation devices must swap CE-VLAN tags in egress frames when the SP-VLAN tags are popped, for example, replacing CE-VID 42 with CE-VID 2 for RT traffic arriving at Branch B from Headquarters.
2.4
Bandwidth Commitments
The EVPL SLA contains throughput commitments, divided into the following bandwidth profile categories: Committed Information Rate (CIR): The bandwidth that the service provider guarantees the enterprise, regardless of network conditions. Excess Information Rate (EIR): The bandwidth allowance for best effort delivery, for which service performance is not guaranteed and traffic may be dropped if the network is congested. The combination of CIR and EIR rates is typically referred to as PIR, or Peak Information Rate, which represents the total burstable bandwidth sold to the enterprise. Committed Burst Size (CBS): The maximum size, expressed in bytes, of a burst of back-to-back Ethernet frames for guaranteed delivery. Excess Burst Size (EBS): The maximum size of a burst of back-to-back Ethernet frames permitted into the network without performance guarantees. EBS frames may be queued or discarded if bandwidth is not available. According to MEF (Metro Ethernet Forum) specifications, the bandwidth profile service attribute, which includes some or all of the above categories, can be defined per UNI, per EVC or per CoS identifier (CoS ID; EVC.CoS). For any given frame, however, only one such model can apply. The service provider meets the bandwidth guarantees by reserving appropriate network resources and employing a two-rate/three-color (trTCM) rate-limitation methodology as part of its traffic engineering policy to ensure compliance by user traffic. For the service discussed in this paper, the policing function is performed by EVC.CoS granularity, as described in further detail in Chapter 3: Service Delivery.
10
Tip: EIR as a Revenue Generator EIR offerings enable carriers to generate more revenues from a given network capacity without compromising the quality of premium or real-time CIR services. As bandwidth consumption fluctuates throughout the day and the week, carriers and service providers can oversubscribe the network and monetize unused portions of it by selling best effort services, provided that the customer-located demarcation devices are equipped with reliable traffic management capabilities. This allows total bandwidth charges to exceed actual infrastructure rates. However, because EIR bandwidth is shared among users and applications, not all users are able to take advantage of the entire excess bandwidth simultaneously.
11
Table 7 lists the bandwidth commitments for each class of service within EVC1 and EVC2, which are applicable to all UNIs even though these support different access rates. To avoid delays in traffic delivery, the bandwidth profiles in each EVC should not exceed the lowest UNI speed in the service points connected by that EVC i.e., 32 Mbps for EVC1 (UNI A) and 22.8 Mbps for EVC2 (UNI B). As can be seen in Table 7, the total CIR allowance for all classes of service in EVC1 is 25 Mbps, permitting a maximum of 7 Mbps EIR to meet UNI As access connection speed limit. To better serve corporate operations, the enterprise purchases higher EIR rates for PD and BE traffic, allowing up to 10 Mbps for each of these classes of service if no other traffic is transmitted at the time. In EVC2, the total PIR bandwidth is 30 Mbps, of which 20 Mbps are CIR and 10 Mbps of EIR are divided between PD and BE traffic, allowing up to 5 Mbps for each, provided that no other traffic is transmitted simultaneously. RT applications are typically allocated CIR bandwidth only, BE EIR only and PDs bandwidth profiles are divided between CIR and EIR commitments.
EVC
Bandwidth Profile EIR (Mbps) 0 10 10 20 0 5 5 10 CBS (Bytes) 150 5,000 0 -150 3,500 0 -EBS (Bytes) 0 5,000 2,500 -0 3,000 2,500 --
Table 7: Effective bandwidth commitments per EVC.CoS The CBS and EBS values should correspond with the frame sizes that typically make up each class of service, as well as with the maximum frame size allowed at the UNI. Here, for example, a CBS value of 5,000 bytes for PD traffic in EVC1 permits up to three frames of 1,522 bytes in each burst. A general rule of thumb correlates between CBS value, frame size and their effect on network delay: Large frames transmitted in a service that receives a low CBS value are more prone to delays, since the burst allowance is exhausted quickly by a relatively low number of frames. In such cases, new frames must await subsequent bursts.
12
Figures 5 and 6 illustrate CIR and EIR values in UNI A and UNI B, respectively.
13
Obviously, the effective throughput is directly impacted by the frame size. For large Ethernet frames carrying, for example, 1,500 bytes of data payload at a line rate of 10 Mbps, the calculation will be as follows: 1. Total frame size = 8B (Preamble + SFD) + 6B (DA) + 6B (SA) + 4B (SP-VLAN) + 4B (CE-VLAN) + 2B (T/L) + 1,500B (data payload) + 4B (FCS/CRC) + 12B (IFG) = 1,546 bytes 2. User data = 6B (DA) + 6B (SA) + 4B (CE-VLAN) + 2B (T/L) + 1,500B (data payload) + 4B (FCS/CRC) = 1,522 bytes 3. Ethernet overhead = {[8B (Preamble + SFD) + 4B (SP-VLAN) + 12B (IFG)] / 1,546 bytes (total frame size)} x 100% = 1.55% 4. Effective throughput = [1,522 bytes (user data) / 1,546 bytes (total frame size)] x 10 Mbps (line rate) = 9.84 Mbps However, smaller frames using the same line rate are characterized by a lower effective throughput due to higher overhead relative to their size, as demonstrated by the following calculation for a 46byte payload data frame: 1. Total frame size = 8B (Preamble + SFD) + 6B (DA) + 6B (SA) + 4B (SP-VLAN) + 4B (CE-VLAN) + 2B (T/L) + 46B (data payload) + 4B (FCS/CRC) + 12B (IFG) = 92 bytes 2. User data = 6B (DA) + 6B (SA) + 4B (CE-VLAN) + 2B (T/L) + 46B (data payload) + 4B (FCS/CRC) = 68 bytes
14
3. Ethernet overhead = {[8B (Preamble + SFD) + 4B (SP-VLAN) + 12B (IFG)] / 92 bytes (total frame size)} x 100% = 26% 4. Effective throughput = [68 bytes (user data) / 92 bytes (total frame size)] x 10 Mbps (line rate) = 7.39 Mbps The actual throughput experienced by the enterprise is therefore dependent on the relative proportions of various applications in the traffic mix. A higher share of 68-byte user data packets, such as being used for most VoIP traffic, will result in lower throughput efficiency. In addition to the Ethernet-related bandwidth penalties, the physical media used for transmission may require further overhead for framing and encapsulation. For example, Ethernet over DSL throughput is affected by the particular transport protocol being used: The traditional DSL protocol stack includes an ATM sub-layer, which presents heavy bandwidth fines (cell tax) of up to 20%-50%; the more recent EFM (Ethernet in the First Mile) encoding, such as used by the LA-210 demarcation device at UNI B, enables improved line utilization and a 5% overhead. Likewise, multi-circuit copper access that is powered by Ethernet over NG-PDH capabilities, as is the case for the RICi demarcation device at UNI A, can rely on constant, predictable and lower overhead with GFP (generic framing protocol), VCAT (virtual concatenation) and LCAS (link capacity adjustment scheme) encapsulation and bonding tools, compared to the less-efficient HDLC, MLPPP and IMA methods.
2.5
Performance Guarantees
A key element in the SLA defines the performance and QoS guarantees that the service provider commits to the enterprise, specifically, frame delay, delay variation and frame loss. Frame Delay (Latency) is the time a transmitted frame travels across the network until it is delivered. VoIP and real-time services require extremely low latency, as even the smallest delay has a dramatic effect on service quality. TCP applications are also impacted from increased network delay, taxing the network resources with re-transmissions when session timeouts occur. Frame Delay Variation (Jitter) is the difference in delay between consecutive frames, causing them to arrive at their destination at inconsistent intervals. Jitter is a critical performance parameter for real-time services.
15
Frame Loss Ratio is the percentage of undelivered frames out of all the frames that were transmitted within a certain time interval. Packet loss might lead to service degradation and can have a negative effect on throughput when dropped frames are re-transmitted, as is the case with TCP/IP applications. The nominal values for the above performance commitments are specified in the SLA, together with qualifying parameters, such as the service direction (one-way or round-trip), the percentage of traffic and the time interval for which these commitments are valid. Table 8 details the performance metrics guaranteed by the service provider for the enterprise. These are presented per class of service and refer to both EVCs, in all locations.
Performance Attribute
Real-Time (VoIP)
Priority (LAN-to-LAN)
Frame Delay Value (ms) Percentile (%) Direction Time Interval (Hrs) <5 99 One-way 1 5-15 99 One-way 1 <30 99 One-way 1
Frame Delay Variation Value (ms) Percentile (%) Direction Time Interval (Hrs) <1 99 One-way 1 N/A N/A
Frame Loss Value (%) Direction Time Interval (Hrs) <0.001 One-way 1 0.2 0.05 One-way 1 0.2 <0.5 One-way 1 0.2
16
Table 8 also specifies the service providers commitment for service restoration in the event of network or equipment failure, a parameter that also affects the Service Availability performance attribute discussed in section 2.7.
2.6
Another aspect of the service that must be defined in advance refers to the handling of user Ethernet control protocols (L2CP Layer 2 Control Protocols), to avoid duplication of user and provider bridged protocol data units (BDPUs). Table 9 lists the processing instructions for the enterprises L2CP, according to MEF recommendations for an EVPL service.
Layer 2 Control Protocol STP Spanning Tree Protocol RSTP Rapid Spanning Tree Protocol MSTP Multiple Spanning Tree Protocol Pause IEEE 802.3 x LACP Link Aggregation Control Protocol Authentication IEEE 802.1 x GARP Generic Attribute Registration Protocol Discard Discard Discard Discard Discard Discard Discard
2.7
Finally, the service provider offers various SLA packages that differ in the service support, time to repair (TTR) and service availability commitments that they offer. In this case, the enterprise selects the Gold package as it reflects the level of support and reliability best suited for its needs. The relevant SLA metrics are listed in Table 10.
17
Service Level
Service Center Hours Mon-Fri 08:00-17:00 Mon-Sat 08:00-20:00 Mon-Sat 08:00-20:00 Mon-Sun 00:00-24:00
Repair Time
Standard
12 Hours
Silver
3 Hours
10 Hours
99.9%
Gold
2 Hours
8 Hours
99.99%
Platinum
1 Hours
4 Hours
99.999%
Table 10: Service package parameters for availability, response and repair time Service availability, or uptime, is typically calculated on a monthly basis, after measuring the number of minutes and seconds that the network or service were unavailable to the enterprise. To determine customer remedies for SLA breaches, unavailability instances include service outages and network downtimes associated with unscheduled maintenance events. This means that, in a 30-day month with no scheduled down-time, the enterprise should not experience service unavailability for more than 4 minutes and 19 seconds throughout the entire month [60 minutes x 24 hours x 30 days x (1-0.9999) unavailability threshold]. According to the terms of the Gold service package, the service provider assures a maximum TTR of 8 hours from the moment the customer opens a Trouble Ticket.
18
3 Service Delivery
As per the enterprises SLA terms, VoIP traffic requires different service quality than email communications and therefore must be handled separately by the network. By delivering multiple services from each UNI with differentiated, per-service QoS parameters the service provider caters to the enterprises needs, while lowering its own operational costs and improving its profit margins. To satisfy the SLA guarantees that are listed in Chapter 2, the Carrier Ethernet demarcation devices must support multi-priority, multi-flow traffic and ensure latency, jitter and packet delivery performance for each flow. These devices are therefore equipped with capabilities such as metering, policing and shaping of user traffic, as well as with a two-stage queuing mechanism that ensures predictable performance and creates scheduling fairness with better load distribution in the network. The EVPL service is defined as CoS-aware, with both bursty and real time traffic and VLAN-based EVCs. Accordingly, upstream user traffic undergoes the following processing steps by the ETX-202A, RICi, and LA-210 demarcation devices, to ensure that QoS and SLA commitments are met: Classification Metering and policing Hierarchical scheduling (Level 0) Shaping Hierarchical scheduling (Level 1) Marking and editing
The following sections describe in detail ingress traffic processing as performed at Headquarters (UNI H).
Tip: Rate Limitation of Downstream Traffic In some cases, downstream traffic also requires rate-limiting in the form of metering, policing and shaping, to ensure that egress traffic does not exceed user equipment port limits. This is required when UNIs receive traffic from several sources simultaneously, such as in E-LAN services involving any-to-any connectivity between numerous remote branches and a company headquarters. In these cases, the aggregate traffic arriving from multiple sites may exceed the bandwidth limit of the customers local equipment at a particular location. Asymmetric ratelimiting, i.e., different policies implemented for upstream and downstream traffic, is therefore often tasked to the local demarcation device.
19
3.1
Classification
Traffic arriving from customer equipment is first classified according to its type. The demarcation devices QoS engine associates incoming traffic by flows, which represent the various classes of service within a particular EVC (EVC.CoS), i.e., Real-Time, Premium Data and Best-Effort. The demarcation devices sort traffic by the user port through which it arrives, together with different CoS ID selectors for each EVC. These are the customers VLAN tag priority fields for EVC1 and userassigned VLAN tags for EVC2, as per the mapping charts in Tables 5 and 6. Consequently, Flows 1, 2 and 3 make up EVC1, while Flows 4, 5 and 6 are delivered in the network as EVC2.
Tip: QoS Classification Criteria High flexibility in traffic classification, manifested by the ability to support a wide variety of sorting criteria, allows service providers to identify various traffic types at fine granularity and ensure appropriate quality of service for each flow. In addition, it eliminates the limitation of by-VLAN-only classification, which is restricted to 4,096 unique IDs. Ideally, criteria alternatives will include such CoS ID selectors as VLAN ID, 802.1p, DSCP, IP precedence, EtherType, MAC address, IP address, and many others, as well as their combinations, depending on the capabilities of the demarcation devices.
20
3.2
Once the flows are established, a metering and policing function is applied for each flow to regulate traffic according to the contracted CIR, EIR, CBS, and EBS bandwidth profiles. Rate limitation is performed according to the Dual Token Bucket mechanism, using a trTCM algorithm, as seen in Figure 9.
Green: Frames admitted to network Yellow: Frames admitted to network on a best effort basis Red: Discarded frames
Let us take, for example, a flow containing Priority Data traffic in EVC1, for which the QoS parameters defined in the SLA are as follows: CIR = 20 Mbps, EIR = 10 Mbps, CBS = 5,000 bytes, and EBS = 5,000 bytes. Three Ethernet frames are sent by the user 2 microseconds apart; all three are 1,522 bytes in size and all are mapped to flow number 2, marked as 17.4 (CE-VLAN ID 17, P-bit 4). The first frame drains 1,522 bytes of the 5,000 CBS bytes, leaving 3,478 bytes remaining. Since the frame size is smaller than the CBS limit, it is marked as Green and admitted forward. The token bytes are refilled at the CIR rate of 20 Mbps, or 2.5 Megabyte per second, resulting in 5 additional bytes in the bucket when the second frame arrives (2.5 Megabyte per second x 2 microseconds). Together, there are now 3,483 bytes available for the second frame (3,478 + 5). The second drains another 1,522 bytes, leaving an allowance of 1,961 bytes in the bucket, to which 5 bytes are added by the time the third frame arrives. The total of 1,966 bytes (1,961 + 5) is still enough for the 1,522 bytes of the third frame, but not for the one following it 2 microseconds later.
21
The fourth frame is also 1,522 bytes in size, however, at the time of its arrival there are only 449 bytes available, after the previous frame drained 1,522 bytes and 5 bytes were added between frames (1,966 1,522 + 5 = 449). The fourth frame is therefore examined by the excess bandwidth threshold, and, as it is within the EBS limit of 5,000 bytes, it is marked as Yellow and passed forward on a best effort basis. Since non-conformant packets are discarded, rather than queued or buffered, the metering function is accompanied by a policing function. Another method for traffic engineering shaping is used at a later stage to ensure that transmission is performed in a way that best utilizes network resources.
22
3.3
The next traffic processing phase defines the order in which the various flows are forwarded, using a two-step scheduling mechanism so that each flow receives the desired scheduling priority. In level 0, different flows are assigned separate output queue blocks, each containing scheduling slots corresponding with CoS delivery priorities. Technically, each of the six flows in our EVPL service can be assigned a dedicated queue block; however, this is not necessary as the flows are already sorted by class of service. Instead, the three flows associated with EVC1 (17.6, 17.4, and 17.1) are assigned one queue block while the flows associated with EVC2 (CE-VLAN IDs 42-44) are assigned another, resulting in a total of two blocks. In the latter case, each of the three VLANs is permanently mapped to its designated CoS queue, regardless of the CE-P bit it carries.
23
As can be seen in Figure 11, each queue cluster contains up to eight slots, whereby CoS 7 is mapped to the highest priority queue, normally reserved for the service providers management traffic, and CoS 0 the lowest. The ETX-202A supports a combination of traffic scheduling techniques, whereby applications requiring low latency and jitter are mapped to Strict Priority queues, while other services are mapped to the remaining slots using weighted fair queuing (WFQ): The Strict Priority queues ensure minimal latency and jitter for the RT traffic, even when a large amount of bursty data traffic is sent over the same uplink. Strict Priority traffic will always be processed first, while flows mapped to the WFQ slots are buffered until the Strict Priority queues are empty. The WFQ technique avoids scheduling starvation of lower priority queues and ensures relatively fair allocation of bandwidth by sharing it among all flows. In this manner, packets belonging to lower classes of service are not penalized when higher priority queues are not empty and may still receive transmission time. QoS-conformant scheduling is handled by assigning different weights to the various queues instead of equally dividing overall bandwidth among all active flows.
Tip: Using Scheduling Queues to Deliver SLA Bandwidth Guarantees To ensure adherence to SLA bandwidth guarantees, it is important to correlate weight distribution among the queues with the committed rates allocated for each service. In EVC1, for example, with network access rate of 32 Mbps, the real-time traffic is mapped to one of the Strict Priority queues to ensure expedited delivery. According to the enterprises SLA bandwidth commitments listed in Table 7, the RT flow requires a committed rate of 5 Mbps, leaving 27 Mbps to be divided between the other two classes of service that are mapped to the WFQ slots CoS 4 (PD traffic) and CoS 1 (best effort traffic). The recommended weight ratio between these queues is 22:5. This allows for 22 Mbps for CoS 4 to ensure it receives its 20 Mbps CIR value and some of the EIR bandwidth, leaving 5 Mbps of EIR for CoS 1. The same principal can be applied to EVC2, for which the access rate is 22.8 Mbps. After securing 5 Mbps of CIR for the RT traffic with Strict Priority queuing, the remaining 17.8 Mbps are divided at a ratio of 8:1 between the PD and BE flows, respectively. This ensures 15 Mbps CIR and 1 Mbps EIR for CoS 4 and almost 2 Mbps EIR for CoS 1.
24
While most of the management traffic is mapped to the highest SP queue, OAM (operations, administration and maintenance) traffic and performance measurement messages should be assigned the same queue slot as that of the data they test. In other words, OAM messages testing CE-VLAN 17.6 are mapped to the Strict Priority queue assigned to CoS 6. As the queues are filling up, new packets face a growing risk of being discarded due to lack of buffer space. When packets arriving to overrun queues are dropped indiscriminately, such as in a Tail Drop mechanism, differentiated QoS cannot be maintained and network performance is hindered by intermittent periods of flooding and underutilization. The ETX-202As QoS engine solves such issues by employing a weighted random early detect (WRED) mechanism for intelligent queue management and congestion avoidance. The WRED algorithm monitors the state and size of each queue and determines whether an incoming packet should be buffered or dropped, based on statistical probabilities: Green-marked packets are directed to their respective queues, while yellow-marked packets are admitted forward in accordance with their WRED profile. Near-empty queues accept all incoming packets but as they begin to fill, the drop probability for new packets increases. The different queues are allocated different occupancy thresholds, above which incoming packets are discarded at random at a growing rate as the queue fills, until the queue has reached a maximum threshold and all incoming packets are dropped. As can be seen in Figure 12, various classes of service are assigned drop values that reflect their priorities. This way, packets of lower classes of service with lower QoS commitments will be dropped earlier and at a greater rate than those of a higher CoS.
25
The blue curve in Figure 12 represents the lowest priority queues for classes of service 0 and 1. Their random packet discarding begins early, for example, when the queues are only 20% full. As the queue reaches 40% occupancy it hits the 60% drop probability mark, after which packet dropping picks up pace until the WRED mechanism stops the random discard and drops all packets. By contrast, the highest priority queues for classes of service 6 and 7, represented by the purple curve, do not drop packets until they are almost full.
3.4
Shaping
Traffic coming out of the level 0 queue blocks is shaped to smooth out bursts and avoid buffer overruns in subsequent network elements. At this stage, output packets from each buffer block undergo a shaping function so that the overall traffic volume from each block does not exceed a preset bandwidth value. Shaping is performed according to a Token Bucket algorithm, with a single rate bandwidth profile that is based on the accumulated CIR values of all the flows mapped to the relevant queue block and a certain allowance of excess rate that the service provider assigns to the enterprise to avoid congestion: Shaper rate queue block 0 = 5 Mbps (CIR Flow 1) + 20 Mbps (CIR Flow 2) + 0 (CIR Flow 3) = 25 Mbps + excess allowance Shaper rate queue block 1 = 5 Mbps (CIR Flow 4) + 15 Mbps (CIR Flow 5) + 0 (CIR Flow 6) = 20 Mbps + excess allowance The shaping function also compensates for the network data packet overhead that is added at later stages, as well as for service provider OAM traffic. Packets exceeding the shaper value are delayed in the buffer until they can be transmitted to the network. The multiple shaping rates mechanism is an important tool to ensure that outgoing traffic volume is in line with the access connection of the remote service point: The shaper for queue block 0 matches the bonded PDH bandwidth capacity of UNI A, while the shaper for queue block 1 is set to meet UNI Bs xDSL access rate. The shaping phase is illustrated in Figure 13.
26
3.5
In the second step of the scheduling process, each queue cluster in level 0 receives a queue slot in level 1, with each slot corresponding with a different EVC. Allocation of different scheduling priorities to the queues effectively sets the precedence each EVC receives at the network ingress, as it defines the priority in which the EVC data is transmitted. In this case, the scheduling is performed by WFQ with weights assigned at a ratio of 5:4 to queue 1, in effect giving precedence to EVC1 traffic at the network entrance.
27
The 1 Gbps rate of the physical network interface can easily accommodate the total bandwidth consumption of both EVCs, however, the queue management and buffering system influences traffic delay and must therefore take into account the enterprises SLA commitments.
28
By managing bandwidth consumption and transmission priorities with CoS granularity, multi-level hierarchical scheduling enables predictable, per-SLA latency and jitter performance across the network. In addition, it provides fair distribution of bandwidth among traffic classes and users over shared connections, by allocating excess bandwidth not required for critical applications to lowerpriority traffic. Implementation of such capabilities at the service hand-off point reduces the risk of congestion at the network core, further facilitating the service providers ability to meet delay and loss guarantees.
3.6
The final stage in preparing user traffic to network transmission involves packet editing and marking. This includes adding service provider VLAN tags (packet editing) according to the EVC mapping attributes listed in tables 5 and 6. In this manner, packets belonging to Flows 1, 2 and 3 are stacked with an SP-VLAN tag whose ID value is 2000, while Flows 4, 5 and 6 are added SP-VLAN tag 2001. In addition, the packets are marked with service provider priority bits in the outer SPVLAN tag to denote the priority each EVC.CoS receives while in the network.
Tip: Using P-bits for Color Marking As the packets are already marked by their level of CIR/EIR conformance (green and yellow), metering continuity in the network can be achieved by using the P-bit field to signal a packets color so that it has a greater chance of maintaining its status and priority throughout the transmission. This is especially useful in color-blind networks, as well as in 802.1Q coloraware networks with no discard eligible (yellow) marking.
Figure 15 illustrates all the various stages included in service delivery processing as described above.
29
30
4 Service Assurance
A third, crucial piece of the L2 VPN service puzzle relates to the providers ability to verify that the actual service performance and network availability experienced by the enterprise matches SLA guarantees. This is done by performing remote, end-to-end OAM tests, preferably without affecting the service and in line with actual user traffic. In addition to meeting customer service expectations and optimizing network operations, service monitoring procedures and remote loopback testing contribute to the service providers profit margins by minimizing the risk of penalties associated with SLA breaches.
4.1
Service validation and testing is required at the following points throughout the service lifecycle: At initial service turn-up: Prior to handing off the L2 VPN service to the enterprise, the service provider must perform acceptance tests to verify that the service is running smoothly according to the SLA, per the pre-defined classes of service. Testing at this point also serves for generating a baseline for performance parameters, to which future test results will be compared. Specifically, KPI (key performance indicators) metrics for end-toend throughput, packet delivery ratio, latency, and jitter are established. These results are recorded and archived for customer reporting, SLA comparison and future use as needed. It is advisable to perform burn-in stability tests over an interval of 24 hours, at minimum, to accurately establish service behavior. Ongoing monitoring: KPI measurements are also performed on an on-going basis, to monitor network health and ensure that QoS is maintained per class-of-service and in accordance with the contracted SLA. Continuous monitoring is required to detect service degradation and network congestion, prompting relevant alerts and advising when an increase in bandwidth is required. When service outages or connectivity faults are identified, Trouble Tickets are initiated and appropriate remedial actions taken. The collected data is used for billing purposes, while reports of network and service conditions are available to the enterprise periodically and on-demand. OAM tests are performed at a frequency that balances between the need to quickly detect and repair problems before they escalate, and the service providers desire to limit the toll such tests take on network and bandwidth resources.
31
Tip: OAM Bandwidth Calculation RAD has developed a modeling tool to determine the network resources required for OAM procedures. In the enterprises service topology, for example, periodic multicast OAM messages for connectivity fault management may require a bandwidth rate of 23,000 bytes per second and 34 FPS (frames per second) per demarcation device, if sent every 1,000 milliseconds. Under the same network conditions, bandwidth consumption per demarcation device climbs to over 40,000 bytes per second and 53 FPS, if the service provider increases testing frequency to 100 millisecond intervals. To receive a copy of RADs OAM Calculator, please contact us at market@rad.com.
32
On-demand monitoring and troubleshooting: When a service outage is reported, a suite of tests is performed to remotely localize the fault prior to a technician dispatch. This reduces MTTR (mean time to repair) and minimizes the effect on users, while lowering operating expenses by eliminating unnecessary (and expensive) truck rolls and ensuring that technicians are sent to the right location.
4.2
The OAM tests performed by the ETX-202A, RICi and LA-210 at the various locations conform to the relevant industry standards: IEEE 802.3-2005 (formerly 802.3ah): Ethernet Link OAM is part of the Ethernet in the First Mile, or EFM, set of standards. It relates to a single Ethernet link, typically the access connection between the customer premises and the network edge. Specific link monitoring procedures include auto-discovery, heartbeat, and fault notification messages; link statistics; MIB variable retrieval; and remote loopbacks. IEEE 802.1ag: Ethernet Service OAM, also termed Connectivity Fault Management (CFM), enables Ethernet service monitoring over any path, whether a single link or end-to-end, allowing the service provider to manage each EVC separately regardless of the underlying transport. CFM partitions a network into maintenance domains and hierarchy levels that are allocated between users, service providers and third-party operators. It assigns maintenance end points, or MEPs, to the edges of each domain and maintenance intermediate points, or MIPs, to ports within domains. This helps define the relationships between all entities from a maintenance perspective and permits each entity to monitor the layers under its responsibility to easily localize problems. Service monitoring procedures include continuity check, link trace, loopback, and alarm indication signal. As can be seen in Figure 17, the EVPL service provided to the enterprise involves a single maintenance domain with one level.
33
Figure 17: Ethernet Service OAM maintenance domain levels ITU-T Y.1731: The OAM Functions and Mechanisms for Ethernet-based Networks standard is used for Ethernet service performance monitoring, enabling the service provider to measure frame delay, delay variation and frame loss SLA parameters. It also includes fault management functionalities similar to CFMs, such as continuity check, loopbacks, and link trace. Figure 18 displays the various network sections to which different OAM procedures apply.
Figure 18: Ethernet Link, Connectivity and Service layer OAM over different network segments
34
Table 11 summarizes the different OAM tools available to the service provider for SLA verification. These are performed either directly between sites or via an external test set probe:
Function
Y.1731/IEEE 802.1ag Continuity Check (Unicast/Multicast) Y.1731/IEEE 802.1ag Loopback (MAC Ping, Unicast/Multicast) on demand Y.1731/IEEE 802.1ag Link Trace (MAC Trace-route)
Fault Isolation
Y.1731/IEEE 802.1ag Loopback (MAC Ping, Unicast/Multicast) L3 Ping and Trace-route L1 IEEE 802.3ah Loopback, MIB variable retrieval
Fault Propagation
Subscriber port shutdown ITU-T Y.1731 Alarm Indication Signal ITU-T Y.1731 Remote Defect Indication IEEE 802.3ah Dying Gasp, SNMP Trap L1 physical interface Loopback
Fault Notification
Diagnostic Loopbacks
L1 IEEE 802.3ah Loopback L2/3 in-service and out-of-service Loopback at line-rate or lower, with MAC/IP swap, per EVC/VLAN, EVC.CoS, or MAC address flows ITU-T Y.1731 Packet Loss, Packet Delay, Packet Delay Variation with statistics collection per EVC.CoS (Unicast/Multicast)
Performance Management
RFC2544 Throughput measurements L3 Performance Measurements BER Testing Table 11: OAM tools for SLA verification
Further details on selected OAM tests performed by the service provider are described in the following sections.
35
36
The receiving device LA-210 swaps the source and destination MAC addresses of incoming packets prior to looping them back, so as not to create a conflict in the switches or bridges along the path. End-to-end loopback tests can also be performed for Layer 3 services, in which case the receiving device swaps the IP addresses.
The test setup procedure includes such user-defined parameters as the MAC address of the tested device, preferred testing standard, test run-time, and other relevant metrics. If a connectivity error is detected, i.e., the LA-210 does not respond within a specified period of time, the CSR would locate the failure point by attempting to loop another device in the path, or by sending a Link Trace request for hop-by-hop path tracking, to identify non-responsive maintenance intermediate points (MIPs). The test results detail loss and error rates (BER) for the loopback frames, as well as round trip duration and delays. These metrics help determine the service performance and connection quality, as described in further detail in section 4.2.3. The Link Trace test results display the responsive nodes, enabling the CSR to map the service path, pinpoint problematic MIPs and dispatch a technician to the right location for a quick repair. Alternatively, the CSR can use the 802.1ag Loopback test to isolate faulty MIPs, by looping successive intermediary points until the fault is identified. Table 12 summarizes the various Loopback methodologies and their main capabilities.
37
Method
L1 PHY
IEEE 802.1ag
ITU-T Y.1731
InService/Outof-Service (OOS) Performed at Line Rate Performed on Actual Data Per Flow (Incl. CoS) Traverses L2 Bridged Networks Traverses L3 Routed Networks Mechanism
OOS
In-Service
In-Service
In-Service + OOS
In-Service + OOS
N/A
N/A
N/A
N/A
802.3ah LB
802.1ag LB
Y.1731 LB
Standard
38
39
management traffic. This enables both provider and customer to easily evaluate actual performance over time and compare it to SLA guarantees. The continuous monitoring of KPI for multiple MEPs and flows simultaneously allows the service provider to detect degradation in service quality and to take remedial actions to quickly restore appropriate performance levels. When the counters of any of the tested parameters rise above or drop below pre-set thresholds within the specified sampling period, the demarcation devices send SNMP traps to notify the associated management station, and update the event log for future reference.
40
Conclusion
As Ethernet technology progresses, it presents telecom providers with opportunities to tap into a rapidly growing market and improve their competitive advantage by offering business clients new customized services at a higher speed and lower cost. Key enablers for achieving these goals are carrier-grade demarcation devices equipped with service delivery and service assurance capabilities. This application guide reviews the various tools carriers and service providers can utilize to ensure business-grade performance for Ethernet services and to meet enterprise expectations for service reliability, measurable KPIs and SLA guarantees. By executing sophisticated traffic management schemes and standardized testing procedures right off the user premises, carriers can also manage their network resources smartly and lower their spending on equipment and operations.
41
www.rad.com
International Headquarters RAD Data Communications Ltd. 24 Raoul Wallenberg St. Tel Aviv 69719 Israel Tel: 972-3-6458181 Fax: 972-3-6498250 E-mail: market@rad.com www.rad.com
North America Headquarters RAD Data Communications Inc. 900 Corporate Drive Mahwah, NJ 07430 USA Tel: (201) 529-1100, Toll free: 1-800-444-7234 Fax: (201) 529-5777 E-mail: market@radusa.com www.radusa.com
The RAD name and logo are registered trademarks of RAD Data Communications Ltd. 2009 RAD Data Communications Ltd. All rights reserved. Subject to change without notice. Catalog no. 802436 Version 6/2009