You are on page 1of 26

Bell Labs

Technical
Journal
Volume 24 | July 2019

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
The network OS:
Carrier-grade
SDN control of
multi-domain,
multi-layer
networks
Marina Thottan, Catello Di Martino, Young-Jin Kim, Gary Atkinson, Nakjung Choi,
Nishok Mohanasamy, Lalita Jagadeesan, Veena Mendiratta, Jesse E. Simsarian, and
Bartek Kozicki
Nokia Bell Labs

Bell Labs Technical Journal | V o l u m e 2 4 | 2

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
Marina Thottan is the Group Leader for the Control Research Group in Nokia Bell Labs. She joined Bell Labs
Research in 1999 and has contributed to a wide variety of projects, including Online Gaming, Content Distribution,
Routing protocols, Data over Optical networks, High Speed Router Design, Network Management and Anomaly
Detection. Most recently she has been leading work on Reliable Software-Defined Networks (SDN). She has also
made significant contributions to the field of Smart Grid Communication Networks and Analytics. Her contributions
were recognized by two Bell Labs Teamwork Awards. Marina received a Ph.D. in Electrical and Computer Engineering
from Rensselaer in 2000. She has published over 60 papers in scientific journals, book chapters and conferences
and holds several patents in the areas of network management, interactive network applications, routing
algorithms, data analytics and network architectures. She is co-author of a recent book Communication Networks for Smart Grids: Making
Smart Grids Real and has also co-edited a book Algorithms for Next Generation Networks. Marina is an IEEE Fellow and a Bell Labs Fellow.

Catello Di Martino is Member of Technical Staff at Nokia Bell Labs, Murray Hill, NJ, USA. He obtained an MSc ‘06 and
PhD ‘09, both in computer and control engineering from “Federico II” University of Naples. Currently, he is working
on the creation of resilient communication platforms and networks. Across different institutions, he has contributed
to several key research projects related to resiliency in the areas of high-performance computing, cloud computing,
SDNs, air traffic control management system and sensor networks.

Young-Jin Kim was until recently a Member of Technical Staff of the Control Research Department at Nokia Bell
Labs, Murray Hill, NJ, USA. Dr. Kim received his B.S. and M.S. degree in Computer Science from Yonsei University,
Korea and a Doctorate in Computer Science from University of Southern California. Since joining Bell Labs in 2010,
he has been contributing to data center networks, IP-optical networks and device-to-device communications such
as plug-in electrical vehicles and smart grids. His work typically involves the design of software platforms, security
measures and routing algorithms on aspects of information-centrism, software-defined network paradigm and
virtualization. Prior to joining Bell Labs, he had worked as senior software engineer at the Telecommunication R&D
center of Samsung Electronics, Korea. His research has been published in IEEE/ACM conference proceedings and
journals and has been distributed as publicly available software.

Gary Atkinson is a member of technical staff in the Network Control Research Department of Nokia Bell Labs.
Since joining Bell Labs, he has worked in the areas of wireless, optical and data network design and analysis, typically
cross-disciplinary and involving mathematical modeling and algorithm development for network planning, design
and tradeoff analysis, including research into and application of algorithmic methods for challenging network
combinatorial optimization and control problems. He holds a B.S. in Math and a B.S. in Physics from the University of
Texas, an M.S. in Physics from Northeastern University and a D.Sc. in Operations Research from George Washington
University, where he produced an award-winning doctoral dissertation in applied mathematical chemistry. He has
authored over 40 conference papers and journal articles in mathematics, physics, telecommunications and power
engineering. He is an inventor with over a dozen patents or filings.

Nakjung Choi is currently a member of technical staff in Nokia Bell Labs, at Murray Hill, NJ, USA, where he has
worked for Bell Labs since April 2010. He received his B.S. (magna cum laude) and Ph.D. at the School of Computer
Science and Engineering, Seoul National University in 2002 and 2009, respectively. Also, he has received several
awards such as Best Paper Awards and Awards of Excellence (formerly Bell Labs, Alcatel-Lucent). His research
focused on Future Internet, SDN/NFV/Cloud, 4G/5G/IoT and future converged services.

Bell Labs Technical Journal | V o l u m e 2 4 | 3

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
Nishok Mohanasamy is a Research Engineer specialized in Software Systems Research in the Control Research
Department at Nokia Bell Labs, Murray Hill, NJ, USA. Nishok received his M.S degree in Computer Science from
Montclair State University, New Jersey. Currently, he is responsible for designing, developing and prototyping
new disruptive concepts, tools or systems and in implementing and demonstrating future SDN-based carrier-grade
networking platforms.

Lalita Jagadeesan is a Distinguished Member of Technical Staff in the Network Control Research Department
at Nokia Bell Labs. Her current research interests include software verification, security and programmable (SDN)
networks. Lalita has held a variety of technical and managerial positions at Bell Labs in computer science research,
software and security and is a co-inventor of a dozen patents (issued and pending). She is a senior member of the
IEEE and holds a Ph.D., S.M and S.B. in computer science, all from the Massachusetts Institute of Technology.

Veena Mendiratta is a Consulting Member of Technical Staff in the Network Control Research Department at
Nokia Bell Labs, Naperville, IL, USA. Her work has focused on the reliability and performance analysis for
telecommunications systems products, networks and services to guide system architecture solutions. Her
research interests include architecture, system and network dependability analysis, software reliability engineering,
programmable networks (SDN) resiliency and telecom data analytics. Current research is focused on network
reliability and analytics — architecting and modeling the reliability of next-generation programmable networks
and development of analytics-based anomaly detection algorithms for improving network performance and
reliability. She is a member of IEEE, INFORMS and SIAM. She holds a B.Tech in engineering from the Indian Institute
of Technology, New Delhi, India and a Ph.D. in operations research from Northwestern University, USA.

Jesse Simsarian is a Member of Technical Staff at Nokia Bell Labs, Murray Hill, NJ, USA. Since joining Bell Labs in
2000, he has been researching optical networks and switching, including software-defined transport networks, fast-
switching coherent optical receivers and scalable optical packet routers. From 1998 to 2000 he had a National
Research Council Fellowship at the National Institute of Standards and Technology in Gaithersburg, MD, USA. He
received the Masters and Ph.D. degrees in physics from SUNY Stony Brook in 1995 and 1998, respectively, and is
a Senior Member of the OSA and IEEE. He has authored or co-authored 80 publications and holds several patents.

Bartek Kozicki is a Member of Technical Staff at Nokia Bell Labs, Antwerp, Belgium. He received his MSc from the
Technical University of Lodz, Poland, in 2004 and PhD degree from Osaka University, Japan, in 2008. From May
2008 to December 2010, he worked as an associate researcher at NTT Network Innovation Laboratories. Since
2011 he has been with Bell Labs focusing on design and modeling of high-speed electrical and optical networks.
His specialty is advanced modulation formats. Current research interests include access network architecture and
software-defined networking. Dr. Kozicki has published over 50 papers in international journals and conferences.
He has filed over 20 patents in the areas of optics and telecommunications.

Bell Labs Technical Journal | V o l u m e 2 4 | 4

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
Driven by the growth of cloud and virtualization technologies,
telecom services are under pressure to become increasingly
dynamic. Cloud services can be instantiated in a matter of
minutes on computing platforms, which can rapidly schedule
and allocate virtualized physical resources. A similar flexibility
to adapt to dynamic service requests remains a challenge with
network resources. Recently, centralized controllers have been
deployed in data center networks to adapt network
connectivity to computational needs. However, due to the
complexity and variety of carrier networking technologies,
carrier networks have been slow to move towards centralized
control. More recently, carriers are evolving to address
network complexity by pushing for the disaggregation of
network elements and the introduction of equipment with
more control flexibility. Carriers require a new control
paradigm to benefit from this new flexibility, which extends far
beyond today’s OpenFlow network controllers.
The lack of a global network controller with a consistent
network view slows down the development and deployment
of the more dynamic network services required by extensive
cloud technologies. To address the scaling requirements of the
carrier environment, the envisioned network controller would
be implemented as a distributed system. In this article, we
describe a global network controller designed to function as a
network operating system with multi-domain, multi-layer
capabilities that will transform carrier-grade telecom services.
We define the requirements for a carrier-grade network
operating system (OS) and describe a prototype system
Bell Labs Technical Journal | V o l u m e 2 4 | 5

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
(NetUNIX) that can control heterogeneous networks at large
scale and ensure high service reliability. We highlight the
advantages of the NetUNIX platform in the context of other
open source platforms and illustrate the use of the NetUNIX
platform with two use cases: a metro-distributed data center
and 5G next-generation fronthaul.

Introduction interfaces [2]. The essential elements of this


All aspects of our modern lives have been transformation are:
impacted by innovations in data  A new automation-optimized network fabric,
communications. The new connected world with a rich and verifiable set of configuration
offers a myriad of interfaces such as audio, parameters that fuses data center (DC) fabric,
video, text, and virtual and augmented reality scalable routing and flexible optical
to create a rich, context-aware communication interconnection
environment. Consumers of these digital  A network OS to control network functions
communication platforms continue to demand and the network fabric, making them agile
more data, higher quality and controllable and adaptable to varying services and user
priority. The increase in digital services has demands.
largely been met by a vast data network Today’s network architectures are distant
comprised of equipment and software from from this vision. Customers requesting a
many thousands of different equipment network service must still go through a lengthy
providers, all expected to interwork at some provisioning process. The delay stems from
level of compatibility. However, future network insufficient automation of the tools needed
services will be increasingly dynamic, making for efficient and reliable configuration of
the rapidly growing combination of software network resources. Typically, the end-to-end
and hardware components difficult to control communication service is established over
and, consequently, less able to satisfy the multiple network segments managed by different
needs of the consumers on a reasonable time teams or operators through individual element
scale with reasonable costs. configuration. Tools are optimized for specific
services, which are tightly coupled to the
Today, telecom operators are experiencing an
custom-built, vendor-specific network, making
economic squeeze created by the inexhaustible
rapid, end-to-end, reliable and secure service
demand for low-cost network bandwidth, and the
creation nearly impossible and providing almost
demand for ubiquitous access to the cloud is
no support for dynamic reconfiguration.
forcing them to think beyond merely adding
capacity. Cloud services require the restructuring In response, network hardware equipment
of networks following similar principles to those in providers have proposed more programmable
the computing industry [1]. Network functions, network elements with open control interfaces.
systems and individual network elements must be One of the early efforts to simplify
disaggregated, realigned and reintegrated at network elements was the Bell Labs Soft
different levels via new software platforms and Router [3]. More recent examples include

Bell Labs Technical Journal | V o l u m e 2 4 | 6

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
OpenFlow-enabled switches, flexible optical layers and domains such as IP, MPLS, Layer 2,
transponders and cross-connects, and fully optical, and mobile and fixed access. To fully
virtualized network functions. These realize the benefits of automation, control
programmable elements are the building blocks should be technology-agnostic and encompass
of the new network fabric. However, to leverage all network elements in the end-to-end service
the flexibility of this new hardware and delivery chain. This multi-layer, multi-domain
to provide visibility of multiple network aspect of a network OS is illustrated in figure 1.
segments at scale, a new control platform
will be needed — one that ensures the same
Requirements for a multi-layer,
carrier-grade service reliability as before.
multi-domain network OS
To fulfill the vision of a multi-layer, multi-domain

Future control plane – the control platform, the functionality of the network
OS must go beyond that of an OpenFlow network
network operating system
controller [4]. OpenFlow controllers translate low-
The goal of the network operator is to have a
level packet flow decisions into flow table rules
future control plane that makes networks truly
and install them in OpenFlow-enabled switches.
programmable with rapid service provisioning that
With an OpenFlow controller, the level of
gives rise to a new network-automation paradigm.
interaction between the customer and the
The network operator would be able to run new
network is too granular and does not provide
services on the network as easily as installing a
mechanisms to ensure reliable service delivery or
new app on an iPhone, thus making it analogous to
efficient network utilization.
a computer OS that partitions resources (memory,
compute, etc.) to various applications/services. The differentiating characteristics of a network
OS compared with an OpenFlow software-
In contrast with the computing paradigm, network
defined networking (SDN) controller are
operators today provision the network manually
summarized here.
with specific scripts and processes accessed from
 Network abstraction: a multi-layer network
a command line interface and laboriously
modeling language enabling seamless
configure individual ports and network layers. The
operation across heterogeneous network
provisioning process requires hours or even days
technologies and virtual network functions
to accomplish complicated changes. To realize the
with consistent topology representation
vision of agile, seamless and efficient operations
across network layers, e.g., unified
the network needs an OS: a coordinated, scalable
representation of heterogeneous
management platform with an intuitive interface
resources in a computing OS, such as a “file”
for clients and operators that gives the perception
in UNIX.
of the network as an adaptive resource supporting
 Intent framework: a secure northbound
service automation.
interface that is bidirectional to express
This programmable network control plane would network service requests in the form of
be the equivalent of a computer OS for multi- intents and to monitor fulfillment of SLAs.
layer, multi-domain networks. Such a network It is a programmable interface for controlling
OS would allow customers to precisely express the abstracted network slices allocated to
their requested network services and empower the authorized customer, e.g., high-level,
network operators to automatically and reliably easy-to-use APIs in a computing OS such as
provision those network services. The role of “POSIX C libraries” in UNIX. The intent
the network OS is to satisfy all service level framework hides the multi-layer aspects of
agreements (SLAs) for new and already- the network to allow expression of resource
deployed network services, and ensure efficient needs that may span different network
resource allocation across multiple network layers.

Bell Labs Technical Journal | V o l u m e 2 4 | 7

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
FIGURE 1. Network fabric controlled by a multi-layer, multi-domain network OS.

 Network resource manager (NRM): a the industry-standard modeling language YANG


real-time mapper of network service requests [7] to interact in an abstract way. The ONOS core
into hardware configurations. The resource has a simple graph abstraction using Java objects
manager is a multi-layer path computation so that YANG-based data models are translated
element that can optimize goals without into Java objects outside of ONOS using the YANG
human intervention, and includes topology Management System (YMS). In addition, with
and hardware configuration compilers with regard to southbound protocols, ODL has strong
correctness verification. The resource support for legacy network devices using industry
manager also keeps track of active and standard protocols, such as NetConf/YANG and
pending network services, their TL1 [8], but ONOS supports innovative SDN
characteristics, performance and resources. It protocols such as ONF Transport API [9] and new
includes algorithms for predicting future programmable hardware forwarding elements P4
network demands, e.g., resources (CPU, [10] [11].
memory, etc.) and their scheduling; similar in
a computing OS to paging and to process From a use-case perspective, ODL has automated
scheduling in UNIX. These required network service delivery, such as transport SDN, cloud and
OS functionalities will be further described in an SDN controller for Open Network Function
what follows. Virtualization (OPNFV), and network resource
optimization, such as DC interconnect, which are
Related work major challenges for many equipment providers.
Two major SDN controller platforms attract the ONOS is more focused on the CORD (Central
most attention from academia and industry: Office Rearchitected as Datacenter) use cases
Open Daylight (ODL) [5] and Open Network with R (Residential)-CORD, E (Enterprise)-CORD,
Operating System (ONOS) [6]. Both are modular, M (Mobile)-CORD and A (Analytics)-CORD [12].
open-source SDN controller platforms to CORD use cases are more aligned with the
customize and automate SDN. At a high level, they operators’ evolution towards using central offices
share many architectural concepts, such as high as distributed data centers, which they can
availability (HA) and model-driven, cluster-based achieve by combining a data center infrastructure
intent frameworks, but differ on some design controller, such as OpenStack, with an
principles. For example, the core of the ODL orchestrator, such as XOS. In this sense, ODL is
architecture is the Model-Driven Service more equipment provider-centric while ONOS is
Abstraction Layer (MD-SAL), which enables the more operator-centric. Some research
modeling of network devices/applications using papers have investigated a performance

Bell Labs Technical Journal | V o l u m e 2 4 | 8

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
comparison between the two SDN controller The core of the architecture controls the network
architectures [13]. elements and discovers, manages and allocates
network resources (e.g., capacity, links, nodes and
From an operating system perspective, the network topology). Above the core, the system
ONOS platform aims to provide more general bus delivers data, commands and signals to other
features, such as resource management and elements of the architecture.
intent framework, and is therefore closer to
NetUNIX. But ONOS still has missing pieces
from a network OS perspective, such as multi- Network abstractions
layer, multi-domain awareness of network Network abstraction generalizes the operation of

resources represented as layered network disparate network technologies and domains by

abstractions and real-time SLA satisfaction. providing a comprehensive network view


that enables unified, end-to-end network service
Figure 2 captures the key differentiators of control and optimization. It also makes the
NetUNIX from other network OSes. In the following network OS equipment provider-agnostic. One of
sections, we present the architecture and further the key requirements of network abstractions is
details on key components of NetUNIX. to provide a consistent view of the network
resources and connectivity to the resource
Architecture manager of the OS. In the market, the de-facto
The software architecture of NetUNIX standard for network models is NETCONF/YANG
presented in figure 2 is aimed at providing an [14]. The primary focus of these models is to
extensible, distributed network OS. The key configure and monitor the network equipment.
advancements that the proposed architecture These data models lack flexibility for describing
brings to current network controllers and explicit network connectivity, layering and layer
operating systems are: encapsulation information and lack a defined
 Native support for network description mathematical structure for ensuring consistent
through a multi-layer network abstraction network representations.
layer For the multi-layer, multi-domain network OS,
 An intent-based multi-layer service we have developed a formal network modeling
description Application Programming language to describe network equipment,
Interface (API) topology, connectivity and performance that
 Dynamic, multi-layer network resource consistently represents a comprehensive
management network view.1 Our modeling language captures
 Transparent support for advanced service, a snapshot across all layers and domains in an
system and network resiliency features equipment provider-agnostic manner. It is
through novel continuous testing and smart flexible enough to generate different levels of
failover techniques abstraction while mathematically guaranteeing
 Native security. consistency of the abstracted network views.
More details on formal verification have been
The proposed architecture is based on the
provided by Fortune [15].
Long-Term Support Linux kernel (LTS) and will
run on commodity server hardware. On top of In addition, the language semantic check and
the Linux kernel there is the southbound (SB) visualization programs aid in path validation
layer that is used to communicate with the over multiple links for multi-layer networks
network devices. The SB implements an equipment
provider-agnostic interface through a specific 1. It is based on extensible markup language (XML) for
network element abstraction layer (network drivers automatic syntactic and semantic validation. Also, it can
be easily converted between JSON and XML formats by
as discussed below). using many open source tools and library, e.g., JSON.org.

Bell Labs Technical Journal | V o l u m e 2 4 | 9

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
FIGURE 2. Network OS architecture.

and automate the presentation of network over which the service is to be transported. This
resources. Simsarian et al describe an example defeats the purpose of network abstraction and
of a multi-layer IP and optical network would limit the ability to dynamically optimize
representation using NetUNIX modeling network resources. Therefore, for the multi-layer
language (NetGraph) [16]. Figure 3a shows network OS, it is necessary to establish a set
a NetGraph visualization of two carrier routers of service description primitives to reflect the
interconnected by an optical network with requirements of the service, rather than the
reconfigurable optical add-drop multiplexers underlying network elements. This should take
(ROADMs). Figure 3b shows an abstracted view of the form of a high-level intent language forming
the network with only the optical network the interface between the application and the
termination points. In addition, NetGraph can control platform.
optionally contain performance data, such as link
utilization, to enable other modules to better The intent language enables network customers
understand network states, for example, NRM for to describe the characteristics of the requested
dynamic network optimization. network service in absolute, technology-agnostic
terms. The role of interpreting those
Intent framework specifications, finding paths that satisfy
The envisioned control platform is intended for service requirements and compiling them into
heterogeneous networks, covering technologies network element configurations is delegated
from optical through IP, MPLS and L2 to wireless. to the control platform. When the requested
An aggregation of explicit packet and flow handling service is committed, it is the role of the controller
primitives specific to each of these technologies to track the performance of the service and report
would lead to a very complex service-provisioning it to the customer. To express the requirements of
interface. Moreover, it would require the network most typical services (service SLAs) we have
customer to predetermine the exact technology defined the following set of parameters:

Bell Labs Technical Journal | V o l u m e 2 4 | 1 0

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
FIGURE 3. NetGraph visualization of carrier routers interconnected with an optical network: a) L1–L7 refer to different network layers and the
triangles are adaptations between the layers; b) shows an abstracted view of the network.

 Connectivity (endpoints, point-to-point, in its use of high-level, service parameter types.


point-to-multipoint, broadcast, path We propose a more complete set of parameters
diversity, load balance) to express a broad range of network services.
 Bandwidth symmetry Ferguson et al proposed delegating service
 Explicit routes characteristic parsing to the controller, as well
 Reliability (availability, protection, latency as the scheduling of future services [18]. In
in recovering from failures) addition to the concepts described in Ferguson,
 Security attributes the multi-layer network OS platform performs
 Service scheduling and holding time the important role of multi-layer traffic
 Guaranteed bit rate placement optimization as well as quality of
 Maximum bit rate (maximum bit-rate duty service (QoS) awareness.
cycle)
For example, consider a simple intent request to
 Maximum data-plane latency
create a L3 virtual network topology among
 Fidelity (upper bound bit-error rate (BER)).
nodes A, D, and C in the IP/optical network
This intent-based approach enables network shown in figure 4 where the network
customers to define the requirements of connections have bandwidth, latency, and
interest while delegating the configuration of mutually diverse requirements. This service
network parameters entirely to the automated intent could also have a further requirement of
control platform. This approach is similar to the path diversity. The intent framework translates
concept of software-friendly networks [17] this high-level intent specification into three

Bell Labs Technical Journal | V o l u m e 2 4 | 1 1

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
FIGURE 4. Mapping an intent request for a virtual subnetwork among the nodes A, D, and C into mutually disjoint connections between those
nodes that are then configured onto the network.

point-to-point intents that satisfy the service Realizing the promise of programmable networks,
criteria. The service criteria are each captured however, requires the network resources
in a specific grammar as shown in figure 5. The themselves to be managed much more
intent framework compiles the set of grammars dynamically and adaptively at all layers on a time
to define the specific resource requests to the scale much shorter than for offline planning.
network OS. Algorithms and associated decision logic is needed
to enable efficient management. And control of
Network resource manager (NRM) programmable network resources and services
The managed network is composed of standalone takes a more central role than in traditional
networking communications equipment computer OSes.
distributed over a (potentially large) geographic
Essential for network resource management are
area. Efficient use of such resources has
algorithms that produce and manage a mapping
traditionally required careful pre-deployment
of services to network resources. Generally, a
planning, first to understand the potential traffic
user’s service request has components that
and services, and then to design network
require fulfillment by network and non-network
interconnections and capacities, including
resources such as application servers, web
protection and restoration capacity, using
servers, data centers or computation farms.
sophisticated combinatorial algorithms to find
The focus here is only on the network
reasonable, cost-efficient designs. Once
component of a service request (see figure 6).
deployed, the resulting network then operates
Thus, a network service (hereafter referred to
with real-time traffic having a combination of
as a service) is a delivery of digital content or
distributed protocols in the higher layers (e.g.,
bandwidth by a service provider network to
MPLS and IP) for adapting to dynamic flows that
and/or from user-designated interface locations
ultimately are transported over a quasi-static and/
according to user-required performance
or pre-planned OTN and optical infrastructure.
characteristics and attributes. Services are

Bell Labs Technical Journal | V o l u m e 2 4 | 1 2

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
FIGURE 5. Intent grammar processed by the Intent Framework for the example in figure 4.

assumed to have associated SLAs specifying or whether they remain fixed throughout the
their performance requirements. service fulfillment.

The foregoing definition allows for a broad range All requests for services by network users or
of services, from high-level content or cloud third parties are assumed to enter the network
interconnection to lower-level intersystem through a logically centralized point via the
federation or point-to-point transport wavelength northbound interface (NBI) or the Intent API
services. Resources are installed network (see figure 4). A request entering the NBI will
capacities that can be assigned tasks confusing to have already been separated out by a higher
support fulfillment of service requests. Resources layer application interface into the network-only
can be: intent, which will be converted into (if not
 Elemental, such as wavelengths on specified already so expressed in) a standardized
point-to-point optical links service description or intent language. The
 IP or optical ports on specified devices at a requests are then mapped to the network in three
node stages.
 A bandwidth fraction of an IP port or In the first stage, as shown in figures 4 and 5, the
composite, such as a light-path, depending Intent Framework translates the network service
on which objects an algorithm is designed to intent requests into a set of valid resource-
operate consuming demands. Demands associated with a
 Radio channels or codes service are a set of network interface points that
 Timeslots in fixed access systems. must be interconnected in accordance with the
Importantly, resources of like kind are fungible service’s SLA to provide for its fulfillment. They are
as far as the services are concerned. As long as a data objects that comprise inputs suitable for
service’s SLA is being met, it does not matter algorithms. Demands are expressed in connection
exactly which network resources are being used units consistent with the network’s transport

Bell Labs Technical Journal | V o l u m e 2 4 | 1 3

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
FIGURE 6. Overall service request is decomposed into network and non-network components. In stage 1, the network component is pre-
sented to the NetOS, where the Intent Framework (IF) translates the network intents into valid network resource-consuming requests, as
shown in figures 4 and 5.

resources and capabilities, such as units of problem type. To exploit such potential efficiency
bandwidth for Label-Switched Paths (LSP) or units gains, demands present in a provisioning cycle
of wavelengths (at maximum line rate), and they would be classified according to pre-defined
can include other attributes and parameters provisioning problem types for which specialized
needed for algorithm selection and constraint algorithms are available in NetUNIX. The order of
instantiation. The intent framework, with inputs execution of each sub-problem would be based on
from the policy engine, performs the conversion of logic that accounts for considerations such as
service requests from the standardized service demand type prevalence, priorities and sub-rate
description language form into algorithm- containment. Demands with additional
understandable demand data objects. The SLA performance characteristics (such as delay
characteristics and operator policies are also tolerance) could request end-to-end connections
compiled into the demands as attributes, such as over full wavelength or sub-wavelength LSPs. Based
performance and constraint parameters. The result on traffic classification by the policy engine, the
of the first stage is a validated resource allocation demands are mapped to corresponding network
problem that needs to be solved for the service resources using special-purpose algorithms for
requests to be realized. the traffic type.

The second stage of the mapping, pictured in figure The third stage of the process first verifies that the
7, consists of assigning demands to network abstracted allocation plan of the intents can be
resources consistent with the demands’ attributes. realized on the network. It then translates the
Using the demands as inputs, this stage is abstract plan into a realizable resource allocation.
performed by a family of algorithms available in This latter step is addressed by the southbound
NetUNIX. They operate on abstractions of actual interface (SBI) described below.
network resources and their associated capabilities
Data analytics in all its forms, both classical
to produce, as output, assignments
methods as well as cognitive-based methods, are
of demands to suitable network resource
a key part of the functioning of the system and will
abstractions. The resource allocation problems
serve to provide key decision support throughout
to be solved are typically challenging network
the NetOS. For example, to reduce the impact of
optimization problems of varying types.
some of the more time-consuming control and
Some of these may be most efficiently solved by decision algorithms, data analytics will allow for
leveraging new or existing specialized or the anticipation of service requests or network
purpose-built algorithms best suited to address the issues. Although analytics can be used to trigger

Bell Labs Technical Journal | V o l u m e 2 4 | 1 4

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
FIGURE 7. In Stage 2, the resource allocation problem defined by the Intent Framework (IF) is solved by the Network Resource Manager (NRM).
The result is a mapping of the demands required for the network services to network resources.

service reconfiguration, the changes are still layers or between IP layer flows and the TDM of
provisioned as “demands”. Analytics may also be a fixed access domain, to ensure optimum
used to support demand classification by the resource utilization. This may entail grooming
policy engine, for instance, the use of historical lower rate services in and out of higher speed
traffic monitoring data to help classify the most facilities multiple times in transit, confusing. It is
appropriate algorithm family to use for realizing a well known that solving this kind of problem,
service request. Cognitive-based methods such as even offline, for multiple demands in a
various forms of machine learning will be exploited provider-scale network is computationally
for their pattern recognition abilities, such as to complex. However, to be successful for network
identify recurrent traffic request patterns or to control, the targeted algorithms need to be
recognize emerging resource usage patterns that sufficiently responsive and agile to maintain
should be avoided. To obtain the data to employ synchronization with the network state dynamics
such analytics capabilities, it is assumed that there while providing a net benefit to network
will be a scalable performance and traffic- resource and service management.
monitoring infrastructure architected and
designed. It will harvest, filter and feed essential Thus, what will be needed are fast, scalable,
IP, optical and service performance and traffic possibly cross-layer-aware combinatorial
information into the data analytics system. Then algorithms and processes that provide
fast, scalable analytics and machine-learning sufficiently high-quality solutions to NP-hard
algorithms will be developed to mine the data (non-deterministic polynomial-time hard)
for actionable decision inputs. problems that are time bound. Further, the overall
algorithmic approach needs to be suitable for an
There are many challenges to creating online, on-demand network-programming
algorithms for programmable networks. The paradigm and maximize network value to its
algorithms include the coordination of service operator. In other words, fast algorithms need to
provisioning across network layers, for instance be able to generate good solutions whose quality
between the IP/MPLS layers and the OTN/optical depends on the amount of time available to

Bell Labs Technical Journal | V o l u m e 2 4 | 1 5

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
compute them. One way to realize this is to have on-demand programmable networking paradigm.
algorithm design and development guided by the Figures 8 illustrates an integrated view of the
following principles. resource management system interworking with
the intent framework, the policy engine, the SBI,
Maximize utilization of installed network as well as data analytics both external and internal
capacity to the NRM component.
The network should run as “hot” as the network
control will allow. Installed network capacities are Network drivers (SBI)
fungible resources. They should be adapted as The computational output of NetUNIX consists
needed to accommodate present and of data objects that are abstractions of network
anticipated network services and events, while resources. For connectivity service requests from
allowing sufficient headroom for the unexpected. NetUNIX users or apps, the stage of mapping
Expected services can be designed and services to the network is achieved by
provisioned under a given failure scenario, and converting those data object outputs into
installed protection capacity can be used for actionable configuration directives for the
lower priority, pre-emptible or reconfigurable network elements and then executing them, as
traffic. This requires development of agile shown in figure 9.
algorithms for near-real-time resource-usage The abstractions must first be mapped to actual
management and frequently retuning service network resources, managed by the resource
provisioning to maintain service reliability and manager and then a schedule of (re)configuration
efficient network utilization. tasks, which are managed by the intent manager,
must be generated to efficiently sequence the
Analytics-driven capacity utilization via service
provisioning operations and to avoid conflicts
anticipation and fault prediction
or wasteful configuration changes. Once all
Data analytics are used to buy time for
configuration operations are scheduled, they
optimization and possibly other complex
must be converted into network-element-specific
NetUNIX processes by anticipating event
instructions understandable by the affected
occurrences and initiating or preparing for such
elements, which are managed by the flow
processes as early as possible. This would entail
manager and device drivers and then executed via
leveraging near-real-time streaming analytics of
southbound protocols such as OpenFlow,
traffic/usage data or performance data to
Representational State Transfer (REST), or open
inform event prediction. To extract the
source Remote Procedure Call (gRPC). In
maximum information from the available data,
addition to installing actionable
analytics should leverage all methodologies such
configuration directives into network elements,
as cognitive-based techniques including, but not
network drivers are responsible for automatically
limited to, machine learning.
populating network topology information (e.g.,
Speed versus quality optimization solution device, port, link and host) into NetUNIX and
tradeoff continuously collecting flow statistics and port
Network optimization algorithms should be statistics through communication with network
designed to run under time constraints, trading elements, including OpenFlow devices and non-
off optimization solution quality for speed. OpenFlow devices.
To offset the loss of design-quality optimizations,
frequent retuning of service provisioning is Resilience of the network OS
performed on the network. In carrier-grade legacy systems, such as telephone
switches, the switch hardware is more reliable than
Creating algorithmic solutions according to these the control software because the hardware
principles is a key ingredient to enabling an online, includes self-checking circuitry and hardware-level

Bell Labs Technical Journal | V o l u m e 2 4 | 1 6

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
FIGURE 8. In Stage 3, the resource allocation plan is verified as feasible and then handed over to the Southbound Interface (SBI) where it is
converted into device-understandable instructions and then provisioned into the network. Also shown is a more detailed unified view of the
interworking of the NBI, IF, NRM, SBI as well as the Policy Engine and various analytics supporting overall NetOS operation.

FIGURE 9. A prototypical southbound interface (SBI) converting data object outputs into actionable configuration directives for the network
elements.

Bell Labs Technical Journal | V o l u m e 2 4 | 1 7

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
fault masking and recovery capabilities. Only a few losing state information. It ensures resiliency
errors are handled by the software stack, which in against failure of control functionality with respect
turn is typically equipped with simple recovery to dependability metrics such as availability,
mechanisms that include test-retry techniques or reliability and resilience.
warm/cold replica activation.  The availability metric is the probability that a
system is operating as expected at any given
As in legacy systems, the dependability of the
time. This metric represents the percentage
network OS is also determined by the
of time (e.g., 99.99 percent or 4 “nines”) in
effectiveness of its resiliency mechanisms.
which services are provisioned, as specified in
However, in contrast to legacy systems, the
the SLA with the customer, for example with
distributed nature of the network OS core
specified latency and capacity.
requires more advanced resiliency mechanisms
 The reliability metric is the probability of
that are able to guarantee consistency across
continuous successful service delivery for a
replicas while keeping low failure detection and
given time t. This metric represents the
recovery times. In addition, due to the different
probability that a service can be provisioned
configurations (e.g., multi-vendor) and different
without interruption for t. This metric
requirements of delivered services, resiliency
captures the probability that a service
mechanisms have to be flexible and
completes successfully with required
continuously adapt to different working
termination, including release of used
conditions.
resources. This is related to the inverse metric
As described in the architecture section, the of the dropped-call rate per million (DCR) in
resiliency of the network OS is achieved by means legacy systems, i.e., the fraction of telephone
of the Resiliency Framework that includes the calls that, due to technical reasons, were
following components: error detection manager, terminated before the service ended.
high availability manager, failover manager and  The resilience metric is the probability of
continuous testing framework, as shown in continuous successful service delivery in spite
figure 10. These components work in a pipeline as of failures that occurred in the system. This
follows: the error detection manager deploys metric depends on the specific delivered
service-level and system-level error detection service and captures the capability of the
on-demand, e.g., whenever a new service is system to recover from a given number of
provisioned, customized to the specific SLA; the consecutive failures without impact on the
high-availability manager computes specific KPIs delivered service.
to detect system-level dependability threats and
trigger recovery actions; the failover manager, The resiliency framework computes the above
triggered by the high-availability manager, selects metrics with data provided by the data-driven
the best system- or service-level recovery action architecture of the network OS, which logs events
to perform recovery without impacting the at fine granularity and performs analytics to

provisioned services; and the continuous testing detect anomalous, service-impacting conditions.

manager, continuously injects defects (e.g., The level of delivered service can be monitored

failures) at the system and service level to identify and compared against SLAs to track compliance

potential resiliency bottlenecks and solutions in a through the data provided by the data-driven

controlled fashion. architecture and the metrics computed by the


resiliency framework. SLAs and service
The overall goal of the Resiliency Framework is to requirements are specified with intents that are
detect controller errors with low latency and to rich in reliability and availability attributes, which
failover to an active standby component without allow for translating SLAs to a set of
impact on the provisioned services and without corresponding low-level rules that are enforceable

Bell Labs Technical Journal | V o l u m e 2 4 | 1 8

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
FIGURE 10. Continuous testing framework consisting of an error detection manager, high availability manager and failover manager.

in various elements of the OS and the network. The fault-injection compiler selects failures according
Modeling enables service quality and performance to the operational context and history. It then
under failure conditions to be evaluated and customizes the failure injection and activation
dynamically validated by a continuous testing policies. The system watcher component performs
framework that leverages context-aware fault continuous monitoring of the failure, and evaluates
injections. the impact of the injected failure with respect to
detection, recovery and downtime (see figure 11).
The resiliency of the system is continuously
measured by analyzing the response of the Data from different network OS components are
system to unpredicted events. The continuous collected after failure injection and are delivered
testing framework simplifies creation of failure to the data-driven architecture that stores them
(failure injection) within the network OS ecosystem in the form of failure-fingerprints, i.e., a set of
and provides a greater degree of precision to features extracted by the data associated with
identify resiliency bottlenecks and proper recovery the injected failure. The resiliency framework then
actions for the created failure. The main objective uses the failure fingerprints to continuously train
of the continuous testing framework is to reduce error-detectors specific to the injected failure
the offline failure testing time of the network OS in order to increase the failure coverage of the
(services and applications) by doing it online, i.e., resiliency framework.
during production hours. This is done by injecting
realistic failure conditions in a controlled fashion Based on the assessment of the controlled failure
to establish confidence that the system degrades injection, the appropriate failure recovery
gracefully with tolerable (i.e., within the SLA procedures are implemented to address any
parameters) or no impact on the provisioned impending failure scenarios. The types of failure
services. The workflow of the continuous testing injections supported on the continuous testing
framework is shown in figure 10. framework are shown in figure 11.

Bell Labs Technical Journal | V o l u m e 2 4 | 1 9

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
FIGURE 11. Types of failure injections supported by the continuous testing framework.

Security of the network OS of confidence that network OS controller software


Security of the network OS is essential as and applications do not contain subtle software
programmable networks can be dynamically faults. At the same time, it is not possible to
reconfigured through software. Security issues exhaustively analyze network OS software. Given the
can arise in the control, application and data high degree of multi-threading and distribution, the
planes and in the interactions between these number of potential states and execution paths is
planes. The network OS should address astronomically high. For the same reasons, subtle
complementary aspects such as: software bugs are also unlikely to be discovered
 Access control through rules governing the during system testing. We have created an approach
degree of dynamic network programmability and framework for combining automated code
given to network applications verification with machine-learning-based analytics
 Identification of security vulnerabilities in all to detect and identify SDN software faults that can
the planes through analytics compromise the network. Our approach is based on
 Infrastructure that enables real-time probabilistic sampling of timed execution of SDN
mitigation of these identified vulnerabilities software combined with anomaly detection.
through dynamic network reconfiguration Identified anomalous execution paths can then
 Analytics-enhanced, automated verification be replayed to detect potential software faults.
to identify security issues early in the Our high-level approach is depicted in figure 12.
software lifecycle.
Multi-domain federation
As network OS software enables real-time As described in the introduction to the network
network programmability, the power of this OS, we envision end-to-end (E2E) communication
programmability also intensifies the potential service delivery. Typically, the E2E service is
impacts that software faults can have on deployed across multiple domains including DC,
the security, reliability and performance of access and metro/core networks.
programmable networks. We have demonstrated A comprehensive and abstracted view of
that subtle faults in network applications can have reachability, network topology, security policy and
disastrous impacts on programmable networks computing resources must be provided for the
and that faulty behaviors can be detected service deployment across multiple domains.
and identified during testing and deployment
There is, to the best of our knowledge, no study on
through machine learning [19].
the multi-domain aspect of the network OS. ONOS,
Given the significant impact on programmable as a network OS, provides a clustering of multiple
network behavior, it is imperative to have a high level controller instances in a single domain with their own

Bell Labs Technical Journal | V o l u m e 2 4 | 2 0

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
FIGURE 12. Detecting reliability, performance and security issues by learning on timed-execution paths.

distributed state-sharing tools. The CORD model can core network controllers, if necessary, thus
provide service chaining in a single data-center [20], exposing a comprehensive network view to
together with ONOS. However, these approaches do other applications such as a service-chaining
not address the E2E multi-domain requirement orchestrator.
described above. Rather, they focus on independent
scenarios, such as CORD for residential, enterprise Use cases
and mobile. The following section illustrates the use of the
multi-layer, multi-domain network OS for
Motivated by the need for multi-domain control, potential carrier application scenarios. We
we developed a federation framework that can explain how the described concepts are
provide E2E service such as service chaining, load applicable to metro-data center interconnects
balancing or replication across multiple domains. and to the convergence of wireless and wireline
As shown in figure 13, our framework is access networks.
implemented as an E2E network hypervisor
over the top of multiple network operating Metro-distributed data center
systems and is a key application of the network The expected move towards edge-cloud
OS. It enables interaction between multiple services will result in increased metro-network
domain controllers, including multiple DC traffic and a need for dynamic multi-layer
controllers, metro network controllers and control. This is reflected in a study that

Bell Labs Technical Journal | V o l u m e 2 4 | 2 1

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
FIGURE 13. Multi-domain federation enabled by Network OS.

has predicted 560 percent growth in Until now, the flexible capabilities of physical-layer
metro traffic, which includes forecasted hardware, such as adjustable transponder bit-rate
data center (user-to-DC and ROADM wavelength routing and transponder
DC-interconnect) traffic increase of more wavelength tuning, have been largely under-utilized.
than 440 percent [21]. Optical networks are typically operated in a “set
and forget” manner where the flexibility is useful
for network planning and deployment,
Flexible and reconfigurable DC optical
but the connections remain static thereafter.
interconnection
The network OS has the potential to improve
Flexible physical-layer optical transport
network operation efficiency by allowing dynamic
technologies have emerged to keep pace with the
reconfiguration of the IP/optical network to adapt
growing metro-network bandwidth needs.
to bandwidth demands. Furthermore, network OS
Flexible-rate transponders have been introduced
applications can optimize the transponder
that support 400G using 64-quadrature
operating parameters such as the bit rate,
amplitude modulation (64-QAM) as well as 100G
modulation format, or even the policy parameters
with quadrature phase-shift keying (QPSK).
for probabilistic constellation shaping.
The 64-QAM modulation has shorter reach
than the 100G QPSK format, which is a natural Intelligent dynamic traffic engineering
consequence of optical transmission approaching Our network OS also enables new, low-cost
the Shannon limit on single-mode fiber. However, approaches to achieve high network resiliency using
the shorter reach and high spectral efficiency of multi-path transport [22] [23]. It extends traffic
the higher-order modulation formats are well engineering and routing to accommodate inter-DC
suited to the shorter metropolitan network traffic as well as intra-DC traffic. Key components
reaches — creating a further incentive towards are dynamic and have elastic wavelength
edge-cloud DCs. In addition, flexible and management and federation among multiple-
transparent optical networking equipment, such as domains. Presently, the utilization of inter-DC links
reconfigurable optical add-drop multiplexers is known to be on average less than 20 percent [24].
(ROADM), have been deployed in metro networks Here the link utilization is measured from the DC
to allow for dynamic wavelength routing between operator’s perspective, which does not account for
DC sites. possible additional over-provisioning such as 1þ1

Bell Labs Technical Journal | V o l u m e 2 4 | 2 2

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
FIGURE 14. Next-generation fronthaul traffic management with flow-delay prioritization.

wavelength protection by the network operator. 5G next-generation fronthaul


With traffic engineering and routing powered by a The evolution of access networks points to a
network OS, link utilization can be significantly converged transport infrastructure with the fiber
improved compared to existing DC operator access network as the cornerstone. Operators are
solutions, such as B4 [25] and SWAN [26], which do seeking to simplify the networks, for instance by
not interact with the wide-area network. reducing the overhead associated with operating
multiple access networks for different services
Using SDN-enabled multi-path transport, data (residential, enterprise or mobile transport) and in
between the DC locations can be evenly terms of connectivity (e.g., improving utilization of
distributed across the paths using the topology networks that have been over-provisioned for
information supplied by the network OS peak capacity) but still some flexibility to satisfy
combined with traffic engineering at the edge latency requirements from different architectural
switches. The method gives high utilization of choices.
available network resources with built-in
resiliency for low-cost DC interconnection. In line with these ambitions, the industry is
Competing methods for distributing flows across working on several network architectures in the
paths, such as equal-cost multipath (ECMP) medium term, such as consolidating fixed access
routing, require a statistically large number of end offices to reduce real estate and
flows to achieve equal traffic distribution, which operational costs, and distributing the
is not always the case for DC interconnection. architecture to minimize the access-specific
Most alternatives, including ECMP, operate portion of the network. The key element driving
statically and do not adjust to changes in the this evolution is the need to support new types
network, such as flow arrival variability, outages of services derived from the advent of 5G. It is
or impairments, with resulting inefficiencies (e.g., commonly understood that to implement 5G
bandwidth over-provisioning or protection features, such as with an extended set of
wavelengths). By contrast, this dynamic traffic collaborative multi-point (COMP) schemes,
distribution is based on real-time monitoring of architectural changes will be introduced in the
traffic flow statistics and network status allowing form of RAN centralization. This will not only
it to adjust to the network state and preserve enable maximizing spectral efficiency (capacity)
high-availability without the need for backup and network resource utilization, but will also
wavelengths. enable operators to profit from the advantages

Bell Labs Technical Journal | V o l u m e 2 4 | 2 3

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
brought by the cloud paradigm such as Whereas the current approach to QoS is to
processing pooling gains and cost savings. But provision each fixed access network element for
this savings will come at a cost, massively extended periods, a dynamic and automatic
increasing demand for transport capacity traffic manager will (semi-) persistently configure
accompanied by an equally challenging decrease the QoS function following traffic variations that
in required latencies. occur in short time scales. Such functionality can
be enabled by a control entity, which has visibility
Delay-aware, next-generation fronthaul on the E2E service performance and allocation of
traffic management timing/bandwidth constraints on the
A centralized RAN configuration splits the intermediate heterogeneous network domains,
cellular base-station functionality, placing each as illustrated on the left-hand side of figure 14.
portion in different locations (cell site, central Besides maximizing network utilization, this
office). The distribution of functions can take operational mode permits prioritization of MH
several flavors as introduced in [27] [28]. Yaun packets with little time left, given the stringent
et al argue that segmenting the signal delay budget (on the order of hundreds of
processing within the physical layer delivers all microseconds [32]), as illustrated on the
the benefits of key 5G features, while right-hand side of figure 14.
considerably relaxing the bandwidth
requirements (compared with existing fronthaul
(FH) approaches such as CPRI) to the levels of Summary
commercial ethernet links [29]. The so-called A network OS with multi-domain, multi-layer
next-generation FH interface, otherwise known capabilities is an essential part of the future network
as mid-haul (MH), also has the characteristic fabric. It will simplify and optimize network
of varying with traffic load and, thus, performance while managing faults, dynamic
allows exploiting multiplexing gains at demand and ongoing system evolution. We have
the transport level. discussed the requirements and described the
implementation of NetUNIX, an example of a
To this end, multi-wavelength TDM-PON (e.g., network OS that can control heterogeneous
NG-PON2) is well suited for aggregating traffic networks at large scale and with high reliability. We
of different services and to building converged demonstrated how NetUNIX can be applied in just
access within a converged infrastructure. The two of several use cases — data center
advantages of leveraging a mature technology, interconnection and 5G fronthaul — to span
such as PON, for MH transport are manifold: different network technology domains, such as one
inbuilt key functions, such as QoS, might find in a typical telecom carrier network.
synchronization and security; and alleviation of NetUNIX can provide true dynamic network
operations and management challenges such as reconfiguration and enhanced service agility, both
scalability, upgrades and utilization. urgent requirements being placed on operators by
the ever-growing embrace of cloud-based services,
Injecting a new service will, however, call
which are prerequisite for coming 5G services, e.g.,
for improvements in the traffic-handling
network slicing.2
capabilities. A low-latency protocol operating
over the physical layer must be ensured, as
demonstrated by Anthapadmanabhan et al
2. Network slicing stands for a concept in which the
[30]. In addition, the traffic manager, which physical network infrastructure is logically partitioned to
occupies a central role in providing QoS at serve multiple, possibly disparate, sets of business
requirements so service-centric definition is an end-to-
the aggregation level, becomes one of the end virtual private service as a fundamental enabler for
elements where SDN can be applied in the new value generation, offering a platform for agile net-
work service creation in response to specific require-
access domain [31]. ments of vertical segments, e.g., enterprise, industry.

Bell Labs Technical Journal | V o l u m e 2 4 | 2 4

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
References
[1] M. Weldon (ed.), The Future X Network: A Bell Sigcomm Computer Communications Review
Labs Perspective, Boca Raton, FL, USA: CRC (CCR), vol. 44, no. 3, Jul. 2014.
Press, 2015. [12] CORD: Reinventing Central Offices for
[2] AT&T, “Domain 2.0 Vision White Paper,” AT&T, Efficiency & Agility. Available: http://
13 November 2013. Available: https://www. opencord.org/
att.com/Common/about_us/pdf/AT&T [13] D. Suh, S. Jang, S. Han, S. Pack, M.-S. Kim,
Domain 2.0 Vision White Paper.pdf T. Kim and C.-G. Lim, “Toward Highly Available
[3] R. Ramjee, F. Ansari, M. Havemann, and Scalable Software-Defined Networks
T.V. Lakshman, T. Nandagopal, K. Sabnani and for Service Providers,” IEEE Communications
T. Woo, “Separating Control Software from Magazine, vol. 55, no. 4, pp. 100–107,
Routers,” Int. Conf. on Communication Apr. 2017.
System Software and Middleware, Delhi, € nwa
[14] J. Scho €lder, “Network Configuration
2006. Management with NETCONF and YANG”
[4] D. Kreutz, F. M. V. Ramos, P. Esteves Verissimo, presented at 84th IETF Meeting, Vancouver,
C. Esteve Rothenberg, S. Azodolmolky and 2012.
S. Uhlig, “Software-Defined Networking: A [15] S. Fortune, “Equivalence and generalization
Comprehensive Survey,” Proc. of the IEEE, vol. in a layered network model,” Journal of
103, no. 1, pp. 14–76, 2015. Computer and System Sciences, vol. 81,
[5] OpenDaylight. Available: https://www. no. 8, pp. 1698–1714, 2015.
opendaylight.org/ [16] J. E. Simsarian, N. Choi, Y.-J. Kim,
[6] Open Network Operating System. Available: S. Fortune and M. Thottan, “NetGraph Data
http://onosproject.org/ Model Applied to Multilayer Carrier

[7] M. Bjorklund (ed.), YANG - A Data Modeling Networks,” OFC 2016, Anaheim, CA, USA,

Language for the Network Configuration 20–24 Mar. 2016.

Protocol (NETCONF), IETF Proposed Standard, [17] K.-K. Yap, T.-Y. Huang, B. Dodson, M. S. Lam
RFC 6020, October, 2010. and N. McKeown, “Towards software-friendly
[8] Transaction Language 1. Available: https://en. networks,” Proceedings of the First ACM
wikipedia.org/wiki/Transaction_ Language_1 Asia-Pacific Workshop on Systems, ser.
APSys ’10, New York, NY, USA,
[9] C. Qiaogang, E. Segev, E. Varma, G. Zhang,
pp. 49–54, 2010.
H. Ding, I. Busi, J. He, K. Sethuraman, L. Ong,
 pez,
N. Davis, R. Vilalta, S. Bellotti and V. Lo [18] A. D. Ferguson, A. Guha, C. Liang, R. Fonseca

Functional Requirements for Transport API, and S. Krishnamurthi, “Participatory

Open Networking Foundation Standard, networking: an API for application control

TR-527, Jun. 2016. of SDNs,” Proceedings of the ACM SIGCOMM


2013 Conference on SIGCOMM, ser.
[10] The P4 Language Consortium, Available:
SIGCOMM ’13, New York, NY, USA, pp. 327–
http://p4.org/
338, 2013.
[11] P. Bosshart, D. Daly, G. Gibb, M. Izzard,
[19] L. Jagadeesan and V. Mendiratta,
N. McKeown, J. Rexford, C. Schlesinger,
“Programming the Network: Application
D. Talayco, A. Vahdat, G. Varghese and
Software Faults in Software-Defined
D. Walker, “P4: Programming Protocol-
Networks.” Proceedings of the IEEE
Independent Packet Processors,” ACM
International Symposium on Software

Bell Labs Technical Journal | V o l u m e 2 4 | 2 5

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.
Reliability Engineering Workshops (ISSREW), € lzle, S. Stuart and A. Vahdat, “B4:
U. Ho
pp. 125–131, 2016. Experience with a globally-deployed

[20] “Central Office Re-architected as a software defined WAN,” Proc. ACM

Datacenter (CORD),” AT&T and SIGCOMM 2013 Conference on SIGCOMM,


Open Networking Lab White Paper, Aug. 2013.

Jun. 2015. Available: https://wiki. [26] C.-Y. Hong, S. Kandula, R. Mahajan, M. Zhang,
onosproject.org/download/attachments/ V. Gill, M. Nanduri and R. Wattenhofer,
3444016/Technical Whitepaper-CORD.pdf? “Achieving high utilization with software-
version=1&modificationDate= driven WAN,” Proc. ACM SIGCOMM 2013
1434143584069&api=v2 Conference on SIGCOMM, Aug. 2013.
[21] “Metro Network Traffic Growth: An [27] Next Generation RAN Architecture. Available:
Architecture Impact Study,” Bell Labs http://www.xran.org/
Strategic White Paper, Murray Hill, [28] R. Knopp, N. Nikaein, C. Bonnet,
NJ, USA, Dec. 2013. Available: http://www. F. Kaltenberger, A. Ksentini and R. Gupta,
tmcnet.com/tmc/whitepapers/documents/ “Prototyping of Next-Generation Fronthaul
whitepapers/2013/9378-bell-labs- Interfaces (NGFI) Using Openairinterface,”
metro-network-traffic-growth-an- OpenAireInterface, 2018. Available: http://
architecture.pdf www.openairinterface.org/?page_id = 1695
[22] Y.-J. Kim, J. Simsarian and M. Thottan, [29] Y. Yuan, I. Chih-lin, J. Huang, S. Ma, C. Cui
“Cross-Layer Orchestration for Elastic and and R. Duan, “Rethink fronthaul for soft RAN,”
Resilient Packet Service in a Reconfigurable IEEE Communications Magazine, vol. 53,
Optical Transport Network,” Proc. OFC 2015, no. 9, pp. 82–88, Sep. 2015.
paper Tu2B.2, 2015.
[30] N. Prasanth Anthapadmanabhan, A. Walid and
[23] Y.-J. Kim, J. Simsarian and M. Thottan, T. Pfeiffer, “Mobile fronthaul over
“Software-Defined Traffic Load Balancing for latency-optimized time division
Cost-Effective Data Center Interconnection multiplexed passive optical networks,”
Service,” Proc. IEEE Inter. Sym. on Integrated IEEE Inter. Conf. on Communication
Network Management, May 2017. Workshop (ICCW), 2015.
[24] S. Jain, A. Kumar, S. Mandal, J. Ong, [31] B. Kozicki, N. Olaziregi, K. Oberle, R. B. Sharpe
L. Poutievski, A. Singh, S. Venkata, and M. Clougherty, “Software-defined
J. Wanderer, J. Zhou, M. Zhu and networks and network functions virtualization
J. Zolla, “Experience with a globally-deployed in wireline access networks,” GLOBECOM
software-defined WAN,” ACM SIGCOMM Workshops (GC Wkshps), vol. 2014,
Computer Communication Review, vol. 43, pp. 595–600, 8–12 Dec. 2014.
no. 4, pp. 3–14, Aug. 2013.
[32] ITU-T Recommendation G.989.1 “40-Gigabit-
[25] S. Jain, A. Kumar, Su. Mandal, J. Ong, capable passive optical networks (NG-PON2):
L. Poutievski, A. Singh, S. Venkata, General requirements, Amendment 1”,
J. Wanderer, J. Zhou, M. Zhu, J. Zolla, Aug. 2015.

Bell Labs Technical Journal | V o l u m e 2 4 | 2 6

Authorized licensed use limited to: UNIVERSIDAD CARLOS III MADRID. Downloaded on February 20,2020 at 07:32:52 UTC from IEEE Xplore. Restrictions apply.

You might also like