Professional Documents
Culture Documents
Abstract—Accurate real-time availability of transmission pa- In both core and metro EONs, accurate and real-time knowl-
rameters at the network controller has the potential to significantly edge of physical layer parameters and transmission impair-
improve the efficiency of control and management operations, par- ments has the potential to significantly improve the efficiency
ticularly in the case of soft failures where detection and localization
procedures are typically affected by the long monitoring time in- of control and management operations [3]. Relevant benefits are
tervals usually adopted in current elastic optical networks (EONs). expected in failure detection, localization, and recovery mech-
In this paper, we report on the design, implementation, and exper- anisms, particularly in the case of soft failures where just mi-
imental demonstration of a telemetry service exploiting the gRPC nor signal degradations are experienced (due to ageing, poor
protocol to enable on-demand streaming of real-time monitoring power supply quality, weather events, bad connectoring), with-
parameters, dynamically retrieved from a configurable set of net-
work devices. The telemetry service is efficiently introduced for out causing relevant disruptions [4]. Detecting and localizing
software-defined networking in EONs, also accounting for disag- soft failures is extremely important since they may reduce sys-
gregated network elements through standard YANG-defined net- tem margin, introduce sporadic post-FEC errors or may reveal
work models. The implemented telemetry service is experimen- in advance possible major malfunctioning of network devices.
tally validated for partial and fully disaggregated architectures Effective soft failure detection is extremely important also in
over networking scenarios where soft failures not addressable
through traditional monitoring solutions are successfully detected the context of the recently proposed disaggregated networks,
and localized. Moreover, detailed performance evaluation is con- where network elements as transponders, reconfigurable optical
ducted varying the number of subscribed network elements, the add drop multiplexers (ROADMs) and optical amplifiers are
type of communication (e.g., compressed vs. uncompressed, bun- provided by different vendors (i.e., white box) and coexist in
dled vs. unbundled), showing remarkable scalability performance the same Software Defined Networking (SDN) control within a
particularly in the case of compressed and bundled telemetry
configurations. unique transparent optical domain [5]. That is, besides being fast
and effective, EON monitoring solutions also have to provide
Index Terms—Disaggregated networks, elastic optical networks, standard and vendor-neutral functionalities, enabling on demand
gRPC, monitoring, soft failure, telemetry, white box, YANG.
detailed analysis of transmission performance of the deployed
network elements and activated communications.
I. INTRODUCTION However, current deployed monitoring devices and transpon-
ders typically provide statistics every 15 minutes. This time was
LASTIC Optical Networks (EONs) have gained a signifi-
E cant research interest in the last years and represent a ma-
ture technology to be deployed in core/transport networks [2].
historically selected to enable basic monitoring without over-
whelming the management plane and the network controller
with an excessive amount of data. Also network protocols de-
The maturity of EON solutions is driving their adoption even
ployed for carrying monitoring statistics, as SNMP, were de-
in next generation metro networks, where relaxed optical reach
signed and implemented to support such pace and amount of
constraints, along with advanced transmission adaptation ca-
data. However, such basic monitoring and protocols, do not
pabilities (e.g., more spectrally-efficient modulation formats),
provide accurate and real-time knowledge of data plane info
are expected to successfully address the requirements for future
and are not really suitable for accurate and fast soft failure
bandwidth-hungry dynamic metro/5G services.
detection/localization. Recently, the availability of higher band-
width in the control/management plane, combined with the in-
Manuscript received November 9, 2017; revised January 3, 2018; accepted troduction of big data analytics, machine learning and improved
January 16, 2018. Date of publication January 18, 2018; date of current version processing capabilities at the network controllers are overcom-
June 15, 2018. This paper is an extended version of the work presented in [1]. ing the aforementioned limitations, opening the way for new
This work has been performed in the framework of the H2020-ICT-2017 project
METRO-HAUL (Grant 671636), which is funded by the European Commission. and effective monitoring and management solutions, relying on
(Corresponding author: Francesco Paolucci.) the autonomic network concept [6] and on proactive, automatic
F. Paolucci, A. Sgambelluri, and P. Castoldi are with the Scuola Su- event/forecast driven SDN control optimization and reconfigu-
periore Sant’Anna, Pisa 56124, Italy (e-mail: fr.paolucci@santannapisa.it;
a.sgambelluri@santannapisa.it; castoldi@santannapisa.it). ration employing predictive models [7], [8].
F. Cugini is with the National Inter-University Consortium for Telecommu- In this paper, we propose, implement and experimen-
nications, Pisa 56124, Italy (e-mail: filippo.cugini@cnit.it). tally demonstrate a telemetry service, providing on-demand,
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. streaming-oriented, parallel, real-time monitoring statistics on a
Digital Object Identifier 10.1109/JLT.2018.2795345 specifically selected set of network devices, particularly suitable
0733-8724 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Universiteit Antwerpen. Downloaded on July 30,2022 at 13:50:35 UTC from IEEE Xplore. Restrictions apply.
PAOLUCCI et al.: NETWORK TELEMETRY STREAMING SERVICES IN SDN-BASED DISAGGREGATED OPTICAL NETWORKS 3143
for soft failure event detection and localizations. The telemetry proposed and evaluated in the context of Passive Optical Net-
service is proposed as additional control/management feature in works (PON) to monitor the connections between an intel-
the SDN control plane of EONs employing NETCONF protocol ligent remote splitter and attached Optical Network Units
for device configuration and notification. (ONUs) [12].
First, two use cases are reported and discussed in order to mo- In the very recent years, the improved availability of cloud-
tivate the adoption of a telemetry service. Then, the telemetry based storage and processing capabilities has raised the interest
service is conceived in the framework of disaggregated opti- towards massive analytics of monitoring data to perform contin-
cal networks, in particular the open-source YANG model en- uous planning, advanced lifecycle models and network proactive
hancing the OpenROADM-based multi source agreement for control based on forecast [13].
multi-vendor interoperability [9]. Streaming-oriented teleme- Focusing on SDN control and management plane, the work in
try service based on the gRPC protocol is proposed, providing [14] proposes an architecture supporting telemetry for YANG
efficient fine grained monitoring in the time domain. Two imple- defined packet network parameters (e.g., one-way delay) in
mented telemetry services (partially and fully disaggregated) are the framework of the Automatic Control Transport Network
then applied to a networking scenario where soft failure affects (ACTN). A related work focuses on YANG models for moni-
transmission performance of optical signals traversing multiple toring Traffic Engineering (TE) tunnels, providing end-to-end
optical devices. In such scenario, traditional 15 minute-based monitored data referred to a tunnel [15]. However, the work
monitoring is not able to detect and localize the failure event. targets only end-to-end parameters and IP-layer variables.
Instead, the demonstration shows that the proposed telemetry Finally, the Distributed Network Analitycs (DNA) model pro-
solution allows to identify the occurrence of a soft failure and to poses the activation of YANG-driven massive analytics directly
successfully localize the source of the issue. Finally, extensive at equipment devices supporting processing and storage capa-
scalability evaluation is provided to motivate the adoption of the bilities (e.g., fog nodes) and reporting functions [16].
gRPC protocol for large networks monitoring. However, none of the above studies has designed, imple-
This paper extends the work in [1] by providing the following mented and experimentally demonstrated the telemetry service
additional contributions: in the context of partially or fully disaggregated EON, also high-
r extends the telemetry service architectural proposal to dis- lighting practical uses cases for effective soft failure detection
aggregated networks. and localization.
r proposes and discusses YANG and gRPC models for
telemetry. III. SOFT FAILURE USE CASES MOTIVATING
r presents extended results including an additional use case, TELEMETRY SERVICE
the application to disaggregated networks and scalability
In order to motivate the need of telemetry services in EON,
evaluations in terms of network load and controller pro-
a couple of use cases are hereafter presented, analyzed and dis-
cessing.
cussed. Both use cases are focused on soft failures affecting,
as an example, optical amplifiers. In the first use case, direct
II. RELATED WORK telemetry of monitoring parameters is configured and activated
on all optical devices. In the second use case, the amplifier is a
A number of studies have proposed and addressed the is-
black box not allowing direct communication through teleme-
sue of optical performance monitoring, mainly focused on hard
try for streaming of parameter information and correlation is
failures detection, end-to-end lightpath QoT continuous mon-
exploited taking advantage of telemetry information retrieved
itoring, techniques to retrieve and derive physical parameters
from other optical devices.
statistics of optical signals that are not directly available by
online monitors in currently deployed EON [3].
A. EON With EDFA Monitoring
In the context of the Architecture on Demand programmable
multi-layer optical network, a network monitoring system A portion of EON shown in Fig. 1 is considered. The network
framework based on SDN network analytics for network re- includes a 400 km link N 3–N 4 of 5-span. A lightpath LA gener-
planning and reoptimizations was proposed [10]. The system ated at N 2 and terminated at N 4 is a 100G PM-QPSK provided
introduced a monitoring hub handler collecting optical perfor- by a commercial system. Under steady state normal conditions,
mance monitoring data at different layers (e.g., Optical Signal the lightpath has OSNR statistics in the range between 22.3 and
to Noise Ratio-OSNR in the optical-switched layer, packet loss 22.0 dB, retrieved by the coherent receiver. As a typical commer-
rate in the packet-switched layer) and triggering SDN recon- cial system [17], the 100G transponder provides such statistics
figurations upon changes in the network. The concept of living every 15 min (along with additional transmission parameters
network with disaggregated components, with continuous mon- such as pre-FEC BER, received power, etc) through SNMP or
itoring, estimation of active connections QoT and provisioning NETCONF protocols to the central network controller. A mi-
computation and assignment (i.e., route, spectrum, code, mod- nor failure then occurs on an intermediate device, such as the
ulation format) of new connections is based on such monitoring optical amplifier D5. In particular, the amplifier starts introduc-
results [11]. ing limited (e.g., 1–2 dB) and sporadic fluctuations on its launch
Telemetry is gaining attention besides EON as well. A ded- power. Such fluctuations do not significantly impact the received
icated low-rate data-plane telemetry channel has been recently power of LA since it is recovered by the subsequent EDFA D6
Authorized licensed use limited to: Universiteit Antwerpen. Downloaded on July 30,2022 at 13:50:35 UTC from IEEE Xplore. Restrictions apply.
3144 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 36, NO. 15, AUGUST 1, 2018
Fig. 1. Soft failure use case with available line amplifiers monitoring.
Authorized licensed use limited to: Universiteit Antwerpen. Downloaded on July 30,2022 at 13:50:35 UTC from IEEE Xplore. Restrictions apply.
PAOLUCCI et al.: NETWORK TELEMETRY STREAMING SERVICES IN SDN-BASED DISAGGREGATED OPTICAL NETWORKS 3145
Authorized licensed use limited to: Universiteit Antwerpen. Downloaded on July 30,2022 at 13:50:35 UTC from IEEE Xplore. Restrictions apply.
3146 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 36, NO. 15, AUGUST 1, 2018
Authorized licensed use limited to: Universiteit Antwerpen. Downloaded on July 30,2022 at 13:50:35 UTC from IEEE Xplore. Restrictions apply.
PAOLUCCI et al.: NETWORK TELEMETRY STREAMING SERVICES IN SDN-BASED DISAGGREGATED OPTICAL NETWORKS 3147
Authorized licensed use limited to: Universiteit Antwerpen. Downloaded on July 30,2022 at 13:50:35 UTC from IEEE Xplore. Restrictions apply.
3148 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 36, NO. 15, AUGUST 1, 2018
Fig. 7. Captures of telemetry messages in the BB and FD configurations collected at the SDN network controller.
val. Results show that the average traffic rate even for a large
number of monitored agents (up to 100 devices) is always below
200 kb/s, while the maximum rate is always 1 Mb/s. Increased
Fig. 8. Telemetry monitors of devices A1-A5 (P x (t)), NE4 A6 Pre power streaming burstiness is experienced employing uncompressed
(P 6 (t)) and NE5 A7 card power (P 7 (t)) and OSNR (OSN R 7 (t)) available at
SDN network controller. gRPC, whereas using compressed gRPC the max rate is always
below 400 kb/s. CPU utilization follows a quasi-linear behavior,
always below 5% of overall utilization, with no noticeable dif-
ferences between the compressed and the uncompressed option,
thus confirming the lightweight gRPC burden. Moreover, the
compressed/uncompressed option does not substantially affect
the average throughput since in this case a single parameter is
monitored per subscription.
In order to evaluate the impact of the compressed option, the
telemetry average throughput has been measured for a number
N = 20 of independent subscriptions targeting a single parame-
ter (single20) and compared with the equivalent bundle subscrip-
tion, i.e., an unique subscription targeting a bundle of N = 20
parameters (bundle20) of the same device. Results depicted in
Fig. 10 show that single20 compression does not achieve a
significant gain (i.e., around 5% rate reduction) and confirms
results of Fig. 9, while bundle20 compression achieves more
Fig. 9. Telemetry scalability results: network and processor load at the con- than 50% of throughput reduction, since the gRPC compression
troller as a function of the size of monitored agents (no bundling). operates on the bundle of sampled values (i.e., in the payload
Authorized licensed use limited to: Universiteit Antwerpen. Downloaded on July 30,2022 at 13:50:35 UTC from IEEE Xplore. Restrictions apply.
PAOLUCCI et al.: NETWORK TELEMETRY STREAMING SERVICES IN SDN-BASED DISAGGREGATED OPTICAL NETWORKS 3149
of the gRPC message) and not on the message header. Indeed, [2] M. Jinno, “Elastic optical networking: Roles and benefits in beyond
results highlight the benefits of the bundling option itself. This 100-Gb/s ERA,” J. Lightw. Technol., vol. 35, no. 5, pp. 1116–1124,
Mar. 2017.
means that telemetry of multiple parameters originated by a [3] Z. Dong, F. N. Khan, Q. Sui, K. Zhong, C. Lu, and A. P. T. Lau, “Optical
single device (available for both BB and FD configurations) is performance monitoring: A review of current and future technologies,” J.
extremely scalable providing around 75% throughput reduction Lightw. Technol., vol. 34, no. 2, pp. 525–543, Jan. 2016.
[4] N. Sambo, F. Cugini, A. Sgambelluri, and P. Castoldi, “Monitoring plane
with respect to independent subscriptions. architecture and OAM handler,” J. Lightw. Technol., vol. 34, no. 8,
Such considerations raise to the conclusion that gRPC proto- pp. 1939–1945, Apr. 2016.
col is extremely effective to provide a high number of parallel [5] P. Layec, A. Dupas, D. Verchere, K. Sparks, and S. Bigo, “Will metro
networks be the playground for (true) elastic optical networks?” J. Lightw.
telemetry streams without affecting control plane load and con- Technol., vol. 35, no. 6, pp. 1260–1266, Mar. 2017.
trollers processing resources in both FD and BB architectures. [6] L. Velasco et al., “An architecture to support autonomic slice networking,”
Moreover, gRPC achieves more than 4 times traffic load reduc- J. Lightw. Technol., pp. 1–7, to be published.
[7] Y. Li et al., “tsdx: Enabling impairment-aware all-optical inter-domain
tion with respect to NETCONF. exchange,” J. Lightw. Technol., pp. 1–13, to be published.
[8] F. Morales et al., “Experimental assessment of a flow controller for dy-
namic metro-core predictive traffic models estimation,” in Proc. 2017 Eur.
VII. CONCLUSIONS Conf. Opt. Commun., Sep. 2017, pp. 1–3.
[9] “Open ROADM MSA device white paper v1.0,” Oct. 2017. [Online].
A telemetry service exploiting the gRPC protocol was de- Available: www.openroadm.org
signed, implemented and experimentally demonstrated to ef- [10] S. Yan, A. Aguado, Y. Ou, R. Wang, R. Nejabati, and D. Simeonidou,
ficiently enable on-demand streaming of real-time monitoring “Multilayer network analytics with SDN-based monitoring framework,”
IEEE/OSA J. Opt. Commun. Netw., vol. 9, no. 2, pp. A271–A279, Feb.
parameters, dynamically retrieved from a configurable set of 2017.
network devices. The telemetry service, relying on standard [11] S. Oda et al., “A learning living network with open ROADMs,” J. Lightw.
YANG-defined network models is suitable for effective appli- Technol., vol. 35, no. 8, pp. 1350–1356, Apr. 2017.
[12] P. Iannone et al., “An 8- x 10-Gb/s 42-km high-split TWDM PON fea-
cation in EONs also encompassing disaggregated network ele- turing distributed raman amplification and a remotely powered intelligent
ments. splitter,” J. Lightw. Technol., vol. 35, no. 7, pp. 1328–1332, Apr. 2017.
The implemented telemetry service was experimentally vali- [13] L. Velasco, A. P. Vela, F. Morales, and M. Ruiz, “Designing, operating,
and reoptimizing elastic optical networks,” J. Lightw. Technol., vol. 35,
dated over two networking scenarios where soft failures affect- no. 3, pp. 513–526, Feb. 2017.
ing optical amplifiers are successfully detected and localized by [14] Y. Lee, R. Vilalta, R. Casellas, R. Martinez, and R. Munoz, “Scalable
retrieving and correlating fast OSNR and optical power varia- telemetry and network autonomics in ACTN SDN controller hierarchy,”
in Proc. 2017 19th Int. Conf. Transparent Opt. Netw., Jul. 2017, pp. 1–4.
tions not detectable through traditional 15-minute monitoring. [15] Y. Lee, D. Dhody, R. Vilalta, D. King, and D. Ceccarelli, “YANG mod-
The telemetry service was shown in the context of both fully els for ACTN TE performance monitoring telemetry and network auto-
or partially disaggregated scenarios, also providing scalability nomics,” IETF, TEAS Working Group, Mar. 2017.
[16] M. Chandramouli and A. Clemm, “Model-driven analytics in SDN net-
performance evaluations. In particular, scalability was assessed works,” in Proc. 2017 IFIP/IEEE Symp. Integr. Netw. Serv. Manag., May
by varying the number of monitored devices in the network and 2017, pp. 668–673.
measuring the network and processing load at the network con- [17] M. Filer et al., “Elastic optical networking in the microsoft cloud [invited],”
IEEE/OSA J. Opt. Commun. Netw., vol. 8, no. 7, pp. A45–A54, Jul. 2016.
troller, with and without data compression. Results showed that [18] K. Christodoulopoulos, N. Sambo, and E. Varvarigos, “Exploiting network
the traffic rate for even 100 subscribed optical devices is very low kriging for fault localization,” in Proc. 2016 Opt. Fiber Commun. Conf.
(average below 200 kb/s and max rate below 400 kb/s for com- Exhib., Mar. 2016, pp. 1–3.
[19] M. Dallaglio, N. Sambo, F. Cugini, and P. Castoldi, “Control and man-
pressed data). CPU utilization follows a quasi-linear behaviour, agement of transponders with NETCONF and YANG,” IEEE/OSA J. Opt.
generally providing extremely low utilization (i.e., below 5%). Commun. Netw., vol. 9, no. 3, pp. B43–B52, Mar. 2017.
Efficient gRPC encoding achieved more than 4 times telemetry [20] “gRPC: A high performance, open-source universal RPC framework.”
[Online]. Available: https://grpc.io/
traffic reduction with respect to XML-based NETCONF notifi- [21] F. Paolucci, F. Cugini, G. Cecchetti, and P. Castoldi, “Open network
cations. database for application-based control in multilayer networks,” J. Lightw.
Finally, results showed the benefits of the bundling option, Technol., vol. 35, no. 9, pp. 1469–1476, May 2017.
[22] J. Thrane, J. Wass, M. Piels, J. C. M. Diniz, R. Jones, and D. Zibar,
enabling around 75% of throughput reduction compared to inde- “Machine learning techniques for optical performance monitoring from
pendent uncompressed subscriptions, leading to the conclusion directly detected PDM-QAM signals,” J. Lightw. Technol., vol. 35, no. 4,
that such bundling-based telemetry successfully supports high pp. 868–875, Feb. 2017.
[23] A. P. Vela, M. Ruiz, F. Cugini, and L. Velasco, “Combining a machine
number of parallel telemetry streams without affecting control learning and optimization for early pre-FEC BER degradation to meet
plane and processing load at the SDN Controller. committed QOS,” in Proc. 2017 19th Int. Conf. Transparent Opt. Netw.,
Jul. 2017, pp. 1–4.
[24] A. Giorgetti, F. Paolucci, F. Cugini, and P. Castoldi, “Dynamic restoration
ACKNOWLEDGMENT with GMPLS and SDN control plane in elastic optical networks [invited],”
IEEE/OSA J. Opt. Commun. Netw., vol. 7, no. 2, pp. A174–A182, Feb.
Special thanks to Matteo Dallaglio who supported the authors 2015.
in the design and setup of NETCONF agents. [25] F. Cugini et al., “Active stateful PCE with hitless LDPC code adaptation
[invited],” IEEE/OSA J. Opt. Commun. Netw., vol. 7, no. 2, pp. A268–
A276, Feb. 2015.
REFERENCES [26] “Confd.” [Online]. Available: developer.cisco.com/site/confD/
[1] F. Paolucci, A. Sgambelluri, M. Dallaglio, F. Cugini, and P. Castoldi,
“Demonstration of gRPC telemetry for soft failure detection in elastic
optical networks,” in Proc. 2017 Eur. Conf. Opt. Commun., Sep. 2017, pp.
1–3. Authors’ biographies not available at the time of publication.
Authorized licensed use limited to: Universiteit Antwerpen. Downloaded on July 30,2022 at 13:50:35 UTC from IEEE Xplore. Restrictions apply.