You are on page 1of 5

Distributed Controller Design with Active–Active

Method on Software Define Network

Abstract — The application of multiple controller to the The reason for choosing flat architecture was that this
software define network architecture currently mostly uses a architecture can expand the capabilities of the control
distributed controller architecture (active-backup). On this plane. In addition, the communication on each controller
architecture, one controller is used as the main controller in this architecture is more frequent in order to guarantee
and the other controller is used as a backup. However, in the
the network consistency when compared to hierarchical
use of the architecture there are still shortcomings such as
load on a limited controller. If the controller is overloaded, architecture [2].
the performance of the network decreases so that the In this experiment Min inet was used as the emulator.
architecture does not have the ability to manage larger Mininet as an emulator can build networks by forming
networks. This will greatly affect the reliability and virtual hosts, switches, and links that can be run on Linux
scalability of the network. Therefore, a solution is needed to operating systems [4]. The analysis in this study included
overcome this problem. This study proposed distributed the value of throughput, failover time and CPU usage. By
controller method (active-active). Distributed controller carrying out these tests, it can provide a guarantee of the
(active-active) architecture allows all active controllers to
quality of the network service.
work together to manage a network simultaneously. Based
on the performance comparison test between a single
controller and a distributed controller (active -active), it is II. RELATED WORK
known that the distributed controller with the (active-active) Previously, there were several journals about
method affected the network performance with a greater distributed controller that had been discussed as in paper
throughput difference of 7.9% by using background traffic. [5] that d iscussed about distributed mult i-do main SDN
In CPU usage testing, the active-active distributed controller contoller. The paper proposed DISCO by using DISCO to
get a 15% lower percentage than the single controller. allo w the controller to co mmunicate with other controllers
Whereas, in failovertime test, the average time was 9.4 to provide end-to-end services. Furthermore, paper [6]
seconds. Based on the results of the testings, it can be discussed the SDN controller robustness and distribution
concluded that the distributed controller (active-active)
framework. The papers co mpared several controllers to
affected the reliability and scalability of the network.
Key words – software define network, reliability, get the best controller performance with the distributed
scalability, controller, failover controller method (active - backup). Fro m the simu lation
experiments conducted, ONOS controller get the best
I. INTRODUCTION performance with failover t ime parameters. Paper [7]
The development of internet network technology from discussed about distributed controller clustering in
year to year is very rap id. Various kinds of technology software defined networks. The paper co mpared the
concepts on networks were created, one of which was the performance between cluster controllers and without a
concept of SDN. Software define network (SDN) is a cluster controller. Fro m the simulat ion experiments
network arch itecture that separates the control plane conducted, contollers were obtained by using a better
(controller) and data plane, where the network architecture cluster with throughput and packet loss parameters. Paper
can be programmed [1]. One of the SDN supporting [8] discussed the overview on SDN architectures with
components is the controller. The controller is the main mu ltip le controllers. The paper p rovided an exp lanation of
component of the SDN which is directly responsible for several types of architectural designs to use multip le
controlling the flow of data fro m each device. An SDN controllers with various functions and their respective
network can use mult iple controllers. In mu ltip le objectives. The information get fro m those papers was
controllers, there are several architectures including used as the basis for the purposes of designing this study.
physically centralized, physically distributed, flat
architecture and hirarchical architecture [2]. A. Software define network
Multiple controllers and distributed controller (act ive- Software define network (SDN) is a network
backup) architecture have the disadvantage of having a architecture that separates the control plane (controller)
limited load capacity. If the controller is overloaded, the and data plane. SDN network creates a network
performance of the network decreases [3] so that the architecture that is dynamic, easy to set up and adaptable.
architecture does not have the ability to manage larger This architecture separates control and forwarding
networks. This will great ly affect the reliab ility and functions by enabling network control so that it is
scalability o f the network. In this study, the researchers programmed direct ly and the underlying infrastructure can
proposed a method of d istributed controller (active-active) be abstracted for the network applications and services
with flat architecture. In this architecture, all controllers [1].
are active and work together to manage a network
simu ltaneously. If there is one controller that fails, switch B. Multiple Controllers
migrat ion will be performed against another controller. Multiple controllers are a set of controllers that work
The controller used in this study was the Pox controller. together to achieve performance levels on an SDN
network. Mult iple controller arch itecture has several
different aspects and characteristics with different
capabilit ies and settings. Multiple controllers is divided
into two: physically centralized and physically distributed.
The physically centralized architecture is the same as the
SDN arch itecture, in general, which only uses a single
controller. While in physically distributed architecture,
there are several controllers used in networks that can
work together. Distributed physically arch itecture has two
types of characteristics, including logically centralized
and logically d istributed. If logically centralized is used, it
means that each controller has the same responsibility and
role to work together to manage the network. Whereas if
logically distributed is used, it means that there are several
controllers with different do mains, but each controller will
only be responsible for that do main. In the use of logically
centralized and logically distributed, flat architecture or
hierarchical architecture models can be applied [8].

C. Openflow Protocol
Openflow is open standard that allows the researchers
to study protocols in the network used. Openflo w is an
important co mponent in SDN. A mong many other Figure 1 Failover Flowchart
protocols that can be used, openflow is the most widely
implemented protocol in SDN network arch itecture [17]. Figure 1 is a failover flo wchart mechanism in this
Openflow provides direct access and control to the system. If the controller fails, another controller would
forwarding plane of netwo rk devices such as switches or detect it and do failover by using switch migration toward
routers, both physically and virtually [18]. Openflow the active controller.
allo ws communication between the controller and the
forwarding plane and can adapt to the network [18]. III. GENERA L DESCRIPTION OF THE SYSTEM
The system design in this study was divided into
D. Reliability topology design and software design. The topology design
Network reliab ility is the ability of the network to was done first to build the network infrastructure that will
provide services smoothly without interruption or in other be used. Next, the software design was done to support the
words the network is reliable [9]. The network implementation process. In this study, the topology that
architecture of distributed controllers can increase the was formed was a distributed controller (active-act ive) in
network reliab ility [10]. In case [7] if one of the SDN architecture.
controllers fails, another controller will continue to push
flow the switch (switch mig ration) to ensure the network
reliability.

E. Scalability
Network scalability is the ability of a system to handle
the amount of load in terms of increasing demand for
larger network resources [9]. In general, the problem in
scalability is controlling the large-scale network traffic. A
controller can experience a bottleneck if the pro posed
flow of requests exceeds the capacity that can be managed
by the controller itself [11]. In case [12] the NOX
controller can only manage 30k requests/sec, this
capability can only be used for small to med iu m scale
networks. Whereas, if it is used for campus networks or
data centers, the controller will experience a bottleneck.
Figure 2 Distributed Controller (active-active) Topology
F. Failover mechanism
Failover mechanism is a network technique by
Based on Figure 1, t wo VMs (Virtual Machines)
providing two or more connection lines where when one
would be created, namely VM 1 for controller 1 and VM2
of the lines dies, the connection is still running by
for controller 2. Both controllers are act ive and work
switching to another path. The failover mechanis m can be
together to co mmunicate with each other to manage a
designed so that it can act as soon as possible after a
network simu ltaneously. The co mmunication used
disturbance occurs [6].
between controller 1 and 2 is RabbitMQ. There are four
switches in this topology, switch 1 and switch 2 are
controlled by controller 1 but still connected to controller
2. Likewise, switches 3 and 4 were controlled by
controller 2, but still connected to controller 1. If there is
one controller that fails, RabbitM Q will detect the
problem, inform and then make a switch migration Figure 4. Information about Controller 2 that has been
command against another controller. So, the network down and the switch migration
reliability is formed. There were several software that
were used in this study, including Linu x operating system The time needed for the entire Po x failover process
Ubuntu 18.04.1 LTS, M ininet that was run on the OS, and can be seen based on the time needed to switch back to the
phyton that was used for system configuration. controller. This can be seen in the open switch log found
in file /var/log/openswitch/ovs -vswitchd.log. Table 1 is
IV. TESTING AND ANALYSIS the test log results on Openvswitch.
In the previous chapter, it was exp lained about the
related theories and general description of the system. In Table 1 The Results of Open Switch Log
this chapter, the researchers would explain about the tests Log Openvswitch Time
performed and the results obtained from those tests. The |00329|rconn|in fo|switch3<- 2018-23-
tests that were carried out in this research were reliab ility >tcp:192.168.56.102:6634:connection closed 11
test and scalability test by comparing the single controller by peer 07:03:10
with the distributed controller. The parameters used in the |00330|rconn|in fo|switch4<- 2018-23-
scalability test were throughput and failover time. While >tcp:192.168.56.102:6634:connection closed 11
the parameters used the scalability test was CPU usage. by peer 07:03:11
By carrying out these tests, it can provide a guarantee of |00352|rconn|in fo|switch3<- 2018-23-
the quality of the network service. >tcp:192.168.56.101:6633:connected 11
07:03:21
A. Reliability Test |00357|rconn|in fo|switch4<- 2018-23-
a) Failover Mechanism Test >tcp:192.168.56.101:6633:connected 11
This failover test was done by turning off controller 2 07:03:21
directly (kill process). With the condition of controller 2
that has been off, RabbitMQ on controller 1 detected that Based on the log in Table 1, in the first test, the switch
the controller 2 was down, and then controller 1 took 10 seconds to reconnect with the Pox controller. In
commanded switch migrat ion against the switch this study, 5 tests were carried out, and the following
connected to controller 2. Hence, the switch that was results were obtained.
initially connected in controller 2 moved to controller 1.
The following result was obtained fro m p ing h1 to h8 Table 2 Overall Results of Failover Time Tests
during the failover process. Test Time (second)
1 10
2 10
3 7
4 11
5 9
Average 9.4

The results of 5 tests showed time differences that


were not too significant. The average time needed in the
failover process was 9.4 seconds. The test results showed
a fairly short time for the switch mig ration process carried
out from controller 1 to controller 2. The failover sy stem
Figure 3. Ping h1 to h8 during the failover process
built can work well and in accordance with the working
concept so that if one controller is down, there is no
Based on Figure 3, after icmp_seq = 19, the switch
network problem because there is another controller that
could not connect to controller 2, so the ping process wsa
would backup the controller.
delayed. During this process, the switch tried to connect to
controller 1. After some t ime, p ing can continue with
b) Test of Performance Co mparison of Distributed
icmp_seq = 20 with t ime = 48.1 ms. It took time for
Controller with Single Controller
RabbitMQ to takeover the switch fro m controller 2 to
The purpose of this test was to determine the effect of
controller 1, started from detecting controller 2 has been
the number o f controllers on the perfo rmance of SDN
down until controller 1 switch migration co mmand was
network and how the co mparison of network performance
done. Figure 4 shows that Controller 2 has been down and
using a single controller with a distributed controller. The
switch migration successfully connected to controller 1.
parameters used in this test was throughput. The topology
used in testing the single controller was the same as the
topology in Figure 2, but the difference was only on the
use of one controller. In this test, two SDN network throughput value was obtained when giving background
topologies that have been run were tested with several traffic by 90%, amounting to 1.54 and 1. 79Gbits/s. This
scenarios to determine the throughput value of each type was due to giving background traffic based on the number
of architecture. The first scenario was throughput testing of iperf streams. The mo re iperf streams are used, the
without using background traffic, wh ile the second smaller the output is obtained.
scenario used background traffic. To find out the value,
testings without using background traffic that used iperf B. Scalability Test
were done 10 times. To test the background traffic, a The purpose of this test was to find out CPU
number of different iperf streams were used. The more utilizat ion of the number of controllers on SDN network
iperf streams used, the more nu mber of mult iuser testing performance and find out the network performance
used for the test. In 10-30% backg round traffic, there was comparison by using a single controller with a distributed
only 1 iperf stream that was used. While 40-90% of the controller (active -active). The parameter used in this test
test used different amounts of iperf streams ranging fro m was CPU usage. Table 3 shows the number of nodes that
2 to 7 iperf streams. The fo llowing is the result of testing were used in the test.
of the two scenarios.
Table 3 The Number of Nodes Used in the Test
Number of switch 4 8 16 32
Number of host 8 16 32 64

CPU usage test was done by looking at the percentage


on the monitor system when all switches were connected
to the controller. The following is the result of the test that
have been carried out.

Figure 5 Throughput without background traffic

Based on the test results carried out 10 times on each


controller arch itecture with a testing duration of 10s in
each test, the results obtained was in the form of a single
controller throughput values ranged from 18.9-22.9 Figure 7 CPU usage
Gb its/s with an average o f 20.93 Gb its/s, while the
throughput in the distributed controller ranged fro m 17.2- The CPU usage test result in Figure 7 shows that the
21.7 Gb its/s with an average of 19.07 Gb its/s. Based on distributed controller had a lower CPU usage percentage
the results obtained, the throughput value in the compared to the single controller. The highest percentage
distributed controller (act ive-active) difference was 8.9% obtained by distributed controller was 83%, while the
smaller than the single controller. highest percentage obtained by single controller was 98%.
With these results, distributed controller (active-active)
had a difference of 15% lower than the single controller.
the more number of nodes used, the greater the percentage
of CPU usage obtained. Based on this test, distributed
controller (active-active) affected the network scalability.

V. CONCLUSIONS AND FURTHER RESEARCH


Based on the tests of reliab ility and scalability by
performing performance co mparisons between single
controller and act ive-active distributed controller, it is
known that the distributed controller with act ive-active
method with throughput parameter get higher result than
the single controller that used background traffic. In
testing the failover mechanis m carried out five t imes, the
Figure 6 Throughput with Background Traffic
average failover t ime was 9.4 seconds. Based on the time
obtained, the time obtained was a short time for the
The throughput testing by using background traffic get
failover process. In testing CPU usage, the distributed
the opposite result that the distributed controller had a
controller (active-active) get a lower percentage of CPU
higher throughput value compared to a single controller.
usage compared to the single controller. Based on all the
Figure 6 shows the results when the background traffic of
results of the tests carried out, it can be concluded that the
10% throughput value reached the highest number that
method of distributed controller (act ive-active) affected
was equal to 23.5 and 24.7 Gb its/s, while the lo west
the reliability and scalability of the network. A suggestion
that can be given for further researchers related to the
design of Distributed Controller architecture on network
reliability is to add a failback mechanis m after a failover
process between controllers.

REFERENCE
[1] Open Networking Foundation, “Software-Defined Networking
(SDN) Definition - Open Networking Foundation,” 2018.
[Online]. Available: https://www.opennetworking.org/sdn-
definition/. [Accessed: 17-Sep-2018].
[2] T. Hu, Z. Guo, P. Yi, T. Baker, and J. Lan, “Multi-controller
Based Software-Defined Networking: A Survey,” IEEE
Access, vol. 6, no. c, pp. 15980–15996, 2018.
[3] M. Nkosi, A. Lysko, L. Ravhuanzwo, T. Nandeni, and A.
Engelberencht, “Classification of SDN distributed controller
approaches: A brief overview,” Proc. - 2016 3rd Int. Conf.
Adv. Comput. Commun. Eng. ICACCE 2016, pp. 342–344,
2017.
[4] Mininet Team, “Mininet Overview - Mininet,” 2018. [Online].
Available: http://mininet.org/overview/. [Accessed: 25-Sep-
2018].
[5] K. Phemius, M. Bouet, and J. Leguay, “DISCO: Distributed
multi-domain SDN controllers,” IEEE/IFIP NOMS 2014 -
IEEE/IFIP Netw. Oper. Manag. Symp. Manag. a Softw. Defin.
World, 2014.
[6] F. Fatturrahman, “SDN Controller Robustness and Distribution
Framework,” 2017.
[7] A. Abdelaziz et al., “Distributed controller clustering in
software defined networks,” PLoS One, vol. 12, no. 4, pp. 1–
19, 2017.
[8] O. Blial, M. Ben Mamoun, and R. Benaini, “An Overview on
SDN Architectures with Multiple Controllers,” J. Comput.
Networks Commun., vol. 2016, 2016.
[9] X. Guan, B. Y. Choi, and S. Song, “Reliability and scalability
issues in software defined network frameworks,” Proc. - 2013
2nd GENI Res. Educ. Exp. Work. GREE 2013, pp. 102–103,
2013.
[10] “ T en {T hings} to {Look} for in an {SDN} {Controller}.”
[11] A. Voellmy, “Scalable Software Defined Network
Controllers.pdf,” pp. 289–290, 2012.
[12] A. Tavakoli, M. Casado, T. Koponen, and S. Shenker,
“ Applying {NOX} to the Datacenter,” HotNets, 2009.