You are on page 1of 4

Construction of Provider-Independent Overlay Networks with High Resilience

Xian Zhang
Queen Mary University of London, Mile End Road, London, United Kingdom

Chris Phillips
Queen Mary University of London Mile End Road, London, United Kingdom proposed method performs the best and can achieve the comparatively good performance in various failure models with limited overhead. Although the proposed algorithm is verified on AS-level physical networks, the definition and verification can be easily extended to router-level networks. This paper is organized as follows: Section II describes related work; Section III presents the proposed algorithm in detail; Section IV provides the simulation results as well as analysis; and Section VI includes conclusions and indicates areas for future consideration. II. RELATED WORK

Abstract—It is difficult to change the Internet infrastructure in support of new services because of its distributed and autonomous features. In this context overlay networks provide a promising means of offering a supplement to the existing “best efforts” delivery mechanism. This paper discusses one of the key issues: topology construction in the context of providerindependent overlay networks. A heuristic method is proposed to build a physical-aware overlay topology which can provide reasonable resilience while incurring little overhead. Simulation results show that this scheme performs the best among the three comparable methods. Keywords- Network Provider Independence; Overlay Network; Resilience



Overlay networks have attracted much attention in the context of Internet evolution [1]. Overlay networks are usually mapped on top of a physical network (e.g. the multi-AS Internet or a single Internet Service Provider (ISP) network). The nodes in overlay networks connect with each other through virtual links, which are in turn composed of possibly multi-hop physical paths. The health of these virtual links is usually monitored by periodically sending probes. Thus, the monitoring overhead increases proportionally with the incremental deployment of virtual links (i.e. the average connectivity of overlay nodes). The design of overlay network topology has been shown to have a direct impact on the performance of overlay networks in previous works [4, 11 and 12]. Therefore, it is critical to construct an efficient overlay topology (i.e. the connection relationship between overlay nodes and the placement of overlay nodes) whilst taking into account the characteristics of the physical network it is deployed upon [13]. Most of the existing work on overlay topology construction focuses on network provider dependent overlays [5 12, 13]. To be more specific, there is no constraint on the placement of overlay nodes. Generally, routers have greater utility by providing alternative paths with better performance such as lower latency for customers, as shown in [5, 14]. However, it requires the necessity of information sharing among different ISPs administering their network independently with their own strategies and policies. Therefore, it is difficult, although not impossible, to host the overlay nodes across a multiAutonomous System (AS) infrastructure at present. In this paper, our aim is to construct an efficient overlay topology to provide better resilience under different failure models in the context of network-provider independent overlay networks such as ROMCA (Resilient Overlay for MissionCritical Applications) [15]. Simulation results show that the

Currently the Internet only provides “best effort” packet transport. Furthermore, the Border Gateway Protocol (BGP), used for routing across ASes, is characterized by reconvergence times of several minutes or longer [7]. Overlay networks, for example RON [3], are proposed to provide better performance by actively monitoring the network among a group of participating nodes. The nodes in RON form a full mesh topology and use active probing to monitor the health of Internet paths included in the overlay. In contrast to the high convergence time of the BGP protocol, RON can achieve recovery time in the order of tens of seconds based on test-bed experiments [3]. However, the overhead increases in the order of O(n2), where n is the number of overlay nodes. In [8], the authors discuss the relationship between the effectiveness of the overlay and the overhead consumption. The conclusion is that with a lower overlay node degree, a comparable Quality of Service (QoS) performance to that of full connectivity can be achieved. Overlay topology construction has been researched from various aspects. For example, in [4, 13], different types of overlay topologies are analyzed given overlay node locations. On the other hand, there remains discussion on where to place the overlay nodes [5, 12]. For instance, [12] considers the question of how many ISPs and the number of routers inside one ISP network is enough for overlay routing. The correlations between direct and indirect paths for source and destination pairs using different available nodes are calculated. Then, the intermediate nodes are categorized into different performance clusters for overlay construction use. Both of them show that the diversity of virtual links is indispensable for providing better performance in overlays. However, they are both categorized as provider-dependent overlay architectures. Moreover, customers only use the overlay services in the case of direct Internet paths failures, which means they should be equipped with the ability to diagnose the health of the original path in a timely manner.

j )b(k . l j ) =| P(l i ) I P(l j ) | and means i∈{Vo } ∑x =1. 11]. ON j ) = 1 .k . overlay nodes can route through all the possible physical paths between a pair of overlay nodes. s . l j ) : The overlap between two virtual links. i ≠ j nofailure i∈{Vo } j∈{Vo }. However. then ρ= afterfailure i∈{Vo } j∈{Vo }. which is an ordered physical AS list. j ) equals to 1 if there is a connection between ON i and j. namely.: LO (li . Physical ND (PND) means the number of neighbours an AS has whilst Overlay ND (OND) means how many other overlay nodes (ON) an ON has virtual links with. OND(Vo ) ≤ ONDmax l i : The virtual link between two overlay nodes. Definition • Node Degree (ND): There are two types of node degree defined in this paper. Proposed Algorithm The problem stated in the above sub-section can be formulated as to find a Go (Vo . Conversely.QoSMap [6] showed that it can provide high overlay resilience in a provider-independent overlay. l j ) ⋅ Li ⋅ L j Where (1) s. the virtual links chosen should overlap the least in the physical layer. In [9. (2) The overlay node degree constraint NDoverlay (ON ) ≤ ONDmax . j∈{V p '} ∑x ij = 1 . a regular graph) and then (2) to map it onto the selected physical node set {V p ' } so as to obtain an overlay topology with minimum overlap as defined in equation (3). there is no work addressing network provider independent overlay networks resilience under different failure models. It maps an overlay topology with specific QoS requirements onto a physical network topology by sequentially selecting the PlanetLab nodes that can provide the best QoS performance. Most existing work on overlays focuses on improving QoS performance using a single intermediate node for detouring. otherwise it equals to 0. overlay topology Go (Vo .i ≠ j A.e. They assume that all the physical nodes are overlay candidates and physical nodes with a lower degree than three are not considered. It is based on the observation that the nodes in the stub areas generally do not have control over which intermediate nodes (i. In this paper. OVERLAY TOPOLOGY CONSTRUCTION B. l i j ) (3) (4) s. E p ) and the overlay topology requirements include: (1) The provider independence requirement: only physical ASes with lower connectivity can be selected to host overlay nodes. ON j ) (2) (ON i . The objective of the overlay topology design is to construct an overlay topology that has maximal AS-disjoint virtual links. It is a variant of BiQuadratic Assignment Problem (BiQAP). ND physical (ON ) ≤ PNDmax . namely.1} (6) the number of AS hops two virtual links share in common. It can be solved in a two-step manner: (1) to find a topology in which all the nodes have the same node degree (i. Moreover.t.e. which is a generalization of the NP-hard QAP problem [17]. The second step can be rewritten as: i . the higher the probability it will overlap with other virtual links. otherwise 0. Where b(i . t ))x ij im x jp x ks xlt (5) O (li . Eo ) so as to minimize: j∈{ Eo } i∈{ E o }. xij ∈ {0. the objective is to find an to reduce the overhead. l j ) = LO (li .t. Our work considers finding a near optimal solution of overlay topology using heuristic methods and verification is carried out under different failure models. i ≠ j ∑ ∑C ∑ ∑C (ON i . C (ON i . especially when the overlay nodes only reside in those areas with low connectivity (e. it strategically chooses the physical network to provide a diversified set of paths for the overlay construction. P( s. C. our overlay node candidates are constrained to ASes with low connectivity as would be expected of smaller tier-3 stub networks. stub areas). ∑ ∑ b(i. According to their definition. Usually the longer a virtual link is. p . they focus on proving the NP-completeness of the two models [11] and how to construct an overlay with high availability in a scalable distributed way [9]. two availability models are proposed to define an overlay that can still be fully connected in case of no more than three physical node failures. and Li ( L j ) represents the number of AS hops li ( l j ) has. p). III. As there is no known way to find the optimal solution of this problem in polynomial time in a . l )O( P(m. Eo ) that will have maximum resilience under various physical failure(s) scenarios. we assume the paths between the overlay nodes are determined by the physical layer. It is be notated as P (li ) .t∈{V p '} • O (l i . • ∑ ∑ O (l .l∈{Vo } m . The possibility of overlapping between the virtual links that may result in concurrent failures still needs to be discussed. In other words. resilience (ρ) of the overlay is defined as follows: If there exists a route between a pair of overlay nodes. which is generally composed of a physical path. physical routers or ASes) they employ for routing purposes unless with the help of other overlay nodes in the overlay layer.g. j . Each virtual link must be probed to monitor its status. Problem Description The overlay topology construction problem can be stated as: given a physical network topology G p (V p . However. ON j ) The main issue of overlay construction is that physical failures may result in concurrent overlay link failures.

Failure Models Only AS failures in physical network are considered within this paper. Physical Topologies Impact . The reason behind this is because there is less physical degree variation in the grid topology. Due to space limitations. for given a maximum overlay node degree constraint. as shown in Fig. However. The overlay node number. overlay node number. There are three failure models employed in simulations. With the third approach the failure starts from a single AS and propagates to its neighbours. overlay node degree and failure model chosen are 30. All the neighbours of the previous failure set will be viewed as failed by turns. Random Multiple Failure model similar to that of [10] and Accumulative Focused Failure model similar to large-scale failure scenarios in [2] (the geographical information is not considered here). A “failure” does not necessarily mean the physical breakdown but can also represent performance degradation (e. three methods are also included: (1) M1: Full Mesh.8 0. the failures are randomly selected from the subset of ASes that ON virtual links traverse. for the third failure model.6 0. SIMULATION RESULTS include both covered and uncovered physical nodes.large scale network.7 0 M1 M2 M3 M4 2 4 6 8 10 ON-Supporting AS Failures(%) 1 0. 500 and 2000-node scale-free topologies generated using Pajek [16]. 5 and multiple random failures unless stated otherwise. simulated annealing is chosen to find a near optimal solution in an efficient way. the radiating effect will 500-Node Scale Free Topology 0. delay or loss rate) below an acceptable threshold as perceived by overlay probing and monitoring. As we do not consider AS failures beyond these (as they have no impact on the performance of the ON). All the overlay nodes are assumed to have the ability to detect the performance degradation and/or the failures in the physical network. B.7 0 M1 M2 M3 M4 2 4 6 8 10 ON-Supporting AS Failures(%) 500-Node Random Topology 1 0. Assumptions • The virtual link between adjacent overlay nodes follows a series of physical hops that are determined using a least-cost routing algorithm.9 ρ 0. as depicted in the figure.9 ρ 0.8 0. They are the Random Single Failure model.4 0.g. So the higher the overlay node degree is. 1 A. We therefore introduce the term “ON Supporting AS Failures” to represent the number of failed ASes that are covered by the ON. C. failure of ASes that do not convey overlay virtual links are not considered. The results are averaged over 300 trials. all the s results are consistent.8 400-Node (20*20) Grid Topology ρ 0. (3) 200 and 500-node random topologies generated using Pajek (average PND equals to 4). They get the up-to-date routing information based on link-state routing that operates across the overlay at the time of making overlay routing decisions. the resilience difference is much larger (as high as 30%) in the grid topology than that (max 10%) of the other two. 1. D. Here. • It is assumed that the physical topology is known to the overlay. Results and Analysis The proposed algorithm is verified considering different topologies. Simulation Scenarios In order to evaluate the performance of the proposed scheme (notated as M3). careful design of the overlay topology is needed and can improve the overlay network availability even when the number of “network failures” is small and the improvement can be more than 10% and 30% in scale-free and grid topologies. Except for grid topologies where PNDmax equals to 3. only some of the results are shown here. M1-M4) show similar trends irrespective of the size and type of the underlying physical topology. 1) Physical Topology Impact The performance difference among the four topological construction methods (i. (2) 200. the nodes in the physical topology represent ASes. A variety of physical networks are chosen including (1) CN05 AS-level network obtained using traceroute [5]. Therefore in the simulations. IV. For the resilience evaluation of the overlay network.2 0 0 2 M1 M2 M3 M4 ON-Supporting AS Failures(%) 4 6 8 10 Figure 1. respectively.e. more resilient the overlay topology will be against multiple random “network failures”. However. Only symmetric routes are considered here for simplicity. (2) M2: randomly mapping of regular graph. M3 performs the best among the three comparable methods. Furthermore. The average node degree of M4 is maintained to be the same as our proposed method for fair comparison. Nevertheless. (4) 200 and 400-node grid topologies. which provides the upper bound of the resilience metric. overlay node degree and failure models. (3) M4: the random method of selecting overlay virtual links proposed in [8]. all the other physical topologies have constraint of PNDmax equals 1. the reported failure figures represent values that would be typically much higher than would be witnessed if the whole Internet were to be considered. Moreover. The first two are self-explanatory. so the first failure is chosen from the whole set.

David G. It is shown according to the results that in scale-free networks. M.7 0. 2008.2) Overlay Degree and Node Number Impact The impact of overlay node degree (i. 2004.2008. such as grid-like topologies. Zhi Li. June 2007. Abdul Jabar.. Computer Networks. 2009. Madhu Kumar S. E. Mohapatra.D. ICC 2008.8 Overlay Node Number = 50 [15] M1 M2 M3 M4 6 9 12 15 ρ ρ 0. [Online]: Available: http://www. Italy. Tech. page(s):169-180. “BGP convergence delay after multiple simultaneous router failures: characterization and solutions”. J. D. Y. 2009. Guoqiang Zhang. Kluwer Academic Publishers. “Network Operator Independent Resilient Overlay for Mission Critical Applications (ROMCA)”. DRCN 2009. compared with random construction methods whilst maintaining a relatively low overhead by constraining the overlay connectivity. 2007. “Enhancing end-toend availability and performance via topology-aware overlay networks”. all the four schemes perform similarly with no significant difference under this failure model. Oct. Junghee Han. Pajek. except for that of grid network. Mohapaira. the x-axis represents the Failure Radius of the accumulative-focused failure model. Chris Phillips. Robert Morris. DEBS’08. “Resilient Overlay Networks”.418.G. ChinaCom 2009. Chen. 4. As illustrated in the figure. P. This is probably because there is a lower overlap between the overlay nodes in grid network than that of the other types of physical networks. March 2004.e. Rep. CONCLUSION 1 0. Canada. the proposed method can achieve better resilience by as much as 10%. 1-5) means all the nodes that are x AS hops away from this point are deemed to have malfunctioned too. “Path diversification: a multipath resilience mechanism”. No 9. “Availability Models for Underlay Aware Overlay Networks”. page(s):714-723. PAM 2004.6 [16] [17] ON-Supporting AS Failures(%) ON-Supporting AS Failures(%) Figure 3. OND ranges from 4-10) and overlay node number (i. Although. K. Umesh Bellur.4 0 1 2 Failure Radius(AS Hops) 3 4 5 6 7 8 9 Figure 4. Li. July 2008.6 0. Anderson. Specifically. Zhou. Sterbenz. “On Investigating Overlay Service Topologies”. available online: http://vlado.5 0 3 6 9 12 15 M1 M2 M3 M4 0. Kumar S. Washington DC. “Improving Chinese Internet’s Resilience through Degree Rank Based Overlay Relays Placement”. August 26-28. Canada. Sushant Rewaskar.) on overlay resilience is verified in scale-free topologies.6 0 0. H.6 0. (2) M3 performs the best. 0 represents the failure of a single AS and x (i. Cheriton. page(s): 1207-1218. Bijan Bassiri. Vol. Overlay Node Degree Impact 1 0. Cela. M. U.9 ON-Supporting AS Failures (%) 2 4 6 8 10 0 ON-Supporting AS Failures(%) 2 [13] [14] Figure 2.8 500-Node Random Topology ρ 0. Z.fmf. “Testing the Scalability of Overlay Routing Infrastructures”. page(s): 33–42.5 0 3 Overlay Node Number = 20 1 0. Computer Networks. the performance of the three methods will become better in terms of resilience and closer to that of the full mesh. page(s): 58-67. Vol. Jawwad Shamsi.”. Frans kaashoek.C3S2E. Jasleen Kaur.8 0.8 500-Node Scale Free Topology 1 0. In networks where physical node degree is similar. October 2009. “The impact of topology on overlay routing service”. Guoqing Zhang.pdf. only some of the simulation results are presented in Fig. “A distributed algorithm for underlay aware and available overlay formation in Event Broker Networks for publish/subscribe systems”. page(s): pages 54-68. China. David Watsonb. Overlay Node Number Impact . V. Li. page(s): 408. “QoSMap: Achieving Quality and Resilience through Overlay Construction”. Z. “The Quadratic Assignment Problem: Theory and Algorithms”.. Montreal. (3) M3 performs the best irrespectively of the number of overlay nodes. page(s): 3029-3046. Xian Zhang. “Network survivability in largescale regional failure scenarios”. INFOCOM 2004. In Symposium on Operating Systems Principles. And the gap between M3 and M4 in grid network can reach as high as 8%. School of Computer Science. Computer Communications. page(s): 5823-5827.. Xi’an. Yanjun Li. Boutaba. “A Measurement Study on the Benefits of Open Routers for Overlay Routing”. Shahram Shah Heydari. J. 1 [5] [6] [7] [8] [9] Overlay Degree = 4 1 Overlay Degree = 8 [10] 0.Bllur. but the performance difference will diminish as the overlay node degree increases. the following conclusions can be drawn: (1) as the overlay node degree increases.uni-lj. [2] [3] [4] An efficient physical aware topology construction algorithm is proposed and verified under various network and failure models. DEPSA ‘07.. Zhongcheng Li. Amit Sahoo. the performance benefit is even greater. 1998. the number varies from 10 to 50. 4. Krishna Kant. Tang. University of Waterloo.e. Bin Yuan. 2 and 3 (in 500-node scale-free network).9 ρ 0.uwaterloo. 23(2009). James P. 2008. Farnam Jahanianc.2 0 0 M1 M2 M3 M4 ρ 0. L. 2001. 3) Failure Model Impact In Fig.6 M1 M2 M3 M4 0.2 0 M1 M2 M3 M4 Failure Radius(AS Hops) 1 1 2 3 4 5 0 1 Failure Radius(AS Hops) 2 3 4 5 400-node (20*20) Grid Topology 0. M3 performs closest to that of full mesh method. Hari Balakrishnan. “A Survey of Network Virtualization.8 0. Yin. Toronto.9 M1 M2 M3 M4 [11] [12] 4 6 8 10 0.cs.8 ρ 0. Rohrer.7 M1 M2 M3 M4 ρ 0.e. Proceedings of the 2009 Fourth International Conference on Internet and Web Applications and Services. P.8 0.9 0.6 0. Journal of Communications. America. Justin R. Volume 52. Huai. Prasant Mohapatra. Oct 2009. Accumulative-Focused Failures Impact REFERENCES [1] N. David R. Vol. Monica Brockmeyer. 2009. Chowdhury and R.4 0. Estimating the virtual link overlap exploiting physical topology inference techniques is now under investigation. 51. Issue 16.7 0.4 0.