An Optimal Median Calculation Algorithm for Estimating Internet Link Delays From Active Measurements.

Dima Feldman and Yuval Shavitt
Tel-Aviv University, Ramat Aviv 69978 ,Israel

Abstract. Delay estimation in the Internet can improve performance of many applications, e.g., web browsing, peer-to-peer applications and distributed games. For this purpose it was suggested to build an Internet distance service that can efficiently supply applications with delay information based on an Internet delay map. This can be achieved by deploying a large scale measurement infrastructure such as the DIMES project where internal delay information is extracted from end to end measurements. We suggest to estimate internal Internet link delays by the median of differences between the measurement to its end points. We suggest a very efficient algorithm for this calculation, which works in linear time and constant space; prove its correctness; and compare its performance to the much slower intuitive algorithms on real Internet data. Key words: link delay, median, median estimation

1

Introduction

As the Internet evolved rapidly in the last decade, so has the interest in measuring and studying its structure. The delay between end nodes of the network is an important parameter that influence the experience of applications such as web browsing and online games. Knowledge of the delays between network nodes is also likely to improve network utilization of peer-to-peer and content distribution application. There are several works [1–5] that suggested using end-to-end delays between small amount of nodes in order to produce estimation of a host distance to other hosts. The DIMES project [6, 7] produces a vast amount of measurement results from thousands of locations worldwide. DIMES software based agents that reside in volunteer host machines use both traceroute and ping measurements to discover the Internet connectivity and internal delays. The results of these measurements are stored in a MySQL [8] database in raw format. All the works that suggested to use Internet delay estimation [1–5] assume that obtaining link delays from measurements is a trivial task. We claim that this task is not trivial, especially if we want to estimate the delay on an internal Internet link for which we do not have a direct end-to-end measurement. Thus,

g. The propagation delay is linearly related to the geographic line length1 and is roughly equal to 1mS RTT between points at 100Km distance. on all continents except Antarctica [7]. Agents continuously download script files that can be tailored by their location. In this work we use DIMES results to estimate link delays in the core of the network.. We tested our algorithm on one US wide provider. the queueing delay is dynamic in many cases. and this is the major contribution of this paper. We have developed a robust and effective technique for link delay estimation using a fast and memory efficient algorithm for median estimation. and show that the results comply with direct inefficient delay calculations and to geographic distances. When measuring the propagation delay. But more important. using traceroute). and propagation delay. and thus. if one aims at building an Internet wide delay infrastructure the calculation of millions of link delays from many measurements need to be efficient. It should be noted that we have no control 1 Geographic line length should be distinguished from the air distance since fibers usually follow railways and highways. The transmission delay is negligible for today’s high speed links. 2 Link delay estimation and Median calculation Link delay is composed of queueing delay.000 agents spread in 91 countries. 1. In measurements of RTT delays (e. Fig. there is additional noise due to delays caused by the CPU load on the responding nodes. the DIMES project has over 11. especially when done in parallel. we can treat the queueing delay and the delay due to CPU load as additive positive noise [9]. The script includes a list commands for the agent (such as ping and traceroute ) and target IP addresses. propagation delay is the stable and important measure for many applications. The use of traceroute gives us the RTT delay from an agent to each one of the hosts on the route to some designated address. Link Delay Measurement Currently. . transmission delay.2 Dima Feldman and Yuval Shavitt we suggest to use the median of the difference between the traceroute measurement to the link end-nodes as a good estimation of the delay on a link.

However. large storage requirement is also translated to non-negligible computation/access time . or may change during the measurement. while the median is 31ms. as shown in figure 1. is to estimate the delay between each pair of nodes (routers) that consecutively appear in any traceroute measurements. The mean of the distribution is 24. b) is a non-increasing function.17.53 → 216. D) = min(RT T (Dj )) − min(RT T (S j )). weighted median defined as a median of ¯ ¯ values of v . Histogram of RT T (Di ) − RT T (Si ) of 216. Notice that we ignore the measurement source. In a dynamic network. for this to give a good estimation the data should have Gaussian noise and no outliers. The alternative method we propose is a statistical analysis of the RT T (Di )− RT T (Si ) values. ¯ When working with a large amounts of data. The classical approach is averaging : N 1 delay(D.140. it might never reflect the change. 2. S) = N i=1 RT T (Di ) − RT T (Si ). In the two day period of DIMES measurements we examined. a topology change in layer 2 may cause a layer 3 link delay to change. 3 over the route to the destination. For agent j. is the difference of the minimum RTT as it is measured by agents. 2 3 For values and weights vectors v and w. has two major disadvantages: non-revertability of the data and high storage requirement3 . each network edge was measured by 10 agents on average. This method. The straight forward method for link delay estimation between two nodes S and D. The other problem is the storage requirement for the calculation. This route may not be unique. what is forcing us to keep on average 10 records for each edge. which gives a larger weight to the agent that made more measurements of a specific link2 . To combine multiple agents results. delayj (S. for example due to load balancing. Since a = min(a. we chose a weighted median algorithm. Fig.5ms. when vi is repeated wi times. where i is a measurement number. Our goal in this paper.An Optimal Median Calculation Algorithm for Link Delays Estimation. therefore by tracerouting to a host on the other side of the globe.140. the agent measures delays to the intermediate hosts along a route selected by both the intra.110 link. which we term the Min Min algorithm.and inter-AS routing algorithms.17.

FAME can be easily and efficiently implemented in hardware or with database query languages. median gives a better estimation to the measured value than linear operator such as mean [10]. exact median calculations require sorting the data and then selecting the middle value. This makes the use of direct median calculation impractical for large amount of data. The hardware implementation is important for calculating median for measurement values that arrive on the fly. Our algorithm is linear in the number of samples with a small constant. Thus. We suggest here our Fast Algorithm for Median Estimation (FAME ) which decrease the required storage for each link to only two double precision variables. While the presentation in this paper concentrate on pseudo code which can be implemented with a high level language such as C or Java. median(RT T (Di ) − RT T (Si )) gives reliable approximation of link delay. The time complexity of such a calculation is O(N log(N )) and the storage complexity is O(N ).d. 3 The Fast Algorithm for Median Estimation FAME uses two variables. However. Many median estimation methods are mentioned by Battiato et al. especially when simultaneous calculations are performed. and the median estimator. M. M + step) : step = step/2. For every new data sample. [10] propose a practical algorithm for median estimation using a construction termed Remedain with calculation complexity of O(N log(N )) and storage complexity of O(log(N )) but it requires predeterminate amount of samples. Similar. some of them have a running time of O(N ) but also O(N ) of storage is required. in a programmable NIC. especially with ‘long tail’ distributions. like in many others. Many communication and signal processing algorithms often utilize ‘windowing’ algorithms to reduce the required memory and to give a higher priority to the latest samples (and therefore are more sensitive to the noise at the last samples) [13]. the ‘windowing’ approach is inappropriate. Median filtering is also widely used in image processing to remove impulse noise [11]. but simpler algorithms were suggested in [14.. the step is halved: if d ∈ (M − step. e. M is increased if d is larger than M and decreased if smaller. For link delay estimation there is no reason to give the last samples more importance than the first samples (we refer to dynamic macroscopic changes later in the paper). 15].4 Dima Feldman and Yuval Shavitt It is widely agreed that for many practical measurements. while giving a good median estimate of i. samples. namely M = M + step · sign(d − M ). [12] and the references therein. The histogram shows the typical concentration of the samples in a vicinity of the true value and the existence of many outliers. If the data sample d is close to M.i. In this case. The MySQL . however they are highly vulnerable to noise in the last samples of the data and have significantly larger convergence time.g. Rousseeuw et al. step. such as in the DIMES case. the step size. which shows a histogram of the delay difference for a typical link example. To depict the advantage of using median in calculations of delay measurement observe Figure 2. d.

b) // b is a minimal initial step 4: 5: 6: 7: 8: 9: 10: 11: 12: For each new item i: if M > data(i) then M = M − step else if M < data(i) then M = M + step end if if |data(i) − M | < step then step = step/2 end if There are two modifications that can be applied to FAME : ..i. and it increases the accuracy with the amount of data. 3. it is possible to multiply the step by 1 + ε for some small ε every step (For hardware implementation one can alternatively use a larger ε every few steps).. We term this variation ”FAME NO OS”. The proof assumes that all samples are i.In order to eliminate overshooting in median prediction when the number of samples is small. Algorithm 1 : Fast Algorithm for Median Estimation 1: Initialization: 2: M = data(1) 3: Step = max(|data(1)/2|. on duplicate . This will give the algorithm a windowing flavor. First. i.d. . if X(i) ≤ median(x) ≤ X(i + 1) −→ i ≤ argmax(π(n)) ≤ i + 1 .. Lemma 1. the longer it will take to reduce the step variable to a small value.” statement which enables writing to the database only when a new estimation is needed. The larger the amount of outliers and the larger is the variance of the data. The proposed median estimation algorithm has two important features. Let C be one dimensional Markov chain with transition probabilities Pr (i) = 1 − P (x < X(i)) and Pl (i) = P (x < X(i)) as shown in figure 3.To allow the algorithm to follow changes in the link statistics over time.. we apply the following change.1 Proof sketch In this section we show that the algorithm converges to the median.An Optimal Median Calculation Algorithm for Link Delays Estimation.. Then the mode of the steady state distribution of the Markov chain π(n) will be one of two nodes adjacent to the median of the distribution P (x < X). In case where a new sample is in the range M ± step.e. the step size as a function of the data size gives a qualifier for the estimation accuracy. Second. 5 implementation is the one we used with DIMES and is also efficient due to the use of the ”insert . the convergence time depends on the quality of the data. M = M ± step (lines 6 and 8) can be replaced with M = data(i).

e. in addition. 4. P (m ≤ 1 X) = 2 . CDF (i ) < CDF (i) when X (i ) = X(i). Pr (i) ≤ Pl (i + 1). The transition probabilities are calculated using the same formula as in C. but in steady state π(i) · Pr (i) = π(i + 1) · Pl (i + 1). the error probability in the refined Markov chain C is smaller than the one in C. mean = λ−1 0. ∀X(i) < m : π(i+1) ≥ 1. For a Markov chain C. Lemma 2. The phenomena of concentrated probability density function with the increasing number of nodes in the Markov chain is demonstrated in figure 4. Then ∀i ≤ m. C inherits all the states of C and. The proof is omitted due to space limitation. we define the refined Markov chain C . has exactly one state between every pair of inherited states of C. m < n. Therefore the steady state distributions ratio = π(i+1) ≤ 1. Pr (i) = 1 − P (x < X(i)) is transition probability from state i to state i + 1. P(x) is a probability density function of x. for the Exponential distribution with λ = ln(2)/0. Pl (i) = P (x < X(i)) is transition probability from state i to state i − 1. According to the Markov chain definition ∀X(i) > m.5 1. Fig. Proof. Steady State CDF of different length Markov chains. π(i) In the same manner. and let C be a refined Markov chain of C. i.5. Let m to be the median of a distribution function P (x ≤ X). . π(i) Hence i ≤ argmax(π(n)) ≤ i + 1 when X(i) ≤ m ≤ X(i + 1).6 Dima Feldman and Yuval Shavitt Fig.38629 (median = 0. 3. 1-D Markov chain.7213). Let X(m) and X(n). Namely.. be the two states with the highest steady state probability in a Markov chain C.

and for 4ms accuracy it is even better than the direct median calculation. We selected Broadwing. on the same system (MySQL 5 database. We have used the Min Min algorithm as a yardstick to the other methods. more than 95% of the results differs by less than 2ms from the Min Min results. Out of all internal measurements of Broadwing network. FAME can be treated as a multi-level 1D-Markov chain. Although. 4 Tests on Internet data We have applied the algorithms. and took 10 time more storage for the same measurements. an ISP with a nation-wide presence in the US. we used manual examination and the DIMES router aliasing to reduce geographic mapping er- . 2x 2.An Optimal Median Calculation Algorithm for Link Delays Estimation.000 measurements of 677. Next. in some cases. and the median on RT T (Di ) − RT T (Si ). there is inconsistency in DNS resolution and geographic location [17]. is more accurate than FAME . we can see that FAME NO OS.000 edges was about 1 hour. FAME NO OS. FAME convergence speed depends on the ‘quality’ of the data. Min Min. FAME . dashed line is a transition between chains with different steps. 7 As can be seen in Figure 5. on the raw data collected by DIMES during a period of 2 days in July 2006. It should be noted that the runtime of our algorithm over 32. while Min Min run for about 24 hours.000.66MHz Intel Xeon CPU with 1GB RAM). Solid lines are transition inside a single detention of the chain. where each level is a median estimator and the accuracy increases as the algorithm receives more data samples. we selected the 178 IP edges that we were able to perform a reverse DNS resolution for both their end points. we performed a sanity check for the delay estimation we obtained. Figure 6 shows that for links with more than 10 measurement results. As a result. Fig. FAME State Machine. Comparing all the results. both FAME and FAME NO OS are shown to calculate the median with an extremely good quality. Overall. which publish its network structure [16]. 5. When the variance is smaller the faster step will be reduced and the algorithm will descend the Markov chain hierarchy.

Thus. In figure 8 the line width represent the number of measurements we obtained for each link. For these links. Obviously there is no true link with negative delay and the estimated delay does not represents the distance. . With DIMES growing ability to measure from different vantage points. by a different return path. In this and other networks large portions of the IP address space were mapped to the network HQ. and there are many results in a close proximity to the theoretical limit. There are links that have negative delay results.8 Dima Feldman and Yuval Shavitt Fig. traceroute measurements are not effective for delay estimation. exhibit consistent large negative delays. CDF of estimation error rors 4 . The histogram in figure 7(right) demonstrates that the majority of the results are above the theoretical limit. most of them are small and represent short distance links with different router response times. From the layer 2 map of Broadwing [16] it is clear that in many cases the physical route is longer than the air-line distance and even than the driving distance. the estimated link delay is highly correlated to the geographic distance. for example. which can be explained. We drew the measured links on the geographic map of continental US to show the connectivity of the network at the IP level. the more data we have the wider the line. we conclude that our delay estimation is reasonable though a more exhaustive study is needed. For most cases there is more than one IP level link between a pair of cites. this effect might be mitigated. therefore only the most measured link is observable. Then we compared the geographic distances (air-line) to the calculated link delays. 6. 4 5 Various GEOIP services were not useful in obtaining geographic locations of internal nodes of this network. 5 Conclusion We have shown that FAME is efficient and accurate in calculating the median of delay measurements. We also showed that the delay measurements seem to well represent the propagation delay on the links. Figure 7(left) shows that for most links with positive link delay5 . Some links.

8.An Optimal Median Calculation Algorithm for Link Delays Estimation. etc. Right: A histogram of a link delay/distance ratio. Fig. 9 Fig. 2 means the link delay is twice the theoretical limit. The lower diagonal line is a theoretical limit assuming maximum 100Km distance for 1ms of RTT. A map of Broadwing ‘named’ routers connections. 1 stands for the theoretical limit. 7. . Left: Estimated RTT delay vs. the upper line is twice the theoretical limit. geographic link length.

: On the curvature of the internet and its usage for overlay construction and distance estimation.: Maintaining stream statistics over sliding windows. : (MySQL) http://www. 17. MA. A.P.H. M. Jamin.html. B. Tankel. Shavitt. Motwani. A. In: ICVGIP. Francis. Dabek. Schofield.: How DNS misnaming distorts internet topology mapping. D. J. Cox.. Rexford. IEEE/ACM Transactions on Networking 12(6) (2004) 993–1006 3. McFarlane. Machine Vision and Applications 8(3) (1995) 187–193 16. J. G. J.: Big-Bang simulation for embedding network distances in Euclidean space.broadwing. Jin. Y. T. S. D. Y. T. SIAM J. IEEE/ACM Transactions on Networking 13(3) (2005) 513–525 6. R. Battiato. Tankel.: Computing the unmeasured: An algebraic approach to internet mapping.. Y... Y.. Choi. (2000) 226–238 13. C...: Segmentation and tracking of piglets in images. E.com/about-b4. : (Dimes) http://www. 31(6) (2002) 1794–1813 14. (2004) 15–26 5. Cincotti. (2004) 46–51 15.. Journal of the American Statistical Association 85(409) (1990) 97–104 11.C. P.. USA (2006) .. M. Richefeu.. Morris. IEEE/ACM Transactions on Networking 9(5) (2001) 525–540 2. Rec.. R. Indyk. : (Broadwing network map) http://www.. S.. A. (1974) 673–681 12.: A robust and computationally efficient motion detection algorithm based on sigma-delta background estimation.. C.netdimes.... Y. Shavitt..S. Datar.org/. M. Zhang.C. In: SIGCOMM.: DIMES: Let the internet measure itself.. V. Wool.. J. Y. P. Shavitt..: Constructing internet coordinate system based on delay measurement. EASCOM 74. F..B. D. In: Congr.com/.. Gilbert W. Rousseeuw.. Boston.. Pai. Raz. Hong Kong (2004) 4.: The remedian: A robust averaging method for large data sets. Zhang. Tukey. In: CIAC.: Vivaldi: a decentralized network coordinate system. Hou. Bassett. 8. Cantone. Sun.: Nonlinear (nonsuperposable) methods for smoothing data. Y... Lim. C. In: ACM SIGCOMM Computer Communication Review. In: Proceedings of the 2006 Usenix Annual Technical Conference.F. Shavitt.J. Catalano.mysql.10 Dima Feldman and Yuval Shavitt References 1. IEEE Journal on Selected Areas in Communications 22(1) (2004) 67–78 10. Peter J. M. Hofri. (2005) 71–74 7.: An efficient algorithm for the approximate median selection problem. Manzanera. H. Kaashoek. Jr. 9.. Volume 35. Ruan. Shavitt. Yener.: IDMaps: A global internet host distance estimation service.. L. N. Gionis. R. X. Shir. In: Infocom. Comput.. Jin.

Sign up to vote on this title
UsefulNot useful