Welcome to Scribd. Sign in or start your free trial to enjoy unlimited e-books, audiobooks & documents.Find out more
Download
Standard view
Full view
of .
Look up keyword
Like this
1Activity
0 of .
Results for:
No results containing your search query
P. 1
NWU-EECS-09-22

NWU-EECS-09-22

Ratings:
(0)
|Views: 19|Likes:
Electrical Engineering and Computer Science Department Technical Report NWU-EECS-09-22 December 18, 2009 Pitfalls for Testbed Evaluations of Internet Systems, by David Choffnes and Fabian Bustamante
Electrical Engineering and Computer Science Department Technical Report NWU-EECS-09-22 December 18, 2009 Pitfalls for Testbed Evaluations of Internet Systems, by David Choffnes and Fabian Bustamante

More info:

Categories:Types, Research
Published by: eecs.northwestern.edu on Jun 02, 2010
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

06/02/2010

pdf

text

original

 

!!∀#∃!%&!!
#∀!∀∋(∃!
)!
 !     
  
∀!#∃%∃&∋   (       
∗&+∀!,

 
Pitfalls for Testbed Evaluations of Internet Systems
David Choffnes and Fabi´an E. Bustamante
Department of Electrical Engineering & Computer Science, Northwestern University
{
drchoffnes,fabianb
}
@eecs.northwestern.edu
ABSTRACT
The design, implementation and evaluation of large-scalesystems relies on global views on network topologies andperformance. In this article, we identify several issueswith extending results from limited platforms to
Internet wide
perspectives. Specifically, we try to quantify thelevel of inaccuracy and incompleteness of testbed resultswhen applied to the context of a large-scale peer-to-peer(P2P) system. Based on our results, we emphasize theimportance of measurements in the appropriate environmentwhen evaluating Internet-scale systems.
1. INTRODUCTION
Global views of network topologies and performanceare essential for informing protocol design and buildingsystems and services at Internet-scale. A number of researchefforts [9, 11] have focused on acquiring such views fromresearch platforms such as PlanetLab [13] and using publictopology data such as RouteViews [15] in an attempt tobuild a comprehensive network atlas. Such informationhas been successfully used to advise a variety of importantapplications addressing IP reachability, prefix hijacking androuting anomalies.Recent studies, however, suggest that such testbed resultsfor distributed systems do not always extend to the targeteddeployment. For example, Ledlie et al [7] and Agarwal etal. [1] show that network positioning systems perform muchworse “in the wild” than in PlanetLab deployments.In this article, we identify several limitations with thismodel taking a look at the accuracy and completeness of testbed results when applied to the context of a large-scalepeer-to-peer (P2P) system. To inform this study, we usea unique collection of traces gathered from a deploymentcontaining hundreds of thousands of users located at thenetwork edge.We focus our analysis on the following three issues thataffect the validity of any study extending testbed results toan Internet scale. First, we find that large and significantportions of the Internet topology used by P2P systems areinvisible to research testbeds, limiting the effectiveness of testbed-based inference of Internet paths and relationshipsbetween autonomous systems (ASes). Next, we show thatinferred properties of these topologies (latencies and band-width capacities) are inaccurate. Finally, we discuss howthese issues prevent accurate evaluations of performance fordistributed systems running on these topologies.Based on our results, we emphasize the importance of representative network views when evaluating Internet-scalesystems. There is a number of approaches to achieve thisgoal, including the use of application-level and network-level traces from the edge of the network (e.g., via Ark 
1
and Ono datasets). We also encourage the design anddeployment of additional edge-based monitoring services,either built into existing distributed services or providedindependently with the proper incentives for Internet-scaleadoption.
2. EDGE SYSTEM TRACES
The Internet is growing in ways that make increasinglydifficult to attain a global view of the network. Largeswathes of the network cannot be probed directly from ourresearch testbeds and a number of valuable measurementtechniques have side effects that render them impractical.For our study, we address this issue using network- andapplication-level traces from tens of thousands of BitTorrentusers worldwide running the Ono plugin [3]. In particular,our software passively records the data transferred over eachhost’s connection and performs active ping and traceroutemeasurements to a subset of these connections.Our installed user base covers 204 countries, 53,000routable prefixes and more than 7,000 ASes. While theamount of data collected per unit time varies according tothe online user population, each day we record between2.5 and 3.5 million traceroutes, tens of millions of latencymeasurements and more than 100 million per-connectiontransfer-rate samples.The design, implementation and maintenance of the asso-ciated data-collection service is the subject of an ongoingsystems project. As part of this work, we are makingthis dataset available to researchers through our EdgeScopeproject.
2
In the following sections we use this dataset to exploreseveral key pitfalls of testbed-based evaluations for Internetscale systems. We begin by exploring the completeness of the view from such testbeds.
3. GENERALIZING NETWORK VIEWS
A number of studies explicitly or implicitly rely on net-work topologies for estimating Internet resiliency, inferring
1
http://www.caida.org/projects/ark/ 
2
http://www.aqualab.cs.northwestern.edu/projects/EdgeScope.html
1
 
Internet paths and estimating cross-network traffic costs, toname a few. While several research efforts have successfullyextracted detailed topologies from public vantage points, itis well known that these are incomplete Internet views.To explore a lower bound for missing topology informa-tion, we used AS-level paths and AS relationships inferredfrom traceroute data gathered from hundreds of thousands of users located at the edge of the network [2]. As Fig. 1 shows,the number of vantage points available from edge systemsfar outnumbers those from public views, particularly forlower tiers of the Internet hierarchy where most of the ASesreside.Our dataset includes probes from nearly 1 million sourceIP addresses to more than 84 million unique destination IPaddresses, all of which represent active users of the BitTor-rent P2P system. By comparison, the BitProbes study [6]used a few hundred sources from the PlanetLab testbed tomeasure P2P hosts comprising 500,000 destination IPs.
Missing links.
Besides providing a large number of vantage points, our dataset also discovers links invisibleto the public view. In total, our approach identified 20%additional links missing from the public view, the vastmajority of which were located below Tier-1 ASes in theInternet hierarchy. Not surprisingly, the number of missinglinks discovered increases with the tier number location forthese links. Thus, when evaluating the interaction betweennetwork topologies and Internet systems at the edge (oftenlocated in lower Internet tiers), testbed-based topologies areless likely to include many relevant links.
123450500100015002000Network tier of vantage point
   T  o   t  a   l   #  o   f  v  a  n   t  a  g  e  p  o   i  n   t  s
 
P2P traceroutePublic BGP
Figure 1: Number of vantage point ASes corresponding toinferred network tiers.
In addition to the locations of links in the Internet hierar-chy, it is useful to understand what kinds of AS relationshipsare included in these missing links. Table 1 categorizes linksinto Tier-1, customer-provider or peering links and showsthe missing links as a fraction of existing links in the publicview, for each category. Note that there is a large numberof additional peering links (44%) and, more surprisingly, asignificant fraction of new customer-provider links (12%).
Tier-1 Customer-Provider Peering
3.14% 12.86% 40.99%
Table 1: Percent of links missing from public views, but foundfrom edge systems, for major categories of AS relationships.
00.10.20.30.40.50.60.70.80.910 0.2 0.4 0.6 0.8 1
   C   D   F   [   X  <  v   ]
Missing volumeAll - UpAll - DownBGP - UpBGP - Down
Figure 2: CDF of the portion of each host’s traffic volume thatcould not be mapped to a path based on both public views andtraceroutes between a subset of P2P users.
From these results it is clear that our community may needto revisit analysis that is based on public-view topologies(e.g., when looking at traffic cost or Internet resilience).
Impact of missing paths.
To better understand howthis missing information affects studies of Internet-scalesystems, we investigate the impact of missing links usingthreeweeksofconnection datafromP2Pusers. Inparticular,we would like to know how much of these users’ trafficvolumes can be mapped to AS-level paths – an essential stepfor evaluating P2P traffic costs and locality.We begin by determining the volume of data transferredover each connection for each host, then we map eachconnection to a source/destination AS pair using the TeamCymru service [18]. We use the set of paths from publicviews and P2P traceroutes [2] and, finally, for each host wedetermine the portion of its traffic volume that could not bemapped to
any
AS path in our dataset.Figure 2 uses a cumulative distribution function (CDF)to plot these unmapped traffic volumes using only BGPdata (labeled
BGP
) and the entire dataset (labeled
All
).The figure shows that when using
All
path information, wecannot locate complete path information for 84% of hosts;fortunately, themedianportionoftrafficforwhichwecannotlocate an AS path is only 6.7%. Of the hosts in our dataset16% use connections for which we have path information foronly half of their traffic volumes and 3% use connections forwhich we have no path information at all. When using onlyBGP information the median volume of unaccounted trafficis nearly 90%.One implication of this result is that any Internet widestudy from a testbed environment cannot accurately charac-terize path properties for traffic volumes from the majorityof P2P users. Even though the additional links discovered2

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->