Professional Documents
Culture Documents
ABSTRACT 1. INTRODUCTION
This paper provides a comprehensive picture of IP-layer Modern content-delivery networks (CDNs) employ L7-
anycast adoption in the current Internet. We carry on anycast, exploiting DNS and HTTP redirection tech-
multiple IPv4 anycast censuses, relying on latency mea- niques to direct traffic from a client to any server in a
surement from PlanetLab. Next, we leverage our novel group of geographically dispersed but otherwise equiv-
technique for anycast detection, enumeration, and ge- alent servers. Such redirection techniques perform load
olocation [17] to quantify anycast adoption in the In- balancing among nearby replicas and map users to the
ternet. Our technique is scalable and, unlike previous closest replica, reducing user-perceived latency.
efforts that are bound to exploiting DNS, is protocol- Network-level (IP) anycast [4] is another instantia-
agnostic. Our results show that major Internet com- tion of the same principle, where a set of replicas spread
panies (including tier-1 ISPs, over-the-top operators, across a number of locations around the world share a
Cloud providers and equipment vendors) use anycast: standard unicast IP address. BGP policies route pack-
we find that a broad range of TCP services are offered ets sent to this address to the nearest replica according
over anycast, the most popular of which include HTTP to BGP metrics, notably (though not only) the number
and HTTPS by anycast CDNs that serve websites from of autonomous system (AS) hops.
the top-100k Alexa list. Additionally, we complement L7 anycast and IP anycast are complementary. On
our characterization of IPv4 anycast with a description one hand, L7 anycast allows for very dense server de-
of the challenges we faced to collect and analyze large- ployments with customized user-server mapping algo-
scale delay measurements, and the lessons learned. rithms and complex operations to shuffle content among
servers. Although this allows a fine grain control of the
CCS Concepts server selection, it also increases the management com-
plexity [36]. On the other hand, IP anycast offers a
•Networks → Network measurement; Network loose control over user-server mapping, which limits the
structure; •General and reference → Measurement; deployment density but considerably simplifies manage-
ment by delegating replica selection to IP routing.
Keywords In recent years, the scientific community has made
Anycast; Census; Network monitoring; Network mea- significant contributions to understand L7 anycast, e.g.,
surement; IPv4; BGP to uncover deployments and geolocate points of pres-
ence (PoPs) with active measurements [15, 45–47], and
Permission to make digital or hard copies of all or part of this work for personal characterize the performance of L7 anycast via passive
or classroom use is granted without fee provided that copies are not made or
distributed for profit or commercial advantage and that copies bear this notice measurements [5, 12]. Yet, with few exceptions [45] as
and the full citation on the first page. Copyrights for components of this work these application-level deployments are diverse, and as
owned by others than ACM must be honored. Abstracting with credit is per- PoPs are pervasive, such efforts generally focus on a
mitted. To copy otherwise, or republish, to post on servers or to redistribute to
lists, requires prior specific permission and/or a fee. Request permissions from single application/player such as Akamai [46], YouTube
permissions@acm.org. [5, 47], Amazon [12] and Google [15]. A recent trend in
CoNEXT ’15 December 01-04, 2015, Heidelberg, Germany the area of Internet infrastructure mapping is to exploit
c 2015 ACM. ISBN ISBN 978-1-4503-3412-9/15/12. . . $15.00
the edns-client-subnet DNS extension (ECS) [20] to un-
DOI: http://dx.doi.org/10.1145/2716281.2836101
cover the geographical footprints of major CDNs [15, Measurement Analysis [16] Ground truth
IPv4
hitlist PlanetLab Detec�on Valida�on
45]. Still, given the wide design space and flexibility (ICMP,TCP,UDP Enumera�on
Anycast
Blacklist RTT Latency) Greylist Characteriza�on
in L7 anycast implementations, it is hard to generalize Geoloca�on
as is the case for the green discs in Fig. 3(b). While ob-
servation of inconsistency among any pair lead to any-
cast detection, leveraging multiple observations it is pos-
sible to further enumerate such replicas. Enumeration
is described in step (c): to provide a conservative esti-
mation of the minimum number of anycast replicas, we
solve a Maximum Independent Set (MIS) problem. MIS
outputs a set of non-overlapping disks which contain a
different replica of the same target: while MIS prob- Figure 4: Anycast census at a glance: typical
lem is NP-Hard, we solve it using a 5-approximation census magnitude
algorithm that greedily operates on disks of increasing
radius size as in Fig. 3(c), and that in practice yield
results that are very close to the optimum provided anycast IPs obtained from the census, to reveal open
by a prohibitively more costly brute force solution [17]. ports and the software their run. Given that an exhaus-
Geolocation happens in step (d): in the smallest disk, tive portscan (i.e., of the 216 TCP and UDP portspaces,
we geolocate the replica at city-level granularity with a over all replicas of all anycast deployments) still incurs
maximum likelihood estimator biased toward city pop- a prohibitive measurement cost, we restrict the mea-
ulation; actually, we find that the city population has surements to TCP services (i.e., the most unexpected
sufficient discriminative power alone (about 75% accu- ones) and interesting deployments (i.e., deployments
racy [18]), so that our geolocation criterion boils down with large geographical footprints). We discuss any-
into picking the largest city in that disk. Finally, (e) cast services in Sec. 4.3, finding over 10,000 open ports,
we coalesce the disk to the classified city, which reduces that map to about 500 well-known services, and finger-
disk overlap and allows iteration of the algorithm until printing some 30 software applications.
convergence, thus increasing the recall (i.e., number of
replicas discovered) along each iteration. Scale, Completeness, and Accuracy. Although we
The analysis technique [17] is of course not an original described the anycast geolocation technique in [17], we
contribution of this work, whose main aims are instead had to overcome several challenges to run it at Internet-
to scale up its application to an Internet-wide census on scale and within a short timespan, for which we went
the one hand (Sec. 3), and to analyse and publish the through multiple re-engineering phases. We believe that
gathered dataset on the other hand (Sec. 4). a number of lesson learned (e.g., as the counter-intuitive
need to slow-down the sending rate to complete a cen-
Characterization and Validation. In addition to sus) are worth sharing, and discuss them in Sec. 3.
anycast detection, the previous steps allow to geolocate Notice that a large number of vantage points is re-
the replicas behind each anycast IP/24. As outlined in quired to provide an accurate picture of anycast de-
Fig 1, we validate the output of the geolocation step ployment, especially in terms of the number of repli-
whenever a ground truth is available (as in Sec. 3.4 for cas discovered around the world. Related work that fo-
CDNs such as CloudFlare and Edgecast, complemen- cuses on O(1) targets (i.e., DNS root-servers) indeed run
tary to the validation limited to DNS in [17]). measurement campaigns involving from O(104 ) [10] to
Finally, we provide a fine-grained characterization of O(105 ) [25] vantage points to achieve ≈90% recall [25].
IP/24 anycast services. While our detection method- In our case, given the sheer size O(107 ) of our target set,
ology is service-agnostic, we use nmap [38] on a list of we tradeoff completeness for scale, and possibly under-
estimate the number of IP-anycast replicas, as we use a only few hours, constitutes an achievement per se.
mere O(102 ) vantage points.
Still, our results provide a broad, conservative, yet Geolocation and infrastructure mapping. Uni-
accurate picture of Internet anycast usage: for targets cast geolocation is a well investigated research topic.
for which we have the ground truth, our city-level ge- Numerous techniques based on latency measurements
olocation is accurate in about 75% of the cases with a [23, 24, 28] and databases [41, 44] have been proposed.
median error of 350 Km otherwise (Sec. 3.4). Yet, database techniques are not only unreliable with
unicast [41], but also with anycast, since they adver-
Typical census. In this work, we perform four IPv4 tise a single geolocation per IP. Similarly, latency-based
censuses and analyse the results obtained from their techniques [23, 24, 28] use triangulation, and geolocate
combination. For each of the O(102 ) VPs the magnitude unicast addresses at the intersection of multiple latency
of a typical census is illustrated in Fig. 4: starting from measurements from geographically dispersed vantage points.
a hitlist of O(107 ) targets, less than half send a reply However, this assumption no longer necessarily holds for
(Sec. 3.1). ICMP replies include O(105 ) errors, some of anycast as depicted in Fig. 3.
which relates to administratively prohibited communi- L7-anycast infrastructure mapping studies [15,45] lever-
cation: senders of these ICMP error messages are added age ECS requests to geolocate servers: (millions of) re-
to a greylist, to avoid probing them again in future cen- quests are sent with different client IPs from one VP to
suses (Sec. 3.3). Finally, running the anycast geoloca- unveil (thousands of) unicast IP addresses correspond-
tion technique over the O(106 ) targets that generate ing to PoPs of major OTTs. However, ECS support
valid ICMP echo reply messages, we discover roughly is becoming widespread to enhance the user online ex-
O(103 ) IP/24 anycast deployments, corresponding to perience, but is not yet pervasive. Finally, the tech-
approximately 0.1h of the whole IPv4 address space – nique fails with alternative L7-anycast design relying
the proverbial needle in the IPv4 haystack. on HTTP redirection.
To the best of our knowledge, no IP-anycast geolo-
2.2 State of the art cation technique exists other than our own: as such
no other study, apart from this work, deals with any-
Anycast vs Unicast. In standard IP unicast censuses, cast infrastucture mapping. With respect to ECS-based
the set of targets can be split among VPs for scalability technique for L7 anycast, our technique allows to re-
reasons. In contrast, in the anycast case, all targets duce the cardinality of the problem without sacrificing
should be probed by all VPs to provide an accurate geolocation accuracy (a qualitative comparison is pro-
map of geographical footprints. Given that the number vided in [17]). Additionally, since BGP provides a uni-
of active VPs in PL is around 300, and that only one fied redirection technique, IP-anycast offers an unprece-
IP/32 target for each IP/24 subnet needs to be probed dented opportunity to broadly assess all deployments at
in a given anycast census, it follows that the raw amount once.
of probe traffic is only slightly larger than that of an
unicast censuses. Anycast discovery and characterization. Prior
In the unicast case, the relative location of VPs with work investigates different aspects related to anycast,
respect to targets is irrelevant. Therefore, census strate- with a focus on discovery and enumeration [25, 35] or
gies cover extremes such as a single centralized high-end on characterization [9–11, 19, 32, 34, 43], but to the best
server capable of O(106 ) probes-per-second on a well of our knowledge, not on anycast census. Closest to
provisioned 1 Gbps Ethernet connection as in Zmap [22], ours [17] is the work of [25, 35]. Specifically, [17] is
or a highly decentralized system that exploits O(105 ) a service-agnostic technique for detection, enumeration
low-resource gateways as in the illegal Carna Botnet and geolocation. Similarly to [17], speed of light vio-
census [1]. Our system design sits halfway between lation is used in [35] (however limited to detection and
these two extremes: in particular, it uses an efficient not capable of enumeration/geolocation). Conversely,
application-level multi-threaded scanner capable of O(104 ) [25] exploits DNS-specificities (i.e., CHAOS requests)
probes/sec, distributed over O(102 ) PlanetLab nodes. to enumerate DNS replicas (but unlike [17] is neither
It is also worth pointing out that an independent capable of geolocation, nor applicable beyond DNS).
analysis [33] of the Carna Botnet dataset found multiple Other studies assess the performance of current IP
campaigns, covering a cumulative number of probes ex- anycast deployments, with a focus on metrics such as
ceeding a full IPv4 census, over a duration of 8 months. proximity [9,10,19,34,43], affinity [9–11,13,34,43], avail-
Additionally, [33] suggests that due to an overlap in the ability [10,32,43], and load-balancing [10,11]. Yet, these
target set, not all hosts were probed, neither during the studies focus on DNS, which is just a piece of the cur-
two fast scan campaign identified, nor during the whole rent anycast puzzle.
measurement period. As such, our measurement cam- Finally, some work study IP-anycast CDN, such as [16,
paign comprising multiple anycast censuses each lasting
27]. However these focused studies add yet other useful
pieces to the puzzle, that remained so far incomplete,
lacking the broad perspective given by a Internet-wide
coverage over all prefixes and services.
3. SYSTEM DESIGN
Anycast detection relies on measuring round trip de-
lays between a set of vantage points and a target IP ad-
dress to uncover geo-inconsistencies. Running an Inter-
net census thus requires measurements towards millions
of destinations, ideally in a short timeframe: we now
describe and justify system design choices that allow us
Figure 5: Microsoft deployment as seen from
to perform multiple censuses, that we analyze later in
PlanetLab (21 replicas) vs RIPE (54 replicas).
Sec. 4. Items discussed in this section concern the se-
Notice that PlanetLab results (white markers)
lection of targets (Sec. 3.1), the measurement platform
are a subset of RIPE (white and black markers)
(Sec. 3.2) and software (Sec. 3.3), as well as the network
protocol used (Sec. 3.4). Finally, we report considera-
tions about the scalability of our workflow (Sec. 3.5). them to reduce the target size to 6.6 · 106 per VP.
3.1 Census targets Coverage. Given our census aim, we verify how well
Census granularity. Unlike multicast, anycast ad- this hitlist covers all routed /24 prefixes. We therefore
dresses need no reservation into the IP space: as any IP obtain from CAIDA a dump of routing tables originat-
address can be a candidate, this makes deployment easy, ing from both RIPE RIS and RouteViews collectors.
but the detection of anycast addresses hard. Luckily, to To compare the hitlist vs the advertised prefixes, we
avoid a significant increase in the size of routing tables, split the latter in /24, obtaining 10,616,435 /24 pre-
BGP standard practice [4] is to ignore or block prefixes fixes, of which 10,615,563 have a representative in the
shorter that /24. Thus, /24 is the minimum granular- hitlist (over 99.99% coverage). We additionally cross-
ity for anycasted services, which is a good granularity check our observed target responsiveness with the ex-
for our census. We validate this assumption with (spot) pected recall: specifically, recent ICMP scans [48] ob-
verifications for all IP addresses on some IP/24 (belong- serve 4.9·106 used /24 subnets and our campaigns simi-
ing to EdgeCast), confirming any IP in the /24 to be larly capture 4.4 · 106 responsive subnets (90% coverage
equivalent for anycast detection purposes. Additionally, with respect to [48]).
a /24 granularity implies that announced BGP prefixes
that are smaller than /24 are tested multiple times, one 3.2 Measurement dataset vs platform
per each /24 they contain: the mapping between /24 Dataset. One option to avoid running a large scale
and announced prefixes is still possible a posteriori, as measurement campaign is to exploit readily available
we do in this work. This choice is reinforced by [35], datasets from public measurement infrastructures – yet
which found 88% of announced prefixes to be /24, stated we could not find any fitting our purpose. For instance,
that “anycast prefixes are dominated by /24” and sug- despite probing all /24 every 2-3 days, Archipelago [6]
gested that larger prefixes may be anycast only in part clusters its vantage points into three independent groups,
due to BGP prefix aggregation. We therefore fix the each using random IPs selected in each /24 prefix: it
census granularity to a single target IP per /24. follows that at most 3 monitors target each /24, with
generally different IP addresses, and a hit rate of about
Target liveness. As previously argued, any alive IP 6%. Given the low hit-rate and low-parallelism, such
belonging to a /24 is equivalent in telling whether the dataset is not appropriate for our purpose, as it would
whole /24 is anycast (or unicast). To identify a respon- not lead to a complete census, nor to an accurate geolo-
sive IP address in every /24-prefix, we rely on the hitlist cation footprint even in case of hits.
periodically published by [31]. The hitlist consists in
generally one representative IP address for O(107 ) pre- Platforms. There are a number of available measure-
fixes, along with a score indicative of the host liveliness, ment platforms in the community, each with its own
computed over several measurement campaigns. When advantages and limitations. Except for illustration pur-
no alive IP has been observed in a /24, the hitlist con- poses, in this paper we relied on PlanetLab (PL). While
tains an arbitrary address from that /24 (score ≤−2). RIPE Atlas (RIPE for short) is more interesting for ge-
After covering the full hitlist with the first census, we ographical diversity due to its scale, it has a limited
confirm these hosts not being reachable and remove control on the rate and type (cf. Sec. 3.4) of measure-
ments, as well as their instantiation for such a large OpenDNS Edgecast Cloudflare Microsoft
scale campaign (i.e., upload of the hitlist, probing bud- L3 L4 L7
get). Additionally, the larger number of vantage points 100
CDF
To do so, we build a ground truth (GT) for CloudFlare
and EdgeCast by performing HTTP measurements with 0.4
0.2
curl from PL: note that HTTP measurements are not
0
available from RIPE, highlighting once more the com- 1 2 4 8 16
plementarity of these platforms. Completion time per VP [Hr]
By inspection of the HTTP headers, we find that
CloudFlare (EdgeCast) encode geolocation of the replica Figure 8: CDF of per-vantage point completion
in the custom CF-RAY: (standard Server:) header field. time, over all censuses
Notice that the measured GT constitutes the upper-
bound of what can be possibly achieved from PL mea-
via a Linear Feedback Shift Register (LFSR) with Ga-
surements, while the publicly available information (PAI)
lois configuration. Still, while the LFSR solves rate lim-
displayed on the CloudFlare and EdgeCast websites con-
iting at the target, it does not solve problems at the
tains a super-set of locations with respect to those mea-
source (or in the network): indeed, while requests are
sured from PL. We contrast true positive (TPR) clas-
well spread, replies do aggregate close to the VP, that
sification of our census vs HTTP GT in Fig. 7: in 77%
receives an aggregate rate equal to the probing rate of
of the IP/24 for CloudFlare (65% for EdgeCast) there
Fastping (in excess of 10,000 hosts per second). In our
is agreement at city level, with a median error of 434
preliminary (and incomplete) censuses, we noted het-
Km (287 Km for EdgeCast) in the (relatively few) mis-
erogeneous (and possibly very high) drop rates for some
classification cases. As expected, the low number of PL
VPs (likely tied to rate limiting spatially close to the
nodes possibly limits the portion of discoverable repli-
VP). Given that the networks where PL machines are
cas (GT/PAI is fairly high for CloufFlare, but fairly low
hosted are independently administered, we opted for a
for EdgeCast), making our footprint estimates conser-
simple solution and slowed down Fastping by one order
vative and confirming the interest for alternative plat-
of magnitude1 , that we verified empirically not trigger-
forms such as RIPE.
ing the above problems. Consequently, probing 6.6 · 106
targets at 103 targets per second takes less than two
Consistency. Additionally, in the case of openDNS,
hours: as shown in Fig.8 about 40% of PL nodes com-
we verify consistency across multiple RTT latency mea-
plete within this timeframe, and 95% in under 5 hours
surement techniques used early in Fig. 6. In this case we
(longer duration likely due to load on the PL host).
rely on public information that maps 24 locations [39].
For all protocols, applying [17] on the dataset yields
Output size and analysis duration. A second scal-
between 15 and 17 instances. Notice that all cities re-
ability issue concerns the output format. We initially
turned by the analysis are correct except Philadelphia
overlooked this issue and logged, in textual format, a
(while the server is located in Ashburn at 260km or
wealth of information amounting to 270M per node and
2.6ms worth of propagation delay away): this misclas-
80GB overall per census (cf. Tab.1). We therefore opted
sification is due to the bias enforced in [17] toward city
for a radical reduction of the output size, dumping a
population (Philadelphia is 33 times more populated
stripped-down binary format containing a timestamp,
than Ashburn), but as observed in [15] this is not prob-
delay and ICMP flag (encoding greylist return codes 9,
lematic as the “physical” Ashburn location is actually
10, or 13 as a negative sign) for a total of about 20MB
serving the “logical” Philadelphia population.
per node and 6GB overall per census.
A third challenge lies in the analysis of the data. For
3.5 Scalability a single target, the running time of [17] is O(10−1 ) sec,
Probing rate. When designing census experiments, which compares very favorably to the O(103 ) sec of the
we take care of avoiding obvious pitfalls. For instance, brute force optimal solution: at the same time, pro-
while we target a single host per /24, nevertheless we cessing a census would still take days (we indeed have
perform measurements from all PL nodes. It follows 1
While it is possible to more finely tune the probing
that each node must desynchronize to avoid hitting ICMP rate per VP, however coverage may benefit from samples
rate limiting (or raising alert) at the destination. We do coming from the slowest VP, especially if it resides in a
so by randomized permutation for target nodes, achieved geographical area which is not well covered by PL.
stopped processing the complete Census-0 after 3 days IP/24 ASes Cities CC Replicas
All 1,696 346 77 38 13,802
of CPU time, where textual format additionally led to ≥ 5 Replicas 897 100 71 36 11,598
slow processing due to disk fragmentation). Moreover, ∩ CAIDA-100 19 8 30 18 138
due to LFSR, the order of the target IPs in all files ∩ Alexa-100k 242 15 45 29 4,038
is not the same, meaning that an on-the-fly sorting of
about 300 lists (one per VP) containing millions targets
is needed. We therefore optimized our implementation,
which currently runs in under three hours, i.e., about
the same timescale of the census duration, so that in
principle we could perform a continuous analysis. While
this is not interesting for the anycast characterization
use-case, it may become relevant for other applications
of this technique (e.g. BGP hijacking inference men-
tioned in Sec. 5).
4. ANYCAST /0 CENSUSES
This section presents results of the first Internet-wide Figure 10: Anycast censuses results, at a glance.
anycast study. We start by aggregated statistics (Sec. 4.1)
and then incrementally refine the picture by providing
a bird’s-eye view of the most interesting deployments ing IP-anycast: here again, we find 242 IP/24 of 15
(Sec. 4.2) over which we perform an additional portscan ASes that are among the major players of the Web.
campaign to reveal their running services (Sec. 4.3).
4.1 At a glance 4.2 Top-100 Anycast ASes
Details about the of our censuses are reported in Fig. 10. Albeit the amount of anycast IP/24 may seem de-
Overall, 1696 IP/24 belonging to 346 ASes appear to ceiving at first in reason of its exiguous footprint, it is
have more than one anycast replica, while we were able nevertheless very rich – revealing silver needles in the
to find only 897 IP/24 belonging to 100 ASes having haystack. From the very coarse cross-check of CAIDA
at least 5 replicas with our technique. The plot also and Alexa ranks, we already expect that anycast usage
shows a geographical density map of anycast replicas: is not only restricted to DNS, but rather covers impor-
results of our censuses are available for browsing at [21], tant ISPs and OTTs. Fig. 9 presents a bird’s-eye view
offering per-deployment (as in Fig. 5) or aggregated (as of anycast adoption, depicting several information for
in Fig. 10) visualizations. Notice that results reported the 100 ASes for which we detected at least 5 replicas,
in this paper correspond to censuses performed during identified by their WHOIS name reported in the x-axis
March 2015: with later censuses, we observed small but (capped to 12 characters). Geographical and IP/24 foot-
interesting changes in the anycast landscape. While we print are reported in the bottom: ASes are arranged left
plan to run a continuous service, please be advised that to right, in decreasing number of replicas (bottom bar-
(at time of writing) results at [21] refer to the censuses plot, with standard deviation across IP/24 belonging
described in this paper. to the same AS), additionally reporting the number of
Several remarks from Fig. 10 are in order. First, no- anycast IP/24 for that AS (middle bar-plot). Service
tice that our results are conservative since (i) in regions footprint is correlated to the open TCP ports in the AS
with low presence of PlanetLab VPs, we may miss some (middle scatter-plot). Next, the relative importance of
anycast replicas, e.g., when the BGP prefix is only lo- the AS in the Internet and for the Web are expressed
cally advertised; (ii) the analysis technique provides a in terms of the CAIDA and Alexa ranks respectively
lower bound on the number of replicas, since overlap- (top scatter-plots). Finally, a label reported on the top
ping disks may correspond to different anycast replicas x-axis categorize the main activity of the ASes from a
but they will not be considered in the solution of the business perspective (category is informal and in case of
MIS problem (recall Fig. 3). Second, we investigate the ASes with multiple services, only the most prominent is
CAIDA AS rank list, to cross check how many ASes us- selected).
ing IP-anycast figure in the top-100: results tabulated
in Fig. 10, show that 19 IP/24 of 8 ASes that play a Big fishes. Major players of the Internet ecosystem
central role in the Internet belong to the list. Similarly, are easy to spot in Fig. 9. The list includes not only
we investigate the Alexa rank, to cross check how many tier-1 and other ISPs (such as AT&T Services, Tinet,
webpages in the top-100k rank are hosted2 by ASes us-
of the frontpage found in Alexa to an IP, and disregard
2
For the sake of simplicity, we resolve the domain name content that is referenced in the frontpage.
#Replicas #IPs/24 Open ports Alexa rank Caida rank
0
10
20
1
10
100
53
80
443
8080
100
1k
10k
100k
1
10
100
1k
10k
100k
1 CLOUDFLARENET 1 1 1 1 CDN
2 ISC-AS,US 2 2 2 2 DNS
3 HURRICANE,US 3 3 3 3 ISP
4 CDNETWORKSUS- 4 4 4 4 CDN
5 FACEBOOK,US 5 5 5 5 Social Network
6 COMMUNITYDNS, 6 6 6 6 DNS
7 XGTLD,US 7 7 7 7 DNS
8 L-ROOT,US 8 8 8 8 DNS
9 MICROSOFT,US 9 9 9 9 Cloud
10 I-ROOT,SE 10 10 10 10 DNS
11 VERISIGN-INC 11 11 11 11 DNS
12 LLNW,US 12 12 12 12 CDN
13 ARYAKA-ARIN, 13 13 13 13 Cloud
14 APPLE-ENGINE 14 14 14 14 CDN
15 CEDEXIS,US 15 15 15 15 Security
16 HIGHWINDS3,U 16 16 16 16 CDN
17 NETNOD-IX,SE 17 17 17 17 DNS
18 OPENDNS,US 18 18 18 18 Security/DNS
19 WOODYNET-1,U 19 19 19 19 DNS
20 LGTLD,US 20 20 20 20 DNS
21 LIECHTENSTEI 21 21 21 21 unknown
22 FASTLY,US 22 22 22 22 CDN
23 CACHENETWORK 23 23 23 23 CDN
24 INSTART,US 24 24 24 24 CDN
25 DNSCAST-AS,U 25 25 25 25 DNS
26 GOOGLE,US 26 26 26 26 Cloud/DNS
27 EDGECAST-IR, 27 27 27 27 CDN
28 UMDNET,US 28 28 28 28 unknown
29 DYNDNS,US 29 29 29 29 DNS
30 NSONE,US 30 30 30 30 DNS
31 EASYLINK4,US 31 31 31 31 Cloud messaging
32 YAHOO-AN2,US 32 32 32 32 Web Portal
33 ULTRADNS,US 33 33 33 33 DNS
34 OVH,FR 34 34 34 34
Cloud
100 ZVONKOVA-AS 100 100 100 100 unknown
AS frequency
Considering CDNs that are, after DNS, the most popu-
tcpwrapped
tcpwrapped
tcpwrapped
60
movaz-ssc
lar anycast service according to Fig. 11, we observe that 40
http-ssl
domain
8 CDNs serve Alexa-100k websites: this set includes
http
http
20
ssh
sip
CloudFlare, EdgeCast, and Fastly with 188, 10, and 0
53
80
443
179
22
8080
8083
3306
1935
5252
5 websites respectively (in addition, Highwinds, Cach-
eNetworks, Instart, Incapsula, and BitGravity host one
IP/24 frequency
popular site each). In addition, 11 of the websites listed
by Alexa are hosted by Google anycast IPs. Finally, 400
http ssl
domain
10 websites are hosted on IPs that belong to Prolexic 200
http
http
http
http
http
http
http
http
(now part of Akamai), which operates a DDoS mitiga- 0
tion service that receives the traffic on behalf of its client
80
443
8080
53
2052
2053
2082
2083
8443
2087
networks, redirecting only legitimate traffic to them.
4.3 Anycast Services Figure 14: Overall nmap portscan statistics and
Portscan campaign. In reason of the historical use of Top-10 open TCP ports (per AS and per /24).
anycast for DNS services, we believe it to be important
to provide an up-to-date longitudinal view across ser-
vices offered via IP-anycast, especially focusing on TCP. We make the following observations: (i) roughly half of
We provide a summary of the nmap probing in the top the ASes have at least one open TCP port, (ii) about
of Fig. 14. We test all anycast /24 of the top-100 ASes: 10% of the ASes have at least 5 open TCP ports and
picking a single IP representative per /24 we scan, at (iii) the largest service footprint is represented by Incap-
low rates, all 216 TCP ports. Our results are conser- sula and especially OVH with 313 and 10148 open ports
vative in that different IPs in the same /24 may have respectively. In the latter case, while we did not inves-
different open ports (which happens, e.g., for Cloud- tigate thoroughly, we suspect the large number of ports
Flare and EdgeCast), and since an under-estimation of being due to the fact that OVH, the largest hosting ser-
the number of open TCP ports can also be the result of vice in Europe and the 3rd in the world, is significantly
probe filtering by firewalls and routers along the path popular in the BitTorrent seedbox ecosystem [42]. Pre-
to the targets. Out of the 897 IP of the top-100 ASes, dominant services (beyond DNS) include fairly popu-
we find that 816 of 81 ASes have at least one open TCP lar HTTP and HTTPS, used by over 20% of the ASes.
port. The total number of distinct open TCP ports Even excluding the OVH case, the list of interesting
across is 10485, providing 449 well-known services (i.e., services is large. In terms of business diversity, 22 ASes
as indicated by TCP port classification), 170 of which have at least 4 different TCP ports open: 8 CDNs, 4
over SSL. Additionally, nmap fingerprinting discovers DNS, 4 ISPs including a tier-1 ISP (Tinet SpA) and
30 different software implementations running on the Google with 9 open TCP ports. Finally, interesting
anycast replicas, that we also detail next. (though unpopular) services worth listing include multi-
media services (RTMP, Simplify Media, MythTV), and
Class imbalance. Given the heterogeneity of the IP/24 gaming (Minecraft).
footprint, we argue being necessary to consider only per-
AS statistics to avoid presenting results that are biased Software diversity. Fig. 16 lists 30 different soft-
due to class imbalance. We illustrate the problem by ware that we group into three main categories: Web,
depicting in Fig. 14 the frequency count of the top-10 Mail, and DNS. Interestingly, the list includes open
open TCP ports by number of ASes (top) and IPs/24 source software such as popular web and DNS daemons
(bottom). Notice that only port 80, 443 and 53 appear (e.g., nginx, ISC BIND) and proprietary software (e.g.,
to be common to both top and bottom plots: especially, ECAcc/ECS/ECD which are web servers developed by
all ports in the hatched area are due to the large pre- EdgeCast). Starting with DNS software, notice that for
dominance of IP/24 owned by the CloudFlare AS, which 44 ASes using port 53 (out of 67), nmap could not iden-
also affect the order of common ports in the top-10. We tify the software version running on the remote server.
thus focus on per AS statistics in the following. Unsurprisingly, we find that ISC BIND is by far the
most adopted protocol to handle DNS requests over
Stateful services. Fig. 15 presents the complemen- anycast. Yet, we also detect the use by 3 ASes (Apple,
tary CDF of the number of open TCP ports per AS. K-ROOT, L-ROOT) of the NLnet Labs NSD implemen-
DNS Web Mail Other
AS frequency
1 20
EDGECAST,US (5) ASes
GOOGLE,US (9) 10
CCDF
CLOUDFLARENET,US (20)
0.1 0
ISC BIND
NLnet Labs NSD
Microsoft DNS
OpenDNS
nginx
lighttpd
Apache httpd
ECD
Microsoft IIS
Varnish
Apache Tomcat
bitasicv2
CFS 0213
cloudflare-nginx
cPanel httpd
thttpd
ECAcc/ECS
Google httpd
instart/160
Gmail imapd
Gmail pop3d
Google gsmtp
OpenSSH
MySQL
sslstrip
Microsoft RPC
Microsoft HTTP
Microsoft SQL
INCAPSULA,US (330)
OVH,FR (10,148)
0.01
1 10 100 1000 10000 100000
Number of open TCP ports