You are on page 1of 15

A comparison of web privacy protection techniques

Johan Mazel Richard Garnier Kensuke Fukuda


National Institute of Informatics/JFLI ENSIMAG National Institute of Informatics/Sokendai
johan.mazel@ssi.gouv.fr garnier.richard.s@gmail.com kensuke@nii.ac.jp

Abstract—Tracking is pervasive on the web. Third party users regarding their interests. To this end, advertisers leverage
trackers acquire user data through information leak from context (e.g. visited website) or previous browsing interests.
websites, and user browsing history using cookies and device Advertisers use techniques such as cookies to identify users
fingerprinting. In response, several privacy protection techniques
across websites and build their browsing history. Other tech-
arXiv:1712.06850v1 [cs.CR] 19 Dec 2017

(e.g. the Ghostery browser extension) have been developed. To the


best of our knowledge, our work is the first study that proposes niques have also been developed to allow advertising actors
a reliable methodology for privacy protection comparison, and to communicate with each other (such as cookie syncing [2]),
extensively compares a wide set of privacy protection techniques. or circumvent cookie removal by respawning cookies using
Our contributions are the following. First, we propose a robust diverse types of data storage inside the browser (e.g. using
methodology to compare privacy protection techniques when
crawling many websites, and quantify measurement error. To this Flash [3]). Browser fingerprinting [4] allows a tracking entity
end, we reuse the privacy footprint [1] and apply the Kolmogorov- to follow a user across websites without any in-browser data
Smirnov test on browsing metrics. This test is likewise applied storage. In response to these techniques, several counter-
to HTML-based metrics to assess webpage quality degradation. measures were designed. We can here quote the Do Not Track
To complement HTML-based metrics, we also design a manual HTTP header [5] by which a user can ask not to be tracked.
analysis. Second, we study the overlap of blocking resources
between most popular browser extensions, and compare the Browsers can also block some or all cookies. Finally, many
performances using the proposed methodology. We show that browser extensions hinder third party tracking by preventing
protection techniques have vastly different performances, and cookie creation and/or blocking requests to tracking services.
that the best of them exhibit a wide overlap. Next, we analyze End-users thus have many techniques available but have
the impact of privacy protection techniques on webpage quality. trouble picking one that offers good protection. Similarly, pri-
We show that automated HTML-based analysis sometimes fails to
expose quality reduction perceived by users. Finally, we provide vacy researchers often want to assess protection efficiency but
a set of usage recommendations for end-users and research rec- do not want to test all available tools. Our goal is to compare
ommendations for the scientific community. Ghostery and uBlock existing privacy protection techniques and provide efficiency-
Origin provide the best trade-off between protection and webpage based recommendations. Some work previously tried to com-
quality. Ghostery however requires a configuration step which is pare privacy protection techniques [6], [7], [8], [9], [10],
difficult for users. The RequestPolicy Continued and NoScript
extensions exhibit the best performances but reduce webpage [11], [12], [13]. Krishnamurthy et al. [6] provide the first
quality. Ghostery and uBlock Origin use manually built blocking comparison of privacy protection methods but do not evaluate
lists which are cumbersome to maintain. Research efforts should most of the browser extensions available today because they
focus on improving existing approaches that do not rely on appeared after the publication of the paper. Balebako et al. [8]
blocking lists (such as Privacy badger or MyTrackingChoices), focus on a very specific use case: behavioral advertising.
and automatically building reliable blocking lists.
Mayer et al. [7], Hill [9], [10] and Traverso et al. [13]
evaluate 4, 4, 4 and 7 browser extensions on a limited set of
I. I NTRODUCTION
websites (respectively Alexa Top 500, 45, 84 and 100 italian
The huge growth of the Internet comes along with an ever- URLs). Wills et al. [11] crawl a thousand of websites but
increasing advertising market. Internet users access content use only six browser extensions. Merzdovnik et al. [12] use
provided for free by publishers. Consequently, publishers five browser extensions. Furthermore, these studies do not
monetize their audience through advertisement. Companies use the same metrics, it is thus difficult to compare their
thus buy online exposure to promote their products. In order results. Our extensive coverage improves the state of the art
to maximize advertisement efficiency, advertisers tailor ads to regarding measurement methodology and error, and evaluated
techniques.
Our contributions are the following. First, we propose a
robust methodology to compare privacy protection techniques
against third party tracking when crawling many websites. To
this end, we reuse the privacy footprint [1] and apply the
Kolmogorov-Smirnov test on browsing metrics. This test is
likewise applied to HTML-based metrics to assess webpage
quality degradation. We also design a manual analysis to
complement HTML-based metrics. Second, we analyze the
Name License User number (K) Version The next subsections provide a breakdown of browser-related
AdBlock Plus [16] GPL v3 13,917 2.7 techniques that we compare in this work. Unless specified
Blocking lists

uBlock Origin [17] GPL v3 3,759 1.6 otherwise, these extensions are open source.
Ghostery [18] Proprietary 966 5.4.10
Disconnect [19] GPL v3 204 3.15.3 1) Extensions: We classify extensions regarding used third
NoTrace [20] - 0.8 2.4 party tracking impediment methods: blocking lists, heuristics,
DoNotTrackMe/Blur [21] Prop. 86 6.0.2091 indiscriminate blocking, or other.
BeefTaco [22] Apa. 2.0 10 1.3.7.1
The following browser extensions use blocking lists made
Ind. Heur.

Privacy Badger [23] GPL v3 205 1.0.6 of regex-based rules on domain names. These lists are usually
MyTrackingChoices [22] - 0.1 1.0
community maintained. Ghostery [18] is a proprietary, privacy-
NoScript [24] GPL v2 1,765 2.9
RPC [25] GPL v3 6 1.0
focused extension that uses a specific tracker blocking list.
It has recently been bought by the privacy-focused browser
HTTPSEverywhere [26] GPL v2 353 5.2.5
Other

Decentraleyes [27] MPL 2.0 69 1.2.2 Cliqz [30]. Ghostery requires a configuration step to select
WebOfTrust [28] GPL v3 315 - categories of trackers to block. uBlock Origin [17] is a general
purpose blocker. It can also understand the syntax used by
TABLE I the famous ad-blocker AdBlock Plus. uBlock Origin includes
P RIVACY PROTECTION TECHNIQUE CHARACTERISTICS . H EUR .: by default: EasyList, EasyPrivacy, Peter Lowe’s Adservers,
HEURISTICS ; I ND .: I NDISCRIMINATE ; P OPULARITY (K): POPULARITY IN
THOUSANDS .
Malware domains and some specific lists. Disconnect [19] is
another blocker. Blur [21] (also known as Do Not Track Me)
is a proprietary extension owned by Abine. It blocks trackers,
protects mail addresses and passwords. NoTrace [20] uses a
blocked resource overlap between privacy protection tech- wide range of techniques in order to enhance privacy on the
niques, and compare their performances. Third, we assess the Internet. NoTrace needs to be configured after installation.
impact of privacy protection techniques on website quality. Fi- One can also block tracking by setting opt-out cookies that
nally, we provide recommendations for end-users and scientific block domains from setting standard cookies. Several entities
community. [31], [32] provide opt-cookie lists. BeefTaco [33] creates
opt-out cookies in the browser. Another privacy protection
II. BACKGROUND
approach uses ad-blockers to hinder tracking in the same way
This section presents an overview of third party web track- they block advertisements. AdBlock Plus [16] is the most
ing and existing privacy protection techniques. We refer the popular ad blocker. Using specific lists, it can block trackers,
reader to [14], [15] for a more complete description of these social widgets, or malwares. Since 2011, AdBlock Plus is
aspects. commercially exploited by Eyeo which monetizes domain
whitelisting [34]. In addition to the ad-focused EasyList [35]
A. Third party web tracking that is used by default in AdBlock Plus, there is a privacy-
Websites massively rely on advertisement to monetize user focused list called EasyPrivacy [35].
visit. Advertisers purchase ads for their products directly from Unlike previous extensions relying on blocking list, some
publishers or ad exchanges. When users access websites, use heuristics to block trackers. Privacy Badger [23] is devel-
they communicate with all these actors. From the users’ oped by the Electronic Frontier Foundation and its behavior
viewpoint, these entities belong to two categories: first and is further described in § V-B. By default, Privacy Badger
third parties. First parties are the entities that users intend uses the Do Not Track HTTP header [5] and strips the
to reach, here, publishers. Third parties can be advertisers, referrer field in HTTP requests. MyTrackingChoices [22] is
ad exchanges or others actors that provide services to first an advertisement friendly privacy protection extension. It uses
parties (such as web analytics). Using techniques such as an hybrid approach which leverages both heuristics similar to
cookies, local data storage or fingerprinting, third parties can Privacy Badger and blocking list for bootstrapping. Users can
identify users across websites. Combining HTTP referrer field allow tracking for some website categories.
and user identification, they can reconstruct users’ browsing Some extensions indiscriminately block resources NoScript
history. Using cookie syncing, trackers can also exchange user [24] disables JavaScript. As a side effect, this also disables
identification and thus improve data collection. some tracking. RequestPolicy Continued [25] blocks all third
party requests.
B. Privacy protection techniques Finally, some extensions use other mechanisms to pro-
Several techniques have been designed to protect users tect users’ privacy. HTTPSEverywhere [26] tries to replace
from third party tracking. Network-based techniques use DNS HTTP connections with HTTPS ones if HTTPS is supported.
filtering or proxy. They however exhibit several shortcomings: WebOfTrust (WOT) [28] provides website rating regarding
proxies cannot analyze encrypted traffic while DNS filter- trustworthiness and child safety. It was temporarly removed
ing only blocks entire domains [12]. User agent spoofing from extensions stores when it was revealed that WOT broke
browser extensions may also improve privacy by hindering privacy rules of browser developers [36]. Decentraleyes uses
fingerprinting, but exhibit poor performance in practice [29]. local files to emulate resources, e.g. freely available JavaScript

2
libraries, hosted on centralized entities. This blocks tracking lists (see Table I). Some proposals automatically build tracking
from these entities. Table I provides popularity (measured as blocking lists using specific keys in URL that correspond to
the number of users for Firefox), source code license and user identifying data [52], machine learning on DOM structure
version of the extensions used in our work. [68], Javascript [53], [55] or user browsing behavior [69]. The
2) Browsers: Some browsers also have built-in privacy same machine-learning-based approach was also applied to ad-
protection features. DoNotTrack [5] is a field in the HTTP blocking list building using network traffic features [51].
header that asks the destination not to track the sender. It was
proposed in 2009 and is currently under standardization in C. Privacy protection techniques comparison
W3C. This feature is turned off by default in Firefox. The While some extensions are used in almost all studies (e.g
last proposed amendments to the European Union’s ePrivacy AdBlock Plus [16] and Ghostery [18]), many of them are
Regulation hints at an increase interest of policy makers seldom employed. Similarly, some works compare between
towards DoNotTrack [37]. Firefox [38] can block cookies: four and seven extensions [7], [8], [9], [10], [11], [12], [13]
either all or only third parties’ ones or only third party cookies while another focuses on two [6]. Table II describes how
from previously visited domains. A tracking protection [39] existing web privacy-related work used or analyzed existing
has also been added in Firefox version 42. Brave [40] and privacy protection techniques.
Cliqz [41] are browsers that emphasize their privacy protection Features used to compare privacy protection techniques are
features. very diverse and thus complicate result comparison across
work. Some works compare privacy protection techniques in
III. R ELATED WORK terms of HTTP request number [7], [9], [10], [12], cookie
A. Tracking on the Internet number [7], [8], [9], [10], domain number [9], [11], [13],
The seminal contribution in the field of web privacy was by private information diffusion [6], and occurrence of behav-
Krishnamurthy et al. They first studied the diffusion of privacy ioral advertising [8]. Some counter-measure proposals are
information [1], then, analyzed the impact of counter-measures also comparing their approaches to others regarding tracker
on this diffusion and webpage quality [6], and observed the blocking [49], [50], [39] or impact on browser performance
increase of user-data aggregation by a small number of entities [22]. Table III lists metrics and features leveraged by existing
[58]. Several works then tried to analyze this phenomenon web privacy-related studies.
with more details by classifying tracking [59] or analyzing Our work is close to [6], [7], [8], [9], [10], [11], [12], [13]
geographical variations of tracking [43], [44]. but improves the state of the art along four axes: protection
While early studies focused on cookie-based tracking, two techniques, target websites, metrics, and reliability. First,
main new tracking techniques trends emerged: resilient in- we compare more protection techniques (15 and additional
browser data storage [3] and browser fingerprinting [4]. Local combination of blocking lists), than Krishnamurthy et al. [6]
data storage allows one to store data in multiple locations (2), Mayer et al. [7] (4), Balebako et al. [8] (4), Hill [9],
(for example, Flash Local Storage Object, HTML5 or ETags [10] (4), Wills et al. [11] (6), Merzdovnik et al. [12] (5)
among other means) on a device to bypass standard cookie and Traverso et al. [13] (7). It is difficult to compare the
removal. This technique thus provides resilient local identifier extensions covered by our work with Krishnamurthy et al. [6]
storage. By using browser fingerprint, a tracker can completely since the ecosystem was much simpler at the time (AdBlock
avoid using and storing an identifier inside the browser and Plus and NoScript were the only extensions available). We
instead recognize a browser, or the device it is installed did not use some extensions that were addressed in previous
on, using its characteristics such as fonts [60], battery [61], studies because they are either, not supported anymore (e.g.
canvas [62] or hardware level features [63]. Several studies TACO which was owned by Abine [8], [56]), or not available
then tried to detect fingerprinting [64], or both local storage for Firefox (AdBlock [46], [22], [11], Superblock Adblocker
and fingerprinting [2], [48]. Furthermore, Lerner et al. [65] [22], Adremover [22] and Adblock Pro [22]). Furthermore,
show that local storage fingerprinting increased between 1996 AdBlock (resp Firefox Tracking Protection [39]) uses the same
and 2016. Another privacy threat is the practice of sharing blocking list as AdBlock Plus (resp. Disconnect). By analyzing
user identification across tracking entities. This is called ID- AdBlock Plus and Disconnect, we thus provide performance
sharing, cookie-syncing or cookie-matching. While the phe- bounds on these two techniques. Achara et al. [22] and Wills
nomenon has been documented for some time [66], recent et al. [11] also evaluated the performance of an ad-blocking
work analyzed its prevalence in the wild and its impact on extension that we do not address in this study: AdGuard
user browsing history sharing among trackers [2], [67], [48]. Adblocker. Second, we us more websites than most existing
Finally, the feasibility of user surveillance through bulk work. We use the Alexa Top 1000 while Mayer et al. [7],
passive network traffic monitoring-based cookie observation Hill [9], [10] and Traverso et al. [13] use the Alexa Top 500,
has also been analyzed [47]. 45, 84 and 100 URLs, and 100 italian URLs. Balebako et al.
discussed five browsing scenarios that use a small number of
B. Automated blocking list building websites to assess the occurrence of behavioral advertising,
Many privacy protection tools have been proposed (see we thus cannot compare the websites we use. We crawled a
§ II-B). Most of these tools use manually maintained blocking number of websites similar to Wills et al. [11] but analyzed

3
Type Authors & references Extensions Browsers

Opt-out cookies [31], [32], [33]


AdBlock Plus EL [16], [35]

AdBlock Plus EP [16], [35]

DoNotTrackMe/Blur [21]
MyTrackingChoices [22]

HTTPSEverywhere [26]
Privacy Badger [23]

Decentraleyes [27]
uBlockOrigin [17]

WebOfTrust [28]
Disconnect [19]
Ghostery [18]

NoScript [24]
AdBlock [42]

NoTrace [20]

Firefox [39]
RPC [25]

DNT [5]

Cliqz
Krishnamurthy et al [1]
Castelluccia et al [43]
Web Fruchter et al [44]
privacy Carrascosa et al [45] +
analysis Metwalley et al [46] · · · ·
Englehardt et al [47] + + + +
Englehardt et al [48] + + + +
Malandrino et al [49] + + + + +
Privacy Malandrino et al [50] + + + + +
protection Kontaxis et al [39] + +
Achara et al [22] + + + + +
Gugelmann et al [51] +? +?
Tracker Metwalley et al [52]
detection Wu et al [53] ?
Yu et al [54] + + +
Ikram et al [55] + + + + +
User Leon et al [56] + + + +
ana. Malandrino et al [57] +
Krishnamurthy et al [6] + +
Mayer et al [7] + + + +
Comp. Balebako et al [8] + + + +
Hill [9] + + + +
Hill [10] + + + +
Wills et al [11] + + + + + +
Merzdovnik et al [12] + + + + +
Traverso et al [13] + + + + + + +
Our work - ∼ + + + + + + + + + + + + + + + + ∼
TABLE II
E XTENSIONS USED IN PREVIOUS PUBLICATIONS (· FOR USAGE IN THE WILD , FOR GROUND - TRUTH , FOR GROUND - TRUTH AND USAGE , + FOR
COMPARISON , ⊕ FOR COMPARISON AND GROUND - TRUTH , ? FOR ML BOOTSTRAPPING , ∼ MEANS THAT WE DID NOT EVALUATE THESE TECHNIQUES BUT
THAT THEIR RESULTS CAN BE DERIVED FROM OUR WORK : A D B LOCK USES THE SAME DEFAULT BLOCKING LIST AS A D B LOCK P LUS AND F IREFOX
T RACKING P ROTECTION USES THE SAME BLOCKING LIST AS D ISCONNECT ).

many more protection techniques. We use a smaller number of crawls for each URL but does not provide any justification
crawled websites than Merzdovnik et al. [12] (who uses Alexa for this number. To the best our knowledge, we here propose
Top 100000) because some of our metrics (such as number of the first measurement error analysis for privacy protection
cookies) require a stateful crawl and thus forbid parallelism. technique comparison (see § V-A).
Third, we use more metrics (see Table III) than any existing
work. We are thus able to provide a better description of IV. M ETHODOLOGY
techniques’ performance. We did not compute the host number This section presents data collection and used metrics for
metric used by Hill [9], [10] because it is very similar to the privacy protection and webpage quality.
number of domains. This metric actually reflects the internal
A. Data collection
network architecture of trackers, which is not relevant for
privacy protection comparison. Finally, none of these work We use OpenWPM [48], an open-source framework written
performs several measurements on each website to remove in Python that relies on Selenium for browser automation.
measurement error, except Mayer et al. [7] who perform three OpenWPM supports Firefox and provides some extensions.
With minor modifications, we add extensions presented in

4
Type Authors & references Privacy protection Webpage quality Browser

Use-case specific metric


Browser memory usage
# third party requests
# third party cookies

# first party requests


# 3rd party domains

Browser CPU usage


Privacy footprint

Manual analysis
Image data size
Script data size
TP/FP/FN/TN

Loading time
# domains

# requests
# cookies

# image
# script
# hosts

Data
Krishnamurthy et al. [1]
Web Fruchter et al. [44]
privacy Carrascosa et al. [45]
analysis Englehardt et al. [47]
Englehardt et al. [48]
Malandrino et al. [49]
Privacy Malandrino et al. [50]
protection Kontaxis et al. [39]
Achara et al. [22]
Tracker Yu et al. [54]
detection Ikram et al. [55]
Krishnamurthy et al. [6]
Mayer et al. [7]
Comparison Balebako et al. [8]
Hill [9]
Hill [10]
Wills et al. [11]
Merzdovnik et al. [12]
Traverso et al. [13]
Our work -
TABLE III
M ETRICS USED FOR PRIVACY PROTECTION TECHNIQUE COMPARISON ( FOR METRIC USED AND FOR METRIC USED WITH BREAKDOWN ).

Table I and § II-B. This study has been conducted with Firefox A protection technique is effective if the number of first
45. party requests is not impacted, and the number of third party
We crawl the highest-ranked websites by Alexa [70]. Unless requests decreases.
stated otherwise, we perform measurements in August 2017, The third metric is the number of third party domains
using IP addresses located in Japan. Typically, a crawl on accessed during a crawl. This is complementary to the number
the Alexa Top 1000 (§ VI-A) takes about three days with of third party requests. It provides a metric on the number of
commodity hardware. blocked entities. There may be a few domains that generate
B. Privacy protection a lot of requests, or a many domains that produce a few
requests. Efficient protection techniques reduce the number of
In this section, we present our robust methodology to third party domains.
compare privacy protection techniques across several websites
This is however not sufficient since third party requests
and using several metrics. We also present privacy footprint
are not always used to track users, they can also provide
[1].
content to users (e.g. media resources) or contact non-tracking
1) Browsing metrics: As explained in § II-A, privacy leak-
third parties (e.g. website analytics). The fourth metric is the
age occurs through communications with trackers, and local
number of the profile cookies obtained from a stateful crawl.
data storage (e.g. cookie) allows trackers to easily identify
users across website. We consider five simple browsing-related Protection techniques with good performances diminish the
metrics that reflect the ability of protection techniques to number of cookies.
hinder these two phenomena. We first focus on HTTP requests. The last metric is the total amount of data transferred.
We separately count the requests that are made to the accessed It is especially important in low-bandwidth situations for
domain (the number of first party requests), and the requests example on mobile. This metric is directly related to tracking
that are made to other domains (the number of third party and advertisement. Advertisement often generates considerable
requests). To this end, we use the second level domain name traffic during browsing. However, as we will see in § V-A, this
obtained from the Public Suffix List [71]. These metrics give a metric shows high variability when crawling the same website.
rough estimation on the performance of a particular extension. We thus did not use this metric in our analysis.

5
1.0 the same hosting service or Content Delivery Network (CDN)
are grouped together by ADNS because they share authoritative
0.8 DNS servers. They thus develop a new method called root.
With root, if a third party is located in a hosting service
0.6 or a CDN, the second-level domain-name of the third party
ECDF

domain is the node identifier. Otherwise, as with ADNS, the


second-level domain name of the authoritative DNS server of
0.4 the considered third party is the node.
We use three metrics that are built on the graph. (1) the
0.2 bare number of third parties reveals tracking breadth. (2) the mean
uBlock Origin number of third parties per first party corresponds to tracking
0.0 10 101
intensity. (3) the number of first parties associated with the top
1 100 102 103 10 third parties estimates how much tracking is concentrated
mean # 3rd party requests on the most prominent third parties.
Fig. 1. Example of ECDF of the mean number of third party requests for C. Webpage quality
all websites in the Alexa Top 1000 with two configurations: default Firefox
(bare) and Firefox with uBlock Origin. The arrow represents the KS statistic. Privacy protection techniques prevent privacy informa-
tion leakage by hindering of communications with trackers
and blocking cookie creation, among other techniques. This
2) Kolmogorov-Smirnov test-based browsing metrics com- may however have negative side-effects on webpage quality.
parison: While the metrics introduced in § IV-B1 provide Blocked third party domains may actually host images or
a good estimation of the privacy protection for a particu- JavaScript that are needed to render the webpage correctly. We
lar website, summarizing this aspect for a set of websites first analyze the impact of privacy protection on browsing data
is non-trivial. Figure 1 presents two empirical cumulative in an automated fashion. We then perform a manual analysis to
distribution functions (ECDFs) built on the mean number assess privacy protection repercussion on rendered webpage.
of third party requests sent, for each website of the Alexa 1) HMTL-based metrics: Our goal is to detect layout dif-
Top 1000 world. We use the mean of each metrics for ten ferences or missing elements that may reduce webpage quality
crawls to reduce measurement error (see § V-A). We here both in terms of rendering quality and functionality. We crawl
use two Firefox configurations: one without any extension HTML data using OpenWPM. We devise several metrics that
(bare) and one with uBlock Origin. To determine whether analyze the browsed page. The first metric is the HTML page
one Firefox configuration exhibits the same performance as size. We then derive two metrics from the crawled HTML data
another, we use the Kolmogorov-Smirnov (or KS) test. It is a itself: number of images and scripts. The last two metrics are
non-parametric test that does not make any assumption on the the total size of images and scripts. For all metrics, a reduction
underlying distributions which fit our context. This test relies is associated with a webpage quality decrease. We here reuse
on the KS-statistic which is the maximum difference between the metric comparison method presented in § IV-B2.
two cumulative distributions. The KS statistic is represented 2) Manual analysis: By analyzing HTML, we are able to
by the arrow on Figure 1. The null hypothesis is that both determine how many elements and how much data is missing
ECDFs belong to the same distribution. In our case, this when privacy protection is used. This, however, does not
means that both configurations have the same performance. assess how well the webpage is rendered inside the browser.
We use a significance level α of 0.05. If the p-value of the We thus perform a manual analysis to determine webpage
KS-test is smaller than α, we consider the two ECDFs as quality obtained with privacy protection techniques. Webpage
distinct. In other words, the two considered configurations screenshots are captured using OpenWPM. First, we want
have performances that are statistically different. to determine whether privacy protection techniques impact
3) Privacy footprint: In this work, we also use the privacy webpage layout. Our first question thus is: “Please rate layout
footprint proposed by Krishnamurthy et al. [1]. The privacy similarity between Bare (left) and Extension (right).”. Good
footprint represents interactions between first parties and third layout similarity, however, does not guarantee lack of missing
parties (i.e. potential trackers) in a graph. For each third party elements. The next step is to compare webpage elements when
accessed by the user when visiting a first party, an edge is privacy protection is used and not used. We want to avoid
added between the nodes representing the considered first and comparing elements of the same webpages that may change
third parties. Privacy footprint thus captures both the potential for different users or crawling time (e.g. news items). We thus
information leaking from first parties, and the aggregating ask user to focus on webpage elements that are part of the
behavior of third parties. Krishnamurthy et al. [1] used the user interface. Our second question thus is: “Please rate the
second-level domain name of the authoritative DNS server of proportion of elements (image, text frame, widgets, etc.) of the
third parties to group domains in the same entity on the same user interface (e.g.: login button, tabs, etc, but not news items
node in the graph. This method is called ADNS. Krishnamurthy or pictures) in Bare (left) that are also in Extension (right)”.
et al. [58] then noticed that unrelated third parties located on For both questions, users give a rating between 0 and 10, 0

6
80 1.0
# 1st party requests
70 # 3rd party requests
Relative standard error of the mean (%)

# 3rd party domains 0.8


60 # bytes received
50 0.6

ECDF
40

30 0.4

20 # 1st party requests


0.2
# 3rd party requests
10
# 3rd party domains
0 0.0
10-4 10-3 10-2 10-1 100 101
5 10 15 20 25 30 35 40
Number of measurements Relative standard error of the mean (%)
Fig. 2. Relative standard error of the mean of browsing metrics in function Fig. 3. ECDF of the relative standard error of the mean of browsing metrics
of the number of measurements when accessing www.yahoo.jp. Solid lines on the Alexa Top 1000 websites crawled ten times. Vertical dashed lines
represent the median, dashed lines correspond to 5 and 95-centiles. represent the relative standard error for www.yahoo.jp.

being the worst and 10 the best. Our manual analysis was the Alexa Top 1000 crawled 10 times. Most websites have
performed by six users. a relative standard error smaller than that of www.yahoo.jp
V. M EASUREMENT PARAMETERS (here in dashed lines). Overall, 99% of websites have a relative
standard error smaller than 1% for all observed features.
We address two measurement parameters: the impact of
crawling parameters on measurement error, and the training B. Privacy Badger training
of Privacy Badger.
Privacy Badger employs heuristics to determine if a domain
A. Impact of crawling parameters on measurement error is performing tracking (see § II-B1). Privacy Badger measures
During our preliminary experiment, we notice that measure- the number of times that a domain reads a cookie as a
ment results are not stable. Querying twice a website usually third party. When this number is greater than three, the
yields two different metric values. To determine the appropri- considered domain is blocked. On top of this, cookies are also
ate number of measurements that generates a small error, we whitelisted using heuristics. This behavior causes a freshly
conducted a detailed study on the website showing the high- installed Privacy Badger not to block any domain. As the
est variability in our preliminary experiment: www.yahoo.jp. user browses websites, more and more domains are blocked.
These measurements were performed is November 2016. Vary- To fairly evaluate Privacy Badger, we analyze the impact of
ing the number of measurements, we computed the relative Privacy Badger training. In other words, we intend to quantify
standard error of the mean of four browsing metrics. how many websites need to be accessed to ensure that Privacy
Figure 2 shows that the relative standard error decreases Badger is completely trained.
when the number of measurements increases. Performing ten If we were to determine the number of websites regular
crawls reduces the relative standard error on the number of first users need to browse to train Privacy Badger, we should use
party requests, third party requests, and third party domains to a browsing model such as AOL search logs [73] as Roesner
less than 5%. The number of bytes received, however, still has et al. [59] do. Our use-case here is however to ensure that
a large error, we thus discard this metric. One possible reasons Privacy Badger is trained for a specific set of websites. We
for this measurement error is the webpage advertisement thus measure Privacy Badger’s performance on the Alexa Top
churn. Guha et al. [72] analyze ad churn during page reloads. 100 websites after training. These crawls were performed is
They show that the number of unique new ads increases November 2016. This training consists of several repeated
quickly during the first ten reloads but only linearly thereafter. crawls targeting the same set of websites and performed
This further justifies our choice to use ten measurements. An without reinitializing the user profile, i.e. Privacy Badger
existing comparison of privacy protection techniques [7] uses continues its training. We use zero to five successive training
three crawls for each website and extension. They however crawls and then perform a single measurement crawl.
do not motivate this choice nor provided measurement error The results of this experiment are shown in Figure 4. The
analysis. dashed line corresponds to a reference: crawling results of
Figure 3 is the ECDF of the relative standard error of the a bare browser. KS test groups the six measurements with
mean of the number of first and third party requests, and Privacy Badger together, but separates them from the bare
of the number of third party domains for all websites in browser. Without training, Privacy Badger is already able to

7
# 3rd party req. # 1st party req.
3400
(EasyList, EasyPrivacy, Peter Lowe’s Ad Server list and Mal-
3200 ware domains). One configuration uses NoTrace with the high
3000
2800 preset (the default configuration has no effect on our metrics).
2600
The last 14 configurations are extensions with their default
8000 setting. We do not use the Firefox Tracking Protection [39]
7500
7000
6500 because OpenWPM does not support it yet. However, this
6000
5500 technique uses the Disconnect blocking list. This means that
5000
by analyzing Disconnect, we provide a performance bound on
# 3rd party dom.

340 Firefox Tracking Protection’s performances. The results are


320
300
280 shown in Figure 5. The number on top of technique name is
260
240
220 the KS-based rank (see § IV-B2).
All techniques have a limited impact on first party requests.
1000
950 The most aggressive technique is NoScript [24]. This is due
# cookies

900
850
800 to the fact that Javascript is pervasive on Internet. Other
750
700
650 techniques exhibit a limited impact on th enumber of blocked
0 1 2 3 4 5
# of training crawls first party requests.
Fig. 4. Performances of Privacy Badger depending on the number of crawl Except for the number of cookies, the two most effective
performed for its training. The horizontal dashed line represents a default
Firefox as reference. extensions are RequestPolicy Continued and NoScript which
reduce the number of third party HTTP requests by 96% and
87%. However, as shown in § VI-B and other studies [6], [50],
provide partial protection. After a single training pass, the they strongly impact webpage quality.
number of third party requests is reduced by 10.3%. The Among other techniques, Ghostery [18] and uBlock Origin
results however do not improve when the number of training [17] provide the best performances. They reduce the number
crawl increases. We also ran a single training crawl on the of third party HTTP requests by 58% and 55%. Ghostery
Alexa Top 500 websites and observed similar performance as however needs to be configured which is difficult for users
with a single training crawl. The training is thus complete [56]. AdBlock Plus [16] uBO-like has similar performances
with less than one crawl (here one hundred websites). My- with uBlock Origin and Ghostery. Our results show that
TrackingChoices uses the same principle but without Privacy users can manually setup AdBlock to obtain good results,
Badger’s cookie data-based white-listing heuristics. One can unfortunately this is difficult [56]. Furthermore, AdBlock Plus
thus expect MyTrackingChoices to train faster. Moreover, used with EasyList is much more CPU and memory-hungry
MyTrackingChoices is provided with a small bootstrapping than uBlock Origin [22].
tracker list which should further reduce training time. For the Another group of techniques provides average perfor-
remainder of this paper, we train Privacy Badger using a single mances: Disconnect (and Firefox Tracking Protection), Ad-
crawl on the considered Alexa Top websites. Out of the four Block Plus (with EasyList and EasyList + EasyPrivacy),
existing work that uses Privacy Badger [55], [9], [12], [13], Privacy Badger, Blur, BeefTaco and NoTrace. Their results
only two actually peform some training [9], [12]. We here are mostly consistent across third party requests and domains,
provide the first estimation of how much website needs to be and cookies. Privacy Badger [23], Blur [21], Disconnect [19]
visited to complete Privacy Badger’s training. and AdBlock Plus with the EasyList block a relatively large
number of third party HTTP requests (between 33% and 48%),
VI. R ESULTS but a fairly small number of third party domains (between 8%
and 33%). We hypothesize that these techniques block the
A. Privacy protection
main trackers but fail to impact the tracking long-tail [48].
We compare protection techniques, first in terms of perfor- MyTrackingChoices performances are worse than untrained
mance and then, regarding their blocking overlap. Privacy Badger. We hypothesize that this is due to a GUI
1) Browsing metrics: We perform a general comparison bug that forbids users from customizing website categories
of privacy protection techniques. We crawled the Alexa Top where blocking occurs. MyTrackingChoices thus always block
1000 World using 21 different browser configurations. One tracking on 13 websites categories out of 32. BeefTaco has a
configuration is the default setup of the Firefox browser (Bare). noticeable impact on cookies but no impact on other metrics.
One configuration is Firefox setup with the DoNotTrack This is consistent with previously observed tracking long tail
HTTP header. Two configurations use Firefox’s ability to [48]. This phenomenom makes the opt-out cookie maintenance
block cookies: third party ones or third party cookies except very difficult while the protection that they offer is limited
from previously visited websites. We do not use complete to cookie blocking. NoTrace [20] provides average protection
cookie blocking because we hypothesize that users want to despite being setup at its highest level. NoTrace increases
keep using cookie for some domains. Three configurations users’ online privacy awareness [57], but unfortunately, users
set up AdBlock Plus with various lists: EasyList, EasyList cannot reliably setup the extensions [56] which is required.
and EasyPrivacy and a filter set similar to uBlock Origin Remaining techniques provide poor protection. Decen-

8
35000
30000 80000

Nb requests 3rd party


Nb requests 1st party

25000 60000
20000
15000 40000
10000
20000
5000
0 0
d) - 7

3PC E 7

Blur - 5

WOT - 11
PB - 7
FF e - 7
FF no DNT - 7
Dece PC - 7
Be ntr. - 7

WOT - 7
-7
HTTP MTC - 6
Disco Eve. - 6
NoTr ct - 5

ABP 4
Ghos EL - 4
ABP tery - 4
uBloc EL+EP - 3
ABP ukOrigin - 3
RPC - 3
ript - 2
1

Dece - 10
FF Batr. - 9
FF e - 9
FF no DNT - 9
9

MTC 8
HTTP PC EV - 9

-8
BeefT . - 9

PB (t PB - 7
A )-6
DiscoBP EL - 6
t-5
ABP E Blur - 5
uBloc L+EP - 4
ABP ukOrigin - 3
Ghos like - 3
NoSc ry - 3
RPC - 2
1
FF no efTaco -

ace -

e-

FF no 3PC -

aco -

ript -
V

S Eve

d
nnec
r

Bo-lik

r
ace
nne

te
raine

FF Ba

raine
n
3

NoSc

Bo-
NoTr
S

3
PB (t

(a) # first party requests (b) # third party requests

50000
2000
40000
1500
Nb domain

Nb cookie
30000
1000
20000
500
10000
0 0
MTC 14

FF no no 3PC - 1
Dece OT - 15
FF Ba r. - 14
FF e - 14
HTTP DNT - 14

- 13
BeefT ve. - 14

NoTr PB - 12

10
PB (t C EV - 10
Disco ined) - 9
Blur - 8
ABP 7
Ghos EL - 6
ABP E tery - 5
uBloc L+EP - 4
ABP u Origin - 3
NoSc ike - 3
RPC - 2
1
FF ace - 1

MTC 13
FF Ba T - 15
De re - 14
HTTP centr. - 14

- 12

A - 10
FF DNe. - 14
BeefT T - 14

NoTr PB - 11
PB (t BP EL - 9
3PC E 8
V-7
Disco Blur - 6
FF no nect - 5
ABP 3PC - 4
uBloc EL+EP - 3
Ghos igin - 2
ABP u tery - 2
NoSc ike - 2
RPC - 1
1
t-

ript -
aco -

FF no rained) -

ript -
nnec

aco -
r

ace
nt

Bo-l

WO
W

Bo-l
SE

S Ev

kOr
ra

n
3P

(c) # third party domains (d) # cookies


Fig. 5. Performance comparison of protection techniques regarding the mean # first party request, mean # third party requests, mean # third party domains
and # cookies. Each metric value corresponds to the total obtained by crawling the Alexa Top 1000 websites. The mean and standard deviation are computed
on ten crawls. The number on top of technique name is the KS-based rank. Bare: Firefox alone, MTC: MyTrackingChoices, PB: Privacy Badger, FF no 3PC:
Firefox no 3rd party cookie, FF no 3PC EV: Firefox no 3rd party cookie blocking except from previously visited domains, ABP: AdBlockPlus, EL: EasyList,
EP: EasyPrivacy, RPC: RequestPolicy Continued . AdBlock Plus uBo-like uses all uBlock Origin’s lists except uBlock Origin’s specific ones; it thus loads
EasyList, EasyPrivacy, Peter Lowe’s Ad Server list and Malware domains.

traleyes [27] has almost no effect on the results. We hypoth- cookies also slightly reduces the number of third party re-
esize that blocking library loading requests only marginally quests. As hypothesized in [48], this may be due to impeded
impacts HTTP traffic. The DoNotTrack HTTP header [5] also cookie-syncing interactions. However, contrary to Englehardt
has barely no effect. If trackers complied with DoNotTrack, et al. [48], third party cookie blocking here has poor perfor-
the number of cookies would significantly drop. We thus mances. This may be due to the fact that, unlike [48], we
conclude that DoNotTrack is not respected by most trackers. perform a stateful crawl. In a previous study [47] that uses
WebOfTrust and HTTPSEverywhere have no effect. stateful crawl and OpenWPM, third party cookie blocking also
We note that WebOfTrust and NoTrace both yield signifi- exhibits poor performances. The metric used in this study is
cantly more third party requests than Firefox without any ex- however extremely specific, it thus is very difficult to compare
tensions. WebOfTrust also contacts more third party domains our results with theirs. Furthermore, some third parties from
and increases the number of cookies. WebOfTrust annotate the Alexa Top 1000 are first party at some point (e.g. Twitter,
hyperlinks in webpages with trust rating. We hypothesize that Facebook). Their cookies are thus created as first parties.
this rating is obtained from WebOfTrust servers and thus 2) Privacy footprint: We then compared extensions using
impact our metrics. We do not have any explanation regarding the privacy footprint (see § IV-B3 and [1]). Figure 6 presents
NoTrace’s behavior. these results. They are similar to browsing metrics’ Figure 5.
When we block third party cookies in Firefox, there is a Six extensions exhibit much better results than the rest:
reduction in the number of third party domains. Blocking RequestPolicy Continued, NoScript, Ghostery, uBlock Origin

9
# 3rd party Mean # 3rd party by 1st party only blocked by Ei : |Bi − {Bk ∀k 6= i}|.
# 1st party link. to top 10 3rd parties As an example, we see AdBlock Plus on the second line
1000 of Fig. 7a. Square surfaces on this line represent the number
of third party requests blocked by AdBlock Plus: 1356. The
800 first square on the considered line represents the intersection
with Privacy Badger which is encoded by the surface of the
600
inner purple square, here 678. The green surface outside the
400 purple square describes the resources blocked by AdBlock
Plus but not by Privacy Badger. The next square on this line is
200 located on the diagonal, and represents the resources blocked
by AdBlock Plus only. Remaining squares on this line depicts
0 intersections with Disconnect, Ghostery and uBlock Origin in
WebOfTrust

NoTrace
MyTracking
FF Bare
BeefTaco

FF no 3rd
Everywhere

cookie EV
FF DNT
Privacy
Badger
ABP EL

Disconnect
Blur

Ghostery
NoScript
Decentraleyes

ABP EL+EP
ABP uBo-like
uBlockOrigin

Request Policy
Privacy Badger
no 3rd
FFcookie

Choices
the same fashion as the intersection with Privacy badger is

Continued
HTTPS

(trained)

represented (see above).


The overlap of the third party requests (Fig. 7a) shows that
there is a big overlap across all extensions. All third party re-
Fig. 6. Privacy footprint [1], [58] for the Alexa Top 1000 world. The mean quests blocked by Privacy Badger are also blocked by uBlock
and standard deviation are computed on the metric value of the ten crawl. EV: Origin. Similarly, Ghostery almost completely covers Privacy
Firefox no 3rd party cookie blocking except from previously visited domains, Badger. Despite using two different blocking techniques, lists
ABP: AdBlockPlus, EL: EasyList, EP: EasyPrivacy.
and heuristics, these extensions exhibit a significant overlap.
This proves that the Privacy Badger’s heuristics are reliable.
Results on the third party domains on Fig. 7b are similar
and AdBlock Plus with uBlock Origin’s list. For RequestPolicy to Figure 7a. The major difference is that Privacy Badger’s
Continued, the mean number of third parties per first party is impact is smaller. This is consistent with the heuristics be-
very small while the total number of third parties is much havior (see § V-B) that only block third party domains that
higher. This means that some third parties are detected but are seen across several websites. These heuristics thus have
that they are present on a limited set of first parties. NoScript trouble blocking less prominent trackers [48].
results are here close to Ghostery’s. The hierarchy between Both figures illustrate that Ghostery and uBlock Origin
Ghostery, uBlock Origin and the customized AdBlock Plus is block many specific resources that other extensions do not
here very clear and follows this order. We hypothesized that block. We speculate that this is due the fact that their blocking
the uniqueness of Ghostery and uBlock Origin blocking lists list settings are unique. This may explain why they exhibit the
(see § VI-A3) is the main factor of their superior performances. best performance (see § VI-A1).
3) Overlap: Next, we measure the overlap between the Some resources are only blocked by AdBlock Plus. It
five extensions among the most popular ones regarding the however uses EasyList which is also employed by uBlock
number of blocked third party requests and blocked third Origin. The ten crawls made with AdBlock Plus may have
party domains using a matrix view. We here data crawled on reached some third parties that were not contacted during
the Alexa Top 100 websites in November 2016. We do not the ten measurements made with uBlock Origin. We thus
analyze NoScript [24] (resp. RequestPolicy Continued [25]) hypothesize that this observation is another side-effect of the
here because it indiscriminately blocks Javascript (resp. third ad churn observed by Guha et al. [72].
party requests). Privacy Badger has been trained on the Alexa Summary: The most popular extensions show a wide over-
Top 100 websites and each extension is used with its default lap. Ghostery and uBlock Origin block specific resources that
blocking list. AdBlock Plus uses the EasyList. uBlock Origin are not affected by other extensions. In terms of overall privacy
loads EasyList, EasyPrivacy, Peter Lowe’s Ad Server list, protection, RequestPolicyContinued and NoScript show the
Malware domains, and some uBlock specific lists. Ghostery best performances. Ghostery and uBlock Origin protect users
is set up with all its filter categories, while Disconnect is slightly less. Remaining techniques provide average to low
employed with its own filters. protection. The DoNotTrack HTTP header provides almost no
Let us consider two extensions. Let Ei be the extension on protection.
the row i, and Ej the extension on the column j. The resources
blocked by Ei (resp. Ej ) are noted Bi (resp. Bj ). On the left B. Webpage quality
(resp. top), the green (resp. purple) square on row i (resp. We finally analyze how privacy protection techniques im-
column j) represents the total number of resource blocked by pact website quality. We use an HTML-based automated
Ei (resp. Ej ). On each line i of the matrix, the surface of the approach and perform a manual analysis.
square represents the total number of resources blocked by Ei : 1) HTML metrics: We crawl the Alexa Top 100 world on
|Bi |. The inner purple square on row i and column j represents the 21 different browser configurations of § VI-A1. We use
the intersection between Ei and Ej and its surface is |Bi ∩Bj |. the metrics presented in § IV-C1 and the ranking method
The diagonal orange square surface displays resources that are exposed in § IV-B2. Figure 8 presents the results. We discard

10
(a) # third party requests (b) # third party domains
Fig. 7. Overlap of third party requests and domains blocked by five extensions (PB: Privacy Badger, Dis: Disconnect, ADB: AdBlock Plus, Gho: Ghostery,
uBO: uBlock Origin). Square surfaces represent the metric value for the extension on the row.

the HTML size metric because there is no significant change We use the questions about layout similarity and missing
across protection techniques. NoScript has a noticeable impact elements presented in § IV-C2. Figure 9 exposes our results.
on all metrics. This is especially visible for the script number, The blue (resp. red) boxplot corresponds to the first (resp.
and image and script size. RequestPolicy Continued’s impact is second) question. Overall, Privacy Badger, Blur, Disconnect,
smaller than NoScript’s except for image size. We hypothesize uBlockOrigin and Ghostery have a very small impact on ren-
that RequestPolicy Continued blocks third party image hosting dered webpages. On the other hand, RequestPolicy Continued
services, and thus impacts many websites. NoScript greatly and NoScript significantly impact on pages both in terms of
reduces the total size of scripts which is consistent with its layout and missing elements. This is consistent with § VI-B1
goal: blocking JavaScript on webpages. Other privacy protec- and previous results [6], [50]. The manual analysis uncovers
tion techniques appear to have a limited impact of website RequestPolicy Continued’s strong impact on webpage quality
quality. Compared to other techniques, Ghostery exhibits a which was not exposed by HTML-based metrics.
smaller impact on images and a higher reduction of the number Summary: Both studies show that RequestPolicyContinued
of script and their total size. This is consistent with its tracking and NoScript reduce webpage quality. Other extensions do not
protection goal. This extension actually does not intend to have such impact.
block media such as advertisements.
2) Manual analysis: The previous subsubsection analyzes C. Summary
webpages quality in terms of HTML and associated data. This Figure 10 summarizes our findings. Each metric-based
automated approach does not necessarily reflect the quality synthetic index is build as the mean of previously used
of the rendered webpage. We perform a manual analysis to metrics (see § IV-B1 § IV-C1) normalized using the metric
cope with this limitation. We gather screenshots of the Alexa value obtained with Firefox used alone. Manual analysis-
Top 50 without pages from the same entity but with distinct based index is the mean of the mean rating of both ques-
localization (e.g. Google and Amazon). These screenshots tions. Figure 10a shows that our manual analysis captures
were gathered in November 2016. We use the seven techniques the negative effects of RequestPolicy Continued that are not
that provided the best privacy protection (see § VI-A1): Privacy visible with HTML-based metrics. Figures 10b and 10c
Badger trained, Blur, Disconnect, uBlock Origin, Ghostery, demonstrate RequestPolicy Continued and NoScript show the
NoScript and RequestPolicy Continued. For each webpage and best protection performances but impact browsing experience.
protection technique, we display a picture that contains two Ghostery and uBlock Origin protect users slightly less but have
screenshots side-to-side to the user: the left one captured with a smaller impact on webpage quality. Remaining techniques
Firefox alone, the other with Firefox used with an extension. provide average protection.

11
1e7
5000 7
6

Image data size (byte)


4000
5
# image

3000 4
2000 3
2
1000
1
0 0

RPC - 2
ABP E ined) - 2

ABP E aco - 4

NoSc ce - 3
-2
ABP u PB - 2
e-2
3PC E 2
V-2
FF no MTC - 2
BeefT PC - 2
Disco aco - 2
t-2
uBloc DNT - 2
FF Ba in - 2
B 2
Dece lur - 2
HTTP ntr. - 2
Ghos ve. - 2
WOT 2
NoTr - 2

1
1

PB (t PB - 4
FF no ed) - 4
Dece C - 4
Disco ntr. - 4
M 4
BeefT TC - 4

-4
ABP E r - 4
L-4
FF no Bo-like - 4
FF DN - 4
Ghos T - 4
FF Ba y - 4
re - 4
HTTP WOT - 4
uBloc S Eve. - 4
NoTr in - 4

2
1
ace -
FF no ABP EL -

re -

tery -

ript -

t-

ript -
RPC -
L+EP

L+EP

V
nnec

nnec

Blu

ter
Bo-lik

3PC E
3P
kOrig

kOrig
a
3

SE

rain
NoSc
F
ra

ABP u
PB (t

(a) # images (b) Image size

1e7
3.0
2500
2.5

Script data size (byte)


2000
2.0
# script

1500
1.5
1000 1.0
500 0.5
0 0.0
uBloc ace - 3

HTTP Trace - 4
-4
4
PB (t PC EV - 3
FF no d) - 3
Dece PC - 3
ABP Er. - 3
L-3
Disco PB - 3
FF Ba ct - 3
Bee re - 3
HTTP fTaco - 3

-3
MTC 3
ABP u F DNT - 3
ABP Eo-like - 3
NoTr EP - 3

RPC - 2
te 2
NoSc ry - 2
1

FF no S Eve. - 3
FF no EV - 3
BeefT PC - 3
ABP E - 3
raine L - 3
Dece d) - 3
FF Ba r. - 3
FF DNe - 3
Blur - 3
3
WOT 3
-3
Disco MTC - 2
uBloc nnect - 2
ABP E rigin - 2
ABP u L+EP - 2
Ghos ike - 2
R 2
NoSc PC - 2
1
FF no Blur -

.-

in -

ript -

T-

PB -

tery -

ript -
aco
WOT

S Eve

r
nt

nt
kOrig
nne
raine

L+
3

Bo-l
3PC
Ghos

kO
B

No
3

PB (t

(c) # scripts (d) Script size


Fig. 8. Comparison of protection techniques’ website quality regarding: number of image and scripts and size of images and scripts. We crawled the Alexa
Top 100 websites. The mean and standard deviation are computed on ten crawls. The number on top of technique name is the KS-based rank. Bare: Firefox
alone, FF: Firefox feature, MTC: MyTrackingChoices, PB (trained): Privacy Badger (trained), FF no 3rd coo. EV: Firefox no 3rd party cookie blocking except
from previously visited domains, ABP: AdBlockPlus, EL: EasyList, EP: EasyPrivacy, RPC: Request Policy Continued. AdBlock Plus uBo-like uses all uBlock
Origin’s lists except uBlock Origin’s specific ones; it thus loads EasyList, EasyPrivacy, Peter Lowe’s Ad Server list and Malware domains.

VII. D ISCUSSIONS that users are more preoccupied by blocking ads than tracking.
Malloy et al. [75] show that ad-blockers usage vary between
A. Recommendations 18% and 37% in analyzed countries (US, UK, Germany,
Our analysis shows that RequestPolicy Continued and No- France, and Canada). Metwalley et al. show [46] that 10 to
script are the most effective privacy protection techniques. 18% of users had installed AdBlock Plus while less than 3.5%
Similarly to previous work [6], [50], we confirm that they (resp. 2%) were using DoNotTrackMe/Blur (resp. Ghostery).
affect webpage quality. Users that want to avoid website Similarly, Pujol et al. [74] speculate that out of the 19.7%
breakage can instead use uBlock Origin or Ghostery. The latter of users that install AdBlock Plus, less than 15% of them
however needs to be configured which is difficult for users setup the EasyPrivacy list. We show that the default setup
[56]. We note that previous work [12], [13] also recommended of AdBlock Plus has average performances but that it can
uBlock Origin and Ghostery. Their results were however noisy be configured to achieve good results. This task is however
(percentiles are overlapping in [13] and some techniques difficult for users [56]. We thus advise ad-blocking users that
actually observed more fingerprintting than the default browser want to improve their privacy to directly change to more
in [12]) due to the use of single crawl to each website. Our efficient extensions (see above).
robust measurement methodology does not suffer from the Extensions that do not rely on manually maintained filters,
same weakness, and show that uBlock Origin or Ghostery Privacy Badger and MyTrackingChoices, exhibit average per-
exhibit statistically better performance than other techniques. formances. It however is not clear how their performances
Previous network traffic monitoring studies [46], [74] show may be improved. Analyzing extensions’ behavior against

12
Layout similarity rating tory) are less valued by advertisers. A user that blocks all third
Proportion of element missing with extension parties cookies thus reduces his customer value for advertisers,
and, consequently, publisher’s revenue. This however does not
Good

10
completely block tracking and thus, advertisement transactions
8 (e.g. bidding in RTB). Indiscriminate third party blocking
techniques (such as RequestPolicy Continued [25]) however
6 have a much bigger impact on advertising. They for example
Score

block interactions between ad exchange and bidders in the


4 case of RTB. All privacy protection techniques presented in
this work may thus have very different impact on advertising.
2
We intend to address this aspect in future work.
0 Ghostery [18] and MyTrackingChoices [22] can block track-
Bad

ing for specific website categories, and thus, allow monetiza-


er

ct

in

ry

ed licy
rip
Blu

rig
ne

te
ed dg

inu o
Sc
os
on

kO
ain a

nt st P
tion for others. This is obviously less aggressive than most
No
Tr cy B

Gh
sc

loc

Coque
Di
iva

approaches addressed in this work. Achara et al. [22] however


uB

Re
Pr

report that 30% of users block the tracking for all categories.
Fig. 9. Webpage quality manual analysis. Users are presented two screenshots These users thus have no reason to use these two techniques
obtained with Firefox alone and with a privacy protection technique. The first
question (blue) asks user to rate layout similarity. The second question (red) instead of other ones.
corresponds to the proportion of elements observed with Firefox alone also
present when a privacy protection technique is used. Screenshots are captured C. Future work
on the Alexa Top 50. Boxplots extend to the upper and lower quartile and
contain a line that represents the median. Dashed lines show min/max range. As shown in § III, existing work uses a vast array of metrics.
When not visible, median is located on top of the upper quartile. Some of them are tailored to very specific use cases (such
as behavioral advertising [8], [45] and cookie-based mass-
surveillance [47]). Similarly, data-related metrics present an
existing blocking lists would help to improve their results. obvious interest in a mobile context. We intend to add new
Blocking list-based protection techniques are efficient but their metrics to this work. We also want to analyze the impact of
scalability is limited due to human intervention. Current efforts all these protection techniques on specific attacks (browser
[68], [52], [53], [51], [55] to automatically build blocking list fingerprinting [4], [48], cookie syncing [67], [2], [76], [48]).
are thus extremely relevant. Analyzing extensions inside other browsers, such as
Ghostery and uBlock Origin block a significant amount of Chrome, Internet Explorer or Opera, would provide a wider
resources that are not blocked by other protection techniques. picture of privacy protection techniques in the wild. Sele-
Both should thus be considered as relevant when assessing nium, the browser automation framework used by OpenWPM,
users’ ability to protect themselves. supports these browsers. OpenWPM, however, currently only
supports Firefox. Furthermore, browsers include more and
B. Limitations more privacy protection (e.g. Firefox [39], Brave [40], Cliqz
In this work, we estimate the impact of privacy protection [54]). Cliqz and Brave are not supported by Selenium, which
techniques on webpage quality in terms of missing elements make both browsers impossible to use with OpenWPM (see
and data (using HTML-based metrics § IV-C1 and alteration to § IV-A). We however intend to add them later on.
the webpage layout and elements (using a user manual analysis While some works use domain name-based heuristics [77],
§ IV-C2). We however do not address the functionality loss. [67] to identify third parties, others use blocking lists-based
We intend to explore this aspect in future work along two extensions as ground-truth (see Table II). When the latter
axes. We plan to explore each website beyond its homepage method is used, tracking’s long tail [48] may cause many
to improve website coverage. false negatives. Beyond our analysis in § VI-A, we intend
Several work [59], [68], [51], [52], [53], [55] automatically to investigate blocking lists more thoroughly.
build blocking lists. We did not evaluate them because pro-
duced lists are not publicly available. VIII. C ONCLUSIONS
Evaluated approaches aim at blocking tracking. One side ef- This work extensively compares existing privacy protection
fect is advertisement blocking. These approaches thus threaten techniques against third party tracking and provides four
the current economic model of many publishers on the Inter- contributions. First, we propose a robust methodology to
net. Techniques examined in this work however do not have the compare privacy protection techniques when crawling many
same impact on third party tracking. For example, Ghostery websites, and quantify measurement error. We use the privacy
and MyTrackingChoices can block tracking for specific web- footprint [1] and apply the Kolmogorov-Smirnov (KS) test on
site categories. Similarly, Firefox’s third party cookie blocking browsing metrics to evaluate privacy protection. This test is
only partially impacts advertisement techniques (such as Real- likewise applied to HTML-based metrics to assess the impact
Time Bidding, RTB) that rely on user interests. Olejnik et al. on webpage quality. We also design a manual analysis to
[67] show that new users (i.e. users without any browsing his- complement HTML-based metrics. Our second contribution is

13
1.05 Blur Disconnect

High
High
Disconnect
High

9.5 uBlockOrigin
9.5 Privacy Badger
Ghostery 1.00 uBlockOrigin (trained)
Disconnect Ghostery
uBlockOrigin 0.95 Privacy Badger Blur
Manual analysis quality index

Blur 9.0

Manual analysis quality index


9.0 Ghostery (trained)

HTML quality index


0.90
8.5 Privacy Badger 0.85 8.5
(trained)
0.80 Request Policy
8.0 Continued 8.0
0.75
Request Policy Request Policy
Continued 0.70 Continued
7.5 NoScript NoScript 7.5 NoScript

Low
Low

Low
0.65
0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Low HTML quality index High High Privacy protection index Low High Privacy protection index Low
(a) HTML-based vs manual analysis quality (b) Privacy protection vs HTML-based quality (c) Privacy protection vs manual analysis
Fig. 10. Scatter plots of synthetic index for privacy protection, HTML-based quality and manual analysis-based quality.

a privacy protection techniques comparison in terms of overlap [9] R. Hill, “Comparative benchmarks against widely used blockers:
and performances. We show that the most popular privacy Top 15 most popular news websites,” 2014, accessed: 2017-07-30.
[Online]. Available: https://github.com/gorhill/httpswitchboard/
protection techniques exhibit a common blocking baseline. We wiki/Comparative-benchmarks-against-widely-used-blockers:
also highlight the fact that some extensions, namely Ghostery -Top-15-Most-Popular-News-Websites
and uBlock Origin, have a very specific behavior and are the [10] ——, “ublock and others: Blocking ads, track-
ers, malwares,” 2015, accessed: 2017-07-30. [Online].
only ones to block some domains. Our third contribution is Available: https://github.com/gorhill/uBlock/wiki/uBlock-and-others%
a comparison of the impact of privacy protection techniques 3A-Blocking-ads%2C-trackers%2C-malwares
on webpage quality. Our fourth contribution is a set of usage [11] C. E. Wills and D. C. Uzunoglu, “What ad blockers are (and are not)
doing,” in HotWeb 2016, pp. 72–77.
recommendations for end-users, and research recommendation [12] G. Merzdovnik, M. Huber, D. Buhov, N. Nikiforakis, S. Neuner,
for the scientific community. Ghostery and uBlock Origin M. Schmiedecker, and E. Weippl, “Block me if you can: A large-scale
provide the best trade-off between protection and webpage study of tracker-blocking tools,” in EuroSP 2017.
[13] S. Traverso, M. Trevisan, L. Giannantoni, M. Mellia, and H. Metwalley,
quality. Ghostery however requires a configuration step which “Benchmark and comparison of tracker-blockers:should you trust them?”
is difficult for users [56]. The RequestPolicy Continued and in TMA 2017.
NoScript extensions exhibit the best performances but reduce [14] T. Bujlow, V. Carela-Español, J. Solé-Pareta, and P. Barlet-Ros, “A
survey on web tracking: Mechanisms, implications, and defenses,”
webpage quality. Ghostery and uBlock Origin use manually Proceedings of the IEEE, vol. PP, no. 99, pp. 1–35, 2017.
built blocking lists which are cumbersome to maintain. Re- [15] J. Estrada-Jiménez, J. Parra-Arnau, A. Rodrı́guez-Hoyos, and J. Forné,
search efforts should focus on improving existing approaches “Online advertising: Analysis of privacy threats and protection ap-
proaches,” Computer Communications, 2016.
that do not rely on blocking lists (such as Privacy Badger or [16] “AdBlock Plus,” accessed: 2017-07-30. [Online]. Available: https:
MyTrackingChoices) and automating blocking list building. //adblockplus.org/
[17] “uBlock Origin,” accessed: 2017-07-30. [Online]. Available: https:
//github.com/gorhill/uBlock
R EFERENCES [18] “Ghostery,” accessed: 2017-07-30. [Online]. Available: https://www.
ghostery.com/
[19] “Disconnect,” accessed: 2017-07-30. [Online]. Available: https://
[1] B. Krishnamurthy and C. E. Wills, “Generating a privacy footprint on
disconnect.me/
the internet,” in IMC 2006, pp. 65–70.
[20] “NoTrace,” accessed: 2017-07-30. [Online]. Available: http://www.
[2] G. Acar, C. Eubank, S. Englehardt, M. Juarez, A. Narayanan, and isislab.it/projects/NoTrace/
C. Diaz, “The web never forgets: Persistent tracking mechanisms in [21] “Blur,” accessed: 2017-07-30. [Online]. Available: https://dnt.abine.com
the wild,” in CCS 2014, pp. 674–689. [22] J. P. Achara, J. Parra-Arnau, and C. Castelluccia, “Mytrackingchoices:
[3] A. Soltani, S. Canty, Q. Mayo, L. Thomas, and C. J. Hoofnagle, “Flash Pacifying the ad-block war by enforcing user privacy preferences,” in
cookies and privacy,” in SSRN, 2009, pp. 158–163. WEIS 2016.
[4] P. Eckersley, “How unique is your web browser?” in PETS 2010, pp. [23] “Privacy Badger,” accessed: 2017-07-30. [Online]. Available: https:
1–18. //www.eff.org/fr/privacybadger
[5] “DoNotTrack,” accessed: 2017-07-30. [Online]. Available: https: [24] “NoScript,” accessed: 2017-07-30. [Online]. Available: https://noscript.
//tools.ietf.org/html/draft-mayer-do-not-track-00 net/
[6] B. Krishnamurthy, D. Malandrino, and C. E. Wills, “Measuring privacy [25] “RequestPolicyContinued,” accessed: 2017-07-30. [Online]. Available:
loss and the impact of privacy protection in web browsing,” in SOUPS https://requestpolicycontinued.github.io/
2007, pp. 52–63. [26] “HTTPS Everywhere,” accessed: 2017-07-30. [Online]. Available:
[7] J. R. Mayer and J. C. Mitchell, “Third-party web tracking: Policy and https://www.eff.org/fr/https-everywhere
technology,” in SP 2012, pp. 413–427. [27] “Decentraleyes,” accessed: 2017-07-30. [Online]. Available: https:
[8] R. Balebako, P. Leon, R. Shay, B. Ur, Y. Wang, and L. Cranor, //decentraleyes.org
“Measuring the effectiveness of privacy tools for limiting behavioral [28] “WebOfTrust,” accessed: 2017-07-30. [Online]. Available: https:
advertising,” in W2SP 2012. //www.mywot.com/

14
[29] N. Nikiforakis, A. Kapravelos, W. Joosen, C. Kruegel, F. Piessens, and [58] B. Krishnamurthy and C. Wills, “Privacy diffusion on the web: a
G. Vigna, “Cookieless monster: Exploring the ecosystem of web-based longitudinal perspective,” in WWW 2009, pp. 541–550.
device fingerprinting,” in SP 2013, pp. 541–555. [59] F. Roesner, T. Kohno, and D. Wetherall, “Detecting and defending
[30] J. Vincent, “Ghostery has been bought by the developer against third-party tracking on the web,” in NSDI 2012, pp. 12–26.
of a privacy-focused browser,” 2 2017, accessed: 2017-07- [60] D. Fifield and S. Egelman, “Fingerprinting web users through font
30. [Online]. Available: http://www.theverge.com/2017/2/15/14622484/ metrics,” in FC 2015, pp. 107–124.
ghostery-ad-tracking-plug-in-cliqz [61] Ł. Olejnik, G. Acar, C. Castelluccia, and C. Diaz, “The leaking battery,”
[31] “Network advertising industry,” accessed: 2017-07-30. [Online]. in DPM 2015, pp. 254–263.
Available: http://www.networkadvertising.org/choices/ [62] K. Mowery and H. Shacham, “Pixel perfect: Fingerprinting canvas in
[32] “Digital advertising alliance,” accessed: 2017-07-30. [Online]. Available: html5,” in W2SP 2012.
http://www.aboutads.info/choices/ [63] Y. Cao, S. Li, and E. Wijmans, “(Cross-)Browser Fingerprinting via OS
[33] “Beeftaco,” accessed: 2017-07-30. [Online]. Available: https://jmhobbs. and Hardware Level Features,” in NDSS 2017.
github.io/beef-taco/ [64] G. Acar, M. Juarez, N. Nikiforakis, C. Diaz, S. Gürses, F. Piessens, and
[34] “Allowing acceptable ads in adblock plus - agreements,” B. Preneel, “Fpdetective: dusting the web for fingerprinters,” in CCS
accessed: 2017-07-30. [Online]. Available: https://adblockplus.org/ 2013, pp. 1129–1140.
acceptable-ads-agreements [65] A. Lerner, A. K. Simpson, T. Kohno, and F. Roesner, “Internet jones and
[35] “EasyList/EasyPrivacy,” accessed: 2017-07-30. [Online]. Available: the raiders of the lost trackers: An archaeological study of web tracking
https://easylist.to/ from 1996 to 2016,” in Usenix Security 2016, pp. 997–1013.
[36] V. S. Eckert, J. Klofta, and J. L. Strozyk, ““Web of Trust” späht [66] A. Ghosh and A. Roth, “Selling privacy at auction,” in EC 2011, pp.
nutzer aus,” 11 2016, accessed: 2017-07-30. [Online]. Available: 199–208.
https://www.tagesschau.de/inland/tracker-online-103.html [67] L. Olejnik, T. Minh-Dung, and C. Castelluccia, “Selling off privacy at
[37] L. Olejnik, “Proposed amendments to eprivacy reg- auction,” in NDSS 2014, pp. 104–114.
ulation are great,” 6 2017, accessed: 2017- [68] J. Bau, J. Mayer, H. Paskov, and J. C. Mitchell, “A promising direction
07-30. [Online]. Available: https://blog.lukaszolejnik.com/ for web tracking countermeasures,” in W2SP 2013.
proposed-amendments-to-eprivacy-regulation-are-great [69] V. Kalavri, J. Blackburn, M. Varvello, and K. Papagiannaki, “Like a
[38] “Firefox,” accessed: 2017-07-30. [Online]. Available: https://www. pack of wolves: Community structure of web trackers,” pp. 42–54.
mozilla.org/en-US/firefox [70] “Alexa top 500 sites on the web,” accessed: 2017-07-30. [Online].
Available: http://www.alexa.com
[39] G. Kontaxis and M. Chew, “Tracking protection in firefox for privacy
[71] “Public suffix list,” accessed: 2017-07-30. [Online]. Available: https:
and performance,” in W2SP 2015.
//publicsuffix.org/
[40] “Brave,” accessed: 2017-07-30. [Online]. Available: https://brave.com/
[72] S. Guha, B. Cheng, and P. Francis, “Challenges in measuring online
[41] “Cliqz,” accessed: 2017-07-30. [Online]. Available: https://cliqz.com/en advertising systems,” in IMC 2010, pp. 81–87.
[42] “AdBlock,” accessed: 2017-07-30. [Online]. Available: https: [73] G. Pass, A. Chowdhury, and C. Torgeson, “A picture of search.”
//getadblock.com/ InfoScale, vol. 152, p. 1, 2006.
[43] C. Castelluccia, S. Grumbach, and L. Olejnik, “Data harvesting 2.0: [74] E. Pujol, O. Hohlfeld, and A. Feldmann, “Annoyed users: Ads and ad-
from the visible to the invisible web,” in WEIS 2013. block usage in the wild,” in IMC 2016, pp. 93–106.
[44] N. Fruchter, H. Miao, S. Stevenson, and R. Balebako, “Variations in [75] M. Malloy, M. McNamara, A. Cahn, and P. Barford, “Ad blockers:
tracking in relation to geographic location,” in W2SP 2015. Global prevalence and impact,” in IMC 2016, pp. 119–125.
[45] J. M. Carrascosa, J. Mikians, R. Cuevas, V. Erramilli, and N. Laoutaris, [76] M. Falahrastegar, H. Haddadi, S. Uhlig, and R. Mortier, “Tracking
“I always feel like somebody’s watching me. measuring online be- personal identifiers across the web,” in PAM 2016, pp. 30–41.
havioural advertising,” in CoNext 2015. [77] ——, “The rise of panopticons: Examining region-specific third-party
[46] H. Metwalley, S. Traverso, M. Mellia, S. Miskovic, and M. Baldi, “The web tracking.” in TMA 2014, pp. 104–114.
online tracking horde: a view from passive measurements,” in TMA
2015, pp. 111–125.
[47] S. Englehardt, D. Reisman, C. Eubank, P. Zimmerman, J. Mayer,
A. Narayanan, and E. W. Felten, “Cookies that give you away: The
surveillance implications of web tracking,” in WWW 2015, pp. 289–299.
[48] S. Englehardt and A. Narayanan, “Online tracking: A 1-million-site
measurement and analysis,” in CCS 2016.
[49] D. Malandrino, A. Petta, V. Scarano, L. Serra, R. Spinelli, and B. Krish-
namurthy, “Privacy awareness about information leakage: Who knows
what about me?” in WPES 2013, pp. 279–284.
[50] D. Malandrino and V. Scarano, “Privacy leakage on the web: Diffusion
and countermeasures,” Computer Networks, vol. 57, no. 14, pp. 2833–
2855, 2013.
[51] D. Gugelmann, M. Happe, B. Ager, and V. Lenders, “An automated
approach for complementing ad blockers blacklists,” in PETS 2015, pp.
282–298.
[52] H. Metwalley, S. Traverso, and M. Mellia, “Unsupervised detection of
web trackers,” in GLOBECOM 2015, pp. 1–6.
[53] Q. Wu, Q. Liu, Y. Zhang, and G. Wen, “Trackerdetector: A system
to detect third-party trackers through machine learning,” Computer
Networks, vol. 91, pp. 164–173, 2015.
[54] Z. Yu, S. Macbeth, K. Modi, and J. M. Pujol, “Tracking the trackers,”
in WWW 2016, pp. 121–132.
[55] M. Ikram, H. J. Asghar, M. A. Kaafar, A. Mahanti, and B. Krishna-
murthy, “Towards seamless tracking-free web: Improved detection of
trackers via one-class learning,” ser. PETS 2017.
[56] P. Leon, B. Ur, R. Shay, Y. Wang, R. Balebako, and L. Cranor, “Why
johnny can’t opt out: A usability evaluation of tools to limit online
behavioral advertising,” in CHI 2012, pp. 589–598.
[57] D. Malandrino, V. Scarano, and R. Spinelli, “How increased awareness
can impact attitudes and behaviors toward online privacy protection,” in
SocialCom 2013, pp. 57–62.

15

You might also like