You are on page 1of 20

UNIVERSITY OF GENEVA DIGITAL LAW SUMMER SCHOOL 2020

THE ANONYMIZATION OF DATA:


A “FALSE” SAFE HAVEN?

Paper presented by
Héléna Vial

Course directors
Professor Jacques de Werra and Dr. Yanniv Benhamou

July 2020
1. INTRODUCTION 2

2. WHAT IS PERSONAL DATA? 4

3. HOW PERSONAL DATA IS USED AND WHY WE MUST PROTECT IT 5

4. DATA ANONYMIZATION AS A SOLUTION 6


4.1 Definition of data anonymization 6
4.2 Anonymization techniques 8

5. “THE BROKEN PROMISE” OF A TRULY ANONYMIZED DATA 9


5.1 Successfully anonymized data and its legal framework 10
5.2 Risks of re-identification 12

6. CONCLUSION 14

7. ABBREVIATIONS 16

8. VOCABULARY 16

9. BIBLIOGRAPHY 16

10. INTERNET SITES 17

11. APPENDIX 18

1. Introduction
For already a few decades now, Internet has been developing at a tremendous pace and radically
changed how we communicate and how information is exchanged. On one hand, we give
willingly our personal information against access to internet platforms, our GPS location to
know how many steps we do within a day or even our DNA to see where we come from. And
on the other hand, we desperately want to protect our personal data.

This era of big data promises opportunities to extract hidden value from raw datasets through
novel reuse. Therefore, in a world where everything is ubiquitously connected, registered and
used, privacy and the protection of our personal data has become one of the major issues of our
time.

Data anonymization implies an alteration of the personal data which is being processed in a
way that the individual cannot be identified anymore. This solution isn’t new, it was already

2
debated in 1850 with the US Federal Bureau of Statistics (Census bureau) and various
techniques were already being used such as stripping names from the personal data. Gradually,
this anonymization of data was done by computers in an automated manner.

The debate between data quality and its protection is central since a lower value of the data
means more privacy but less usability for data scientists. This led many scholars and statistics
professors to conclude that data anonymization is idealistic. Another challenge lies in the fact
that many scientific papers have proven that data which has been anonymized can be de-
anonymised and therefore be easily exposed.

Conscious of those difficulties that underlie the process of data anonymization, many legal
scholars and scientists have tried to address this issue and new anonymization methods are
regularly published.

The aim of this paper will be to identify the legal aspects of data anonymization in european
Law and american Law as well as the issues resulting in those legal solutions. The General Data
Protection Regulation (Regulation EU 2016/679) (GDPR) and the California Consumer Privacy
Act (CCPA) will be the major legal instruments analysed.

On one hand, the GDPR, which went into effect on the 25th of May 2018, is one of the most
comprehensive data protection law in the world and serves as a model for data protection law
of other countries such as India or Brazil.

On the other hand, the CCPA which took effect on the 1st of January 2020, is considered to be
the most significant milestone in the development of legal privacy in the USA, since it lacks a
federal privacy law.

We will begin with a short reminder about what personal data is under both jurisdictions, how
it is used and why we should protect it. Then, we will analyse how data anonymization appears
to be the solution and cover its definition under both jurisdictions. Finally, we will focus on the
inherent risk of re-identification and how to handle it.

This paper will not focus on the technical aspects of data anonymization nor its historical
background.

3
A short vocabulary concerning the technical terms of anonymization techniques and risks
related to this process is provided at the end of this paper (cf. infra Chapter 8). These terms will
be marked with an asterisk.

2. What is personal data?

From the European perspective, personal data is defined, in Art. 4 (1) of the General Data
Protection Regulation of 2016, as “any information which are related to an identified or
identifiable natural person”1.

Personal data can include a name, a number or other identifiers such as an IP address or even a
cookie identifier. As soon as the identification of an individual is made possible from the
information that is being processed, we are most likely dealing with personal data2.

From the American perspective, the term “personal information” is most commonly used rather
than personal data. A myriad of data protection legislations, from federal to state levels, address
this issue depending on the sector and the purpose of the protection (Children’s Online Privacy
Protection Act of 1998; Health Insurance Portability and Accountability Act of 1996; etc.).
However, unlike European law, there isn’t any harmonized definition, there are different Acts
to protect different types of data3.

For example, under the California Consumer Privacy Act of 2018 (CCPA), personal
information is defined as “information that identifies, relates to, describes, is reasonably
capable of being associated with, or could reasonably be linked, directly or indirectly, with a
particular consumer or household”.

From this definition, we understand how the meaning of personal information is broad and that
what might fall under this category might not at first sight seem like personal information. For
example, under the CCPA, using data that is not personal data to draw interference for creating

1
Art. 4 (1) GDPR.
2
https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-
gdpr/key-definitions/what-is-personal-data/ (2).
3
https://iclg.com/practice-areas/data-protection-laws-and-regulations/usa;
https://www.mondaq.com/unitedstates/privacy-protection/803300/what-is-personal-information-in-legal-terms-
it-depends?signup=true (3).

4
profiles on consumers such as monitoring user behaviour on a domain (scroll speed, clicks etc.)
can be considered as personal information4.

Unlike the GDPR, the CCPA does not consider publicly available information as “personal
information” (information made available by a government entity)5.

Both definitions are broad and appear to be context-dependant. Identifiability is therefore a key
component of the concept of personal data with regards of all the available elements.

3. How personal data is used and why we must protect it

Everyday our devices, service providers and retailers are tracking every move we make
through our purchases, browsers and apps, our cloud storage, basically everything that is
connected (process of data collection). Then, this data is taken by data brokers (process of data
aggregation). The targeting from the data brokers can vary depending on their aim.
Companies such as quantium or axciom focus more on targeted marketing and advertising
whereas Experian or Equifax concentrate on credit reporting and risk assessment. A solid
profile is finally made from every source where information can be gathered, including public
records6.

The billion-dollar Adtech industry that collects personal and behavioural data has created a
vast data collection empire that can measure all aspects of our online and offline life. The
profiles that are based on extracted patterns can easily be connoted with discriminatory results
and can breach our right to privacy, which is why our personal information must be protected.

The reuse of our data is processed for purposes beyond those that justified its original
collection and violates the purpose of limitation7, which is a key principle under the GDPR
regarding the processing of personal data8.

4
https://www.cookiebot.com/en/ccpa-personal-information-ccpa-compliance-with-cookiebot/ (4).
5
Data Guidance, Comparing privacy laws: GDPR v. CCPA, p. 15.
6
https://www.visualcapitalist.com/personal-data-ecosystem/ (5).
7
Art. 5 (1) (b) GDPR.
8
SOPHIE STALLA-BOURDILLON; ALISON KNIGHT, 2016, ARTICLE: ANONYMOUS DATA V. PERSONAL
DATA - A FALSE DEBATE: AN EU PERSPECTIVE ON ANONYMIZATION, PSEUDONYMIZATION AND
PERSONAL DATA. Wisconsin International Law Journal, 34, p. 284.

5
Figure 1: Example of a profile made from available data of an individual9.

4. Data anonymization as a solution

4.1 Definition of data anonymization

Any company that works with a database will need a place for long-time storage or there will
come a time when they will disclose their data to a third party. The challenge relies on not
compromising the privacy of the data subjects, especially when the data comes from sensitive
sources such as health or financial data. Data anonymization emerged as a solution of this
problem. Through the process of anonymization, data can be processed further without harming
the data subject’s privacy and is still considered compatible with the original processing.

The GDPR refers to anonymization in Recital 26:

9
https://www.visualcapitalist.com/personal-data-ecosystem/ (5).

6
“Whereas the principles of protection must apply to any information concerning an identified
or identifiable person; whereas, to determine whether a person is identifiable, account should
be taken of all the means likely reasonably to be used either by the controller or by any other
person to identify the said person; whereas the principles of protection shall not apply to data
rendered anonymous in such a way that the data subject is no longer identifiable; whereas
codes of conduct within the meaning of Article 27 may be a useful instrument for providing
guidance as to the ways in which data may be rendered anonymous and retained in a form in
which identification of the data subject is no longer possible;”.

The focus is on the fact that the data subject can no longer be identifiable by using “all the
means likely reasonably to be used” by either the controller or a third party. It appears that the
GDPR adopts a risk-based approach to anonymization with this “test”.

As mentioned above, the processing must be irreversible. The directive does not precise how
the de-identification process is or should be performed10.

It is important to remember that data that has been successfully anonymised does not fall under
the scope of the GDPR.

Under the CCPA, the term data anonymization cannot be found. Within section 1798.140, two
definitions seem to be related to what refers to data anonymization under European law:

(a) “Aggregate consumer information” means information that relates to a group or


category of consumers, from which individual consumer identities have been
removed, that is not linked or reasonably linkable to any consumer or household,
including via a device. “Aggregate consumer information” does not mean one or
more individual consumer records that have been deidentified.

(h) “Deidentified” means information that cannot reasonably identify, relate to,
describe, be capable of being associated with, or be linked, directly or indirectly, to a
particular consumer, provided that a business that uses deidentified information:

10
Opinion 05/2014 on Anonymization techniques, Article 29 Data Protection Working Party, 2014, chapter 2, page
5.

7
a. Has implemented technical safeguards that prohibit reidentification of the
consumer to whom the information may pertain.
b. Has implemented business processes that specifically prohibit reidentification
of the information.
c. Has implemented business processes to prevent inadvertent release of
deidentified information.
d. Makes no attempt to reidentify the information.

Although the CCPA uses a different term than the GDPR, they both tend to a similar result:
information that has been irreversibly de-identified from its subject.

Similarly to the GDPR, de-identified information under the CCPA is not considered as
“personal information” and therefore does not benefit from its protection.

Section 1798.140 (o) 3: “Personal information” does not include consumer information
that is de-identified or aggregate consumer information.

Section 1798.145 (a) 5: The obligations imposed on businesses by this title shall not
restrict a business’s ability to: Collect, use, retain, sell, or disclose consumer
information that is de-identified or in the aggregate consumer information.

A common pitfall is assimilating pseudonymized* data to anonymized data. Pseudonymization


replaces personal identifiers with non-identifying references. Pseudonymized data still allows
a data subject to be singled out and linkable across different data sets and can result in
identifiability. Pseudonymous data become anonymous when separately kept identifying
information (decryption key etc.) is destroyed.
This explains why pseudonymized data remains within the scope of the GDPR and the CCPA
unlike successfully anonymized data11. It is merely a useful security measure.

4.2 Anonymization techniques

The anonymization techniques won’t be thoroughly explained in this paper but for the sake of
a basic comprehension, we will briefly mention them.

11
GDPR, 2016, art. 4 (5); CCPA, 2018, Section 1798.140 (r).

8
Broadly speaking there are two different approaches to anonymization: randomization and
generalization.

1. Randomization regroups various techniques (noise addition, permutation, differential


privacy) that alters the veracity of the data in order to remove the strong link between
the information and the individual. This process will not reduce the singularity of each
data subject’s set of values but can be effective against inference* risks.

2. Generalization consists of generalizing the attributes of data subjects by modifying the


respective scale or order of magnitude. This technique prevents from singling out*
individuals but does not allow effective anonymization in all cases especially against
linkability* and inference12.

Figure 2: Example of generalization 13.

12
Opinion 05/2014 on Anonymization techniques, Article 29 Data Protection Working Party, 2014, chapter 3, page
11-19.
13
Big Data Anonymisation Techniques, CLAUDE CASTELLUCCIA, INRIA PRIVATICS, 2016.

9
5. “The broken promise” of a truly anonymized data

5.1 Successfully anonymized data and its legal framework

Firstly, to handle an anonymized data, we must assume that the personal data must have been
collected and processed in compliance with the applicable legislation. For example, under the
GDPR, the data provenance must respect the principle of lawfulness, fairness, transparency; the
purpose limitation; data minimization; accuracy etc14.

Concerning the CCPA, unlike the GDPR, prior consent from the consumer is not required
before collecting and processing data, except for the under–aged. The consent is only required
when the business enter a scheme that gives financial incentives based on the personal
information provided15. The CCPA does not possess a list of “positive” legal grounds required
for collecting, selling or disclosing personal information. However, consumers may ask
businesses not to sell their personal data16.

Secondly, the process of anonymization, meaning the processing of personal data to achieve
irreversible de-identification, is an instance of “further processing”. This can be considered
compatible with the original purpose of the processing but only on condition that the
information is considered anonymized17.

The recital 26 of the GDPR mentions that objective factors should be considered to ascertain
whether means are reasonably likely to be used to re-identify a person, such as “the costs of and
the amount of time required for identification” with the awareness of the available technology
at the time of the processing and the technological developments.

On the 10th of April 2014, the Article 29 Working Party issued an opinion, 05/2014, on data
anonymization techniques18. The Opinion analysed the effectiveness and limits of existing

14
GDPR, 2016, art. 5- 11; CCPA, 2018, Section 1798.140 (r).
15
CCPA, 2018, Section 1798.120 (d) and Section 1798.125 (b) (3).
16
Data Guidance, Comparing privacy laws: GDPR v. CCPA, p. 23.
17
Opinion 05/2014 on Anonymization techniques, Article 29 Data Protection Working Party, 2014, chapter 2.2.1,
page 7.
18
Article 29 Data Protection Working Party, Opinion 05/2014 on Anonymisation Techniques, 10 April 2014: This
Working Party was set up under Article 29 of Directive 95/46/EC. It is an independent European advisory body
on data protection and privacy.

10
anonymization techniques and provided recommendations to minimize the risks of re-
identification of individuals19.

The Working Party has clarified in its Opinion whether the anonymization process is
sufficiently robust (when identification has become “reasonably” impossible). The robustness
of each technique is based on 3 criteria:

1. Is it still possible to single out an individual;


2. Is it still possible to link records relating to an individual, and;
3. Can information be inferred concerning an individual?

With the strengths and weaknesses of each techniques outlined, the data controller can choose
how to de design an adequate anonymization process with the specific context and circumstance
of his case. The “Technical Annex” of the Opinion provides also an analysis on which technique
is the most appropriate and its impact depending on the situation20.

What counts as “de-identified” or “aggregate consumer information” under the CCPA’s


definition is also not easy to meet. As it is defined in the Section 1798.140 (h), not only the
information cannot reasonably be directly or indirectly linked to a customer, but the business
must also have implemented technical safeguards and business processes that prohibit re-
identification and finally that the business must not make any attempt to re-identify the
information.

Those requirements seem particularly difficult to comply with. The judgment of


“reasonableness” is not even defined by the law. There is no metrics to decide how difficult the
re-identification of the data must be.

Regarding all these conditions and criteria, as for both the GDPR and the CCPA, once the data
is considered properly anonymized, it will fall out of the scope of application of both texts and
thus won’t benefit from their safeguards.

19
KHALED EL EMAM; CECILIA ALVAREZ, A critical appraisal of the Article 29 Working Party Opinion 05/2014
on data anonymization techniques, International Data Privacy Law, 2015, Vol. 5, No. 1, p. 73.
20
Opinion 05/2014 on Anonymization techniques, Article 29 Data Protection Working Party, 2014, chapter 2.2,
page 6-9.

11
Understanding the concept and the meaning of anonymization could be considered crucial since
it demarcates the scope of data protection laws. But a static approach is not desirable, there’s
no such thing as a definitive and permanent contour of an anonymized data. One must reject
the assumption that once the data is anonymized, not only can the data controller forget about
it but also that the recipients of the dataset are free from any obligation because the anonymized
information will always lie outside the scope of data protection laws21.

Firstly, it would be a mistake to consider that anonymized data deprives individuals of any
protection. From Mr. Capt’s perspective, attorney at law in Switzerland, specialized in cyber
law, “other legal instruments also apply. One example is official secrecy, which applies even
in the absence of personal data. Data/information can therefore be protected even without
personal data. Another example is Art. 261 of the Swiss Criminal Code (violation of
manufacturing or trade secrets)”22.

Secondly, a dynamic approach must be adopted for anonymized data that became personal data
again. This further processing is subject to data protection law and uncertainty may persist
concerning the compatibility with the initial processing23.

5.2 Risks of re-identification

“Data can be either useful or perfectly anonymous but never both”24

In theory, data anonymization appears to be a “safe-haven” for balancing the private interests
of individuals and realizing the promise of big data. However, many research suggests that re-
identifying or de-anonymizing individuals from these data is surprisingly easy. Even if
identifiers such as names and Social Security numbers have been removed, the adversary can
use background knowledge and cross-correlation with other databases to re-identify individual

21
SOPHIE STALLA-BOURDILLON; ALISON KNIGHT, 2016, ARTICLE: ANONYMOUS DATA V. PERSONAL
DATA - A FALSE DEBATE: AN EU PERSPECTIVE ON ANONYMIZATION, PSEUDONYMIZATION AND
PERSONAL DATA. Wisconsin International Law Journal, 34, p.284.
22
Interview with Me Capt, 23th of July 2020, in appendix.
23
SOPHIE STALLA-BOURDILLON; ALISON KNIGHT, 2016, ARTICLE: ANONYMOUS DATA V. PERSONAL
DATA - A FALSE DEBATE: AN EU PERSPECTIVE ON ANONYMIZATION, PSEUDONYMIZATION AND
PERSONAL DATA. Wisconsin International Law Journal, 34, p. 319.
24
PAUL OHM, Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization, 57 UCLA
Law Review 1701 (2010), page 1704.

12
data records25. The Netflix prize is one of many examples where poorly anonymized data could
be re-identified by linking with another dataset, the Internet Movie Database (IMDb) in this
case26.

Even if the company conducting the data analysis does not intend to re-identify the data, the
result of the analysis may result in data that can be correlated with a specific individual. Big
data not only facilitates but also encourages this re-identification: due to the mass of data
available (which can be correlated with each other) and to computer technology, it is simple,
efficient and increasingly inexpensive27.

Paul Ohm wrote in 2009 probably one of the most influential and provocative legal piece on
anonymization entitled “Broken Promises of Privacy: Responding to the Surprising Failure of
Anonymization”. He insists on how de-identification is a failure and should be abandoned as a
regulatory objective and argues that the distinction made between personal data and
anonymized data should be abolished28.

Ohm’s excessive and radical approach must be rejected. According to Mr. Capt, “Anonymous
data can be very useful. One example is data used for statistical purposes. Paul Ohm's sentence
is obviously intended to provoke and make people think. That said, it is clear that in our world
of machine learning and big data, black gold is drawn from the flourishing derrick of personal
data and not from anonymous data”29.

Ohm’s approach concerning abandoning the dichotomous term of personal data and
anonymized data is evidently not compatible with the GDPR nor the CCPA. This delineation
is important in order to alleviate certain companies from the burden arising from data
processing obligations.
Finally, regarding this issue, Mr Capt states that “Re-identification is the Achilles heel of data
protection. There is no way to be 100% sure. Once re-identified, the data becomes personal

25
ARVIND NARAYANAN; VITALY SHMATIKOV, Robust De-anonymization of Large Sparse Datasets, (2008)
Proceedings - IEEE Symposium on Security and Privacy, art. no. 4531148, pp. 111-125.
26
https://www.digitorc.com/re-identification-of-anonymised-data-sets/(6); ARVIND NARAYANAN; VITALY
SHMATIKOV, How To Break Anonymity of the Netflix Prize Dataset, 2006.
27
PHILIPPE MEIER, Le défi de Big Data dans les relations privées p.54-56 –Big Data und Datenschutz.
28
PAUL OHM, Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization, 57 UCLA
Law Review 1701 (2010), page 1744-1745.
29
Interview with Me Capt, 23th of July 2020, in appendix.

13
again, with all that this entails. True anonymisation (as opposed to post-anonymisation) should
in principle guarantee that there is no possibility of re-identification”30.

The legal question is how to deal with these risks of re-identification. Since the technology has
not yet reached the state of the art to exclude possible de-anonymization in every case.
According to the UKAN Decision-making Framework31, anonymization is not about removing
all risk of re-identification but how to manage it through taking precautions by carefully
analysing the data environment. The UKAN divided this issue into 3 parts:

1. The data context audit


2. The risk control and analysis
3. The impact management32

According to the UKAN, a holistic approach based on the data environment is to be favoured.
This model is in line with the dynamic approach of the GDPR.

Therefore, if we agree that zero risk doesn’t exist, a comprehensive and ongoing assessment of
data environments should still allow the implementation of robust anonymizations practises.

6. Conclusion
To conclude, we have seen how personal data and anonymized data is defined from a European
and Californian perspective. The criteria of a possible identifiability to the data subject being
crucial since it will determine whether the data protection law will apply or not. In addition, the
GDPR and the CCPA rely on a risk-based-approach for the very definition of anonymized data
or de-identified data and regarding with which anonymization techniques is being adopted for
each case.

30
Interview with Me Capt, 23th of July 2020, in appendix.
31
The UK Anonymisation Network (UKAN) was set up in 2012 as a means of establishing best practice in
anonymization in UK. It offers practical advice and information to anyone who handles personal data and needs
to share it. UKAN was funded for its first two years by the Information Commissioner’s Office (ICO). It is co-
ordinated by a consortium of four organisations: the University of Manchester, the University of Southampton,
the Open Data Institute (ODI) and Office for National Statistics (ONS).
32
MARK ELLIOT; ELAINE MACKEY KIERON O’HARA; CAROLINE TUDOR, The anonymization Decision-Making
Framework, 2016 edition, UKAN, chapter 1.2.

14
However, this paper showed how the process of anonymization is not immutable, a possible re-
identification of the once-anonymized data remains a permanent risk. This risk must be
managed after the anonymization technique occurred. The mistake is to focus on a fixed end-
state of the data. A “release-and-forget” anonymization is not advocated by the Working Party,
nor the UKAN, nor most of the legal scholars of today. Companies should then comply with
data protection rules when re-identification is not only intentional, but also the result of the
subsequent aggregation of new identifying data. Hence, a dynamic approach to anonymized
data is warranted.

The need of properly understanding and identifying the description of the data environment
right for each processing activity is of utmost importance. But who will undertake this
continuous verification? Is the EU data protection authority enough to make sure that data are
properly anonymized on an ongoing basis? And in California, does the Attorney General
investigates enough against CCPA breaches? The effectiveness of supervisory authorities is
rather questionable.

On the other hand, it also seems impossible to stop the big data machine and take a leap
backwards. It's the very essence of the privacy paradox: companies want more customer data,
customers say they dislike this, yet they freely provide personal data. As Mr. Capt said, “since
current and future devices are by definition data-intensive. And this often only responds to more
or less conscious requests from users for personalization and parameterization. It's a circle
(vicious or virtuous, depending on)”33.

All things considered, I cannot bring myself to say that this battle is vain, as seen previously, it
is possible to have a robust anonymization while keeping in mind that the zero risk of re-
identification does not exist. However, I share the point of view of Mr. Capt who would rather
lead a fierce fight against “the real perils that await us” which is facial recognition and
biometrics34. Actions can and must still be taken.

33
Interview with Me Capt, 23th of July 2020, in appendix.
34
Interview with Me Capt, 23th of July 2020, in appendix.

15
7. ABBREVIATIONS
- Adtech: Advertising Technology
- Art. : article
- CCPA: California Consumer Privacy Act
- GDPR: General Data Protection Regulation
- UKAN: UK Anonymization Network

8. VOCABULARY
- Inference: the possibility to deduce, with significant probability, the value of an
attribute from the values of a set of other attributes35.
- Linkability: the ability to link at least two records concerning the same data subject
or a group of data subjects (in the same database or in different databases)36.
- Pseudonymised data: Pseudonymisation consists of replacing one attribute
(typically a unique attribute) in a record by another. The natural person is therefore
still likely to be identified indirectly; accordingly, pseudonymisation when used
alone will not result in an anonymous dataset. Nevertheless, it is discussed in this
opinion because of the many misconceptions and mistakes surrounding its use37.
- Singling out: the possibility to isolate some or all records which identify an
individual in the dataset38.

9. BIBLIOGRAPHY

- ARTICLE 29 DATA PROTECTION WORKING PARTY, Opinion 05/2014 on


Anonymization techniques, 2014, 0829/24/EN WP216;
- ARVIND NARAYANAN; VITALY SHMATIKOV, Robust De-anonymization of Large
Sparse Datasets, (2008) Proceedings - IEEE Symposium on Security and Privacy,
art. no. 4531148;

35
Opinion 05/2014 on Anonymization techniques, Article 29 Data Protection Working Party, 2014, chapter 3, page
12.
36
Opinion 05/2014 on Anonymization techniques, Article 29 Data Protection Working Party, 2014, chapter 3, page
11.
37
Opinion 05/2014 on Anonymization techniques, Article 29 Data Protection Working Party, 2014, chapter 4, page
20.
38
Opinion 05/2014 on Anonymization techniques, Article 29 Data Protection Working Party, 2014, chapter 3 page
11.

16
- ARVIND NARAYANAN; VITALY SHMATIKOV, How To Break Anonymity of the Netflix
Prize Dataset, 2006, Cornell University;

- ARVIND NARAYANAN; VITALY SHMATIKOV, How To Break Anonymity of the Netflix


Prize Dataset, 2006.
- ASTRID EPINEY/DANIELA NÜESCH, Big Data und Datenschutzrecht, Université de
Fribourg, Schultess 2016;
- KHALED EL EMAM; CECILIA ALVAREZ, A critical appraisal of the Article 29
Working Party Opinion 05/2014 on data anonymization techniques, International
Data Privacy Law, 2015, Vol. 5, No. 1;
- MARK ELLIOT; ELAINE MACKEY KIERON O’HARA; CAROLINE TUDOR, The
anonymization Decision-Making Framework, 2016 edition, UKAN;
- PAUL OHM, Broken Promises of Privacy: Responding to the Surprising Failure of
Anonymization, 57 UCLA Law Review 1701 (2010), page 1704;
- SOPHIE STALLA-BOURDILLON; ALISON KNIGHT, ARTICLE: ANONYMOUS DATA
V. PERSONAL DATA - A FALSE DEBATE: AN EU PERSPECTIVE ON
ANONYMIZATION, PSEUDONYMIZATION AND PERSONAL DATA, 2016,
Wisconsin International Law Journal.

10. INTERNET SITES

1. Cover image: https://protonmail.com/blog/truth-about-anonymized-data/ https://gdpr-


info.eu/issues/personal-data/ consulted on the 5th of July 2020;

2. https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-
data-protection-regulation-gdpr/key-definitions/what-is-personal-data/ consulted on the
5th of July 2020;
3. https://iclg.com/practice-areas/data-protection-laws-and-regulations/usa consulted on
the 5th of July 2020;
https://www.mondaq.com/unitedstates/privacy-protection/803300/what-is-personal-
information-in-legal-terms-it-depends?signup=true consulted on the 5th of July 2020;

4. https://www.cookiebot.com/en/ccpa-personal-information-ccpa-compliance-with-
cookiebot/ consulted on the 5th of July 2020;

17
5. https://www.visualcapitalist.com/personal-data-ecosystem/ consulted on the 7th of July
2020;

6. https://www.digitorc.com/re-identification-of-anonymised-data-sets/ consulted on the


24th of July 2020.

11. APPENDIX

Questions asked to Me Capt on July 23, 2020 in Geneva:

1. Partagez-vous l’opinion de Paul Ohm concernant le fait qu’une donnée puisse être utile
ou anonyme mais pas les deux.

Pas entièrement puisque les données anonymes peuvent avoir une utilité très
importante. On pense par exemple aux données utilisées à des fins statistiques. La
phrase de Paul Ohm a évidemment vocation à provoquer et à faire réfléchir. Cela
étant, il est évident que dans notre monde de machine learning et de big data, l’or
noir se puise dans le derrick florissant des données personnelles et non dans celui
des données anonymes.

2. On pourrait penser que la définition de donnée personnelle est cruciale puisqu’elle


délimite le champ d’application des lois de protections des données. Au final, ce n’est
pas le cas si l’on suit l’approche dite dynamique du RGPD. Qu’en pensez-vous ?

Elle l’est, puisque si une donnée n’est pas « à caractère personnel », soit qu’elle se
réfère à une personne identifiée ou identifiable, le RGPD ne trouve pas application.
Cela étant, la définition n’est pas immuable.

3. Lorsqu’une donnée est correctement anonymisée (dans la mesure raisonnable du


possible), elle ne fait plus partie du champ d’application du RGPD ni du CCPA.
Echappe-t-elle donc à toute protection ? (Quid de la E-directive de l’UE, Quid des USA)

18
Elle échappe alors par nature aux réglementations de protection des données. Mais
d’autres instruments juridiques s’appliquent par ailleurs. On peut penser, par
exemple, au secret de fonction qui s’applique même en l’absence de données
personnelles. Les données/informations peuvent donc être protégées même en
l’absence de données personnelles. On peut aussi penser à l’art. 261 CP (violation
du secret de fabrication ou du secret commercial).

4. Comment s’assurer qu’une donnée anonymisée ne retombe plus dans un cas de ré-
identification ?

La réidentification est le talon d’Achille de la protection des données. Il n’y a pas


de moyen de s’en assurer à 100%. Une fois réidentifiée, la donnée redevient
personnelle, avec tout ce que cela implique. La vraie anonymisation (au contraire
de la poseudonymisation) devrait en principe garantir une absence de possible
réidentification.

5. Finalement, ne serait-il pas préférable de ne plus faire cette distinction dichotomique


entre les données personnelles et les données anonymisées et de tout mettre sous l’égide
des lois de protection des données ?

Cela me semble tout à fait impraticable. Je suis davantage pour mener un combat
féroce contre les vrais périls qui nous guettent (reconnaissance faciale, biométrie,
etc.).

6. Croyez-vous en l’anonymisation des données comme moyen de protection de la vie


privée ? (Le Big Data ne cesse de s’enrichir, les machines sont de plus en plus
performantes tandis que le droit peine à maintenir le rythme face à toutes ces avancées
technologiques).

Pas vraiment… Puisque les dispositifs actuels et futurs sont par définition
gourmands en données personnelles. Et cela ne fait souvent que répondre à des
demandes plus ou moins conscientes des utilisateurs de personnalisation et de
paramétrage. Il s’agit d’un cercle (vicieux ou vertueux, c’est selon).

19
7. Que pensez-vous des applications de traçage pour le Covid-19 ? (Utilisation du
bluetooth ; données stockées de manières décentralisées ; content delivery network chez
Amazon concernant l’application Swisscovid)

Je pense que le débat est assez vain. Même le scenario du pire n’arrive pas à la
cheville des risques encourus chaque jour sur les réseaux sociaux. Cela réveille
surtout les angoisses endormies de l’Etat fouineur et l’affaire des fiches,
notamment.

20

You might also like