Professional Documents
Culture Documents
Ai and Privacy PDF
Ai and Privacy PDF
Contents
ABOUT THIS REPORT ................................................................................................................................................. 4
Legal sources and use of terminology ............................................................................................................... 4
About this report the report considers the role of the DPA as the
supervisory body for AI applications. Finally, we provide
a number of examples of methods and tools and
Most applications of artificial intelligence (AI) require recommendations for safeguarding privacy in the
huge volumes of data in order to learn and make development and use of AI.
intelligent decisions. Artificial intelligence is high on the
agenda in most sectors due to its potential for radically The target group for this report consists of people who
improved services, commercial breakthroughs and work with, or who for other reasons are interested in,
financial gains. In the future we will face a range of legal artificial intelligence. We hope that engineers, social
and ethical dilemmas in the search for a balance scientists, lawyers and other specialists will find this
between considerable social advances in the name of AI report useful.
and fundamental privacy rights. This report aims to
describe and help us understand how our privacy is Producing this report has been a learning process for the
affected by the development and application of artificial staff of the Norwegian DPA, and we learned a lot from
intelligence. the experiences and appraisals of artificial intelligence
and data protection from the stakeholders we were in
The Norwegian Data Protection Authority (DPA) touch with during the process. We are most grateful to
believes it to be imperative that we further our Inmeta, Privacy International, the Financial Supervisory
knowledge about the privacy implications of artificial Authority of Norway, Google, Sintef, the Norwegian
intelligence and discuss them, not only in order to University of Science and Technology (NTNU), Big
safeguard the right to privacy of the individual, but also Insight at the University of Oslo and the Norwegian
to meet the requirements of society at large. Computing Center, Sparebank 1 Stavanger, the
Information Commissioner’s Office in the UK, the Office
If people cannot trust that information about them is of the Privacy Commissioner in Canada, the Office of the
being handled properly, it may limit their willingness to Auditor General of Norway, and the Centre for Artificial
share information – for example with their doctor, or on Intelligence Research at the University of Agder.
social media. If we find ourselves in a situation in which
sections of the population refuse to share information
because they feel that their personal integrity is being Legal sources and use of
violated, we will be faced with major challenges to our
freedom of speech and to people’s trust in the
terminology
authorities. A refusal to share personal information will
also represent a considerable challenge with regard to In this report we use artificial intelligence as a collective
the commercial use of such data in sectors such as the term describing its various aspects, including machine
media, retail trade and finance services. learning and deep learning.
This report elaborates on the legal opinions and the The basis for this report is the EU's General Data
technologies described in the 2014 report «Big Data – Protection Regulation (GDPR). This Regulation will
data protection principles under pressure»1. In this be enshrined in Norwegian law in the form of a Personal
report we will provide greater technical detail in Data Act which will come into force on May 25 2018.2
describing artificial intelligence (AI), while also taking a We have also drawn upon the Recitals of the
closer look at four relevant AI challenges associated with Regulation in interpreting the contents of the articles.
the data protection principles embodied in the GDPR: The recitals are not legally binding, but explains the
content of the articles.
Fairness and discrimination
Purpose limitation Furthermore, we have also cited the statements made
Data minimisation by the Article 29 Working Party and the guidelines
Transparency and the right to information it set for individually automated decisions and
profiling.3 The Article 29 Working Party is the European
The above list is not exhaustive, but represents a Commission’s most senior advisory body on data
selection of data protection concerns that in our opinion protection and information security matters.
are most relevance for the use of AI today. In addition,
1 https://www.datatilsynet.no/om-personvern/rapporter-og- 3 http://ec.europa.eu/newsroom/just/item-detail.cfm?item_id=50083
utredninger/temarapporter/big-data/
Artificial intelligence (AI) is the concept used to processing power and access to cheaper and greater
describe computer systems that are able to learn from storage capacity. Big Data often refers to vast volumes
their own experiences and solve complex problems in of data, extracted from multiple sources, often in real
different situations – abilities we previously thought time.4 These enormous data streams can be utilised
were unique to mankind. And it is data, in many cases for the benefit of society by means of analysis and
personal data, that fuels these systems, enabling them finding patterns and connections.
to learn and become intelligent.
This is where AI can make a difference. While
The development of AI has made some major traditional analytical methods need to be
advances in recent years and its potential appears to programmed to find connections and links, AI learns
be promising: a better and more efficient public from all the data it sees. Computer systems can
sector, new methods of climate and environmental therefore respond continuously to new data and
protection, a safer society, and perhaps even a cure adjust their analyses without human intervention.
for cancer. Thus, AI helps to remove the technical barriers that
traditional methods run into when analysing Big
We are in other words embarking on a venture that Data.
will without doubt have a considerable impact on
society. Accordingly, it is important for us to engage Greater demand for data, more stringent regulations
in discussion now. What sort of regulatory framework
do we need in order to grasp the opportunities offered The new data protection regulations that enters into
by AI in an assured and just manner? For we cannot force in May 2018 will strengthen our privacy rights,
escape the fact that the use of AI raises a number of while intensifying the requirements made of those
concerns with respect to ethics, security, legal processing such data. Organisations will bear more
responsibility, etc. This report is devoted to one such responsibility for processing personal data in
concern: the use of personal data in AI and the issue accordance with the regulation, and transparency
of privacy. requirements will be more stringent.
From winter to spring – why now? At the same time as the requirements are being
intensified, demand for data is growing. AI-based
The concept of AI was known as far back as in the systems can become intelligent only if they have
1950s as a technology in which people had high hopes enough relevant data to learn from.
of success. The initial progress made was however
followed by many decades that are often called the AI An intelligent chatbot (a computer program that
winter because the early expectations were not met. people can interact with by means of ordinary speech,
In recent years, though, we have witnessed the or through written input) analyses all the information
coming of spring. it is fed – a combination of questions posed by
customers and responses communicated by customer
Today we see that AI is used to solve specific tasks service. From its analysis the chatbot can
such as, for example, image and speech recognition. “understand” what a customer is asking about and is
This is often called specialised AI. General AI refers therefore able to give a meaningful answer. The
to systems that are as versatile as humans when it greater the volume of information the chatbot can
comes to learning and problem solving. But it will base its analysis on, the better and more precise will
probably be several decades before this is achieved. be the reply it gives.
4 https://ico.org.uk/for-organisations/guide-to-data-protection/big-data/
6
5 https://iq.intel.com/artificial-intelligence-and-machine-learning/ 6 https://no.wikipedia.org/wiki/Nevralt_nettverk,
https://en.wikipedia.org/wiki/Deep_learning
7
There are two main aspects of artificial intelligence that 1. Learning starts with selected information
are of particular relevance for privacy. The first is that containing patterns or similarities.
the software itself can make decisions, and the second is 2. By using machine learning, the patterns found
that the system develops by learning from experience. in the information are identified.
3. A model is generated that can recognise the
In order for a computer system to learn, it needs patterns that emerge when fresh data is
experience, and it obtains this experience from the processed by the model.
information that we feed into it. This input may be in
Model is an umbrella term for the final outcome of
several different formats. If a system is sought that will
learning. There are many different types of models and
only perform image recognition and analysis, the
it is these that are used in commercial applications —
experiential data input will naturally enough consist of
such as predicting the type of streamed TV series a
images. For other tasks the input data will consist of
consumer prefers. What these models have in common
text, speech or numbers. Some systems utilise personal
is that they contain essential training data. As the data
data, while other systems use data that cannot be linked
that the model will process in the future will seldom be
to individuals.
completely identical with the training data, a
generalisation is required. Certain data that deviate from
the main bulk of training data, will therefore usually be
Machine learning removed from the model.
In order to understand why AI needs huge volumes of This is how the model works: (Illustrated by Figure 1,
data, it is necessary to understand how the system from top to bottom)
learns.
1. The model receives data similar to that used for
Developing AI requires the input of experiential data. learning.
Machine learning generally proceeds in this way: 2. The model decides which pattern the new data
(Illustrated by Figure 1, from left to right): most resembles.
3. The model produces an estimated result.
8
There are several forms of learning that can be used, other cats than those originally shown. In similar
depending on whether the information has been labelled fashion, a machine learning model will develop the same
or not. Labelled data is tagged data: if the data consists ability to recognise objects based on labelled images.
of images, the labels or tags may for example be gender,
ethnicity, dog or cat. If one is working with a dataset and wishes to separate
men and women, one can use different features that are
Below we have listed the main forms of learning, and we of relevance. The features used will depend on the basic
describe how the data is used in these. data available. For example, women live longer than
men on average, so life duration is of relevance when
Supervised learning differentiating between genders. This feature will,
however, prove to be somewhat narrow in most cases,
Supervised learning involves the use of labelled data, by and is mentioned here only as an example. If one’s data
means of which the supervision is performed. The basis consists of images, then hair length, or the use of
dataset is split into two, usually an 80/20 split, with 80 make-up or jewellery, may be relevant features. The
per cent of the data used to train the model. The example below illustrates how two different features are
remaining 20 per cent is used to verify how precisely the used in learning.
model processes unknown data. It is no good if the
model performs accurately using the training data and Learning takes place as follows (Illustrated by figure 2,
inaccurately using new and unknown data. If the model from left to right):
is too well adjusted to the training data, which we call
overfitting, it will not produce satisfactory results using 1. A set of labelled data is used.
new data. Therefore, the model requires a certain degree 2. Depending on data type, and what is considered
of generalisation. relevant, the features (circles and triangles) to
be used for learning are selected. The data is
Training data may for example consist of images labelled labelled to denote the right answer.
with information about the contents of each image. 3. A model is built that, based on the same
Supervised learning may be compared to teaching a features, will produce a label.
child. For example, we point to a number of objects to
the child and give them names. If we show a number of
cats to a child, the child will gradually learn to recognise
9
We will often also know which features of labelled data would be for this data, to the greatest extent possible, to
are most decisive for correct categorisation or for be sorted into two groups – one consisting of images of
producing the right result. It is important to have dogs, and the other of cat images.
persons with sound knowledge of the field in question in
order to identify the most relevant features. The correct Learning proceeds as follows (Fig.3, left to right):
selection of relevant features may be of much more
importance than the amount of data, an issue we will be 1. A dataset is used in which there must be a
addressing later. One advantage of labelled data is that it certain number of similarities, or patterns, if it
enables an easy check of the model’s precision. is to be meaningful.
2. The patterns are revealed.
When we use the model, the following takes place (Fig. 3. A model is built that can recognise and
2, top to bottom): differentiate patterns.
Reinforcement learning
10
This form of learning is based on trial and error, as well Decision trees represent one exception to this, as they
as on optimisation, as the model learns which actions contain a varying degree of the model’s data basis. The
are targeted towards the goal. This means that less data, limits here depend on whether the tree is “pruned” after
or no data at all, is needed for the system to learn. learning, or a level limitation is set for learning. One or
the other will normally be chosen, as the model should
generalise and not overfit. In a deep-learning model, the
basic data will be represented as numerical values in the
neural network. It should, therefore, not be possible to
AlphaGo Zero retrieve any personal data used to train the model. We
shall take a closer look at these models a little later, in
Earlier we mentioned AlphaGo as an the section entitled the Black Box.
example of machine learning. AlphaGo was
first trained using a data set consisting of Model use – static and dynamic (offline/online)
30 000 games of Go. In order to further
improve AlphaGo’s ability to play Go, it was A model may be used in two ways. The first way is to use
programmed to play against itself. Its a static, or offline model, that will not change
experiential basis could be enlarged through use. A static model will always, as the name
considerably through trial and error, suggests, operate in the same way and produce the same
without it needing to obtain data from more results throughout its entire lifecycle. All new model
games. It also gave AlphaGo the opportunity training will take place in a test environment, and all
of discovering moves and strategies not in changes require that the model is replaced by a new
the original training set. version. This means that full control is maintained of the
model in use.
The latest version – AlphaGo Zero – was
devised in order to start playing without The other possibility is provided by a dynamic, or
using training data. It was programmed only online model. The model is used in a similar fashion
with the rules for Go, and was not fed any to the static model. However, the difference is that the
information about previously played games dynamic model is able to avail itself of input data in
It then learned to play against itself. After order to improve and adjust to changes. This may, for
40 days it was able to beat the previous example, be necessary in connection with the
AlphaGo version 100-0. It is also interesting monitoring of credit card transactions in order to reveal
to note that the Zero version of AlphaGo fraud. The transactions may change according to the
requires much less computing power to user’s life situation, or in relation to his job, by for
achieve these results. example taking place in completely new locations. These
new usage patterns could well be labelled suspicious by a
(Source: https://deepmind.com/blog/alphago-zero- static model and potentially result in a blocked credit
learning-scratch/) card. A model can therefore become less accurate over
time if it is not continuously updated.
7 http://www.telegraph.co.uk/technology/2016/03/24/microsofts-teen-girl-
ai-turns-into-a-hitler-loving-sex-robot-wit/
11
Like a spreadsheet, a dataset for machine learning may It is worth mentioning here that deep learning is
consist of rows and columns. If one has person-related somewhat of an exception in this respect. The selection
data, the columns may well denote a person’s age, and adjusting of features are not as important as they
gender, address, marital status, height, weight, are in other learning methods. For example, feature
nationality, etc. The rows will represent individual selection is conducted via value weights in a neural
persons. Consideration must be given to the quantity of network. The disadvantage of not making selections
personal information needed in order to train the means that one needs a vastly greater volume of training
desired models, as well as its relevance to the chosen data.
purpose.
Feature engineering
In selecting relevant features, there will often be a need
for persons who are expert in the relevant fields. It’s not An important factor in achieving good results is how the
always the case that the basic data tells the whole story. dataset is presented. Relevant correlations may be
concealed if the data is not used properly. In many
Good selection is important, otherwise one risks ending instances there is a great deal more to be gained by
up with too many features, or what specialists call “The smart data use than by increasing the amount of data.
Curse of Dimensionality». Put simply, this means that
an excessive number of features will result in matches Dates are one example. Let us consider the date
being lost amongst all the non-matching data. This will 1.10.2017, which tells us it is the first day in the month
mean that enormous volumes of data will be needed by and the tenth month of the year. It might well be that
way of compensation. the information would be more useful if we could
12
convert it to show what day of the week it is: a Sunday in The Black Box
this case.
One concern in relation to machine learning is that one
In Norway, where there are four quite distinct seasons, does not always know how the result is produced. Which
we might consider grouping the months in order to features, or which combinations of features, are the
represent the data in a better way. Month 10 could then most important? A model will often produce a result
be represented as autumn. Autumn itself could be without any explanation. The question then arises as to
represented as numerical value 3, while spring would be whether it is possible to study the model, and thus find
1, summer 2, and winter 4. In this way we could derive out how it arrived at that specific result.
more features from a data item, or reduce the number of
different values. If data is extracted from multiple As mentioned above, specialists at the Norwegian Tax
sources, steps should be taken to ensure that they are in Administration have built a predictive model that helps
the same format. In US data for example, the month will them select the tax returns to be scrutinised more
be denoted by the 1 and the day by the 10 in the data closely. They state the following: “When we build a
formula 1.10.2017. model in this way, we don’t necessarily know what it is
that gives a tax payer a high ranking for error risk. The
The normalisation of features may also be necessary in ranking is the result of complex data aggregation in the
order to ensure that certain features do not create an model.”
imbalance in the training data, or that a few extreme
values do not adversely affect the rest. Put simply, we
8 https://www.coursera.org/learn/machine- http://www.ritchieng.com/machinelearning-learning-curve/
learning/lecture/Kont7/learning-curves
13
Neural networks
Neural networks are used in a methodology that is If there is more than one hidden layer, then this is
largely inspired by our understanding of the way the considered to be deep learning. In the above figure we
human brain functions. These networks are built by have a single neural network in which all the input data
what is basically a very simple component (a move from left to right, and emerge as a result. There
perceptron), but very many of these components can be are several variants of these neural networks. Some form
used to create large and complex networks. loops and also send the data from right to left within the
network before the final result is produced.
A perceptron, illustrated here below, has a variable
number of inputs and one output: One of the challenges here is that the input data is
viewed in isolation. In many situations we work with
Each «leg» of the perceptron has a weight value. This information that has a context. For example, some
value determines how great will be the influence of the words carry different meanings depending on their
input feature on the final result. These values are context. This context does not need to be formed by the
adjusted when the network is trained to give the desired same sentence. This is part of the reason why some
results. This is often carried out by working backwards neural networks have a form of short-term memory.
in the network to adjust the values of the relevant This allows them to produce different outputs based on
perceptrons so that the final result is right the data that was processed previously, which of course
makes it more difficult to determine how a result was
derived. This also means that it can be very difficult to
merely examine the algorithms to find out how they
work and what decisions they reach.
9 https://blogs.microsoft.com/ai/2015/12/10/microsoft-researchers-win-
imagenet-computer-vision-challenge/
15
In addition, the data controller is responsible for, and principle of fairness has been safeguarded in the
shall be able to prove, compliance with the principles processing of personal data. These investigations may
(accountability principle). include a review of the documentation underpinning the
selection of data, an examination of how the algorithm
In the following we will review the most important data was developed, and whether it was properly tested
protection challenges associated with developing and before it came into use.
using artificial intelligence. We look at these challenges
in the light of the data protection principles that are
most relevant to artificial intelligence – namely the
principles of fairness, purpose limitation, data
minimisation and transparency. Example
A claim of AI-based discrimination was
Algorithmic bias meets the levelled against a US system for setting bail
conditions and sentencing. The system is used
fairness principle to predict the risk of a convicted person
committing a new crime.
It is easy to think that artificial intelligence will be able
to perform more objective analyses and therefore reach
The journal ProPublica studied the decisions
better decisions than human beings. After all, artificial
reached by the system and concluded that it
intelligence will not be affected by low blood sugar, by
discriminated against black defendants. The
having a bad day, or by the desire to help a friend.
number of blacks erroneously flagged as being
high re-offending risks, was twice as high as
And yet algorithms and models are no more objective
the number of whites so classified.
than the people who devise and build them, and the
personal data that is used for training. The model’s
The company that developed the software
result may be incorrect or discriminatory if the training
disagreed with ProPublica’s conclusion, but it
data renders a biased picture reality, or if it has no
was unwilling to allow the criteria and
relevance to the area in question. Such use of personal
calculations used in developing the algorithm
data would be in contravention of the fairness principle.
to be examined. It is therefore impossible for
the convicted person, or the general public, to
This principle requires all processing of personal
obtain clear information as to why and how
information to be conducted with respect for the data
such decisions are reached.
subject’s interests, and that the data be used in
accordance with what he or she might reasonably (Source: https://www.propublica.org/article/machine-bias-
expect. The principle also requires the data controller to risk-assessments-in-criminal-sentencing)
implement measures to prevent the arbitrary
discriminatory treatment of individual persons. The
Regulation’s preface describes the use of suitable
mathematical or statistical procedures as possible
measures here.
Artificial intelligence meets the
This would not, however, be sufficient of itself to ensure principle of purpose limitation
compliance with the principle. The model must also be
trained using relevant and correct data and it must learn Many of the models developed using artificial
which data to emphasise. The model must not intelligence will be used in connection with good causes,
emphasise information relating to racial or ethnic origin, such as cancer diagnosis. Are we permitted to use
political opinion, religion or belief, trade union personal data unrestrictedly as long as it is for a good
membership, genetic status, health status or sexual cause?
orientation if this would lead to arbitrary discriminatory
treatment. The purpose limitation principle means that the reason
for processing personal data must be clearly established
If it is suspected, or claimed, that use of a model will and indicated when the data is collected. This is
entail unfair or discriminatory results, the Data essential if the data subject is to exercise control over the
Protection Authority can investigate whether the use of his/her information. The purpose of the
17
The purpose limitation principle is highly important in This list is not exhaustive and all issues that are
ensuring that the data subject exercises control over his relevant in the individual case must be included
or her own personal information. There are, however, in the appraisal.
exceptions to the principle. Further processing of data is,
for example, considered to be compatible with the
original purpose if it takes place in connection with
scientific or historical research, or for statistical and
archival purposes in the public interest. This begs the know-how.10 The GDPR’s preface (Recital 159) states
question as to what constitutes scientific research, and that scientific research should be interpreted broadly
to what extent the development and application of and include technological development and
artificial intelligence is scientific research. demonstration, basic research, as well as applied and
privately financed research. These elements would
More and more university and hospital research indicate that – in some cases – the development of
environments are working on developing tools that use artificial intelligence may be considered to constitute
artificial intelligence. Examples include models that scientific research.
identify the risk of tax or social benefit fraud, or image
recognition software that diagnoses cancer in tumours. Applying artificial intelligence to assess a person’s
But how do we really define scientific research? creditworthiness cannot, however, be said to be aimed at
gaining new knowledge. In this case, the use of artificial
The General Data Protection Regulation does not define intelligence cannot be defined as scientific research. But
what constitutes scientific research. A general is it always possible to differentiate between the
understanding of the concept, however, is that it must development and the application of AI?
relate to efforts aimed at discovering new knowledge or
When the completed model is static (offline), not possible to predict what the algorithm will learn. The
development and use can be clearly differentiated. A purpose may also be changed as the machine learns and
model developed using training data is tested on similar develops. This challenges the data minimisation
data before it is used. Once the model is put into use, the principle as it is difficult to define which data is
training data is removed from the algorithm and the necessary.
model will only process the personal data to which it is
applied, such as information about loan applicants. However, data minimisation is more than a principle
Because the algorithm is static, it will not learn anything limiting the amount of detail included in training or in
further from the personal data it is currently processing. the use of a model. The principle also stipulates
Consequently, nor will it develop intelligence once it has proportionality, which restricts the extent of the
been put into use. intervention in a data subject’s privacy that the use of
personal data can involve. This may be achieved by
Other models develop and improve continuously as they making it difficult to identify the individuals contained
are fed more personal data. These include models that in the basic data. The degree of identification is
provide decision support for doctors. The model learns restricted by both the amount and the nature of the
something new about every patient it receives data information used, as some details reveal more about a
about, or every scientific article it reads. This new person than others. The use of pseudonymisation or
knowledge can then be used on the next patient. encryption techniques protect the data subject’s identity
and help limit the extent of intervention.
When a model develops on a continuous basis, it is
difficult to differentiate between development and use, This principle also forces developers to thoroughly
and hence where research stops and usage begins. examine the intended area of application of the model to
Accordingly, it is therefore difficult to reach a conclusion facilitate selection of relevant data necessary for the
regarding the extent to which the development and use purpose. Furthermore, the developer must consider how
0f these models constitute scientific research or not. The to achieve the objective in a way that is least invasive for
limits on what constitutes scientific research will need to the data subjects. The assessments performed need to be
be reviewed once the new data protection regulations documented, so that they can be presented to the Data
come into force. Protection Authority in the event of an inspection, or in
connection with a preliminary discussion.
We emphasise that the use of personal data for scientific
research is governed by specific rules in the GDPR
(Article 89). Use in such instances must be subject to the
appropriate safeguards to secure the data subject’s
rights and freedoms. The safeguards must ensure that Data protection
technical and organisational measures are in place to
protect the data minimisation principle in particular. impact assessment
Before personal information is processed the
Artificial intelligence meets impacts on data protection must be assessed if
it is likely that the process will represent a risk
data minimisation to the rights and freedoms of natural persons.
This is particularly the case when using new
It often takes huge amounts of personal data to develop
technology, and consideration must be given to
artificial intelligence.
the nature of the processing, its scope and
purpose, and the context in which it is
On the other hand, the principle of data minimisation
performed.
requires that the data used shall be adequate, relevant
and limited to what is necessary for achieving the
If the risk is high, and the data controller
purpose for which the data is processed. This means that
a controller cannot use more personal data than is cannot limit it, he or she is duty bound to
necessary, and that the information selected must be initiate preliminary discussions with the Data
relevant to the purpose. Protection Authority.
Although it is difficult to establish in advance the exact 63) the right of access must avoid. Consideration of
information that will be necessary and relevant to the others’ rights, such as the commercial secrets of an
development of an algorithm – and this may change organisation, may nevertheless not be used to deny a
during the project – it is essential for the data data subject access to all data relating to her. The answer
minimisation principle to be adhered to by means of is to find a pragmatic solution. In most cases, furnishing
continuous assessment of the actual requirements. This the data subject with the information she needs to
not only protects the rights of the data subjects, but also protect her interests, without at the same time disclosing
minimises the risk of irrelevant information leading the trade secrets, will not be problematical.
algorithm to find correlations that rather than being
significant are coincidental and to which no weight Although AI is complex, and difficult to understand and
should be attached. explain, the principle of transparent processing of
personal data applies with full force in the development
The pressure to use personal data is intensifying as AI- and use of artificial intelligence.
based analyses are employed to promote increased
efficiency and better services. The Data Protection Below we will discuss the duty to inform and the rights
Authority believes that the principle of data of data subjects.
minimisation should play a major role in the
development of artificial intelligence so that the rights of General information
data subjects are protected and general confidence in the
models retained. When personal data is collected, the data controller
must always provide some general information such as
It is also challenging that information about the model Essentially, automated individual decisions are not
may reveal commercial secrets and intellectual property permitted. Exceptions apply, however, if the automated
rights, which according to the GDPR’s preface (Recital decision is a necessary condition for entering into a
20
contract, is permitted by law, or is based on the explicit Nor is the alternative that the automated decision
consent of the data subject. The regulation does not similarly significantly affects a person defined more
define what constitutes explicit consent as opposed to closely. We assume that the decision must have the
ordinary consent, but the phrase indicates that an potential to affect the circumstances, behaviour or
express gesture by the data subject is required. choices of the person who is subject to the automated
decision. Yet it is difficult to state precisely where the
In order to meet the requirements of the Regulation, the line should be drawn, as there are considerable
decision must be based solely on automated subjective elements in such an appraisal.
processing, and it must produce legal effect, or
similarly significantly affect a person. When automated decisions are applied, measures must
be implemented to protect the data subject’s rights,
That an automated decision must be based solely on freedoms and rightful interests. The data subject must
automated processing, means that there cannot be any be able to demand that a human being takes the final
form of human intervention in the decision-making decision, and she must have the right of appeal.
process. “Human intervention” means that a natural
person must have undertaken an independent Automated decisions that involve special categories
assessment of the underlying personal data, and be of personal data (sensitive personal data) are
authorised to re-examine the recommendations the permitted only if the data subject has consented, or if
model has produced. The rules governing automated they are legally warranted.
decision-making cannot be circumvented by fabricating
human intervention. It is important to be aware that the alignment of
different types of personal data can reveal sensitive
What is meant by legal effect is not defined in the information about individuals. Operating with this data
preface. It would be natural to understand this phrase as will involve the processing of special categories of
meaning that the automated decision must impact on personal data.
the data subject’s rights or duties, such as legal rights, or
the rights set out in a contract. See the examples listed in For example, one study combined “likes” on Facebook
the fact box. with information from a simple survey and predicted
male users’ sexual orientation with an accuracy of 88 per
cent. Moreover, they predicted ethnicity with 95 per cent
accuracy, and whether a user was Christian or Muslim set its motor insurance premiums on the basis of policy
with 82 per cent accuracy.11 A study of this nature is holders’ driving patterns, should inform its customers of
subject to the same legal obligations pursuant to the the possible impacts of this, and that careless driving
GDPR as if sensitive personal data had been processed can lead to higher premiums.
from the outset.
The data subject must receive the information described
here before automated processing commences. It will
enable the data subject to lodge a complaint against
Special categories of processing, or to consent to it.
11 Michael Kosinski, David Stilwell and Thore Graepel. «Private traits and attributes 12 See for example Andre Burt, «Is there a right to explanation for machine
are predictable from digital records of human behaviour. Proceedings of the learning in the GDPR?»: https://iapp.org/news/a/is-there-a-right-to-
National Academy of Sciences of the United States of America»: explanation-for-machine-learning-in-the-gdpr/ cf. Sandra Wachter, Brent
http://www.pnas.org/content/110/15/5802.full.pdf Mittelstadt, Luciano Floridi, International Data Privacy law, forthcoming,
«Why a right to explanation of automated decision-making does not exist in
the General Data Protection Regulation», available at
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2903469
22
change in order for a different decision to be reached.13 does not grant the right to be given an explanation of the
The data subject must be informed as to how he can decision.
oppose the decision, either by appealing or by requesting
human intervention. Even though there is no right to an explanation when a
decision is not automated, the transparency principle
Does one have a right to an explanation when a requires that the data controller should give an
human being makes a decision based on the explanation similar to those given for automated
model’s recommendtion? decisions.
13 See for example Sandra Wachter, Brent Mittelstadt and Chris Russel,
«Counterfactual explanations without opening the black box: automated
decisions and the GDPR».
23
14 See the guidelines on the responsibilities of enterprises under the GDPR https://www.datatilsynet.no/regelverk-og-skjema/veiledere/virksomhetens-
on the Norwegian DPA’s web site (in Norwegian), ansvar-etter-nytt-regelverk
25
Solutions and
recommendations
A data protection principle which underpins all AI routines and in daily use. Standard settings shall be as
development and applications, is accountability. This protective of privacy as possible, and data protection
principle is central to the GDPR and places greater features shall be embedded at the design stage.15 The
responsibility on the data controller for ensuring that all principle of data minimisation is expressly mentioned in
processing is conducted in compliance with the rules. the provision relating to privacy by design.
Data processors are also bound by the accountability
principle. Data Protection Impact Assessment
In this chapter we shall present examples of tools and Anyone processing personal data has a duty to assess the
solutions that can help the data controller to comply risks involved. If an enterprise believes that a planned
with the rules. But first, we will discuss two of the process is likely to pose a high risk to natural persons’
requirements in the GDPR that are especially important rights and freedoms, it has a duty to conduct data
in connection with the development and application of protection impact assessment (DPIA). This is described
AI; data protection impact assessment (DPIA) and in Article 35 of the GDPR.
privacy by design. Following that, we look at tools and
methods that can help protect privacy in systems that When a risk is assessed, consideration shall be given to
use AI. Finally, we shall propose some recommendations the nature, scope, context and purpose of the process.
for developers, system suppliers, organisations buying The use of new technology must also be taken into
and using AI, end users and the authorities. account. Moreover, there is a requirement to assess the
impact on personal privacy by systematically and
extensively considering all personal details in cases
where this data is used in automated decision making,
Assess the data protection or when special categories of personal data (sensitive
impact – and build privacy into personal data) are used in on a large scale. The
systematic and large-scale monitoring of public areas
your system! also requires documentation showing that a DPIA has
been conducted.
The new data protection regulations will enhance the
rights of individuals. At the same time, the duties of The impact assessment should include the following as a
organisations will be tightened up. Two new minimum:
requirements that are especially relevant for
organisations using AI, are the requirements privacy by a systematic description of the process, its
design and DPIA. purpose and which justified interest it protects
an assessment of whether the process is
Privacy by design necessary and proportional, given its purpose
an assessment of the risk that processing
The data controller shall build privacy protection into
involves for people’s rights, including the right
the systems and ensure that data protection is
to privacy
safeguarded in the system’s standard settings. These
the measures selected for managing risk
requirements are described in Article 25 of the GDPR
identified
and apply when developing software, ordering new
systems, solutions and services, as well as when
The DPA shall be involved in preliminary discussions
developing these further.
should an impact analysis reveal that the planned
process may represent a high risk for data subjects, and
The rules require that data protection is given due
consideration in all stages of system development, in
that the risk cannot be reduced by the data controller model to a client unit, for example a mobile phone. The
(GDPR Article 36). model is then improved locally on the client, on the basis
of local data. The changes to the model are then sent
back to the server where they are consolidated with the
change information from models on other clients. An
Tools and methods for good average of the changed information is then used to
data protection in AI improve the centralized model. The new, improved
centralized model may now be downloaded by all the
Artificial intelligence is a rapidly developing technology. clients. This provides an opportunity to improve an
The same applies to the tools and methods that can help existing model, on the basis of a large number of users,
meet the data protection challenges posed by the use of without having to share the users’ data.
AI. We have collected a number of examples to illustrate
some of the available options. These methods have not Matrix capsules18
been evaluated in practice, but assessed according to Matrix capsules are a new variant of neural networks,
their possible potential. This means that technically they and require less data for learning than what is currently
are perhaps unsuitable today, but the concepts are the norm for deep learning. This is very advantageous
exciting, and they have the potential for further research because a lot less data is required for machine learning.
and future use.
2. Methods that protect privacy without reducing
We have placed the methods in three categories: the data basis
Methods for reducing the need for training data. The optimal solution would be if one could use as much
Methods that uphold data protection without data as one wished for machine learning, without
reducing the basic dataset. compromising privacy. The field of cryptology offers
Methods designed to avoid the black box issue. some promising possibilities in this area:
1. Methods for reducing the need for training data Differential privacy19
Let us, for example, start with a database that contains
One of the challenges we have pointed out in this report, natural persons and features related to these persons.
is that there is often a need for huge amounts of data When information is retrieved from the database, the
during machine learning. However, by selecting the response will contain deliberately-generated “noise”,
right features, and adjusting them appropriately, the enabling information to be retrieved about persons in
data requirement can be reduced. Here is a selection of the database, but not precise details about specific
methods that can help achieve this: individuals. A database must not be able to give a
markedly different result to a query if an individual
Generative Adversarial Networks 16 person is removed from the database or not. The
Generative Adversarial Networks (GAN) are used for overriding trends or characteristics of the dataset will
generating synthetic data. As of today, GAN has mainly not change.
been used for the generation of images. But it also has
the potential for becoming a method for generating huge Homomorphic encryption
volumes of high quality, synthetic training data in other This is an encryption method that enables the
areas. This will satisfy the need for both labelled data processing of data whilst it is still encrypted. This means
and large volumes of data, without the need to utilise that confidentiality can be maintained without limiting
great amounts of data containing real personal the usage possibilities of the dataset. At present,
information. homomorphic encryption has limitations, which mean
that systems employing it will operate at a much lower
Federated learning17 rate of efficiency. The technology is promising, however.
This is a form of distributed learning. Federated learning
works by downloading the latest version of a centralized
16 https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf 18 https://openreview.net/pdf?id=HJWLfGWRb
17 https://research.googleblog.com/2017/04/federated-learning- 19 https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf,
collaborative.html https://arxiv.org/abs/1412.7584
27
Microsoft for example has published a white paper on a were, in order to quality assure and improve their
system that uses homomorphic encryption in connection products.
with image recognition.20 Active efforts are also
underway to standardise homomorphic encryption Explainable AI (XAI)24
solutions.21 XAI is the idea that all the automated decisions made
should be explicable. With people involved in a process,
Transfer learning22 it is often desirable that an explanation is given for the
It is not the case that it is always necessary to develop outcome. There are some interesting possibilities in two
models from scratch. Another possibility is to utilise areas. There will also be a need to be able to control
existing models that solve similar tasks. By basing systems that do not have this embedded. It will probably
processing on these existing models, it will often be also be attractive for developers employing transfer
possible to achieve the same result with less data and in learning.
a shorter time. There are libraries containing pre-
trained models that can be used. There is also a project underway in this field, being run
by the Defense Advanced Research Projects Agency
RAIRD (DARPA), where the objective is to gain more knowledge
Statistics Norway (SSB) and the Norwegian Centre for about providing understandable explanations for
Research Data (NSD) have developed a system called automated decisions. They have sponsored Oregon State
RAIRD23 that permits research to be carried out on data University, awarding an amount of USD 6.5 million over
without having direct access to the complete dataset. four years for research into this topic. The goal is to
create AI that can explain its decisions in such a way
In short, this system works by means of an interface that that is understandable and promotes confidence in using
allows the researchers to access only the metadata of the the system. There are in any case good grounds for
underlying dataset. The dataset may, for example, be a believing that this research will drive the field forward.
cancer diagnosis register containing fields for age,
gender, date of and place of birth. The researcher can LIME25
then submit queries based on the metadata and obtain a LIME is an approach to XAI. It is a model-agnostic
report containing aggregated data only. solution that produces explanations ordinary people can
understand. In the case of image recognition, for
This solution has been designed to prevent the retrieval example, it will be able to show which parts of the
of data relating to very small groups and individual picture are relevant for what it thinks the image is. This
persons. This type of system can therefore be used when makes it easy for anyone to comprehend the basis for a
data for machine learning is needed. Instead of receiving decision.
a report as an end result, one could obtain a model from
the system.
20 https://www.microsoft.com/en-us/research/publication/cryptonets- 23 http://raird.no/
applying-neural-networks-to-encrypted-data-with-high-throughput-and-
accuracy/ 24 https://www.darpa.mil/program/explainable-artificial-intelligence
21 http://homomorphicencryption.org/ 25 https://www.oreilly.com/learning/introduction-to-local-interpretable-
model-agnostic-explanations-lime
22 http://www.cs.utexas.edu/~ml/publications/area/125/transfer_learning
Select models that meet the privacy needs of the
buyer. For example, not all types of model can
Recommendations for privacy explain how they reached a specific result.
friendly development and use Limit the amount of personal data in the training
data to what is relevant and necessary for the
of AI purpose.
Ensure, and document, that the system you are
In the following we propose a number of
developing meets the requirements for privacy by
recommendations for protecting personal data when
design.
developing and using AI.
Document how the data protection requirements
are met. Documentation is one of the
Recommendations for developers of AI
requirements of the regulations, and will be
These recommendations are meant for actors pursuing requested by customers or users.
AI research and development. They will in the main be Assist customers by showing how different
research milieus in universities and large commercial systems protect personal data, by for example
organisations. These constitute an important target helping to fulfil the duty to provide information,
group because they are developing the basic technology and by showing the customer how to test or audit
that is the basis for further application of AI. the system to ensure compliance with the
regulations and internal requirements.
Conduct research into how intelligent systems
can be made more privacy friendly, such as how Recommendations for organisations purchasing
AI systems can be designed in order to make it and using AI-based systems
easy for users to comply with the regulations.
These recommendations are aimed at organisations
Research can, for example, be carried out on
purchasing and using IT solutions based on AI
solutions that use less training data,
technologies. This could be both commercial and public
anonymisation techniques and on solutions that
organisations.
explain how systems process data and how they
reach their conclusions. Other interesting
Carry out a risk assessment and, if required, carry
research areas include how to conduct system
out a DPIA before you purchase a system, before
audits to ensure the system isn’t biased, especially
you start using it, as well as when in use.
audits by third parties.
Demand that the system you order satisfies the
Adopt a multidisciplinary approach. AI is more
requirements for privacy by design.
than just technology. It is important to put
Conduct regular tests of the system to ensure that
together multi-disciplinary teams that can
it complies with the regulatory requirements, for
consider the consequences for society of the
example, for avoiding latent discriminatory
systems developed. Research can also throw light
treatment.
on the how the use of AI can be of considerable
Ensure that the system protects the rights of your
value to society as well as on the problematical
users; for example, the right to demand limited
areas.
processing.
Ensure that you have good systems for protecting
Recommendations for system suppliers
the rights of data subjects, such as the right to
These recommendations are meant for organisations information, to access and deletion. If consent is
that use basic technologies developed by others – the legal basis of processing, the system must also
organisations that use AI in their own projects or in include functionality enabling consent to be
solutions supplied to others. These can be data given, and to be withdrawn.
controllers or merely a supplier of a service or product. Consider establishing industry norms, ethical
Our recommendations are also relevant for research guidelines or a data protection panel consisting of
milieus utilising technologies developed by others. external experts in the fields of technology,
society and data protection. These can provide
Get familiar with the GDPR – the duties you advice on the legal, ethical, social and
have, and the rights and duties of the users of the technological challenges – and opportunities –
system. linked to the use of AI.
29
Recommendations for end users compelling and justifiable grounds for processing
the data, and that these grounds weigh more
These recommendations are aimed at end users. An end heavily than your interests, rights and freedoms.
user is the data subject using a service or whose personal Right to demand limited processing. If you
details are processed by using AI. are of the opinion that some details are incorrect,
or are being processed unlawfully, or you have
Right to information. You are entitled to exercised your right to protest against processing,
comprehensible and readily available information the organisation may be compelled to stop the
about the processing of your personal data. This data being used, but continue to store them until
right applies both when organisations retrieve the disagreement has been settled.
information directly from you, and when it is Data portability. If, whether contractually or
retrieved from other sources. You shall know having given your consent, you have had personal
what the information is being used for (the data about yourself processed, you can ask for
purpose) and the legal basis on which that the these details to be delivered to you by the
organisation is processing the information. organisation in a structured, generally applicable,
Consent. In many situations the controller must and machine-readable format.
obtain your consent before processing can begin.
The data controller is responsible for Recommendations for authorities
documenting that proper consent has been given,
which means that you have given a voluntary, These recommendations are for legislators and political
specific, informed and unambiguous declaration decision makers as they set the terms and conditions for
that you approve of your personal data being the development and use of AI.
processed. You also have the right to withdraw
any consent you have given previously. Ensure that the public sector sets a good example
Right of access to information. You have the in using AI. This requires acute awareness of the
right to contact organisations and ask whether ethical and privacy consequences of the systems
they are processing details about you, and, if so, they use, as well as expertise as buyers to make
what has been registered. As a rule, you are sure that the systems purchased have privacy by
entitled to a copy of the details registered. There design and that they meet the legislative
are, however, some exceptions to the right of requirements.
access to information, for example within the Allocate funds to research that ensure the
judicial sector. technology processes the personal data in
Right to rectify and delete information. compliance with the regulations. Protecting
You are entitled to ask for incorrect or personal data is not just a legal requirement, but
unnecessary details about you to be rectified or can also be a competitive advantage for
deleted. Norwegian industry.
Right to object to the processing of details Ensure that the enforcement authorities possess
about you. You may have the right to protest the relevant expertise, and arrange for experience
against the processing of details concerning and knowledge sharing across sectoral
yourself. If you protest against direct marketing, boundaries.
it must be stopped without your needing to Ensure that the law keeps apace with
provide further grounds. In other situations, you technological developments. This applies to all
may have to justify your right to object by legislation that has relevance for the use of
explaining the circumstances affecting your personal data.
situation. The organisation must then cease
processing, unless they can prove they have
The Norwegian Data Protection
Authority
Visiting address:
Tollbugata 3, 0152 Oslo, Norway
Postal address:
P.O. Box 8177 Dep.,
0034 Oslo, Norway
postkasse@datatilsynet.no
Telephone: +47 22 39 69 00
datatilsynet.no
personvernbloggen.no
twitter.com/datatilsynet