You are on page 1of 5

Schwerpunkt

Profiling: From Data to Knowledge


The challenges of a crucial technology
Mireille Hildebrandt

knowledge that predicts future behaviour on


Profiling is not about data but about Introduction: Saved statistical grounds, often building on what
knowledge. It provides a crucial tech-
nology in a society that is flooded with
by profiling seems trivial data. The results are manifold.
Genetic profiling can deliver effective
noise and information. Profiling is technologies? correlations between individual genes and
another term for sophisticated pattern Profiling seems to be the only viable tech- diseases, without having much of a clue as
recognition, and the enabling technol- nology that can save us from the 2 problems to the underlying causal chain. Profiling of
ogy for Ambient Intelligence. It con- of Information Society: (1) the overload of keystroke behaviour can serve to identify a
information and (2) the blurring of the person as the same person whenever she
fronts us with a new type of inductive
borders between noise, information and goes online, thus allowing web profiling
knowledge, inferred by means of auto- even without any other identifier. The pos-
knowledge. More data does not necessarily
mated algorithms. To the extent that mean more knowledge or information and sibility to identify a person in many differ-
decisions that impact our lives are the increasing use of computer technologies ent ways and for many different purposes
based on such knowledge, we need to may flood us with data out of which no leads to the classification of profiling as one
single human mind can filter what is rele- of three basic types of identity management
develop the means to make this knowl-
vant. We may end up lost instead of knowl- [HaMe06].
edge accessible for individual citizens Integrating the data of a variety of data-
edgeable. In the most optimistic scenario,
and provide them with the legal and profiling promises a dynamic, contextual bases enables the construction of refined
technological tools to anticipate and selection of relevant information and the group profiles that should adequately repre-
contest such knowledge or challenge its inference of pertinent knowledge. One sent categories of peoples, providing a
could say that a profile is knowledge to the detailed picture of e.g. their probable earn-
application.
extent that it turns data into information, ing capacities (e.g. credit storing) and
allowing one to discriminate between rele- proneness to risk (e.g. health, criminal
vant and irrelevant data (in a specific con- recidivism, victimisation). Hereunder we
text). However, this type of knowledge does will explain how FIDIS understands profil-
not deliver metaphysical truths, proof of ing and indicate some of the challenges it
causality or conclusive reasoning. Instead it evokes, especially as it is the enabling
builds on mathematical correlations be- technology for Ambient Intelligence (AmI)
tween aggregated machine-readable data, and The Internet of Things. In the conclud-
which correlations are indicative of ex- ing remarks we will argue for a well bal-
pected future behaviour. As with all overly anced tool kit of technological and legal
optimistic faith in technological progress, instruments to protect some of the basic
reality is sobering but nevertheless profiling tenets of constitutional democracy, as these
does make a difference. Sobering because may face serious provocation in an informa-
profiling is not just a technology (combin- tion society that renders its citizens virtually
ing hardware with software), but also a transparent.
practice, in need of professional knowledge
Dr Mireille to decide which are spurious or otherwise 1 The process of
Hildebrandt irrelevant correlations. Machines are help-
ing us to sort out noise from information
profiling
Senior Researcher at and to infer knowledge from data, but in the FIDIS is interested in the process of auto-
LSTS, Vrije Univer- end we write the software and check mated profiling, which involves [HiBa05]:
siteit Brussel, teach- whether it does the job. However, profiling „ recording of data (taking note of them in
ing at Faculty of does make a difference, because the type of a computable manner):
Law Erasmus Uni- knowledge produced by profiling practices „ storing data (in a way that renders them
versity Rotterdam. is of a different nature compared to tradi- accessible, aggregated in a certain way)
Focus on issues of identity in constitutional tional scientific knowledge that starts out „ tracking data (recording and storing over
with hypotheses to be tested in search of a period of time, linking data to the same
democracy.
causes (empiricism) or reasons (rational- data subject)
E-Mail: hildebrandt@frg.eur.nl
ism). Here we have a pragmatic type of

548 DuD • Datenschutz und Datensicherheit 30 (2006) 9


Mireille Hildebrandt

„ identifying patterns in the data (by run- filing involving real-time adaptation of an behaviour and surfing habits. Because the
ning algorithms through the data base) environment to a user's inferred preferences, profile is directed to one individual person
„ monitoring data (checking whether new the interpretation may be done by machines. of whom it may disclose intimate knowl-
data fit the pattern or produce outliers) In a reiterating process of checking for edge, a personalised profile seems to have a
The process is often described in terms of outliers while applying generated profiles, a direct impact on privacy. However, this
knowledge discovery in data bases (KDD), machine-learning process may evolve, even depends on how we understand privacy,
involving the collection and storage of data, if the software will be checked and adjusted since it may be the case that the profile
data mining, interpretation and decision by means of human intervention. Auto- identifies a person over a period of time as
making [Cu04]. For the data mining process nomic computing will become especially the same person, disclosing his surfing
a set of non-proprietary guidelines is freely important if the vision of Ambient Intelli- habits, without having access to his name or
available, partly funded by the European gence becomes a reality, to which we will other personal data. At the same time we
Commission, developed in conjunction with devote some special attention below. should take note that group profiles can be
practitioners and vendors, called the Cross- highly specific in a particular context, pro-
Industry Standard Process for Data Mining viding a very rich profile that comes very
(CRISP-DM).1 This model emphasises the 2 Some pertinent close to a personalised profile. This indi-
feedback between the different stages of distinctions cates the blurring of the border between the
data mining, consisting of business under- two types of profiles. Furthermore, person-
standing, data understanding, data prepara- When speaking of profiling one may refer alised profiles can be aggregated to produce
tion, modelling, evaluation and deployment. to a host of different phenomena, which can a group profile.
What is important to keep in mind is the lead to a Babylonian confusion. For this
fact that automated profiling depends on reason FIDIS discriminates between group
profiles and personalised profiles and be-
2.2 Distributive and non-
adequate recording and storage of digital-
ised data. Between the events, transactions tween the construction of profiles and the distributive group profiles
or movements and their storage a translation application of profiles. In both cases the
Group profiling identifies and represents a
takes place that transforms a fluid moment distinction is analytically salient, while in
group, that is, a community or a category. In
into machine-readable data. The data are practice the phenomena intermingle.
the case of a distributive group profile, the
recorded as a type of brute facts, de- attributes of the group are also the attributes
contextualised and – as far as the machine is 2.1 Group profiles and of all the members of the group [Ve99]. For
concerned – without meaning. Most data instance, the attribute of 'not being married'
may be trivial in themselves, acquiring new personalised profiles
for the group of bachelors, but also for any
meaning after having been connected with A group profile identifies and represents a member of that group. However, in the case
other trivial data and used for decision group (community or category), of which it of a non-distributive group profile, matters
making. The crucial instances in the process describes a set of attributes. The group can are complicated. Imagine if a person is
of data mining are the emergence of correla- consist of people that think of themselves as included in the group of people with blue
tions and their interpretation. Profiling is a community, like a class of students, ad- eyes and red hair and imagine that it is the
basically a matter of pattern recognition: the herents to a specific religion or members of case that a group profile is constructed for
knowledge inferred from the data consists an association. The group can also consist this category that indicates 88 % probability
of association, classification or clustering. of a category of people that have no connec- of a specific type of skin disease. This does
For instance, the process of KDD may tions amongst them, other than the fact that not mean that this particular person has an
produce patterns that correlate a certain gait profiling has established them as a category. 88% chance to have this disease, because
to specific learning disabilities. However, For instance, data mining may produce a this may depend on other factors (like age,
the interpretation – the meaning of the correlation between left-handed people and sunlight, eating habits, use of skin lotions).
pattern that is found – depends on practical a certain disease or a certain propensity It does mean, however, that belonging to the
wisdom or professional knowledge, putting towards artistic endeavour. This correlation category allows what Schauer [Sch03] calls
the newly found correlations in a specific is probabilistic and does not depend on the a non-universal generalisation.
professional context. In this case experts fact that left handed people form any sort of In real life, most generalisations are non-
specialised in learning disabilities would be community. The fact that one can be identi- universal, meaning that we learn to cope
involved to assess the relevance of the fied as a member of this category does not with the abundance of detail by imposing
correlations. If the automated profile is necessarily mean that one shares the attrib- some order in the form of generalisations or
considered as relevant knowledge by the utes of this group. This will be discussed categories that provide adequate standards
experts, it also defines certain data as rele- hereunder in reference to non-distributive to assess a new situation. These generalisa-
vant, thus transforming them into informa- group profiles. tions are seldom universal; they are short
tion. After all, data can be noise or informa- A personalised profile identifies and hand for more complex standards that in-
tion depending on such knowledge. Appli- represents a person, of which it describes a corporate the fact that not all members of a
cations of profiling technologies can be set of attributes. The profile can be based category share the same features. If I say
found in marketing, criminal investigation, entirely on the recorded data of one indi- that people who smoke will end up with
detection of fraud or money laundering. vidual person, for instance his keystroke lung cancer I will probably be aware of the
However, in the context of autonomic pro- behaviour or a combination of different fact that this may be the case for a number
types of correlated data like keystroke of people but not for all. This 'goes without
1
See www.crisp-dm.org

DuD • Datenschutz und Datensicherheit 30 (2006) 9 549


Profiling: From Data to Knowledge

saying'. Group profiling seems to produce devices and CCTV-cameras that detect disciplinary perspectives within the FIDIS
such generalisations on the basis of sophis- movement, temperature, and other data. research community has led to broader
ticated algorithms, based on mathematical When this vision materialises we will find perspectives on these issues, with special
inferences instead of intuitive rules of ourselves in the everyware of a networked regard to the implications of profiling for
thumb. This, of course, means that all the environment that seamlessly integrates real democracy and the rule of law [HiGu05].
dangers of non-universal generalisations time monitoring with real time proactive First of all the debate seems entirely fo-
apply equally to non-distributive profiles: adjustments of the environment. Ambient cused on data, while many of these are
one cannot presume for any member of a Intelligence implies that the environment is trivial and most of the time they are as-
category that the group profile applies able to anticipate a user's wishes, even sessed by machines rather then by humans.
without access to additional information. As before he becomes aware of them. This is Apart from abuse there seems little reason
soon as decisions – with for instance legal expected to move well beyond anticipating to fear the collection and storage of these
consequence or other serious impact – are how you like your coffee or room tempera- data.
taken on the basis of such a wrongful pre- ture, as it may cater to your specific health However, in the case of profiling we are
sumption, we find ourselves in the realm of needs, travel plans or your preferred profes- not dealing with data, but with inferred
illegitimate discrimination. sional infrastructure. To allow real time knowledge. For two reasons this is more
adjustment of the environment we need worrying: (1) non-distributive group pro-
2.3 Construction and autonomic profiling. Autonomic profiling files are based on probabilities, this
goes one step further than automated profil- means that the group profile does not
application of profiles ing. The term is derived from what Paul automatically apply to each member of
When discussing profiling we should dis- Horn, IBM's senior vice-president, has the group, (2) profiles may reveal so-
tinguish between the construction of a named autonomous computing [KeCh03]. phisticated knowledge about a person
profile, by means of data mining tech- This is a type of computing that not only that is more intimate than sensitive per-
niques, and the application of a profile, for performs algorithmic functions on incoming sonal data.
instance to inform the decision which cate- data, but also takes a number of decisions Solove [So04] warns that we may develop a
gory of people should be offered (or re- that amount to a kind of self-management. general fear that anything we do will be
fused) a specific service. As indicated Horn compares autonomic computing to the recorded and can be used against us at any
above, the processes of construction and autonomic function of our nervous system, point in time, on the basis of knowledge
application are intermingled. Applying a claiming it should provide for a continu- produced by indifferent anonymous ma-
profile may consist of checking outliers that ously readjusted environment without dis- chines. He suggests that the metaphor of
suggest unusual – or undesirable – behav- turbing us with complex decision-making Big Brother does not cover the distributed
iour. This checking of outliers is at the same processes. Just like your nervous system spying generated by a host of private and
time a test for the profile, allowing adjust- does not ask for your consent to adjust your public organisations. Instead he refers us to
ment to curb the number of outliers or to body temperature or heart rate, autonomic the metaphor of Kafka's The Trial, because
change some of the parameters that have computing should unobtrusively work out it saliently articulates the vagueness of the
turned out to be spurious. the right fit with your surroundings. Auto- accusations and the indifference of the
nomic profiling thus implies that adaptive prosecuting bureaucracy.
environments function smoothly without Second, as a consequence of the focus
3 The Internet of too much intervention of the end-user, on data instead of knowledge, the debate
meaning that machines take all the neces-
Things and sary decisions, based on their profiling
seems to be directed to anonymisation, or
the use of pseudonyms, in order to protect
Autonomic Profiling activities [Hi07]. This is meant to unburden
the human person, but it may obviously also
personal data.
However, citizens may rather need pro-
Profiling is the enabling technology for disempower citizens regarding the choices
tection against the application of pro-
Ambient Intelligence (ISTAG 2001), that are made for them [BoCo04].
files, or at least access to such profiles
[SchHi05] and what has been called The
and transparency concerning their use.
Internet of Things [ITU05]. Without ade-
quate profiling we will not be able to handle 4 Beyond privacy and Data Protection is a tool of transparency
that aims to guarantee access to the process-
the innumerable data recorded when the real security? ing of personal data, but precisely when
world goes online, we will miss the means
to make sense of the data, mistaking noise The discourse on the dangers or threats personal data are anonymised data protec-
for information or information for knowl- faced by further development of informa- tion legislation is no longer applicable. This
edge. In fact, we would be engulfed by tion society seems locked in a debate about means that citizens have no legal right to
noise, lacking the tools to discriminate the balance between privacy and security. even access the knowledge that is inferred
between what is relevant at which moment Citizens are persuaded that in times of from these anonymised data and may be
in which context. international terrorism and transnational used in ways that impact their lives. Once a
Imagine all things to be RFID-tagged organised crime, they are better off when profile is linked to an identifiable person –
and part of an RFID-system that allows trading a bit of their privacy for security and for instance in the case of credit scoring – it
reading and online storing of their status, it seems that apart from privacy advocates may turn into a personal data, thus reviving
location and other data, while at the same not many citizens have sleepless nights over the applicability of data protection legisla-
time all spaces are provided with sensor this trade-off. The exchange of cross- tion. This protection, however, comes after

550 DuD • Datenschutz und Datensicherheit 30 (2006) 9


Mireille Hildebrandt

the fact [SchHi07], not providing access to


the dynamic group profiles available to the
service provider, who may even protect the
profiles by means of intellectual property
rights. Art. 15 of the data protection direc-
tive (D 46/95 EC) does provide some pro-
tection in the case a 'decision which pro-
duces legal effects concerning him or sig-
nificantly affects him and which is based
solely on automated processing of data
intended to evaluate certain personal as-
pects relating to him, such as his perform-
ance at work, creditworthiness, reliability,
conduct'. As Bygrave [By01] explains in his
analysis of art. 15, the article provides data
subjects with a right not be subjected to
such decisions, but we may expect that if
the right is not exercised, these types of
decisions will continue to proliferate. Also
in this case, one does not have access to the
dynamic group profiles that may or may not
be applied. This produces an asymmetry of
knowledge between profiler and profiled
subject.
Third, in an AmI environment people
may be identified on the basis of behav-
ioural biometric profiling, which renders
identification in the sense of art 2 (a) of D
46/95 EC unnecessary, again ruling out
applicability of data protection legislation
as it stands now.
Fourth, in the context of ICT privacy is
often reduced to control over the disclosure
of personal data, while in fact privacy is a
good or value that concerns both more and fairness and due process in the wider socie- Like other public goods, such as security,
less than the exchange of data. Privacy tal context. equality, fairness and due process, privacy
concerns the capacity to continuously re- Profiling shifts the balance of power be- needs protection beyond the arbitrary deci-
construct one's identity and to control the tween those that can afford profiling sions of individual citizens. Profiling may
borders between self and others [Ag01]. (mostly large organisations) and those that not impact our sense of privacy, or even our
For this reason it is pertinent to distin- are being profiled (mostly individual citi- expectation of privacy, because we are not
guish between data protection and pri- zens), because the profilers have a certain aware of it. But it may still invade our
vacy, because the one is a tool that aims type of knowledge to which those profiled privacy to a much greater extent than unau-
for transparency, while the latter refers have no effective access. This particular thorised use of personal data. The threat of
the opacity of the personal sphere that lack of transparency is not only a matter of autonomic profiling is the unobtrusive
should enable the positive and negative the non-applicability of data protection ubiquitous disclosure of patterns that define
freedom of individual citizens, empow- regimes in the case of anonymised data. At our most intimate habits, beyond our
ering them to partake in private and pub- this moment we also lack the technological awareness. It may provide us with a golden
lic life without undue interference. tools to anticipate the type of profiles that cage, in as far as AmI caters to our inferred
Reducing privacy to control over personal may be constructed and applied to us. wishes. But, like Sunstein claims, even if
data mistakes data protection, which actu- Sixth, the value of privacy is often un- we would prefer this in our capacity of
ally aims for a free flow of information, for derstood as if privacy is a private good, private citizens, it will undermine our ca-
the protection of essential rights and liber- one that can be traded at will against pacities to function as public citizens. This
ties, which may be at stake at the moment other private goods, or even disowned in is the case because we will lack the con-
of application of group profiles rather than order to protect public goods like intra- frontation with what the machines expect us
at the moment of data collection. or international security. Without deny- to dislike, thus reducing our confrontation
Fifth, the prevalent focus on privacy and ing that privacy is also a private good, with 'unplanned, unexpected encounters
security issues seems to distract attention we should not forget the value of privacy [that] are central to democracy itself'.
from far-reaching consequences of ad- as a public good that is preconditional [Su01:8].
vanced profiling technologies for equality, for a viable constitutional democracy.

DuD • Datenschutz und Datensicherheit 30 (2006) 9 551


Profiling: From Data to Knowledge

tional for the exercise of individual prefer- Epidemiology., Wolf Legal Publishers,
5 Ambient Law: ence. It involves clear thinking about the Nijmegen 2004.
HaMe06 Hansen, M., Meints, M., 'Digitale
Integration of Data normative impacts of technological artifacts
Identitäten – Überblick und aktuelle
and technological infrastructure and de-
Protection, TETs and mands political choice about the kind of
Trends', in this issue.
Hi07 Hildebrandt, M. 'Defining Profiling: A
PETs information society we want to inhabit. New Type of Knowledge', in Profiling the
European Citizen. A Cross-disciplinary
Anonymisation may protect against abuse Perspective. M. Hildebrandt and S. Gut-
of personal data, but it will not protect
Summary wirth (Eds.), Springer 2007.
HiBa05 Hildebrandt, M., Backhouse, J. FIDIS
against application of group profiles, in- Data Protection is focused on data. It takes Deliverable D7.2 – Descripitve analysis
ferred from anonymised data or from the a proactive perspective by demanding that and inventory of profiling practices.
application of a group profile to a person data are collected in a restricted manner, Brussels 2005. Download via:
that is identified only by means of behav- pointedly expressed in the data minimisa- www.fidis.net
ioural biometric profiling. This is not to HiGu05 Hildebrandt, M. and S. Gutwirth,
tion principle.
claim that the application of group profiles (Eds.) FIDIS Deliverable D7.4 – Implica-
Profiling is not about data but about tions of profiling practices on democracy
is a bad thing in itself [Kr86], or to claim knowledge. However it feeds on data and rule of law. Brussels 2005. Dowload
that anonymisation or the use of pseudo- and in the context of an Internet of via http://www.fidis.net
nyms makes no sense. PETs can provide Things or an Ambient Intelligent envi- ISTAG01 ISTAG., Scenarios for Ambient
much needed means for identity manage- Intelligence in 2010, Information Society
ronment it demands as many data as pos-
ment, as discussed in [BaMeHa05]. The Technology Advisory Group 2001. Dow-
sible. Even though the protection of per- load: http://www.cordis.lu/ist/istag-
point is that we need to find ways to render sonal data can limit profiling by limiting reports.htm
the processing of data transparent, after the the input of data, anonymisation will not ITU05 International Telecommunications
data have been anonymised and before they limit but rather facilitate large scale Union (ITU), The Internet of Things. Ge-
are applied. Citizens must be able to antici- group profiling. The protection needed at neva 2005.
pate the profiles that may be applied to this point is not just of our own data but Ji02 Jiang, X. Safeguard Privacy in Ubiqui-
them and be given the legal and technologi- tous Computing with Decentralized In-
the protection of our capacity to antici- formation Spaces: Bridging the Technical
cal tools to resist the validity and relevance pate which group profiles may affect our and the Social. Privacy Workshop Sep-
of the profile in their particular case. personal lives. For this we need to create tember 29, 2002, University of Califor-
For this reason FIDIS aims to develop a a legal-technological infrastructure that nia, Berkeley. Berkeley 2002. Download:
cross-disciplinary perspective between provides us with the legal-technological http://guir.berkeley.edu/pubs/ubicomp200
computer scientists, technologists and law- means to minimise the leaking of data, to 2/privacyworkshop/papers/jiang-
yers to prepare a technological infrastruc- privacyworkshop.pdf
anticipate which profiles may affect us, KeCh03 Kephart, J. O. and D. M. Chess 'The
ture that would: to contest the inherent knowledge claims Vision of Autonomic Computing.' Com-
„ integrate the mandatory aspects of Data they entail and to challenge their appli- puter January 2003.
Protection legislation and cation if necessary. Kr86 Kranzberg, M., 'Technology and His-
„ facilitate machine-to-machine communi- tory: 'Kranzberg's Laws'.' Technology
cation between citizens' personal digital and Culture 27: pp. 544-560, 1986
assistants and networked environments,
Literature Sch03 Schauer, F. Profiles Probabilities and
allowing adequate anticipation of auto- Stereotypes. Cambridge, Massachusetts,
Ag01 Agre, P. E. 'Introduction', Technology
London, England, Belknap Press of Har-
nomic profiling. and Privacy: The New Landscape. P. E.
vard University Press 2003.
In this case the focus of Privacy Enhancing Agre and M. Rotenberg. Cambridge,
SchHi05 Schreurs, W., M. Hildebrandt, et al.
Technologies (PETs) would be on what has Massachusetts, MIT, 2001
(Eds.), FIDIS Deliverable D7.3 – Report
BaMeHa05 Bauer, M., Meints, M., Hansen,
been called the 'principle of minimum on Actual and Possible Profiling Tech-
M. (Hrsg.), FIDIS Deliverable D3.1 –
asymmetry' [Ji02], combining the data niques in the Field of Ambient Intelli-
Structured Overview on Prototypes and
minimisation principle that restricts the gence, p. 68, Brussels 2005.
Concepts of Identity Management Sys-
SchHi07 Schreurs, W., M. Hildebrandt, et al.
flow of information of data subjects to data tems, Frankfurt a.M. 2005. Download:
'Cogitas ergo sum. The role of data pro-
processors, with a maximisation of the feed- http://www.fidis.net/486.0.html.
tection law and non discrimination law in
back from data processors. Not just to find BoCo04 Bohn, J., V. Coroama, et al. Social,
group profiling in the private sector, in:
Economic, and Ethical Implications of
out what happened to your personal data, Profiling the European Citizen. A Cross-
Ambient Intelligence and Ubiquitous
but first of all to find out which profiles disciplinary Perspective, M. Hildebrandt
Computing, Institute for Pervasive Com-
may be inferred that will impact you as a and S. Gutwirth (Eds.), Springer 2007
puting, ETH Zurich, Zurich, 2004.
So04 Solove, D. J., The Digital Person. Tech-
member of a certain category. To allow such Download: www.vs.inf.ethz.ch/publ/papers
nology and Privacy in the Information
'counter profiling' we need to develop /socialambient.pdf.
Age. New York, New York University
Transparency Enhancing Technologies By01 Bygrave, L. 'Minding the Machine. Art.15
Press 2004.
(TETs). and the EC Data Protection Directive and
Su01 Sunstein, C., Republic.com. Princeton
automated profiling. 'Computer Law & Se-
Ambient Law would imply that the use and Oxford, Princeton University Press
curity Report, 17: pp. 17-24, 2001
of PETs (and TETs) is not left to individual 2001.
Cu04 Custers, B. The Power of Knowledge.
preference but is part and parcel of a legal- Ve99 Vedder, A. „KDD: The challenge to
Ethical, Legal, and Technological Aspects
individualism.“ Ethics and Information
technological framework that is precondi- of Data Mining and Group Profiling in
Technology 1, pp. 275-281, 1999.

552 DuD • Datenschutz und Datensicherheit 30 (2006) 9

You might also like