Professional Documents
Culture Documents
Profiling: From Data To Knowledge
Profiling: From Data To Knowledge
identifying patterns in the data (by run- filing involving real-time adaptation of an behaviour and surfing habits. Because the
ning algorithms through the data base) environment to a user's inferred preferences, profile is directed to one individual person
monitoring data (checking whether new the interpretation may be done by machines. of whom it may disclose intimate knowl-
data fit the pattern or produce outliers) In a reiterating process of checking for edge, a personalised profile seems to have a
The process is often described in terms of outliers while applying generated profiles, a direct impact on privacy. However, this
knowledge discovery in data bases (KDD), machine-learning process may evolve, even depends on how we understand privacy,
involving the collection and storage of data, if the software will be checked and adjusted since it may be the case that the profile
data mining, interpretation and decision by means of human intervention. Auto- identifies a person over a period of time as
making [Cu04]. For the data mining process nomic computing will become especially the same person, disclosing his surfing
a set of non-proprietary guidelines is freely important if the vision of Ambient Intelli- habits, without having access to his name or
available, partly funded by the European gence becomes a reality, to which we will other personal data. At the same time we
Commission, developed in conjunction with devote some special attention below. should take note that group profiles can be
practitioners and vendors, called the Cross- highly specific in a particular context, pro-
Industry Standard Process for Data Mining viding a very rich profile that comes very
(CRISP-DM).1 This model emphasises the 2 Some pertinent close to a personalised profile. This indi-
feedback between the different stages of distinctions cates the blurring of the border between the
data mining, consisting of business under- two types of profiles. Furthermore, person-
standing, data understanding, data prepara- When speaking of profiling one may refer alised profiles can be aggregated to produce
tion, modelling, evaluation and deployment. to a host of different phenomena, which can a group profile.
What is important to keep in mind is the lead to a Babylonian confusion. For this
fact that automated profiling depends on reason FIDIS discriminates between group
profiles and personalised profiles and be-
2.2 Distributive and non-
adequate recording and storage of digital-
ised data. Between the events, transactions tween the construction of profiles and the distributive group profiles
or movements and their storage a translation application of profiles. In both cases the
Group profiling identifies and represents a
takes place that transforms a fluid moment distinction is analytically salient, while in
group, that is, a community or a category. In
into machine-readable data. The data are practice the phenomena intermingle.
the case of a distributive group profile, the
recorded as a type of brute facts, de- attributes of the group are also the attributes
contextualised and – as far as the machine is 2.1 Group profiles and of all the members of the group [Ve99]. For
concerned – without meaning. Most data instance, the attribute of 'not being married'
may be trivial in themselves, acquiring new personalised profiles
for the group of bachelors, but also for any
meaning after having been connected with A group profile identifies and represents a member of that group. However, in the case
other trivial data and used for decision group (community or category), of which it of a non-distributive group profile, matters
making. The crucial instances in the process describes a set of attributes. The group can are complicated. Imagine if a person is
of data mining are the emergence of correla- consist of people that think of themselves as included in the group of people with blue
tions and their interpretation. Profiling is a community, like a class of students, ad- eyes and red hair and imagine that it is the
basically a matter of pattern recognition: the herents to a specific religion or members of case that a group profile is constructed for
knowledge inferred from the data consists an association. The group can also consist this category that indicates 88 % probability
of association, classification or clustering. of a category of people that have no connec- of a specific type of skin disease. This does
For instance, the process of KDD may tions amongst them, other than the fact that not mean that this particular person has an
produce patterns that correlate a certain gait profiling has established them as a category. 88% chance to have this disease, because
to specific learning disabilities. However, For instance, data mining may produce a this may depend on other factors (like age,
the interpretation – the meaning of the correlation between left-handed people and sunlight, eating habits, use of skin lotions).
pattern that is found – depends on practical a certain disease or a certain propensity It does mean, however, that belonging to the
wisdom or professional knowledge, putting towards artistic endeavour. This correlation category allows what Schauer [Sch03] calls
the newly found correlations in a specific is probabilistic and does not depend on the a non-universal generalisation.
professional context. In this case experts fact that left handed people form any sort of In real life, most generalisations are non-
specialised in learning disabilities would be community. The fact that one can be identi- universal, meaning that we learn to cope
involved to assess the relevance of the fied as a member of this category does not with the abundance of detail by imposing
correlations. If the automated profile is necessarily mean that one shares the attrib- some order in the form of generalisations or
considered as relevant knowledge by the utes of this group. This will be discussed categories that provide adequate standards
experts, it also defines certain data as rele- hereunder in reference to non-distributive to assess a new situation. These generalisa-
vant, thus transforming them into informa- group profiles. tions are seldom universal; they are short
tion. After all, data can be noise or informa- A personalised profile identifies and hand for more complex standards that in-
tion depending on such knowledge. Appli- represents a person, of which it describes a corporate the fact that not all members of a
cations of profiling technologies can be set of attributes. The profile can be based category share the same features. If I say
found in marketing, criminal investigation, entirely on the recorded data of one indi- that people who smoke will end up with
detection of fraud or money laundering. vidual person, for instance his keystroke lung cancer I will probably be aware of the
However, in the context of autonomic pro- behaviour or a combination of different fact that this may be the case for a number
types of correlated data like keystroke of people but not for all. This 'goes without
1
See www.crisp-dm.org
saying'. Group profiling seems to produce devices and CCTV-cameras that detect disciplinary perspectives within the FIDIS
such generalisations on the basis of sophis- movement, temperature, and other data. research community has led to broader
ticated algorithms, based on mathematical When this vision materialises we will find perspectives on these issues, with special
inferences instead of intuitive rules of ourselves in the everyware of a networked regard to the implications of profiling for
thumb. This, of course, means that all the environment that seamlessly integrates real democracy and the rule of law [HiGu05].
dangers of non-universal generalisations time monitoring with real time proactive First of all the debate seems entirely fo-
apply equally to non-distributive profiles: adjustments of the environment. Ambient cused on data, while many of these are
one cannot presume for any member of a Intelligence implies that the environment is trivial and most of the time they are as-
category that the group profile applies able to anticipate a user's wishes, even sessed by machines rather then by humans.
without access to additional information. As before he becomes aware of them. This is Apart from abuse there seems little reason
soon as decisions – with for instance legal expected to move well beyond anticipating to fear the collection and storage of these
consequence or other serious impact – are how you like your coffee or room tempera- data.
taken on the basis of such a wrongful pre- ture, as it may cater to your specific health However, in the case of profiling we are
sumption, we find ourselves in the realm of needs, travel plans or your preferred profes- not dealing with data, but with inferred
illegitimate discrimination. sional infrastructure. To allow real time knowledge. For two reasons this is more
adjustment of the environment we need worrying: (1) non-distributive group pro-
2.3 Construction and autonomic profiling. Autonomic profiling files are based on probabilities, this
goes one step further than automated profil- means that the group profile does not
application of profiles ing. The term is derived from what Paul automatically apply to each member of
When discussing profiling we should dis- Horn, IBM's senior vice-president, has the group, (2) profiles may reveal so-
tinguish between the construction of a named autonomous computing [KeCh03]. phisticated knowledge about a person
profile, by means of data mining tech- This is a type of computing that not only that is more intimate than sensitive per-
niques, and the application of a profile, for performs algorithmic functions on incoming sonal data.
instance to inform the decision which cate- data, but also takes a number of decisions Solove [So04] warns that we may develop a
gory of people should be offered (or re- that amount to a kind of self-management. general fear that anything we do will be
fused) a specific service. As indicated Horn compares autonomic computing to the recorded and can be used against us at any
above, the processes of construction and autonomic function of our nervous system, point in time, on the basis of knowledge
application are intermingled. Applying a claiming it should provide for a continu- produced by indifferent anonymous ma-
profile may consist of checking outliers that ously readjusted environment without dis- chines. He suggests that the metaphor of
suggest unusual – or undesirable – behav- turbing us with complex decision-making Big Brother does not cover the distributed
iour. This checking of outliers is at the same processes. Just like your nervous system spying generated by a host of private and
time a test for the profile, allowing adjust- does not ask for your consent to adjust your public organisations. Instead he refers us to
ment to curb the number of outliers or to body temperature or heart rate, autonomic the metaphor of Kafka's The Trial, because
change some of the parameters that have computing should unobtrusively work out it saliently articulates the vagueness of the
turned out to be spurious. the right fit with your surroundings. Auto- accusations and the indifference of the
nomic profiling thus implies that adaptive prosecuting bureaucracy.
environments function smoothly without Second, as a consequence of the focus
3 The Internet of too much intervention of the end-user, on data instead of knowledge, the debate
meaning that machines take all the neces-
Things and sary decisions, based on their profiling
seems to be directed to anonymisation, or
the use of pseudonyms, in order to protect
Autonomic Profiling activities [Hi07]. This is meant to unburden
the human person, but it may obviously also
personal data.
However, citizens may rather need pro-
Profiling is the enabling technology for disempower citizens regarding the choices
tection against the application of pro-
Ambient Intelligence (ISTAG 2001), that are made for them [BoCo04].
files, or at least access to such profiles
[SchHi05] and what has been called The
and transparency concerning their use.
Internet of Things [ITU05]. Without ade-
quate profiling we will not be able to handle 4 Beyond privacy and Data Protection is a tool of transparency
that aims to guarantee access to the process-
the innumerable data recorded when the real security? ing of personal data, but precisely when
world goes online, we will miss the means
to make sense of the data, mistaking noise The discourse on the dangers or threats personal data are anonymised data protec-
for information or information for knowl- faced by further development of informa- tion legislation is no longer applicable. This
edge. In fact, we would be engulfed by tion society seems locked in a debate about means that citizens have no legal right to
noise, lacking the tools to discriminate the balance between privacy and security. even access the knowledge that is inferred
between what is relevant at which moment Citizens are persuaded that in times of from these anonymised data and may be
in which context. international terrorism and transnational used in ways that impact their lives. Once a
Imagine all things to be RFID-tagged organised crime, they are better off when profile is linked to an identifiable person –
and part of an RFID-system that allows trading a bit of their privacy for security and for instance in the case of credit scoring – it
reading and online storing of their status, it seems that apart from privacy advocates may turn into a personal data, thus reviving
location and other data, while at the same not many citizens have sleepless nights over the applicability of data protection legisla-
time all spaces are provided with sensor this trade-off. The exchange of cross- tion. This protection, however, comes after
tional for the exercise of individual prefer- Epidemiology., Wolf Legal Publishers,
5 Ambient Law: ence. It involves clear thinking about the Nijmegen 2004.
HaMe06 Hansen, M., Meints, M., 'Digitale
Integration of Data normative impacts of technological artifacts
Identitäten – Überblick und aktuelle
and technological infrastructure and de-
Protection, TETs and mands political choice about the kind of
Trends', in this issue.
Hi07 Hildebrandt, M. 'Defining Profiling: A
PETs information society we want to inhabit. New Type of Knowledge', in Profiling the
European Citizen. A Cross-disciplinary
Anonymisation may protect against abuse Perspective. M. Hildebrandt and S. Gut-
of personal data, but it will not protect
Summary wirth (Eds.), Springer 2007.
HiBa05 Hildebrandt, M., Backhouse, J. FIDIS
against application of group profiles, in- Data Protection is focused on data. It takes Deliverable D7.2 – Descripitve analysis
ferred from anonymised data or from the a proactive perspective by demanding that and inventory of profiling practices.
application of a group profile to a person data are collected in a restricted manner, Brussels 2005. Download via:
that is identified only by means of behav- pointedly expressed in the data minimisa- www.fidis.net
ioural biometric profiling. This is not to HiGu05 Hildebrandt, M. and S. Gutwirth,
tion principle.
claim that the application of group profiles (Eds.) FIDIS Deliverable D7.4 – Implica-
Profiling is not about data but about tions of profiling practices on democracy
is a bad thing in itself [Kr86], or to claim knowledge. However it feeds on data and rule of law. Brussels 2005. Dowload
that anonymisation or the use of pseudo- and in the context of an Internet of via http://www.fidis.net
nyms makes no sense. PETs can provide Things or an Ambient Intelligent envi- ISTAG01 ISTAG., Scenarios for Ambient
much needed means for identity manage- Intelligence in 2010, Information Society
ronment it demands as many data as pos-
ment, as discussed in [BaMeHa05]. The Technology Advisory Group 2001. Dow-
sible. Even though the protection of per- load: http://www.cordis.lu/ist/istag-
point is that we need to find ways to render sonal data can limit profiling by limiting reports.htm
the processing of data transparent, after the the input of data, anonymisation will not ITU05 International Telecommunications
data have been anonymised and before they limit but rather facilitate large scale Union (ITU), The Internet of Things. Ge-
are applied. Citizens must be able to antici- group profiling. The protection needed at neva 2005.
pate the profiles that may be applied to this point is not just of our own data but Ji02 Jiang, X. Safeguard Privacy in Ubiqui-
them and be given the legal and technologi- tous Computing with Decentralized In-
the protection of our capacity to antici- formation Spaces: Bridging the Technical
cal tools to resist the validity and relevance pate which group profiles may affect our and the Social. Privacy Workshop Sep-
of the profile in their particular case. personal lives. For this we need to create tember 29, 2002, University of Califor-
For this reason FIDIS aims to develop a a legal-technological infrastructure that nia, Berkeley. Berkeley 2002. Download:
cross-disciplinary perspective between provides us with the legal-technological http://guir.berkeley.edu/pubs/ubicomp200
computer scientists, technologists and law- means to minimise the leaking of data, to 2/privacyworkshop/papers/jiang-
yers to prepare a technological infrastruc- privacyworkshop.pdf
anticipate which profiles may affect us, KeCh03 Kephart, J. O. and D. M. Chess 'The
ture that would: to contest the inherent knowledge claims Vision of Autonomic Computing.' Com-
integrate the mandatory aspects of Data they entail and to challenge their appli- puter January 2003.
Protection legislation and cation if necessary. Kr86 Kranzberg, M., 'Technology and His-
facilitate machine-to-machine communi- tory: 'Kranzberg's Laws'.' Technology
cation between citizens' personal digital and Culture 27: pp. 544-560, 1986
assistants and networked environments,
Literature Sch03 Schauer, F. Profiles Probabilities and
allowing adequate anticipation of auto- Stereotypes. Cambridge, Massachusetts,
Ag01 Agre, P. E. 'Introduction', Technology
London, England, Belknap Press of Har-
nomic profiling. and Privacy: The New Landscape. P. E.
vard University Press 2003.
In this case the focus of Privacy Enhancing Agre and M. Rotenberg. Cambridge,
SchHi05 Schreurs, W., M. Hildebrandt, et al.
Technologies (PETs) would be on what has Massachusetts, MIT, 2001
(Eds.), FIDIS Deliverable D7.3 – Report
BaMeHa05 Bauer, M., Meints, M., Hansen,
been called the 'principle of minimum on Actual and Possible Profiling Tech-
M. (Hrsg.), FIDIS Deliverable D3.1 –
asymmetry' [Ji02], combining the data niques in the Field of Ambient Intelli-
Structured Overview on Prototypes and
minimisation principle that restricts the gence, p. 68, Brussels 2005.
Concepts of Identity Management Sys-
SchHi07 Schreurs, W., M. Hildebrandt, et al.
flow of information of data subjects to data tems, Frankfurt a.M. 2005. Download:
'Cogitas ergo sum. The role of data pro-
processors, with a maximisation of the feed- http://www.fidis.net/486.0.html.
tection law and non discrimination law in
back from data processors. Not just to find BoCo04 Bohn, J., V. Coroama, et al. Social,
group profiling in the private sector, in:
Economic, and Ethical Implications of
out what happened to your personal data, Profiling the European Citizen. A Cross-
Ambient Intelligence and Ubiquitous
but first of all to find out which profiles disciplinary Perspective, M. Hildebrandt
Computing, Institute for Pervasive Com-
may be inferred that will impact you as a and S. Gutwirth (Eds.), Springer 2007
puting, ETH Zurich, Zurich, 2004.
So04 Solove, D. J., The Digital Person. Tech-
member of a certain category. To allow such Download: www.vs.inf.ethz.ch/publ/papers
nology and Privacy in the Information
'counter profiling' we need to develop /socialambient.pdf.
Age. New York, New York University
Transparency Enhancing Technologies By01 Bygrave, L. 'Minding the Machine. Art.15
Press 2004.
(TETs). and the EC Data Protection Directive and
Su01 Sunstein, C., Republic.com. Princeton
automated profiling. 'Computer Law & Se-
Ambient Law would imply that the use and Oxford, Princeton University Press
curity Report, 17: pp. 17-24, 2001
of PETs (and TETs) is not left to individual 2001.
Cu04 Custers, B. The Power of Knowledge.
preference but is part and parcel of a legal- Ve99 Vedder, A. „KDD: The challenge to
Ethical, Legal, and Technological Aspects
individualism.“ Ethics and Information
technological framework that is precondi- of Data Mining and Group Profiling in
Technology 1, pp. 275-281, 1999.