You are on page 1of 7

Profiling (information science)

In information science, profiling refers to the process of construction and application of user profiles
generated by computerized data analysis.

This is the use of algorithms or other mathematical techniques that allow the discovery of patterns or
correlations in large quantities of data, aggregated in databases. When these patterns or correlations are used
to identify or represent people, they can be called profiles. Other than a discussion of profiling technologies
or population profiling, the notion of profiling in this sense is not just about the construction of profiles, but
also concerns the application of group profiles to individuals, e.  g., in the cases of credit scoring, price
discrimination, or identification of security risks (Hildebrandt & Gutwirth 2008) (Elmer 2004).

Profiling is being used in fraud prevention, ambient intelligence, and consumer analytics. Statistical
methods of profiling include Knowledge Discovery in Databases (KDD).

The profiling process


The technical process of profiling can be separated in several steps:

Preliminary grounding: The profiling process starts with a specification of the applicable
problem domain and the identification of the goals of analysis.
Data collection: The target dataset or database for analysis is formed by selecting the
relevant data in the light of existing domain knowledge and data understanding.
Data preparation: The data are preprocessed for removing noise and reducing complexity by
eliminating attributes.
Data mining: The data are analysed with the algorithm or heuristics developed to suit the
data, model and goals.
Interpretation: The mined patterns are evaluated on their relevance and validity by
specialists and/or professionals in the application domain (e.g. excluding spurious
correlations).
Application: The constructed profiles are applied, e.g. to categories of persons, to test and
fine-tune the algorithms.
Institutional decision: The institution decides what actions or policies to apply to groups or
individuals whose data match a relevant profile.

Data collection, preparation and mining all belong to the phase in which the profile is under construction.
However, profiling also refers to the application of profiles, meaning the usage of profiles for the
identification or categorization of groups or individual persons. As can be seen in step six (application), the
process is circular. There is a feedback loop between the construction and the application of profiles. The
interpretation of profiles can lead to the reiterant – possibly real-time – fine-tuning of specific previous steps
in the profiling process. The application of profiles to people whose data were not used to construct the
profile is based on data matching, which provides new data that allows for further adjustments. The process
of profiling is both dynamic and adaptive. A good illustration of the dynamic and adaptive nature of
profiling is the Cross-Industry Standard Process for Data Mining (CRISP-DM).

Types of profiling practices


In order to clarify the nature of profiling technologies, some crucial distinctions have to be made between
different types of profiling practices, apart from the distinction between the construction and the application
of profiles. The main distinctions are those between bottom-up and top-down profiling (or supervised and
unsupervised learning), and between individual and group profiles.

Supervised and unsupervised learning

Profiles can be classified according to the way they have been generated (Fayyad, Piatetsky-Shapiro &
Smyth 1996) (Zarsky & 2002-3). On the one hand, profiles can be generated by testing a hypothesized
correlation. This is called top-down profiling or supervised learning. This is similar to the methodology of
traditional scientific research in that it starts with a hypothesis and consists of testing its validity. The result
of this type of profiling is the verification or refutation of the hypothesis. One could also speak of deductive
profiling. On the other hand, profiles can be generated by exploring a data base, using the data mining
process to detect patterns in the data base that were not previously hypothesized. In a way, this is a matter
of generating hypothesis: finding correlations one did not expect or even think of. Once the patterns have
been mined, they will enter the loop – described above – and will be tested with the use of new data. This
is called unsupervised learning.

Two things are important with regard to this distinction. First, unsupervised learning algorithms seem to
allow the construction of a new type of knowledge, not based on hypothesis developed by a researcher and
not based on causal or motivational relations but exclusively based on stochastical correlations. Second,
unsupervised learning algorithms thus seem to allow for an inductive type of knowledge construction that
does not require theoretical justification or causal explanation (Custers 2004).

Some authors claim that if the application of profiles based on computerized stochastical pattern recognition
'works', i.e. allows for reliable predictions of future behaviours, the theoretical or causal explanation of
these patterns does not matter anymore (Anderson 2008). However, the idea that 'blind' algorithms provide
reliable information does not imply that the information is neutral. In the process of collecting and
aggregating data into a database (the first three steps of the process of profile construction), translations are
made from real-life events to machine-readable data. These data are then prepared and cleansed to allow for
initial computability. Potential bias will have to be located at these points, as well as in the choice of
algorithms that are developed. It is not possible to mine a database for all possible linear and non-linear
correlations, meaning that the mathematical techniques developed to search for patterns will be determinate
of the patterns that can be found. In the case of machine profiling, potential bias is not informed by
common sense prejudice or what psychologists call stereotyping, but by the computer techniques employed
in the initial steps of the process. These techniques are mostly invisible for those to whom profiles are
applied (because their data match the relevant group profiles).

Individual and group profiles

Profiles must also be classified according to the kind of subject they refer to. This subject can either be an
individual or a group of people. When a profile is constructed with the data of a single person, this is called
individual profiling (Jaquet-Chiffelle 2008). This kind of profiling is used to discover the particular
characteristics of a certain individual, to enable unique identification or the provision of personalized
services. However, personalized servicing is most often also based on group profiling, which allows
categorisation of a person as a certain type of person, based on the fact that her profile matches with a
profile that has been constructed on the basis of massive amounts of data about massive numbers of other
people. A group profile can refer to the result of data mining in data sets that refer to an existing community
that considers itself as such, like a religious group, a tennis club, a university, a political party etc. In that
case it can describe previously unknown patterns of behaviour or other characteristics of such a group
(community). A group profile can also refer to a category of people that do not form a community, but are
found to share previously unknown patterns of behaviour or other characteristics (Custers 2004). In that
case the group profile describes specific behaviours or other characteristics of a category of people, like for
instance women with blue eyes and red hair, or adults with relatively short arms and legs. These categories
may be found to correlate with health risks, earning capacity, mortality rates, credit risks, etc.

If an individual profile is applied to the individual that it was mined from, then that is direct individual
profiling. If a group profile is applied to an individual whose data match the profile, then that is indirect
individual profiling, because the profile was generated using data of other people. Similarly, if a group
profile is applied to the group that it was mined from, then that is direct group profiling (Jaquet-Chiffelle
2008). However, in as far as the application of a group profile to a group implies the application of the
group profile to individual members of the group, it makes sense to speak of indirect group profiling,
especially if the group profile is non-distributive.

Distributive and non-distributive profiling

Group profiles can also be divided in terms of their distributive character (Vedder 1999). A group profile is
distributive when its properties apply equally to all the members of its group: all bachelors are unmarried, or
all persons with a specific gene have 80% chance to contract a specific disease. A profile is non-distributive
when the profile does not necessarily apply to all the members of the group: the group of persons with a
specific postal code have an average earning capacity of XX, or the category of persons with blue eyes has
an average chance of 37% to contract a specific disease. Note that in this case the chance of an individual to
have a particular earning capacity or to contract the specific disease will depend on other factors, e.g. sex,
age, background of parents, previous health, education. It should be obvious that, apart from tautological
profiles like that of bachelors, most group profiles generated by means of computer techniques are non-
distributive. This has far-reaching implications for the accuracy of indirect individual profiling based on
data matching with non-distributive group profiles. Quite apart from the fact that the application of accurate
profiles may be unfair or cause undue stigmatisation, most group profiles will not be accurate.

Applications
In the financial sector, institutions use profiling technologies for fraud prevention and credit scoring. Banks
want to minimize the risks in giving credit to their customers. On the basis of the extensive group, profiling
customers are assigned a certain scoring value that indicates their creditworthiness. Financial institutions
like banks and insurance companies also use group profiling to detect fraud or money-laundering.
Databases with transactions are searched with algorithms to find behaviors that deviate from the standard,
indicating potentially suspicious transactions.[1]

In the context of employment, profiles can be of use for tracking employees by monitoring their online
behavior, for the detection of fraud by them, and for the deployment of human resources by pooling and
ranking their skills. (Leopold & Meints 2008)[2]

Profiling can also be used to support people at work, and also for learning, by intervening in the design of
adaptive hypermedia systems personalizing the interaction. For instance, this can be useful for supporting
the management of attention (Nabeth 2008).
In forensic science, the possibility exists of linking different databases of cases and suspects and mining
these for common patterns. This could be used for solving existing cases or for the purpose of establishing
risk profiles of potential suspects (Geradts & Sommer 2008) (Harcourt 2006).

Consumer profiling

Consumer profiling is a form of customer analytics, where customer data is used to make decisions on
product promotion, the pricing of products, as well as personalized advertising.[3] When the aim is to find
the most profitable customer segment, consumer analytics draws on demographic data, data on consumer
behavior, data on the products purchased, payment method, and surveys to establish consumer profiles. To
establish predictive models on the basis of existing databases, the Knowledge Discovery in Databases
(KDD) statistical method is used. KDD groups similar customer data to predict future consumer behavior.
Other methods of predicting consumer behaviour are correlation and pattern recognition. Consumer profiles
describe customers based on a set of attributes[4] and typically consumers are grouped according to income,
living standard, age and location. Consumer profiles may also include behavioural attributes that assess a
customer's motivation in the buyer decision process. Well known examples of consumer profiles are
Experian's Mosaic geodemographic classification of households, CACI's Acorn, and Acxiom's
Personicx.[5]

Ambient intelligence

In a built environment with ambient intelligence everyday objects have built-in sensors and embedded
systems that allow objects to recognise and respond to the presence and needs of individuals. Ambient
intelligence relies on automated profiling and human–computer interaction designs.[6] Sensors monitor an
individual's action and behaviours, therefore generating, collecting, analysing, processing and storing
personal data. Early examples of consumer electronics with ambient intelligence include mobile apps,
augmented reality and location-based service.[7]

Risks and issues


Profiling technologies have raised a host of ethical, legal and other issues including privacy, equality, due
process, security and liability. Numerous authors have warned against the affordances of a new
technological infrastructure that could emerge on the basis of semi-autonomic profiling technologies (Lessig
2006) (Solove 2004) (Schwartz 2000).

Privacy is one of the principal issues raised. Profiling technologies make possible a far-reaching monitoring
of an individual's behaviour and preferences. Profiles may reveal personal or private information about
individuals that they might not even be aware of themselves (Hildebrandt & Gutwirth 2008).

Profiling technologies are by their very nature discriminatory tools. They allow unparalleled kinds of social
sorting and segmentation which could have unfair effects. The people that are profiled may have to pay
higher prices,[8] they could miss out on important offers or opportunities, and they may run increased risks
because catering to their needs is less profitable (Lyon 2003). In most cases they will not be aware of this,
since profiling practices are mostly invisible and the profiles themselves are often protected by intellectual
property or trade secret. This poses a threat to the equality of and solidarity of citizens. On a larger scale, it
might cause the segmentation of society.[9]

One of the problems underlying potential violations of privacy and non-discrimination is that the process of
profiling is more often than not invisible for those that are being profiled. This creates difficulties in that it
becomes hard, if not impossible, to contest the application of a particular group profile. This disturbs
principles of due process: if a person has no access to information on the basis of which they are withheld
benefits or attributed certain risks, they cannot contest the way they are being treated (Steinbock 2005).

Profiles can be used against people when they end up in the hands of people who are not entitled to access
or use the information. An important issue related to these breaches of security is identity theft.

When the application of profiles causes harm, the liability for this harm has to be determined who is to be
held accountable. Is the software programmer, the profiling service provider, or the profiled user to be held
accountable? This issue of liability is especially complex in the case the application and decisions on
profiles have also become automated like in Autonomic Computing or ambient intelligence decisions of
automated decisions based on profiling.

See also
Automated decision-making
Behavioral targeting
Data mining
Demographic profiling
Digital identity
Digital traces
Forensic profiling
Identification (information)
Identity
Labelling
Privacy
Profiling
Offender profiling
Social profiling
Stereotype
User modeling
User profile

References
Anderson, Chris (2008). "The End of Theory: The Data Deluge Makes the Scientific Method
Obsolete" (https://www.wired.com/science/discoveries/magazine/16-07/pb_theory). Wired
Magazine. 16 (7).
Custers, B.H.M. (2004). "The Power of Knowledge". Tilburg:Wolf Legal Publishers.
Elmer, G. (2004). "Profiling Machines. Mapping the Personal Information Economy". MIT
Press.
Fayyad, U.M.; Piatetsky-Shapiro, G.; Smyth, P. (1996). "From Data Mining to Knowledge
Discovery in Databases" (https://web.archive.org/web/20091216122634/http://www.daedalu
s.es/fileadmin/daedalus/doc/MineriaDeDatos/fayyad96.pdf) (PDF). AI Magazine. 17 (3): 37–
54. Archived from the original (http://www.daedalus.es/fileadmin/daedalus/doc/MineriaDeDat
os/fayyad96.pdf) (PDF) on 2009-12-16. Retrieved 2009-02-09.
Geradts, Zeno; Sommer, Peter (2008). "D6.7c: Forensic Profiling" (http://www.fidis.net/filead
min/fidis/deliverables/fidis-wp6-del6.7c.Forensic_Profiling.pdf) (PDF). FIDIS Deliverables. 6
(7c).
Harcourt, B. E. (2006). "Against Prediction. Profiling, Policing, and Punishing in an Actuarial
Age". The University of Chicago Press, Chicago and London.
Hildebrandt, Mireille; Gutwirth, Serge (2008). Profiling the European Citizen. Cross
Disciplinary Perspectives. Springer, Dordrecht. doi:10.1007/978-1-4020-6914-7 (https://doi.o
rg/10.1007%2F978-1-4020-6914-7). ISBN 978-1-4020-6913-0.
Jaquet-Chiffelle, David-Olivier (2008). "Reply: Direct and Indirect Profiling in the Light of
Virtual Persons. To: Defining Profiling: A New Type of Knowledge?". In Hildebrandt, Mireille;
Gutwirth, Serge (eds.). Profiling the European Citizen. Springer Netherlands. pp. 17–45.
doi:10.1007/978-1-4020-6914-7_2 (https://doi.org/10.1007%2F978-1-4020-6914-7_2).
Kephart, J. O.; Chess, D. M. (2003). "The Vision of Autonomic Computing" (http://archive.wiki
wix.com/cache/20140810081722/http://agl.cs.unm.edu/~forrest/classes/readings/ieee-auton
omic.pdf) (PDF). Computer. 36 (1 January): 96–104. CiteSeerX 10.1.1.70.613 (https://citesee
rx.ist.psu.edu/viewdoc/summary?doi=10.1.1.70.613). doi:10.1109/MC.2003.1160055 (https://
doi.org/10.1109%2FMC.2003.1160055). Archived from the original (http://agl.cs.unm.edu/~fo
rrest/classes/readings/ieee-autonomic.pdf) (PDF) on 2014-08-10.
Leopold, N.; Meints, M. (2008). "Profiling in Employment Situations (Fraud)". In Hildebrandt,
Mireille; Gutwirth, Serge (eds.). Profiling the European Citizen. Springer Netherlands.
pp. 217–237. doi:10.1007/978-1-4020-6914-7_12 (https://doi.org/10.1007%2F978-1-4020-6
914-7_12). ISBN 978-1-4020-6913-0.
Lessig, L. (2006). "Code 2.0". Basic Books, New York.
Lyon, D. (2003). "Surveillance as Social Sorting: Privacy, Risk, and Digital Discrimination".
Routledge.
Nabeth, Thierry (2008). "User Profiling for Attention Support for School and Work". In
Hildebrandt, Mireille; Gutwirth, Serge (eds.). Profiling the European Citizen. Springer
Netherlands. pp. 185–200. doi:10.1007/978-1-4020-6914-7_10 (https://doi.org/10.1007%2F
978-1-4020-6914-7_10). ISBN 978-1-4020-6913-0.
Schwartz, P. (2000). "Beyond Lessig's Code for the Internet Privacy: Cyberspace Filters,
Privacy-Control and Fair Information Practices". Wisconsin Law Review. 743: 743–788.
Solove, D.J. (2004). The Digital Person. Technology and Privacy in the Information Age.
New York, New York University Press.
Steinbock, D. (2005). "Data Matching, Data Mining, and Due Process". Georgia Law
Review. 40 (1): 1–84.
Vedder, A. (1999). "KDD: The Challenge to Individualism". Ethics and Information
Technology. 1 (4): 275–281. doi:10.1023/A:1010016102284 (https://doi.org/10.1023%2FA%
3A1010016102284). S2CID 10377988 (https://api.semanticscholar.org/CorpusID:1037798
8).
Weiser, M. (1991). "The Computer for the Twenty-First Century". Scientific American. 265 (3):
94–104. doi:10.1038/scientificamerican0991-94 (https://doi.org/10.1038%2Fscientificameric
an0991-94).
Zarsky, T. (2002). " "Mine Your Own Business!": Making the Case for the Implications of the
Data Mining or Personal Information in the Forum of Public Opinion". Yale Journal of Law
and Technology. 5 (4): 17–47.

Notes and other references

1. Canhoto, A.I. (2007). "Profiling behaviour: the social construction of categories in the
detection of financial crime, dissertation at London School of Economics" (http://www.lse.ac.
uk/collections/informationSystems/pdf/theses/canhoto.pdf) (PDF). lse.ac.uk.
2. Electronic Privacy Information Center. "EPIC - Workplace Privacy" (https://epic.org/privacy/w
orkplace/). epic.org. {{cite web}}: |author= has generic name (help)
3. Reyes, Matthew (2020). Consumer Behavior and Marketing. IntechOpen. p. 10.
ISBN 9781789238556.
4. Reyes, Matthew (2020). Consumer Behavior and Marketing. IntechOpen. p. 11.
ISBN 9781789238556.
5. Reyes, Matthew (2020). Consumer Behavior and Marketing. IntechOpen. p. 12.
ISBN 9781789238556.
6. De Hert, Paul; Leenes, Ronald; Gutwirth, Serge; Poullet, Yves (2011). Computers, Privacy
and Data Protection: an Element of Choice. Springer Netherlands. p. 80.
ISBN 9789400706415.
7. De Hert, Paul; Leenes, Ronald; Gutwirth, Serge; Poullet, Yves (2011). Computers, Privacy
and Data Protection: an Element of Choice. Springer Netherlands. p. 80.
ISBN 9789400706415.
8. Odlyzko, A. (2003). "Privacy, economics, and price discrimination on the Internet, A. M.
Odlyzko. ICEC2003: Fifth International Conference on Electronic Commerce, N. Sadeh, ed.,
ACM, pp. 355–366" (http://www.dtc.umn.edu/~odlyzko/doc/privacy.economics.pdf) (PDF).
9. Gandy, O. (2002). "Data Mining and Surveillance in the post 9/11 environment, Presentation
at IAMCR, Barcelona" (http://www.asc.upenn.edu/usr/ogandy/IAMCRdatamining.pdf) (PDF).
asc.upenn.edu.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Profiling_(information_science)&oldid=1151761178"

You might also like