You are on page 1of 15

Customer profiling in e-commerce:

Methodological aspects and


challenges
Received: 3rd December, 2001

Klaus-Peter Wiedmann
Professor Dr Wiedmann holds the Chair of General Management and Marketing II at the University of Hannover, Germany.

Holger Buxel
was a research associate for the Department of Marketing II at the University of Hannover, Germany from 1999 to 2001. In
2001, he became a consultant for Droege & Comp. AG, Dusseldorf, Germany.

Gianfranco Walsh*
is Senior Lecturer at the University of Hannover’s Department of Marketing II, Germany. He also consults on a variety of
marketing challenges, especially relating to empirical marketing research and cross-cultural consumer behaviour.

Abstract Online profiling is the collection of information about Internet surfing


behaviour across many different websites for the purpose of formulating a profile of
users’ habits and interests. While marketers have long profiled large consumer groups
on the basis of demographics, online profiling allows companies to collect information
from individuals across a wide range of traits. A company’s ability to build and
strengthen long-term customer relationships via individualised e-commerce offers will
depend on its ability to use customer data to plan, develop and control interactions
with its customers. Despite the importance of customer profiles in e-commerce, the
methods for producing them have rarely been investigated in marketing research. This
paper aims to close this gap by providing an overview of key areas and issues of
customer profiling in e-commerce. More recently, this controversial practice has seized
the public’s attention because ethics and privacy are concerned.

INTRODUCTION learning from relationships with


Since the early 1990s, and especially in customers. Over time, companies can
connection with the one-to-one- obtain comprehensive pictures of
marketing concept, discussion has been customers that can be used to cater
increasing on how to manage better for customer needs. As companies
relationships between companies and are eager to increase the duration of a
Dr Gianfranco Walsh* individual customers. Long-term, customer relationship, actions to increase
Department of Marketing II, revenue-maximising relationships with the completeness and sharpness of such
University of Hanover,
Koenigsworther Platz 1, attractive customers are especially pictures, and to apply such pictures to
30167 Hannover, Germany. relevant as the focus of marketing individualised transactions, automatically
Tel: ⫹49 511 762 4540 interest shifts from companies’ share of lead to development of customised offers
Fax: ⫹49 511 762 3142
e-mail:
market to their share of customer. So that have a positive effect on customer
walsh@m2.uni-hannover.de called ‘one-to-one’ concepts focus on loyalty.

170 Journal of Database Marketing Vol. 9, 2, 170–184 䉷 Henry Stewart Publications 1350-2328 (2002)
Customer profiling in e-commerce: Methodological aspects and challenges

In e-commerce, the quality of Secondly, it discusses central problems in


one-to-one marketing grows with further using profiling techniques in e-commerce,
developments in information technology, from the perspective of affected and
which help to improve quality of interested groups. Thirdly, management
customer data and to build lasting implications are offered and discussed.
relationships with its customers. The
Internet allows marketers to obtain
detailed customer-related information and METHODS OF CUSTOMER
enables sellers to individualise their PROFILING
marketing instruments on a large scale When users move through the Internet,
and in real time. This process is referred their every step through the virtual
to as personalisation, which involves pathways can be followed and can be
offering individualised products. The recorded in detail and combined with
ability to do so depends very much on timing data. Recording what information
whether the company has data with and services a user seeks or uses, the
which it can selectively plan, shape and order in which they are used and the
monitor its individual interactions with amount of time spent on reviewing
its customers. To make use of this information present no particular
possibility, a company needs to collect, technical problems. When such data are
compile and continually update extensive combined with information such as the
information about its (potential) user’s identity and demographic profile, a
customers and their needs and comprehensive picture of the user can
preferences. result, a picture that can closely
Customers leave a wide range of approximate the fanciful image of the
tracks as they move through the Internet. ‘glass customer’. The possibilities for
Such tracks, if systematically recorded creating and using detailed profiles are
and collected, can be used for centrally enhanced by the following
formulating a ‘customer profile’ or factors:
representation of users’ habits and
interests that can support one-to-one — from a technical standpoint, the
marketing concepts in e-commerce and Internet makes it easy to determine
become useful with respect to a broad what users visit what sites, — and for
range of different purposes. Companies how long. The main reason for this is
have been showing growing interest in that a great deal of relevant
creating customer profiles in information about consumer
e-commerce. It is estimated that more behaviour is generated and recorded
than 100 million Internet users have automatically as a result of Internet
been profiled by DoubleClick (a large data exchange between providers and
network advertiser) alone. As consumers consumers
and other interest groups have become — the Internet’s importance as a space
increasingly aware that companies may for information exchange and for
be watching and recording their online transactions, as a space covering more
behaviour, however, they have begun and more areas of life (professions,
criticising profiling activities on ethical, hobbies, recreation, etc.), has been
moral and privacy-related grounds. growing rapidly. Depending on how
This paper constitutes a threefold intensively consumers use the Internet
contribution. First, it describes the basic in various areas of their lives, such
methods used to create customer profiles. data can provide a relatively

䉷 Henry Stewart Publications 1350-2328 (2002) Vol. 9, 2, 170–184 Journal of Database Marketing 171
Wiedmann, Buxel and Walsh

comprehensive picture about supports profiling. In the data warehouse,


consumer behaviour. data are stored in keeping with the
— because the data are generated relevant profiling objectives — and
automatically, data collection and modified over time, since detailed
processing can be largely automatic as customer profiles normally become
well. As a result, consumer profiles available only through long-term
can be produced at virtually no extra observation of customer behaviour.
cost and effort.1 This is accomplished
with the help of suitable profiling
tools that automatically collect and Data collection procedures
analyse data and make them available Internet-based data collection can be
for marketing purposes. By reactive and non-reactive. The term
conducting profiling and updating reactive implies that a customer is aware
profiles continuously, companies can that his/her behaviour is being recorded
obtain profiles of very high quality in and, possibly, used, which may cause the
a very short time. Since data are customer to react. Gathering of
already in digital form, they can be consumer-behaviour data in the Internet
used directly and quickly in via non-reactive processes is, thus
companies’ analytical processes, which ‘quasi-biotic’ — it can proceed relatively
makes it possible to shape interaction free of disruptive factors, while a
individually and responsively, as a customer exposed to reactive collection
function of observed behaviours. procedures will be aware that his/her
behaviour (ie data about him/her) is
The process of using customer profiles being recorded.
begins with the production of a suitable Non-reactive data collection
database from which conclusions about procedures are especially suitable for
customers’ needs and interests can be collecting data about observable ‘surf’
drawn. In e-commerce, such data can be behaviour in the Internet, while reactive
obtained by various procedures that differ data collection methods are best used in
in the types of data they can generate, in situations in which sociodemographic and
their methods of function and in the psychographic characteristics need to be
privacy-related problems linked with recorded, as well as characteristics that
their use. could not be derived from observing the
Much of the customer-related data usage behaviour of customers.
available in e-commerce are in a form Figure 1 provides an overview of the
that is not conducive to efficient analysis most important data collection
and interpretation. This is due to procedures used in customer profiling.
technical factors and those relating
directly to the data-collection procedures
themselves. On the other hand, various Non-reactive data collection
customer behaviour factors can often Non-reactive procedures focus on
render collected data unsuitable for direct recording data resulting from customers’
production of customer profiles. As a usage behaviour in websites. Common
rule, therefore, collected data have to be procedures and technologies include
processed before they can be used. collection via log files, via Common
Once they have been suitably Gateway Interface-based (CGI-based)files
processed, customer data have to be and via applications designed especially
transferred to a data warehouse, which for Internet data collection.

172 Journal of Database Marketing Vol. 9, 2, 170–184 䉷 Henry Stewart Publications 1350-2328 (2002)
Customer profiling in e-commerce: Methodological aspects and challenges

Log files CGI-based Special Form Selection


files applications spaces menus

Figure 1: The most important data collection procedures used in customer profiling

Data collection via log files information: IP address (Internet address


Data transport in the World Wide Web is of the computer from which the user
based on the client server principle. retrieves data from the server); User ID
Software programs that access services are (data that identify a user when website
referred to as ‘clients’, while programs access is subject to user authentication);
that provide services are referred to as Time (shows the time at which the
‘servers’. Log files automatically record client retrieves data from the server);
information about exchanges of data Request (shows what data were
between servers and clients and use retrieved from the server); Status
codes to describe specific types of (shows whether data exchange between
website access. Since they record every the client and server was successful);
requested instance of data transmission Bytes (shows the number of bytes sent
and reception, they are relatively in response to the request); Referrer
comprehensive data sources that can (shows the previous URL (Uniform
provide valuable information for Resource Locator)) and Agent (text
profiling. Log files come in various string that can contain information
formats, although the formats differ only about the client’s operating system and
insignificantly in terms of the information browser software).
they include. Currently, the extended In profiling, such data can be used
common log format (ECLF) is the most to reach conclusions about the Internet
common. usage behaviour and, thus,
The basic spaces of an ECLF file characteristics of individual customers or
(see Table 1) provide the following customer groups (Table 2).

䉷 Henry Stewart Publications 1350-2328 (2002) Vol. 9, 2, 170–184 Journal of Database Marketing 173
174
Wiedmann, Buxel and Walsh

Journal of Database Marketing


Table 1: Example of an ECLF log file

IP Address User id Time Request (Method/URL/Protocol) Status Size Referrer Agent

123.456.78.8 – – [09/May/2001:03:04:41 – 0500] ‘‘Get Buxel.html HTTP/1.0’’ 200 3290 — Mozilla/3.04 (Win95,I)

Vol. 9, 2, 170–184
123.456.78.8 – – [09/May/2001:03:04:51 – 0500] ‘‘Get Wiedmann.html HTTP/1.0’’ 200 5450 Buxel.html Mozilla/3.04 (Win95,I)
123.456.78.8 – – [09/May/2001:03:05:32 – 0500] ‘‘POST/cgi-bin/p1HTTP/1.0’’ 200 5096 Wiedmann.html Mozilla/3.04 (Win95,I)
123.456.78.8 – – [09/May/2001:03:05:41 – 0500] ‘‘Get Buxel.html HTTP/1.0’’ 200 3290 — Mozilla (IE4.2,WinNT)
123.456.78.8 – – [09/May/2001:03:05:59 – 0500] ‘‘Get Wiedmann.html HTTP/1.0’’ 200 5450 Buxel.html Mozilla (IE4.2,WinNT)
123.456.78.8 – – [09/May/2001:03:06:30 – 0500] ‘‘Get Frenzel.html HTTP/1.0’’ 200 1000 Wiedmann.html Mozilla (IE4.2,WinNT)
123.456.78.8 – – [09/May/2001:03:07:11 – 0500] ‘‘Get Buckler.html HTTP/1.0’’ 200 2020 F.html Mozilla/3.04 (Win95,I)
123.456.78.8 – – [09/May/2001:03:07:45 – 0500] ‘‘Get Halstrup.html HTTP/1.0’’ 200 3030 Frenzel.html Mozilla (IE4.2,WinNT)
123.456.78.8 – – [09/May/2001:03:12:23 – 0500] ‘‘Get Meissner.html HTTP/1.0’’ 200 4040 Wiedmann.html Mozilla/3.04 (Win95,I)
123.456.78.2 – – [09/May/2001:05:05:11 – 0500] ‘‘Get Buxel.html HTTP/1.0’’ 200 3290 — Mozilla/3.04 (Win95,I)
123.456.78.3 – – [09/May/2001:05:06:03 – 0500] ‘‘Get Walsh.html HTTP/1.0’’ 200 4040 Buxel.html Mozilla/3.04 (Win95,I)
123.456.78.5 – – [09/May/2001:05:06:05 – 0500] ‘‘Get robots.txt’’ 200 1020 — Mozilla/3.04 (Win95,I)
233.999.79.4 – – [09/May/2001:05:06:07 – 0500] ‘‘Get Buxel.html HTTP/1.0’’ 200 3290 — Ultraseek

䉷 Henry Stewart Publications 1350-2328 (2002)


Customer profiling in e-commerce: Methodological aspects and challenges

Table 2: Information contained in server logs

Information Explanation

User identification Analysis of IP addresses and user IDs can provide information about who the users
of a particular service are.
Interests URLs provide information about users’ interest in particular services. Analysis of
request sequences can also show important relationships between individual
services.
Usage periods Analysis of time spaces can reveal the times of day when users access services
and the amounts of time they spend with services when they access them. The
duration of users’ visits to websites can serve as an indicator for the interest being
shown in the relevant offers. In addition, time-space analysis can also show
distribution, over time, of access to the relevant website (when and how often).
Related interests The referrer provides information about the user’s visit before they access the
relevant site. Analysis of such data can support conclusions about significant
relationships to other websites, and about interests of relevant consumers.

Data collection via CGI-based files almost any sort of data might be
Instead of retrieving static HTML available to reactive data collection, the
documents, a client computer can run a following are of particular interest with
program known as a CGI script on the regard to the creation of customer
web server. This is protocol that can be profiles.
used to communicate between web Identification data include such data as
forms and a program. Like log files, user name, address, telephone and fax
CGI-based files keep records of events number, e-mail address and, possibly, the
that are caused by user actions. URL of the user’s own homepage. With
CGI-based files can provide information such data, a provider is in a position to
about users’ access patterns, consisting of send the user specially tailored offers or
the services that users request and the even to provide special support for
time users spend with each service, real-life (non-virtual) offers.
hence supplementing log files. Descriptive Data can be used to
determine customers’ basic goods and
services preferences and, thus, to generate
Data collection via special relevant, applicable conclusions about
applications customers (Table 3). Apart from data that
Other procedures can also be used for link user behaviour data with specific
non-reactive data collection, such as persons, there are many types of (user-)
cookies (software agents that providers descriptive data that are very difficult to
can use for data collection, special obtain through non-reactive data
browsers with source code modified for collection. Relevant data of particular
enhanced data collection), packet-sniffing interest in connection with profiling
technologies and web bugs, invisible includes information about the customer’s
images that produce usage information. current purchasing behaviour and
purchasing history, sociographic and
psychographic data. Information about
Reactive data collection purchasing behaviour, for example,
Reactive data collection is concerned indicates which products a customer
with consumer characteristics that cannot prefers at which prices, and which
be revealed with the behaviour tracking methods of payment. Typical
technologies outlined above. While sociographic data are, for instance, date

䉷 Henry Stewart Publications 1350-2328 (2002) Vol. 9, 2, 170–184 Journal of Database Marketing 175
Wiedmann, Buxel and Walsh

Table 3: Collection characteristics of reactive procedures

Procedures Defining characteristic

Closed form spaces Space for manual entries; no latitude on the topic level
Open form spaces Space for manual entries; latitude on the topic level
Selection menus Presentation of defined, fixed answer categories

of birth, sex, education, occupation, and With non-reactively obtained data,


income. Psychographic data include data processing challenges normally arise
information about customers’ interests, through the relevant data exchange
hobbies, recreational activities, opinions between client and server. The reason
and lifestyles. for this is that the means by which
Communications data provide such non-reactively collected data is recorded
information as when, and in what form, and represented require that the data
a customer contacts a provider. They undergo a number of transformations
thus can be used in the generation of before it can be used in profiling. Data
company-related data. Communications processing then becomes a complex
data can be collected reactively whenever process that influences the formation,
a customer enters directly into dialogue quality and, ultimately, usefulness of
with a provider, ie not only when a resulting customer profiles.
customer simply surfs through a website. Non-reactively obtained data consist of
Relevant data in this context include records of exchange (requesting and
information about the subject of sending) of page files between clients and
communications (product, negotiation, servers. A user’s actions in a website (eg
order, etc.) or the occasion for mouse clicks) normally lead to a change
communications (eg reaction to certain in the web content displayed and thus to
matters, spontaneous query, issue of an the generation of a new ‘picture’ in a
order), etc. user’s browser windows. In the ECLF (eg
On a concept level, the means by data collected with other procedures),
which identification data, descriptive data such actions appear as records of all page
and communications data can be files exchanged between client and server
obtained in e-commerce can be broken in processing the client’s request. In
down into closed and open form spaces practice, a user’s single click on a website
and selection menus (see Table 3). link generates a new browser image
consisting of several new graphic
elements and HTML files, each of which
Data processing as a preliminary leaves a separate entry in the log.
phase in profile formation The provider’s problem in interpreting
After the data are collected, they must such records is that their data do not
be processed to permit creation of readily support conclusions regarding the
profiles. With reactively collected data, usage behaviour, especially in the light of
the primary data processing problems specific questions of interest to the
have to do with useful conversion of provider. In an analysis of usage
data from open form spaces, filtering out behaviour, interest must focus not on the
of non-relevant data and wrong entries various individual transmitted files, but
and treatment of cases in which data on aggregates, since a user normally does
missing. not request ‘x’ page files with texts and

176 Journal of Database Marketing Vol. 9, 2, 170–184 䉷 Henry Stewart Publications 1350-2328 (2002)
Customer profiling in e-commerce: Methodological aspects and challenges

‘z’ graphics for their browser; instead the consists of filtering out irrelevant
user requests a website with specific elements to identification of usage
information. The key data for behaviour and to later analysis (’data
interpretation of online behaviour cleaning’). In a second step, the data
include: user, server sessions, page views records must be assigned to individual
and episodes: clients or users and then divided into
server sessions. In a third step, the
— a user is a person who uses a browser identified server sessions are subjected to
to retrieve page files from one or still more analysis, leading to
more servers identification of individual page views.
— a page view (also referred to as page Finally, individual episodes are identified.
impression) is a set of page files Processing of information that
needed to generate a web browser describes user behaviour must be
display at a given time. A page view supported by processing of such
consists of all those elements that the structural and content data that describe
user is presented in their browser the company’s website.
window in response to a request via a
mouse click
— over longer periods of time, it Storage in a data warehouse
becomes very likely that some users Processing of non-reactively collected
will visit a given website more than data produces information about a user’s
once. For this reason, all data requests ‘hits’ on the company’s website (possibly
that are recorded in a log and that also about hits on other websites) and
can be assigned to a specific user have the amount of time spent with each item
to be divided into individual sessions, (good/service), which can be used for
if conclusions are to be drawn about creation of usage profiles. These data, in
the way the user’s behaviour is turn, can be of two basic types: personal
distributed over time. A ‘server (ie data that can be correlated with
session’ (also referred to as visits) individual, identified customers) and
refers to a process of continuous usage non-personal (that describe the usage
of a website, by a client, expressed in behaviour of anonymous customers).
terms of related page views. Usage Processing of reactively collected data
processes thus include every instance yields information about user
of technically successful and external characteristics that cannot be directly
access, by an Internet browser, to the obtained from usage behaviour. Five
website different types of marketing-relevant
— in addition to page views, ‘episodes’ customer profiles can be produced by
are relevant. Episodes are semantically combining reactively and non-reactively
meaningful sub-areas of server collected data (Table 4).
sessions. Examples of episodes include Once data, either reactively or
website pages devoted to particular non-reactively obtained, have been
categories of goods, eg sporting goods processed for creation of customer
or food, as well as pages that present profiles, the data are stored in a suitable
company-oriented data. structure, by either of two basic
procedures: a) establishment of a
Processing of non-reactively obtained standalone database system that comprises
data can be described as a four-step only data from Internet-based
process (see Figure 2).2 The first step transactions; or b) integration of the data

䉷 Henry Stewart Publications 1350-2328 (2002) Vol. 9, 2, 170–184 Journal of Database Marketing 177
Wiedmann, Buxel and Walsh

Usage data

Data from the Website

Figure 2: Processing of non-reactively obtained data

within a comprehensive data warehouse model that makes it possible to observe,


environment that is linked with other analyse and optimise the company’s
department-specific databases, via an market-oriented activities across all areas,
integrated approach aimed at creating a in a general manner.
general management support system. Apart from the general problems of
A standalone database system can be setting up data warehouse concepts, the
useful when information relative to major challenge in integrating profile
customer profiles does not have to be data from Internet-based activities
combined, for further purposes, with data involves representing masses of
from other areas of the company. For non-reactively collected information in a
example, this could be the case if suitable form. Log files of very popular
non-reactively collected Type 4 profile websites can grow by several hundred
data are to be used solely in megabytes daily.3 Complete, detailed
personalisation tools. The primary presentations of all relevant
advantages of a standalone database non-reactively obtained usage data in a
system are that it has very few IT data warehouse can reach orders of
interfaces, to other areas. The resulting magnitude that considerably constrain
simplicity can positively affect the storage and/or subsequent usage within
system’s installation and management the framework of analysis. In particular,
costs. the extremely calculation-intensive
Data available to the company from analysis used in ‘data mining’ can create
different areas of the company can have problems in terms of computer capacity
much greater strategic usefulness if and calculation time (it can take too long
combined within a decision-oriented data to produce a result). Thus, it may be

178 Journal of Database Marketing Vol. 9, 2, 170–184 䉷 Henry Stewart Publications 1350-2328 (2002)
Customer profiling in e-commerce: Methodological aspects and challenges

Table 4: Typology of customer profiles

Reactively collected data

Relevant Not relevant

Non-reactively collected data Personal Type 1 Type 4


Non-personal Type 2 Type 5
Not relevant Type 3 —

Table 5: Presentation of profile data in a data warehouse, using a weighting system

Reactively
obtained data Non-reactively obtained data

ID Name ... Product group A Product group B ... Episode n

1 Aaker ... W1A W1B ... W1n


2 Abend ... W2A W2B ... W2n
... ... ... ... ... Wij ...
m Zacher ... WmA WmB ... Wmn

useful to pare data before they are a certain product group). In evaluation of
transferred to a data warehouse or used non-reactively obtained data, a weight wij
for analysis. For instance, it can be useful can then be defined, for customer i and
to combine data to form aggregate dimension j, that describes the strength
packages. This approach can make the of the interest shown in the episode in
information in all relevant data available question (Table 5).
for profiling and yet largely eliminate the Weight wij, corresponding to customer
need for fine-grained storage of all i and episode j, can then be evaluated
non-reactively obtained data. using information about all relevant page
As described above, appropriate views for a given episode, as a function
combination of page views into episodes of three central figures4:
can make it possible for customer
requested page views to be automatically — the view time dij that a customer
classified and assigned to thematic areas. spends on page views for a relevant
If episodes are formed in such a manner episode
that they can be used to generate — the relative frequency hij with which
relevant conclusions about customer the page views for a given episode
characteristics (eg interest shown in are retrieved
certain products or topics), then — the chronological distribution or
non-reactively obtained data can be proximity of accesses (hits). While the
represented in a data warehouse in the relative frequency hij serves as an
form of weightings. In a data warehouse indicator for the general interest
a set of non-reactively obtained usage shown in a given product, the
data for a single customer can be importance of an area of interest can
represented as a vector with each of the vary over time. Management of a
vector’s dimensions corresponding to an product based on relative frequencies
area of interest (eg an episode assigned to can produce poor results if hits on the

䉷 Henry Stewart Publications 1350-2328 (2002) Vol. 9, 2, 170–184 Journal of Database Marketing 179
Wiedmann, Buxel and Walsh

most frequently visited episode lie to which the above-described data


some time in the past. For example, a collection procedures and, ultimately, the
customer who requests current data profiles derived through them, can be
about a new topic might continue to used depends on a number of factors.
receive information or offers relative Both data collection, for creation of
to the episode the customer visited customer profiles, and the use of such
most frequently, even if that episode profiles in e-commerce are sensitive
does not reflect the customer’s current matters that an increasing number of
interests. For this reason, current customers and institutions are viewing
accesses (hits) have to be given sceptically. Concerns about development
greater weight in evaluation of weight and use of customer profiling in
wij than accesses that lie further in the electronic commerce tend to be related
past. Consequently, a factor vij, to the following areas:
representing the chronological
distribution of accesses, has to be — both customers and the public can
included. consider data collected for customer
profiling purposes as private (or at
Logically enough, compression of least confidential) information. In this
non-reactively obtained data, to form view, collection of such data is
weights, destroys some data. It should be considered inherently unethical or
noted that this could pose a problem in immoral and is seen as an ‘invasion of
some applications. In general, it is privacy’. Privacy issues seem to be of
possible to combine non-reactively particular interest because privacy is a
obtained data with reactively obtained fundamental human right
data in a weight. — even when the data themselves are
not seen as private information,
consumers and the public may still
Data analysis and use have reservations about the methods
Data stored in a data warehouse used to collect the data. In such cases,
environment can be processed and used criticism focuses on the inherent
in many ways. In the simplest case, methods and concepts of non-reactive
rule-based procedures can be applied to data collection. Privacy issues seem to
submit offers to customers with certain be of particular interest because
observed, fitting characteristics (rule-based privacy is a fundamental human right
matching). For example, banners — also in the information
advertising private health insurance might environment. It is a basis of human
be directed only at customers of a certain freedom (of action) and human
minimum age (over 18), with a certain dignity. With regard to human
minimum income (over $4,000/month) freedom, it is argued that, in many
and with certain fitting hobbies (knitting areas of life, a person’s anonymity is a
instead of motorcycling). basis for creation and protection of
personal freedom and personal
development. Ultimately, every
PROBLEM AREAS IN USING human being has spheres, in his/her
CUSTOMER-PROFILING private life, that contain actions,
TECHNIQUES IN eCOMMERCE statements and thoughts about which
In practice, and apart from any no other (or only few) people should
methodological complexities, the extent have any knowledge. Most people

180 Journal of Database Marketing Vol. 9, 2, 170–184 䉷 Henry Stewart Publications 1350-2328 (2002)
Customer profiling in e-commerce: Methodological aspects and challenges

would not want their immediate, website operators can collect data
relevant environments (eg transaction from customers, and use such data in
partners, family, workplace) to know customer profiling, without customers’
about their ‘dark areas’ (eg certain awareness. Another related complaint
behaviours and preferences); armed is that many non-reactive collection
with such knowledge, others would procedures, such as ‘web bugs’ or
intervene in their personal freedom, ‘packet-sniffing’ technologies, can
for instance, by exerting social and theoretically be used to obtain and
moral pressure or by criticising (often, record data about consumers without
unjustifiably) their individual the consumers’ consent. Interestingly,
behaviour. almost 60 per cent of all Internet
— even when data collection and users do not know how cookies can
relevant customer profiling methods be rejected.6
are seen as acceptable, consumers may — while profiling techniques make it
still be concerned about the many possible to enhance marketing
ways in which the data are used. efficiency through precise selection of
Advertisers usually use customer customers for marketing approaches,
profiles in order to predict preferences they also give rise to concern about
and future needs of customers, dangers of discrimination and selective
thereby placing themselves in a advertising and transaction design.7
position to influence existing needs Profiles, which are used to assess
and purchases efficiently. While customers and their future behaviour,
criticism focused on the manipulative can be used by companies as a way
nature of advertising is not new, such to avoid certain unattractive types of
criticism has grown into a new customers, to block such customers’
dimension as the possibilities for access to information about products
approaching individual customers via and services8 or to offer them poorer
the Internet have developed. More purchasing terms.9 For example, in
than ever before, advertisers are able online tests conducted in 2000,
to reinforce customers’ wants and to Amazon set DVD prices in a
spark more or less ‘uncontrolled’ customer-specific, selective way as a
purchases; purchases that customers function of the user’s browser type,
might not make under other service provider and frequency of
circumstances (or might make only at previous visits to the Amazon website.
different times and under different As a result, at a given time the price
terms) for a standardised DVD such as the
— privacy activists often stress that many film ‘Planet of the Apes’ differed by
Internet users are not aware that they $10 and more between customers.10
leave tracks that can be used for
creation of profiles. Others argue that The criticism aimed at customer profiling
customer profiling information, in methods and against the use of
principle, belongs to the affected non-reactive data collection procedures
customers and thus cannot become does not apply in equal measure to all
the property of other parties without data collection procedures. Table 6
the customers’ consent.5 All provides an overview of features of
non-reactive data collection non-reactive data-collection procedures
procedures provide behaviour in the light of criticism targeting the
observation instruments with which various methods.

䉷 Henry Stewart Publications 1350-2328 (2002) Vol. 9, 2, 170–184 Journal of Database Marketing 181
Wiedmann, Buxel and Walsh

Table 6: Traits of non-reactive data collection methods

Procedures Secrecy Involuntary participation Lack of protection

Log files Yes Yes Yes


CGI-based files Yes Yes Yes
Cookies Yes* No* No*
Software agents Yes* No* No*
Packet sniffers Yes Yes Yes
Web bugs Yes* Yes Yes
Modified browsers No* No No

*Depends on the user’s know-how

Development of consumer customers to overlook losses of privacy


empowerment tools and risks of manipulation, discrimination,
One way of responding to customers’ etc., and thus can turn customer profiling
concerns, as well as to criticism of into a ‘win–win’ relationship from which
profiling, is to promote development and providers and customers alike can profit.
dissemination of privacy enhancing While in theory all providers can
technologies11 and procedures that enable use incentives in e-commerce, in
customers to protect themselves as practice such incentives will be a
necessary against customer profiling and viable option only for companies
even to help shape the profiling process. whose core business is in management
Logically enough, this approach responds of customer profiles. At websites with
to criticism that customers lack relatively little user traffic, incentives
protection and participate involuntarily, will be unlikely to entice customers to
and it positively affects consumers’ participate in complicated log-in
perceptions of the general risks resulting procedures. Furthermore, incentives
from customer profiling. Such tools relate create management overhead for
to: providers, overhead that may not
amortise itself in collection and use of
— controlling exchange of data profile data. For ‘normal’ e-commerce
(primarily reactively collected data) providers, therefore, incentives present
(control-enhancing tools, CET) efficiency problems, and this is likely
— making collected data (primarily to limit their use.
non-reactively collected data) unusable
(protection tools, PT).
Involvement of certification agencies
Rating services can also help build trust
Incentives relationships between market players and
Yet, another way of responding to participants. Via such services, companies
customers’ concerns is to offer monetary can have their compliance with privacy
(eg cash payments, discounts for online policies certified by external
purchases) and/or non-monetary organisations, thereby making such
incentives (eg personalisation of web compliance available for marketing
areas, provision of exclusive information) communications purposes. An especially
that enhance customers’ willingness to important role in this context could be
participate voluntarily. Incentives can played by state organised certification via
have a positive effect on acceptance of data protection audits, such as those now
customer profiling efforts, can encourage being discussed in connection with a

182 Journal of Database Marketing Vol. 9, 2, 170–184 䉷 Henry Stewart Publications 1350-2328 (2002)
Customer profiling in e-commerce: Methodological aspects and challenges

draft programme standard for amendment Problems in using customer profiling


of the German Data Protection Act can arise primarily through ethical and
(Bundesdatenschutzgesetz), as well as by moral concerns and through criticism of
‘seal programs’ carried out by private non-reactive data collection procedures.
sector certification organisations. Such problems can limit acceptance of
Prominent representatives of such private profiling in e-commerce and can burden
organisations include TRUSTe companies’ relationships to their
(www.truste.com) and WebTrust customers and to the public.
(www.cpawebtrust.com). Consequently, companies need to
develop and implement profiling
methods that, while producing useful,
Development of industry codices detailed profiles, also respond to
Trust of customers and the public in customers’ needs and concerns in
companies’ privacy policies is centrally connection with data collection and use.
affected by the behaviour of ‘other’ A range of different measures is available
providers. For this reason, companies for enhancing affected groups’ acceptance
must seek to influence the behaviour of of profiling methods, measures that differ
other market participants, with the aim in terms of aims, function and relevant
of reducing the risks that other application problems. Regardless of
participants will act unethically. One way whether they use such measures,
of influencing competitors to act companies should remember their social
responsibly is to apply moral suasion by responsibility and carefully consider the
establishing suitable, binding industry general and specific ethical and moral
codices of behaviour in the market. boundaries for their own actions.

IMPLICATIONS AND CONCLUSION References


The main purpose of this paper is to 1 Peters, T. A. (1999) ‘Computerized monitoring
shed some light on the emerging area of and online privacy’, McFarland, Jefferson.
2 Buxel, H. (2001) ‘Customer profiling im
online marketing research and customer electronic commerce: Methodische Grundlagen,
profiling. It provides an overview of Anwendungsprobleme und
online customer profiling methods and Managementimplikationen’, Shaker Verlag,
Aachen.
discusses caveats of such methods. This 3 Zaiane, O. R., Xin, M. and Han, J. (1999)
study has implications for international ‘Discovering web access patterns and trends by
marketers targeting international Internet applying OLAP and data mining technology on
users. web logs’, online: www.cs.sfu.ca.com (last
revision: 23rd March, 2000).
The Internet provides companies with 4 Buxel op. cit.
new opportunities for exploring 5 Clarke, R. (1999), ‘Internet privacy concerns
customers’ needs and characteristics in confirm the case of intervention’, online:
www.anu.edu.au/people/Roger.Clarke/DV/AC-
order to obtain a basis for selective M99.html (last revision: 24th July, 2000), also in
planning, design and control of all Communications of the ACM, Vol. 2, pp. 60–67).
interactions with customers. Such a basis 6 Personalization Consortium (2000),
‘Personalization & Privacy Survey’, online: www.
can support efficient interaction with the personalization.org/survey.pdf (last revision 7th
market as well as effective stimulation April, 2000).
and satisfaction of customers’ needs. As 7 Stepanek, M. (2000) ‘Weblining: Companies are
to customer profiling methods, however, using your personal data to limit your choices —
and force you to pay more for products’, online:
no ‘best’ approach to generating and www.businessweek.com/2000/00_14/b3675027.h-
using profiles exists. tm (last revision 4th January, 2001).

䉷 Henry Stewart Publications 1350-2328 (2002) Vol. 9, 2, 170–184 Journal of Database Marketing 183
Wiedmann, Buxel and Walsh

8 Novek, E., Sinha, N. and Gandy, O. (1990) ‘The 10 Rosencrance, L. (2000) ‘Amazon charging
value of your name Culture & Society, Vol. 12, pp. different prices on some DVDs’, online:
525–543. www.computerworld.com/cwi/story/frame/0,121-
9 Weichert. T. (2000) ‘Zur Ökonomisierung des 3,NAV47_STO49569,00.html (last revision 1st
Rechts auf informationelle Selbstbestimmung October, 2000).
[Economising the law on individual 11 Wang, H., Lee, M. K. O. and Wang, C. (1998)
self-determination], Bäumler, H. (ed.) ‘E-Privacy: ‘Consumer privacy concerns about Internet
Datenschutz im Internet’, Vieweg, Braunschweig marketing’, Communications of the ACM, Vol. 3,
pp. 158–184. pp. 63–70.

184 Journal of Database Marketing Vol. 9, 2, 170–184 䉷 Henry Stewart Publications 1350-2328 (2002)

You might also like