Professional Documents
Culture Documents
Klaus-Peter Wiedmann
Professor Dr Wiedmann holds the Chair of General Management and Marketing II at the University of Hannover, Germany.
Holger Buxel
was a research associate for the Department of Marketing II at the University of Hannover, Germany from 1999 to 2001. In
2001, he became a consultant for Droege & Comp. AG, Dusseldorf, Germany.
Gianfranco Walsh*
is Senior Lecturer at the University of Hannover’s Department of Marketing II, Germany. He also consults on a variety of
marketing challenges, especially relating to empirical marketing research and cross-cultural consumer behaviour.
170 Journal of Database Marketing Vol. 9, 2, 170–184 䉷 Henry Stewart Publications 1350-2328 (2002)
Customer profiling in e-commerce: Methodological aspects and challenges
䉷 Henry Stewart Publications 1350-2328 (2002) Vol. 9, 2, 170–184 Journal of Database Marketing 171
Wiedmann, Buxel and Walsh
172 Journal of Database Marketing Vol. 9, 2, 170–184 䉷 Henry Stewart Publications 1350-2328 (2002)
Customer profiling in e-commerce: Methodological aspects and challenges
Figure 1: The most important data collection procedures used in customer profiling
䉷 Henry Stewart Publications 1350-2328 (2002) Vol. 9, 2, 170–184 Journal of Database Marketing 173
174
Wiedmann, Buxel and Walsh
123.456.78.8 – – [09/May/2001:03:04:41 – 0500] ‘‘Get Buxel.html HTTP/1.0’’ 200 3290 — Mozilla/3.04 (Win95,I)
Vol. 9, 2, 170–184
123.456.78.8 – – [09/May/2001:03:04:51 – 0500] ‘‘Get Wiedmann.html HTTP/1.0’’ 200 5450 Buxel.html Mozilla/3.04 (Win95,I)
123.456.78.8 – – [09/May/2001:03:05:32 – 0500] ‘‘POST/cgi-bin/p1HTTP/1.0’’ 200 5096 Wiedmann.html Mozilla/3.04 (Win95,I)
123.456.78.8 – – [09/May/2001:03:05:41 – 0500] ‘‘Get Buxel.html HTTP/1.0’’ 200 3290 — Mozilla (IE4.2,WinNT)
123.456.78.8 – – [09/May/2001:03:05:59 – 0500] ‘‘Get Wiedmann.html HTTP/1.0’’ 200 5450 Buxel.html Mozilla (IE4.2,WinNT)
123.456.78.8 – – [09/May/2001:03:06:30 – 0500] ‘‘Get Frenzel.html HTTP/1.0’’ 200 1000 Wiedmann.html Mozilla (IE4.2,WinNT)
123.456.78.8 – – [09/May/2001:03:07:11 – 0500] ‘‘Get Buckler.html HTTP/1.0’’ 200 2020 F.html Mozilla/3.04 (Win95,I)
123.456.78.8 – – [09/May/2001:03:07:45 – 0500] ‘‘Get Halstrup.html HTTP/1.0’’ 200 3030 Frenzel.html Mozilla (IE4.2,WinNT)
123.456.78.8 – – [09/May/2001:03:12:23 – 0500] ‘‘Get Meissner.html HTTP/1.0’’ 200 4040 Wiedmann.html Mozilla/3.04 (Win95,I)
123.456.78.2 – – [09/May/2001:05:05:11 – 0500] ‘‘Get Buxel.html HTTP/1.0’’ 200 3290 — Mozilla/3.04 (Win95,I)
123.456.78.3 – – [09/May/2001:05:06:03 – 0500] ‘‘Get Walsh.html HTTP/1.0’’ 200 4040 Buxel.html Mozilla/3.04 (Win95,I)
123.456.78.5 – – [09/May/2001:05:06:05 – 0500] ‘‘Get robots.txt’’ 200 1020 — Mozilla/3.04 (Win95,I)
233.999.79.4 – – [09/May/2001:05:06:07 – 0500] ‘‘Get Buxel.html HTTP/1.0’’ 200 3290 — Ultraseek
Information Explanation
User identification Analysis of IP addresses and user IDs can provide information about who the users
of a particular service are.
Interests URLs provide information about users’ interest in particular services. Analysis of
request sequences can also show important relationships between individual
services.
Usage periods Analysis of time spaces can reveal the times of day when users access services
and the amounts of time they spend with services when they access them. The
duration of users’ visits to websites can serve as an indicator for the interest being
shown in the relevant offers. In addition, time-space analysis can also show
distribution, over time, of access to the relevant website (when and how often).
Related interests The referrer provides information about the user’s visit before they access the
relevant site. Analysis of such data can support conclusions about significant
relationships to other websites, and about interests of relevant consumers.
Data collection via CGI-based files almost any sort of data might be
Instead of retrieving static HTML available to reactive data collection, the
documents, a client computer can run a following are of particular interest with
program known as a CGI script on the regard to the creation of customer
web server. This is protocol that can be profiles.
used to communicate between web Identification data include such data as
forms and a program. Like log files, user name, address, telephone and fax
CGI-based files keep records of events number, e-mail address and, possibly, the
that are caused by user actions. URL of the user’s own homepage. With
CGI-based files can provide information such data, a provider is in a position to
about users’ access patterns, consisting of send the user specially tailored offers or
the services that users request and the even to provide special support for
time users spend with each service, real-life (non-virtual) offers.
hence supplementing log files. Descriptive Data can be used to
determine customers’ basic goods and
services preferences and, thus, to generate
Data collection via special relevant, applicable conclusions about
applications customers (Table 3). Apart from data that
Other procedures can also be used for link user behaviour data with specific
non-reactive data collection, such as persons, there are many types of (user-)
cookies (software agents that providers descriptive data that are very difficult to
can use for data collection, special obtain through non-reactive data
browsers with source code modified for collection. Relevant data of particular
enhanced data collection), packet-sniffing interest in connection with profiling
technologies and web bugs, invisible includes information about the customer’s
images that produce usage information. current purchasing behaviour and
purchasing history, sociographic and
psychographic data. Information about
Reactive data collection purchasing behaviour, for example,
Reactive data collection is concerned indicates which products a customer
with consumer characteristics that cannot prefers at which prices, and which
be revealed with the behaviour tracking methods of payment. Typical
technologies outlined above. While sociographic data are, for instance, date
䉷 Henry Stewart Publications 1350-2328 (2002) Vol. 9, 2, 170–184 Journal of Database Marketing 175
Wiedmann, Buxel and Walsh
Closed form spaces Space for manual entries; no latitude on the topic level
Open form spaces Space for manual entries; latitude on the topic level
Selection menus Presentation of defined, fixed answer categories
176 Journal of Database Marketing Vol. 9, 2, 170–184 䉷 Henry Stewart Publications 1350-2328 (2002)
Customer profiling in e-commerce: Methodological aspects and challenges
‘z’ graphics for their browser; instead the consists of filtering out irrelevant
user requests a website with specific elements to identification of usage
information. The key data for behaviour and to later analysis (’data
interpretation of online behaviour cleaning’). In a second step, the data
include: user, server sessions, page views records must be assigned to individual
and episodes: clients or users and then divided into
server sessions. In a third step, the
— a user is a person who uses a browser identified server sessions are subjected to
to retrieve page files from one or still more analysis, leading to
more servers identification of individual page views.
— a page view (also referred to as page Finally, individual episodes are identified.
impression) is a set of page files Processing of information that
needed to generate a web browser describes user behaviour must be
display at a given time. A page view supported by processing of such
consists of all those elements that the structural and content data that describe
user is presented in their browser the company’s website.
window in response to a request via a
mouse click
— over longer periods of time, it Storage in a data warehouse
becomes very likely that some users Processing of non-reactively collected
will visit a given website more than data produces information about a user’s
once. For this reason, all data requests ‘hits’ on the company’s website (possibly
that are recorded in a log and that also about hits on other websites) and
can be assigned to a specific user have the amount of time spent with each item
to be divided into individual sessions, (good/service), which can be used for
if conclusions are to be drawn about creation of usage profiles. These data, in
the way the user’s behaviour is turn, can be of two basic types: personal
distributed over time. A ‘server (ie data that can be correlated with
session’ (also referred to as visits) individual, identified customers) and
refers to a process of continuous usage non-personal (that describe the usage
of a website, by a client, expressed in behaviour of anonymous customers).
terms of related page views. Usage Processing of reactively collected data
processes thus include every instance yields information about user
of technically successful and external characteristics that cannot be directly
access, by an Internet browser, to the obtained from usage behaviour. Five
website different types of marketing-relevant
— in addition to page views, ‘episodes’ customer profiles can be produced by
are relevant. Episodes are semantically combining reactively and non-reactively
meaningful sub-areas of server collected data (Table 4).
sessions. Examples of episodes include Once data, either reactively or
website pages devoted to particular non-reactively obtained, have been
categories of goods, eg sporting goods processed for creation of customer
or food, as well as pages that present profiles, the data are stored in a suitable
company-oriented data. structure, by either of two basic
procedures: a) establishment of a
Processing of non-reactively obtained standalone database system that comprises
data can be described as a four-step only data from Internet-based
process (see Figure 2).2 The first step transactions; or b) integration of the data
䉷 Henry Stewart Publications 1350-2328 (2002) Vol. 9, 2, 170–184 Journal of Database Marketing 177
Wiedmann, Buxel and Walsh
Usage data
178 Journal of Database Marketing Vol. 9, 2, 170–184 䉷 Henry Stewart Publications 1350-2328 (2002)
Customer profiling in e-commerce: Methodological aspects and challenges
Reactively
obtained data Non-reactively obtained data
useful to pare data before they are a certain product group). In evaluation of
transferred to a data warehouse or used non-reactively obtained data, a weight wij
for analysis. For instance, it can be useful can then be defined, for customer i and
to combine data to form aggregate dimension j, that describes the strength
packages. This approach can make the of the interest shown in the episode in
information in all relevant data available question (Table 5).
for profiling and yet largely eliminate the Weight wij, corresponding to customer
need for fine-grained storage of all i and episode j, can then be evaluated
non-reactively obtained data. using information about all relevant page
As described above, appropriate views for a given episode, as a function
combination of page views into episodes of three central figures4:
can make it possible for customer
requested page views to be automatically — the view time dij that a customer
classified and assigned to thematic areas. spends on page views for a relevant
If episodes are formed in such a manner episode
that they can be used to generate — the relative frequency hij with which
relevant conclusions about customer the page views for a given episode
characteristics (eg interest shown in are retrieved
certain products or topics), then — the chronological distribution or
non-reactively obtained data can be proximity of accesses (hits). While the
represented in a data warehouse in the relative frequency hij serves as an
form of weightings. In a data warehouse indicator for the general interest
a set of non-reactively obtained usage shown in a given product, the
data for a single customer can be importance of an area of interest can
represented as a vector with each of the vary over time. Management of a
vector’s dimensions corresponding to an product based on relative frequencies
area of interest (eg an episode assigned to can produce poor results if hits on the
䉷 Henry Stewart Publications 1350-2328 (2002) Vol. 9, 2, 170–184 Journal of Database Marketing 179
Wiedmann, Buxel and Walsh
180 Journal of Database Marketing Vol. 9, 2, 170–184 䉷 Henry Stewart Publications 1350-2328 (2002)
Customer profiling in e-commerce: Methodological aspects and challenges
would not want their immediate, website operators can collect data
relevant environments (eg transaction from customers, and use such data in
partners, family, workplace) to know customer profiling, without customers’
about their ‘dark areas’ (eg certain awareness. Another related complaint
behaviours and preferences); armed is that many non-reactive collection
with such knowledge, others would procedures, such as ‘web bugs’ or
intervene in their personal freedom, ‘packet-sniffing’ technologies, can
for instance, by exerting social and theoretically be used to obtain and
moral pressure or by criticising (often, record data about consumers without
unjustifiably) their individual the consumers’ consent. Interestingly,
behaviour. almost 60 per cent of all Internet
— even when data collection and users do not know how cookies can
relevant customer profiling methods be rejected.6
are seen as acceptable, consumers may — while profiling techniques make it
still be concerned about the many possible to enhance marketing
ways in which the data are used. efficiency through precise selection of
Advertisers usually use customer customers for marketing approaches,
profiles in order to predict preferences they also give rise to concern about
and future needs of customers, dangers of discrimination and selective
thereby placing themselves in a advertising and transaction design.7
position to influence existing needs Profiles, which are used to assess
and purchases efficiently. While customers and their future behaviour,
criticism focused on the manipulative can be used by companies as a way
nature of advertising is not new, such to avoid certain unattractive types of
criticism has grown into a new customers, to block such customers’
dimension as the possibilities for access to information about products
approaching individual customers via and services8 or to offer them poorer
the Internet have developed. More purchasing terms.9 For example, in
than ever before, advertisers are able online tests conducted in 2000,
to reinforce customers’ wants and to Amazon set DVD prices in a
spark more or less ‘uncontrolled’ customer-specific, selective way as a
purchases; purchases that customers function of the user’s browser type,
might not make under other service provider and frequency of
circumstances (or might make only at previous visits to the Amazon website.
different times and under different As a result, at a given time the price
terms) for a standardised DVD such as the
— privacy activists often stress that many film ‘Planet of the Apes’ differed by
Internet users are not aware that they $10 and more between customers.10
leave tracks that can be used for
creation of profiles. Others argue that The criticism aimed at customer profiling
customer profiling information, in methods and against the use of
principle, belongs to the affected non-reactive data collection procedures
customers and thus cannot become does not apply in equal measure to all
the property of other parties without data collection procedures. Table 6
the customers’ consent.5 All provides an overview of features of
non-reactive data collection non-reactive data-collection procedures
procedures provide behaviour in the light of criticism targeting the
observation instruments with which various methods.
䉷 Henry Stewart Publications 1350-2328 (2002) Vol. 9, 2, 170–184 Journal of Database Marketing 181
Wiedmann, Buxel and Walsh
182 Journal of Database Marketing Vol. 9, 2, 170–184 䉷 Henry Stewart Publications 1350-2328 (2002)
Customer profiling in e-commerce: Methodological aspects and challenges
䉷 Henry Stewart Publications 1350-2328 (2002) Vol. 9, 2, 170–184 Journal of Database Marketing 183
Wiedmann, Buxel and Walsh
8 Novek, E., Sinha, N. and Gandy, O. (1990) ‘The 10 Rosencrance, L. (2000) ‘Amazon charging
value of your name Culture & Society, Vol. 12, pp. different prices on some DVDs’, online:
525–543. www.computerworld.com/cwi/story/frame/0,121-
9 Weichert. T. (2000) ‘Zur Ökonomisierung des 3,NAV47_STO49569,00.html (last revision 1st
Rechts auf informationelle Selbstbestimmung October, 2000).
[Economising the law on individual 11 Wang, H., Lee, M. K. O. and Wang, C. (1998)
self-determination], Bäumler, H. (ed.) ‘E-Privacy: ‘Consumer privacy concerns about Internet
Datenschutz im Internet’, Vieweg, Braunschweig marketing’, Communications of the ACM, Vol. 3,
pp. 158–184. pp. 63–70.
184 Journal of Database Marketing Vol. 9, 2, 170–184 䉷 Henry Stewart Publications 1350-2328 (2002)