You are on page 1of 7

The Marketer's Dilemma:

Focusing on a Target or a Demographic?

The Utility of Data-integration Techniques

MrKE HESS Data-integration techniques can be useful tools as marketers continue to innprove
overall efficiency and return on investment. This is true because of the value of the
com techniques themselves and also because the current advertising market, based on
demographic buying, has major opportunities for arbitrage in the range of 10 percent
Nielsen to 25 percent (where in that range depends on the nature of the vertical). The current study reviews different methods of data integration in pursuing such negotiations.

INTRODUCTION sets also can improve targeting efficiency by a

Advertisers, agencies, and content providers all range from about 10 percent to 25 percent depend-
are looking for improvement in the placement of ing on the category vertical. A number of firms
advertisements in content. If an advertiser can employ data fusion and integration techniques
reach more of its customers and potential custom- on the provider side (e.g., Nielsen, Telmar, Kantar,
ers by spending less money, or an agency can help and Simmons) and the agency business (Hess and
an advertiser to do the same, this yields a positive Fadeyeva, 2008).
effect on the advertiser's bottom line. Conversely, In this study, the authors share some of the defi-
a content supplier can enhance its value if it can nitions and empirical generalizations that have
demonstrate that its content is attractive to par- accumulated in the past five years of working with
ticular types of people (e.g., those disposed to a these techniques.
particular brand or category, or even a particular. The practical application of data integration
psychographic target). already has begun to appear in the marketplace.
In this quest for improved advertising effi- A large snack-manufacturing company presented
ciency and return on investment (ROI), a number some of its findings ata recent Advertising Research
of different methods have evolved. Most market- Foundation (ARF) conference (Lion, 2009); a global
ers and their agencies use targeting rather than software supplier took the stage at a Consumer-360
mass-marketing strategies (Sharp, 2010). Beyond event (Nielsen C-360, 2011); and a media-planning
this, many agencies have their own "secret-sauce" and buying agericy has indicated that it is using its
formulas whereby they adjust the value of an custom fusion data set to verify andfine-tunecom-
advertising buy as a function of how much mitments made in the 2012 Upfront and in all of
"engagement" can be attributed to that vehicle, its competitive pitches for new business (personal
whether it be a specific television program communication to M. Hess, 2012).
or a magazine title. A more recent in-market In the next section, the various data-integration
approach—exemplified by TRA (Harvey, 2012) and techniques are defined, and some of the advan-
Nielsen Catalina Services—has also shown that tages and disadvantages of each are discussed.
buying can be improved through the identification
of programs that have more brand and category TYPES OF DATA INTEGRATION
heavy users. There are three broad types of data integration
The authors' own work since 2007 with data- used in media and consumer research for advertis-
integration techniques has shown that fused data ing planning.

DOI: 10.2501/JAR-53-2-231-236 June 2 0 1 3 JDURIIRL OF HDUEdTISIIIG RESEHRCH 2 3 1


consequence of this for media databases

EIVIPIRICAL GENERALIZATION is that advertising reach and frequency
Analysis with integrated data sets and the national people meter panel has shown us analyses can be created.
that if an advertising buy is made based on a marketing target and the programs that
its members view—rather than against a demographic target—there is empirically a • The cost of ascription is low in com-
range of between 10 percent and 25 percent improvement in the efficiency of that buy. parison to the cost of additional primary
This marketing target can be based either on consumption pattern segmentation (e.g., research.
heavy/light category users) or on psychographic/lifestyle segmentation (e.g., prudent
savers versus financial risk takers). Caveats associated with this approach:

Directly Matched Data being in two samples with sampling frac- • Ascription techniques contain the pos-
Data sets are matched using a common key tions of 1/10,000 is 1 in 100 million. sibility of model bias. This needs to be
(e.g., name and address, or cookies). Very In these cases, statistical ascription tech- carefully assessed. Model validation is
often, this requires the use of personally niques can be used to impute data. For essential.
identifiable information, and appropriate example, product-purchase data can be
privacy measures must be in place. Some ascribed onto the members of a research • In the majority of cases, ascription
of the key technical aspects that must be panel that measures television audiences, models have aggregate- rather than
evaluated are completeness and accuracy using common variables on the television respondent-level validity. For example,
of matching. panel and a product-purchase database to a model that overlays brand purchasing
For marketing purposes, databases guide the ascription. This enables viewing onto a television measurement panel
that are integrated via direct-matching habits of product users to be estimated. may not be able to predict the actual
of address are often referred to as single- Data fusion is one example of a unit- brand purchases of an individual house-
source data, but there is a distinction level ascription technique that is increas- hold on the panel, but it will be able ta
between true single-source and this form ingly being used to create integrated reliably predict the viewing of brand
of integrated data as the completeness and databases. (The topic is discussed in more purchasers as a group. This means
accuracy of the match are usually not per- detail later in this article.) that the approach is relevant to advei-
fect. However, it can be considered to be Some of the advantages of this approach: tising planning but less applicable to
the next best thing to single source assum- test-control ROI analyses where direct
ing the datasets being integrated are of • There is no additional burden on the assessment of purchase versus exposure
good quality and relevance. respondent. Because the ascription is sta- is required.
An example of this sort of database is tistical, it can be applied to anonymized
the Nielsen Catalina Services integration data. Additional data are obtained with- Aggregate-Level Integration
of Catalina frequent shopper data with out affecting existing response rates or Aggregate-level integration uses segmen-
television data obtained from Nielsen worsening respondent fatigue. tation to group and then link types cf
National People Meter data and Return respondent on data sets. The segmentation
Path Set Top Box data. • There are no privacy concerns. Along typically uses combinations of demograph-
with the previous point, it makes this a ics and geography, though any information
Unit-Level (e.g., respondent-level) particularly valuable approach to add- common to the data sets can be employed.
Ascription ing additional data fields to media cur- An example of a commonly used seg-
In many cases, direct matching of data rency measurements, which typically mentation is Prizm, which segments the
is unfeasible, perhaps because of pri- have tight constraints on respondent population into 60 geo-demographic
vacy concerns or because the intersection access and measurement specifications. groups. An assessment of viewing habits
between the data sets is minimal (this is of brand users can be obtained by iden-
usually the case with samples, where pop- • As the ascription is applied at the urùt/ tifying Prizm codes strongly associated
ulation sampling fractions are very small); respondent level, the database created with particular brands (using a consumer
assuming no exclusion criteria for research delivers complete analytic flexibility. panel) and looking at viewing traits associ-
eligibility, the chance of a respondent A particularly relevant and valuable ated with these groups (using a television

2 3 2 JOUIRL orflDUERTISlOGRESEflflCH June 2 0 1 3


panel with Prizm classification). Alterna- TABLE i "

tively, purchase, propensity scores across
all segments can be calculated on the con-
Overview of Integration Approaches
sumer panels and used as media weights Direct Match (e.g.. Unit-Level Ascription Aggregate Level
on television audiences. Address Matching) (e.g., Data Fusion) (e.g.. Segment Matching)
Advantages of this approach: Applications Advertising ROI

Media Reach and Media Reach and

• Segmentations can cover a wide Frequency Frequency
scope—linking data sets through
Media Planning Media Planning Media Planning
geo-demographic segmentation, for
example, allows consumer and media Ad Sales Ad Sales Ad Sales
research databases to be connected and Relating media and sales
subsequently linked with geographical activity to geographical
data such as retail areas. locations e.g., stores.
catchment areas
• Understanding a brand through the lens Accuracy/ High - near single Dependent on model: Dependent on
of a suitably constructed segmentation precision source can be near single segmentation but typically
delivers insights beyond basic purchase source lower than unit-level
facts, perhaps guiding advertising crea- ascription
tivity as well as media touch-points. Caveats Privacy Aggregate-level Aggregate-level validity:
validity: not suited to not suited to direct ROI
Limitations of this approach: direct ROI estimation estimation
Completeness and Model Bias Reach and Frequency not
• Segmentations, by nature, assume Accuracy of Matching available
homogeneity within segments, and this
Assumption of homogeneity
delivers less precision and less sensitiv-
within segments reduces
ity than other approaches.

• Because the integration of data sources

is not unit/respondent level, there are DATA FUSiON
restrictions on analysis: in particular, The term data fusion is used to describe The Data Fusion Process
campaign reach and frequency. many different data-integration methods. (TV/Internet Fusion)
The most conunon definition, and the one
TV Panel Internet Panel
The Pros and Cons of Each Approach we shall use in this study, is as follows:
Common Common
Direct match, unit-level ascription, and "Data fusion is a respondent-level integra- Characteristics Characteristics
aggregate-level ascription can' be consid- tion of two or more survey databases to TV Viewing Online Use
ered as a tool for users of research, to be create a simulated single source data set."
used in the appropriate way (See Table 1). Essentially two surveys (or panels) are
For example, respondent-level ascription merged at the respondent level to create a Data Fusion
of brand user attributes on a television single database (e.g., the U.S. Nielsen tele- (Matching via Common
panel may be used to plan advertising vision/Internet Data Fusion overlays data
for a specific brand target; a direct-match from the Nielsen Online Audience Meas-
database may then be used to estimate urement Panel onto the National People
Integrated Data
advertising effectiveness of the cam- Meter television Audience Measurement
Common Characteristics
paign; product distribution tactics may be Panel, creating a database of respondents
TV Viewing and Online Use
informed by the use of geo-demographic with television viewing measures and
segmentation. online usage measures).



cases, the two samples to be fused may

The term data fusion is used to describe many have very different sample sizes, and con-
sideration needs to be given to how to best
different data-integration methods. use the samples—whether ail respondents
will contribute to the fused database or
just the closest matches to create a data-
base with a respondent base equal in size
Linking Variabies independence; in the case of the televi- to the smaller of the two samples. This
The creation of this single database sion/Internet fusion, this would mean that decision often is driven by logistical fac-
matches respondents on common vari- variations in the way that television view- tors such as the analysis system capabili-
ables to lir\k the data sets. Common vari- ing and online use interact are random ties rather than being a purely statistical
ables (also known as "linking variables" within each group of respondents defined consideration.
or "fusion hooks") typically are demo- by the interlaced common variables.
graphic, geographic, and media-related. Where this condition does not hold, Vaiidation
For example, men aged 18 to 24 years, in model regression to mean occurs, and Data fusion has been used in media
full-time employment within a certain there wili be some bias in the fused results. research for planning purposes for more
geographical region who have a particular This bias can be estimated using fold-over than 20 years, and a body of knowledge
defined set of media habits (defined across tests or comparison to single-source data has been built up over that time. Valu-
the two' panels), may be matched across (if available) and is an important part of able guidance as to the validity levels that
the two databases. assessing a data fusion's validity and may hold given various data-integration
The importance of linking variables in utility. approaches also can be found in industry
the data fusion cannot be overstressed. In addition, a smart fusion practitioner guidelines developed by the Advertising
In the case of media-based data fusion, also will test the congruence of the link- Research Foundation (2003).
Nielsen data fusions adhere to the gener- ing variables across the two databases— Validation studies have demonstrated
ally accepted idea that linking variables checking that the two sample structures that data fusion provides vahd results
must encompass more than standard are matched well enough to enable the with acceptably low levels of model bias
demographic measures to ensure reliabil- fusion to work well and assessing the assuming the following hold:
ity of results. closeness of matching of the two samples
The importance of employing measures post fusion. • The samples are well defined and struc-
directly related to the phenomena begin turally similar;
fused (in this case, television viewing) was Matching the Samples • there is a sufficient set of relevant link-
emphasized by Suzanne Rassler (2002) in In practice, it is rarely possible to find a ing variables; and
Statistical Matching: match for every respondent across every • the fusion matches the samples closely
characteristic in the linking variable set. across the linking variables.
Within media and consuming data the In the absence of a perfect match, the
typical demographic and socioeconomic objective, therefore, becomes finding the The authors of the current article believe
variables will surely not completely explain best match. And although fusion algo- that it is important to validate every data
media exposure and consuming behavior. rithms vary, this requirement typically is fusion across these three criteria and to
Variables already concertiing media expo- achieved using statistical distance meas- create formal fold-over validation tests
sure and consuming behavior have to be urements (including assessment of the rel- and/or single-source comparisons where
asked as well. Thus, the common variables ative importance of the linking variables possible. In addition, offering methodo-
also have to contain variables concerning in predicting behavior) and identifying the logical transparency and welcoming exter-
television and consuming behaviors.... respondents with the smallest distance. nal validation of data fusion processes
At the same time, checks should occur have contributed to greater acceptance of
Linking variables are the key to the sta- in the fusion algorithm to ensure that the data fusion by the industry. As such, the
tistical validity of the fusion, which oper- fusion uses all the respondents in both method is viewed by many as a useful tool
ates on the assumption of conditional samples as equitably as possible. In some in the researchers' tool box.

234 m m i or nouERTisiiG DESEHRCH June 2013


ANALYSIS OF LEARNINGS AND aligned with CPG items that have broader not be either demographic or purchase
EMPIRICAL GENERALIZATIONS penetration, whereas the technology side based: it could be based on a psycho-
Although the authors have been work- is less aligned. Larger improvements can, graphic segmentation or a set of attitudes.
ing in this space since 2007, it is not easy therefore, come from this area. The implication is that planning on a
to obtain specific learning from every data standard demographic target (e.g., women
integration due to the proprietary nature of Expectations ages 25 to 54 years) is less efficient than
the service. The generalizafions below are The only empirical excepfions occur when planning on a more precisely defined
offered in the spirit of industry advance- the demographics and marketing target target.
ment while, at the same dme, protective of indexes for two programs happen to over-
the proprietary aspects of the outcomes. lap, or at least not differ significantly. STRATEGIC IMPLICATIONS
Analysis with integrated data sets and These occasional excepfions, however, Using more precise brand targets than
the national people meter panel has shown are offset by the findings that come from a tradifional demographics creates oppor-
us that if an advertising buy is made based list of demographicaUy similar programs. tunities for both buyers and sellers and
on a marketing target and the programs In fact, one almost always can find a subset improves overall media efficiency by
that its members view, rather than on a that will have higher category consumpfion delivering less waste: better advertising
demographic target, there is empirically a or penetrafion of a key psychographic tar- placement leads to more advertisements
range of 10 percent to 25 percent improve- get segment. This 10-percent to 25-percent being seen by the right people at the right
ment in the efficiency of that buy. range, in turn, translates into a form of fime and less irrelevant adverfisements
This marketing target can be based media arbitrage because sellers do not take being served up to bemused consumers.
either on consumption pattern segmen- into account the amount of the category Improving the media envirormient
tation (e.g., heavy/light category users) consumpfion/segment penetration when in this way is clearly good for every-
or on psychographic/lifestyle segmenta- they price their program Cost per Thou- one. Whether the use of brand targets
tion (e.g., prudent savers versus financial sand (CPMs) based on demographics. As will become an explicit component of an
risk takers). An increase in efficiency is noted earlier, established CPG categories adverfising buy or will remain hidden in
explained as follows: tend to fall in the lower part of this range the planning and negotiation process is
whereas newer spaces such as software and unclear. At present, the latter is the case
A campaign planned to deliver X demo- technology lie in the higher end. in television, in part because the execu-
graphic GRPs will deliver Y brand target Brands in all the categories we have fional tools for buying are conshained to
GRPs. An alternate plan can be developed examined to date have fallen into that demographics. Online advertising-serving
that delivers X demographic GRPs and Z range, signaling that there is virtually models, however, are capable of defining
brand target GRPs wheh Z > X. Equiva- always an efficiency to be gained by being more precise targets through cookie-based
lently an alternate plan can be developed to able to direct media toward the marketing ascription models.
deliver X2 demographic GRPs and Y brand target from an initial condition of having This empirical generalizafion also sug-
target GRPs where X2<X begun as a demographic target. Import- gests a strategy: to take advantage of the
(Collins and Doe, 2011). antly, that marketing target can be based available demographic-versus-marketing
either on psychographic/lifestyle attrib- target arbitrage, it is important to have the
The general patterns observed are utes or on brand/category consumption. right data that link the consumption seg-
These targets are sourced directly from ment, or psychographic segment, to pro-
• technology companies are closer to the
the fused databases. Although it is true gram viewing.
high end of the 10-percent to 25-percent
that if the target is very large (such as all These data sets can be based on single-
range of improvement;
American television viewers), no efficien- source, direct-matched, or fused data. In
• services, such as financial, are in the
cies will be gained; the majority of the each case, the television currency meas-
middle; and
targets worked with represent less than urement (e.g., the National People Meter
• Consumer Packaged Goods (CPG) are
20 percent of the viewing populafion. At service for national television advertising
at the lower end.
that level of targeting, the 10-percent to in the United States) is used as the basis
The authors attribute this outcome to the 25-percent range of improvement holds. for the program-viewing behavior. Get-
fact that demographic buying is itself more As noted previously, the brand target need ting these efficiencies in the television buy



also is important for cross-platform cam- Advertising Agencies-sponsored monograph on "Short COLLINS, ]., and P. DOE. Making Best Use
paigns. If the reach, for example, against and Long Term Effects of Advertising and Promotion" of Brand Target Audiences Print and Digital
the marketing target is already enhanced (2002), and a review of quantitative methods in Research Forum. San Francisco, CA, 2011.
via this approach as part of the television advertising research for the Rftieth Anniversary issue

buy, the Key Performance Indicator (KPI) of the Journal of Advertising Research (2011). He HARVEY, B., panelist at the Wharton Empirical
of the cross-platform might be based more currently acts as project co-lead for the quantification Generalizations Conference-II, Philadelphia,
on frequency and recency than on an effort of brand equity for the MASB and this year became a May 31,2012.
to attain additional unduplicated reach. trustee of the Marketing Sciences Institute.

HESS, M . , and I. FADEYEVA. ARF Forum on Data

Fusion and Integration. New York: Advertising
PETE DOE is svp/data integration at Nielsen. In that
In sum, the authors believe that data-
Research Foundation, 2008.
role, he has global responsibility for Nielsen's
integration techniques are acting as the
data-fusion methodologies and is involved with such
latest wave of services that are bringing
data-integration methods as STB modeled ratings and LION, S. Marketing Laws in Action. AM 4.0. New
greater overall efficiency and, in tum, ROI
oniine hybrid audiences. Prior to moving to the United York, NY: Advertising Research Foundation,
to the industry. They follow in the foot-
States in 2003. Doe was a board director at RSMB 2009. *•
steps of predictive new product models in
television research in the United Kingdom, where he
the 1970s and 1980s, and marketing-mix
worked on the BARB television audience measurement NIELSEN ANNUAL CUSTOMER C-360 CONFER-
modeling in the 1990s and 2000s. ®
currency and numerous data-fusion projects. ENCE. Orlando, June 2011.

MIKE HESS is evp in Nielsen's Media Analytics group.

He aiso serves as the Nielsen spokesperson RASSLER, S. Statistical Matching: A Frequentisf

for Social Television and is currentiy directing a Theory, Practical Applications, and Alternatizx
comprehensive anaiysis of the relationship between Bayesian Approaches. New York: Springer-Verlag,

social buzz and television ratings. Before joining 2002.

Nielsen in 2011, Hess was research director for ADVERTISING RESEARCH FOUNDATION. ARF
the media agencies of Carat and OMD. Hess's Guideiines for Data Integration. Advertising SHARP, B . HOW Brands Grow. Australia and New
publications inciude an American Association of Research Foundation, 2003. Zealand: Oxford University Press, 2010.

2 3 6 JouRom or HDUEIITISIIIG RESEHUCH June 2013

Copyright of Journal of Advertising Research is the property of Warc LTD and its content
may not be copied or emailed to multiple sites or posted to a listserv without the copyright
holder's express written permission. However, users may print, download, or email articles for
individual use.