You are on page 1of 129

4-1

A Comparison of Primary & Secondary Data


Table 4.1
Primary Data Collection Collection Collection Collection purpose process cost ti
me For the problem at hand Very involved High Long
Secondary Data For other problems Rapid & easy Relatively low Short
4-2
Uses of Secondary Data
Identify the problem Better define the problem Develop an approach to the proble
m Formulate an appropriate research design (for example, by identifying the key
variables) Answer certain research questions and test some hypotheses Interpret
primary data more insightfully
4-3
A Classification of Secondary Data
Fig. 4.1
Secondary Data
Internal
External
Ready to Use
Requires Further Processing
Published Materials
Computerized Databases
Syndicated Services
4-4
A Classification of Published Secondary Sources
Fig. 4.2 Published Secondary Data
General Business Sources
Government Sources
Guides
Directories
Indexes
Statistical Data
Census Data
Other Government Publications
4-5
A Classification of Computerized Databases
Fig. 4.3 Computerized Databases
Online
Internet
Off-Line
Bibliographic Databases
Numeric Databases
Full-Text Databases
Directory Databases
SpecialPurpose Databases
4-6
Syndicated Services: Consumers
Fig. 4.4 cont. Households / Consumers
Panels Electronic scanner services
Purchase
Media
Surveys
Volume Scanner Diary Scanner Diary Tracking Data Panels Panels with Cable TV Gen
eral Advertising Evaluation
Psychographic & Lifestyles
4-7
Syndicated Services: Institutions
Fig. 4.4 cont. Institutions
Retailers
Wholesalers
Industrial firms
Audits
Direct Inquiries
Clipping Services
Corporate Reports
4-8
A Classification of Marketing Research Data
Fig. 5.1
Marketing Research Data
Secondary Data
Primary Data
Qualitative Data Descriptive Survey Data Observational and Other Data
Quantitative Data Causal Experimental Data
4-9
Qualitative vs. Quantitative Research
Table 5.1 Qualitative Research Objective To gain a qualitative understanding of
the underlying reasons and motivations Small number of nonrepresentative cases U
nstructured Non-statistical Develop an initial understanding Quantitative Resear
ch To quantify the data and generalize the results from the sample to the popula
tion of interest Large number of representative cases Structured Statistical Rec
ommend a final course of action
Sample Data Collection Data Analysis Outcome
4-10
A Classification of Qualitative Research Procedures
Fig. 5.2 Qualitative Research Procedures
Direct (Non disguised)
Indirect (Disguised) Projective Techniques
Focus Groups
Depth Interviews
Association Techniques
Completion Techniques
Construction Techniques
Expressive Techniques
4-11
Definition of Projective Techniques
An unstructured, indirect form of questioning that encourages respondents to pro
ject their underlying motivations, beliefs, attitudes or feelings regarding the
issues of concern. In projective techniques, respondents are asked to interpret
the behavior of others. In interpreting the behavior of others, respondents indi
rectly project their own motivations, beliefs, attitudes, or feelings into the s
ituation.
4-12
Word Association
In word association, respondents are presented with a list of words, one at a ti
me and asked to respond to each with the first word that comes to mind. The word
s of interest, called test words, are interspersed throughout the list which als
o contains some neutral, or filler words to disguise the purpose of the study. R
esponses are analyzed by calculating: (1) the frequency with which any word is g
iven as a response; (2) the amount of time that elapses before a response is giv
en; and (3) the number of respondents who do not respond at all to a test word w
ithin a reasonable period of time.
4-13
Completion Techniques
In Sentence completion, respondents are given incomplete sentences and asked to
complete them. Generally, they are asked to use the first word or phrase that co
mes to mind. A person who shops at Sears is ______________________ A person who
receives a gift certificate good for Sak's Fifth Avenue would be _______________
___________________ J. C. Penney is most liked by _________________________ When
I think of shopping in a department store, I ________ A variation of sentence c
ompletion is paragraph completion, in which the respondent completes a paragraph
beginning with the stimulus phrase.
4-14
Completion Techniques
In story completion, respondents are given part of a story – enough to direct atte
ntion to a particular topic but not to hint at the ending. They are required to
give the conclusion in their own words.
4-15
Construction Techniques
With a picture response, the respondents are asked to describe a series of pictu
res of ordinary as well as unusual events. The respondent's interpretation of th
e pictures gives indications of that individual's personality. In cartoon tests,
cartoon characters are shown in a specific situation related to the problem. Th
e respondents are asked to indicate what one cartoon character might say in resp
onse to the comments of another character. Cartoon tests are simpler to administ
er and analyze than picture response techniques.
4-16
A Cartoon Test
Figure 5.4
Sears
Let’s see if we can pick up some house wares at Sears
4-17
Expressive Techniques
In expressive techniques, respondents are presented with a verbal or visual situ
ation and asked to relate the feelings and attitudes of other people to the situ
ation. Role playing Respondents are asked to play the role or assume the behavio
r of someone else. Third-person technique The respondent is presented with a ver
bal or visual situation and the respondent is asked to relate the beliefs and at
titudes of a third person rather than directly expressing personal beliefs and a
ttitudes. This third person may be a friend, neighbor, colleague, or a “typical” per
son.
4-18
Advantages of Projective Techniques
They may elicit responses that subjects would be unwilling or unable to give if
they knew the purpose of the study. Helpful when the issues to be addressed are
personal, sensitive, or subject to strong social norms. Helpful when underlying
motivations, beliefs, and attitudes are operating at a subconscious level.
4-19
A Classification of Survey Methods
Fig. 6.1 Survey Methods
Telephone
Personal
Mail
Electronic
In-Home
Mall Intercept
Computer-Assisted Personal Interviewing Mail Interview
E-mail
Internet
Traditional Telephone
Computer-Assisted Telephone Interviewing
Mail Panel
Observation Methods
4-20
Structured versus Unstructured Observation For structured observation, the resea
rcher specifies in detail what is to be observed and how the measurements are to
be recorded, e.g., an auditor performing inventory analysis in a store. In unst
ructured observation, the observer monitors all aspects of the phenomenon that s
eem relevant to the problem at hand, e.g., observing children playing with new t
oys.
Observation Methods
4-21
Disguised versus Undisguised Observation In disguised observation, the responden
ts are unaware that they are being observed. Disguise may be accomplished by usi
ng oneway mirrors, hidden cameras, or inconspicuous mechanical devices. Observer
s may be disguised as shoppers or sales clerks. In undisguised observation, the
respondents are aware that they are under observation.
Observation Methods
4-22
Natural versus Contrived Observation Natural observation involves observing beha
vior as it takes places in the environment. For example, one could observe the b
ehavior of respondents eating fast food in Burger King. In contrived observation
, respondents behavior is observed in an artificial environment, such as a test
kitchen.
4-23
A Classification of Observation Methods
Fig. 6.3
Classifying Observation Methods
Observation Methods
Personal Observation
Mechanical Observation
Audit
Content Analysis
Trace Analysis
4-24
Concept of Causality
A statement such as "X causes Y " will have the following meaning to an ordinary
person and to a scientist.
____________________________________________________ Scientific Meaning Ordinary
Meaning ____________________________________________________ X is the only caus
e of Y. X is only one of a number of possible causes of Y.
X must always lead to Y
(X is a deterministic cause of Y). It is possible to prove that X is a cause of
Y.
The occurrence of X makes the occurrence of Y more probable (X is a probabilisti
c cause of Y). We can never prove that X is a cause of Y. At best, we can infer
that X is a cause of Y.
4-25
Definitions and Concepts
Independent variables are variables or alternatives that are manipulated and who
se effects are measured and compared, e.g., price levels. Test units are individ
uals, organizations, or other entities whose response to the independent variabl
es or treatments is being examined, e.g., consumers or stores. Dependent variabl
es are the variables which measure the effect of the independent variables on th
e test units, e.g., sales, profits, and market shares. Extraneous variables are
all variables other than the independent variables that affect the response of t
he test units, e.g., store size, store location, and competitive effort.
4-26
Experimental Design
An experimental design is a set of procedures specifying
the test units and how these units are to be divided into homogeneous subsamples
, what independent variables or treatments are to be manipulated, what dependent
variables are to be measured, and how the extraneous variables are to be contro
lled.
4-27
Validity in Experimentation
Internal validity refers to whether the manipulation of the independent variable
s or treatments actually caused the observed effects on the dependent variables.
Control of extraneous variables is a necessary condition for establishing inter
nal validity. External validity refers to whether the cause-and-effect relations
hips found in the experiment can be generalized. To what populations, settings,
times, independent variables and dependent variables can the results be projecte
d?
4-28
Controlling Extraneous Variables
Randomization refers to the random assignment of test units to experimental grou
ps by using random numbers. Treatment conditions are also randomly assigned to e
xperimental groups. Matching involves comparing test units on a set of key backg
round variables before assigning them to the treatment conditions. Statistical c
ontrol involves measuring the extraneous variables and adjusting for their effec
ts through statistical analysis. Design control involves the use of experiments
designed to control specific extraneous variables.
4-29
A Classification of Experimental Designs
Figure 7.1 Experimental Designs
Pre-experimental One-Shot Case Study One Group Pretest-Posttest Static Group
True Experimental Pretest-Posttest Control Group Posttest: Only Control Group So
lomon FourGroup
Quasi Experimental Time Series Multiple Time Series
Statistical Randomized Blocks Latin Square Factorial Design
4-30
Factorial Design
Is used to measure the effects of two or more independent variables at various l
evels. A factorial design may also be conceptualized as a table. In a two-factor
design, each level of one variable represents a row and each level of another v
ariable represents a column.
4-31
Selecting a Test-Marketing Strategy
Competition Socio-Cultural Environment Very +ve New Product Development Other Fa
ctors Research on Existing Products Research on other Elements Very +ve Other Fa
ctors Very +ve Other Factors Simulated Test Marketing Controlled Test Marketing
Standard Test Marketing National Introduction Overall Marketing Strategy -ve -ve
-ve -ve Stop and Reevaluate
Need for Secrecy
4-32
Criteria for the Selection of Test Markets
Test Markets should have the following qualities:
1) Be large enough to produce meaningful projections. They should contain at lea
st 2% of the potential actual population. 2) Be representative demographically.
3) Be representative with respect to product consumption behavior. 4) Be represe
ntative with respect to media usage. 5) Be representative with respect to compet
ition. 6) Be relatively isolated in terms of media and physical distribution. 7)
Have normal historical development in the product class 8) Have marketing resea
rch and auditing services available 9) Not be over-tested
4-33
Measurement and Scaling
Measurement means assigning numbers or other symbols to characteristics of objec
ts according to certain prespecified rules. One-to-one correspondence between th
e numbers and the characteristics being measured. The rules for assigning number
s should be standardized and applied uniformly. Rules must not change over objec
ts or time.
4-34
Measurement and Scaling
Scaling involves creating a continuum upon which measured objects are located. C
onsider an attitude scale from 1 to 100. Each respondent is assigned a number fr
om 1 to 100, with 1 = Extremely Unfavorable, and 100 = Extremely Favorable. Meas
urement is the actual assignment of a number from 1 to 100 to each respondent. S
caling is the process of placing the respondents on a continuum with respect to
their attitude toward department stores.
4-35
Primary Scales of Measurement
Scale Figure 8.1 Nominal Numbers
Assigned to Runners
7 8 3
Finish
Ordinal
Rank Order of Winners
Third place Second place 9.1 First place 9.6
Finish
Interval
Performance Rating on a 0 to 10 Scale Time to Finish, in Seconds
8.2
Ratio
15.2
14.1
13.4
4-36
A Classification of Scaling Techniques
Figure 8.2 Scaling Techniques
Comparative Scales
Noncomparative Scales
Paired Comparison
Rank Order
Constant Sum
Q-Sort and Other Procedures
Continuous Itemized Rating Scales Rating Scales
Likert
Semantic Differential
Stapel
4-37
A Comparison of Scaling Techniques
Comparative scales involve the direct comparison of stimulus objects. Comparativ
e scale data must be interpreted in relative terms and have only ordinal or rank
order properties. In noncomparative scales, each object is scaled independently
of the others in the stimulus set. The resulting data are generally assumed to
be interval or ratio scaled.
Preference for Toothpaste Brands Using Rank Order Scaling
Figure 8.4 cont.
4-38
Form
Brand 1. Crest 2. Colgate 3. Aim 4. Gleem 5. Macleans 6. Ultra Brite 7. Close Up
8. Pepsodent 9. Plus White 10. Stripe Rank Order _________ _________ _________
_________ _________ _________ _________ _________ _________ _________
Importance of Bathing Soap Attributes Using a Constant Sum Scale
Figure 8.5 cont.
4-39
Form
Attribute 1. Mildness 2. Lather 3. Shrinkage 4. Price 5. Fragrance 6. Packaging
7. Moisturizing 8. Cleaning Power Sum
Average Responses of Three Segments
Segment I
8 2 3 53 9 7 5 13 100
Segment II
2 4 9 17 0 5 3 60 100
Segment III
4 17 7 9 19 9 20 15 100
4-40
Noncomparative Scaling Techniques
Respondents evaluate only one object at a time, and for this reason noncomparati
ve scales are often referred to as monadic scales. Noncomparative techniques con
sist of continuous and itemized rating scales.
4-41
Likert Scale
The Likert scale requires the respondents to indicate a degree of agreement or d
isagreement with each of a series of statements about the stimulus objects.
Strongly disagree Disagree Neither Agree agree nor disagree 3 3 3X 4 4 4 Strongl
y agree
1. Sears sells high quality merchandise. 2. Sears has poor in-store service. 3.
I like to shop at Sears.
1 1 1
2X 2X 2
5 5 5
The analysis can be conducted on an item-by-item basis (profile analysis), or a
total (summated) score can be calculated. When arriving at a total score, the ca
tegories assigned to the negative statements by the respondents should be scored
by reversing the scale.
4-42
Semantic Differential Scale
The semantic differential is a seven-point rating scale with end points associat
ed with bipolar labels that have semantic meaning. SEARS IS: Powerful --:--:--:-
-:-X-:--:--: Weak Unreliable --:--:--:--:--:-X-:--: Reliable Modern --:--:--:--:
--:--:-X-: Old-fashioned The negative adjective or phrase sometimes appears at t
he left side of the scale and sometimes at the right. This controls the tendency
of some respondents, particularly those with very positive or very negative att
itudes, to mark the right- or left-hand sides without reading the labels. Indivi
dual items on a semantic differential scale may be scored on either a -3 to +3 o
r a 1 to 7 scale.
A Semantic Differential Scale for Measuring SelfConcepts, Person Concepts, and P
roduct Concepts
1) Rugged 2) Excitable 3) Uncomfortable 4) Dominating 5) Thrifty 6) Pleasant 7)
Contemporary 8) Organized 9) Rational 10) Youthful 11) Formal 12) Orthodox 13) C
omplex 14) Colorless 15) Modest :---:---:---:---:---:---:---: Delicate :---:---:
---:---:---:---:---: Calm :---:---:---:---:---:---:---: Comfortable :---:---:---
:---:---:---:---: Submissive :---:---:---:---:---:---:---: Indulgent :---:---:--
-:---:---:---:---: Unpleasant :---:---:---:---:---:---:---: Obsolete :---:---:--
-:---:---:---:---: Unorganized :---:---:---:---:---:---:---: Emotional :---:---:
---:---:---:---:---: Mature :---:---:---:---:---:---:---: Informal :---:---:---:
---:---:---:---: Liberal :---:---:---:---:---:---:---: Simple :---:---:---:---:-
--:---:---: Colorful :---:---:---:---:---:---:---: Vain
4-43
4-44
Stapel Scale
The Stapel scale is a unipolar rating scale with ten categories numbered from -5
to +5, without a neutral point (zero). This scale is usually presented vertical
ly.
SEARS +5 +4 +3 +2 +1 HIGH QUALITY -1 -2 -3 -4X -5 +5 +4 +3 +2X +1 POOR SERVICE -
1 -2 -3 -4 -5
The data obtained by using a Stapel scale can be analyzed in the same way as sem
antic differential data.
4-45
Some Unique Rating Scale Configurations
Figure 9.3 Thermometer Scale Instructions: Please indicate how much you like McD
onald’s hamburgers by coloring in
the thermometer. Start at the bottom and color up to the temperature level that
best indicates how strong your preference is.
Form:
Like very much Dislike very much
100 75 50 25 0
Smiling Face Scale Instructions: Please point to the face that shows how much yo
u like the Barbie Doll. If
you do not like the Barbie Doll at all, you would point to Face 1. If you liked
it very much, you would point to Face 5.
Form:
1
2
3
4
5
4-46
Validity
Construct validity addresses the question of what construct or characteristic th
e scale is, in fact, measuring. Construct validity includes convergent, discrimi
nant, and nomological validity. Convergent validity is the extent to which the s
cale correlates positively with other measures of the same construct. Discrimina
nt validity is the extent to which a measure does not correlate with other const
ructs from which it is supposed to differ. Nomological validity is the extent to
which the scale correlates in theoretically predicted ways with measures of dif
ferent but related constructs.
4-47
Questionnaire Definition
A questionnaire is a formalized set of questions for obtaining information from
respondents.
4-48
Questionnaire Design Process
Fig. 10.1
Specify the Information Needed Specify the Type of Interviewing Method Determine
the Content of Individual Questions Design the Question to Overcome the Respond
ent’s Inability and Unwillingness to Answer Decide the Question Structure Determin
e the Question Wording Arrange the Questions in Proper Order Identify the Form a
nd Layout Reproduce the Questionnaire Eliminate Bugs by Pre-testing
Choosing Question Structure
Unstructured Questions
4-49
Unstructured questions are open-ended questions that respondents answer in their
own words. Do you intend to buy a new car within the next six months? _________
_________________________
Choosing Question Structure
Structured Questions
4-50
Structured questions specify the set of response alternatives and the response f
ormat. A structured question may be multiple-choice, dichotomous, or a scale.
Choosing Question Structure
Multiple-Choice Questions
4-51
In multiple-choice questions, the researcher provides a choice of answers and re
spondents are asked to select one or more of the alternatives given. Do you inte
nd to buy a new car within the next six months? ____ Definitely will not buy ___
_ Probably will not buy ____ Undecided ____ Probably will buy ____ Definitely wi
ll buy ____ Other (please specify)
Choosing Question Structure
Dichotomous Questions
4-52
A dichotomous question has only two response alternatives: yes or no, agree or d
isagree, and so on. Often, the two alternatives of interest are supplemented by
a neutral alternative, such as “no opinion,” “don t know,” “both,” or “none.” Do you intend
uy a new car within the next six months? _____ Yes _____ No _____ Don t know
Choosing Question Wording
Use Ordinary Words
4-53
“Do you think the distribution of soft drinks is adequate?” (Incorrect) “Do you think
soft drinks are readily available when you want to buy them?” (Correct)
Choosing Question Wording
Use Unambiguous Words
In a typical month, how often do you shop in department stores? _____ Never ____
_ Occasionally _____ Sometimes _____ Often _____ Regularly (Incorrect) In a typi
cal month, how often do you shop in department stores? _____ Less than once ____
_ 1 or 2 times _____ 3 or 4 times _____ More than 4 times (Correct)
4-54
4-55
Flow Chart for Questionnaire Design
Fig. 10.2
Introduction Ownership of Store, Bank, and Other Charge Cards Purchased Products
in a Specific Department Store during the Last Two Months Yes No Ever Purchased
in a Department Store? Yes
How was Payment made? Credit Cash Other
No Store Charge Card Bank Charge Card Other Charge Card Intentions to Use Store,
Bank, and other Charge Cards
4-56
Pretesting
Pretesting refers to the testing of the questionnaire on a small sample of respo
ndents to identify and eliminate potential problems. A questionnaire should not
be used in the field survey without adequate pretesting. All aspects of the ques
tionnaire should be tested, including question content, wording, sequence, form
and layout, question difficulty, and instructions. The respondents for the prete
st and for the actual survey should be drawn from the same population. Pretests
are best done by personal interviews, even if the actual survey is to be conduct
ed by mail, telephone, or electronic means, because interviewers can observe res
pondents reactions and attitudes.
4-57
Observational Forms
Department Store Project Who: Purchasers, browsers, males, females, parents with
children, or children alone. What: Products/brands considered, products/brands
purchased, size, price of package inspected, or influence of children or other f
amily members. When: Day, hour, date of observation. Where: Inside the store, ch
eckout counter, or type of department within the store. Why: Influence of price,
brand name, package size, promotion, or family members on the purchase. Way: Pe
rsonal observer disguised as sales clerk, undisguised personal observer, hidden
camera, or obtrusive mechanical device.
4-58
Questionnaire Design Checklist
Table 10.1
Step 1. Specify The Information Needed Step 2. Type of Interviewing Method Step
3. Individual Question Content Step 4. Overcome Inability and Unwillingness to A
nswer Step 5. Choose Question Structure Step 6. Choose Question Wording Step 7.
Determine the Order of Questions Step 8. Form and Layout Step 9. Reproduce the Q
uestionnaire Step 10. Pretest
4-59
Sample vs. Census
Table 11.1
Conditions Favoring the Use of Sample Census Small Short Large Small Low High De
structive Yes Large Long Small Large High Low Nondestructive No Type of Study 1.
Budget 2. Time available 3. Population size 4. Variance in the characteristic 5
. Cost of sampling errors 6. Cost of nonsampling errors 7. Nature of measurement
8. Attention to individual cases
4-60
The Sampling Design Process
Fig. 11.1
Define the Population Determine the Sampling Frame Select Sampling Technique(s)
Determine the Sample Size Execute the Sampling Process
4-61
Define the Target Population
The target population is the collection of elements or objects that possess the
information sought by the researcher and about which inferences are to be made.
The target population should be defined in terms of elements, sampling units, ex
tent, and time. An element is the object about which or from which the informati
on is desired, e.g., the respondent. A sampling unit is an element, or a unit co
ntaining the element, that is available for selection at some stage of the sampl
ing process. Extent refers to the geographical boundaries. Time is the time peri
od under consideration.
Sample Sizes Used in Marketing Research Studies
Table 11.2
Type of Study Problem identification research (e.g. market potential) Problem-so
lving research (e.g. pricing) Product tests Test marketing studies TV, radio, or
print advertising (per commercial or ad tested) Test-market audits Focus groups
Minimum Size Typical Range 500 200 200 200 150 10 stores 2 groups 1,000-2,500 3
00-500 300-500 300-500 200-300 10-20 stores 4-12 groups
4-62
4-63
Classification of Sampling Techniques
Fig. 11.2
Sampling Techniques Probability Sampling Techniques
Nonprobability Sampling Techniques
Convenience Sampling
Judgmental Sampling
Quota Sampling
Snowball Sampling
Simple Random Sampling
Systematic Sampling
Stratified Sampling
Cluster Sampling
Other Sampling Techniques
4-64
Data Preparation Process
Fig. 14.1 Prepare Preliminary Plan of Data Analysis Check Questionnaire Edit Cod
e Transcribe Clean Data Statistically Adjust the Data Select Data Analysis Strat
egy
4-65
Selecting a Data Analysis Strategy
Fig. 14.5 Earlier Steps (1, 2, & 3) of the Marketing Research Process Known Char
acteristics of the Data Properties of Statistical Techniques Background and Phil
osophy of the Researcher Data Analysis Strategy
4-66
A Classification of Univariate Techniques
Fig. 14.6 Univariate Techniques
Metric Data One Sample * t test * Z test Two or More Samples
Non-numeric Data One Sample * Frequency * Chi-Square * K-S * Runs * Binomial Ind
ependent * Chi-Square * Mann-Whitney * Median * K-S * K-W ANOVA Two or More Samp
les
Independent * TwoGroup test * Z test * One-Way ANOVA
Related * Paired t test Related * Sign * Wilcoxon * McNemar * Chi-Square
4-67
A Classification of Multivariate Techniques
Fig. 14.7
Multivariate Techniques
Interdependence Technique Variable Interdependence * Factor Analysis Interobject
Similarity * Cluster Analysis * Multidimensional Scaling
Dependence Technique One Dependent Variable * CrossTabulation * Analysis of Vari
ance and Covariance * Multiple Regression * Conjoint Analysis More Than One Depe
ndent Variable * Multivariate Analysis of Variance and Covariance * Canonical Co
rrelation * Multiple Discriminant Analysis
4-68
Frequency Distribution
In a frequency distribution, one variable is considered at a time. A frequency d
istribution for a variable produces a table of frequency counts, percentages, an
d cumulative percentages for all the values associated with that variable.
Statistics Associated with Frequency Distribution
Measures of Location
The mean, or average value, is the most commonly used measure of central tendenc
y. The mean, X ,is given by X = Σ X i /n
i=1 n
4-69
Where, Xi = Observed values of the variable X n = Number of observations (sample
size) The mode is the value that occurs most frequently. It represents the high
est peak of the distribution. The mode is a good measure of location when the va
riable is inherently categorical or has otherwise been grouped into categories.

tatistics Associated with Frequency Distribution
Measures of Location
The median of a sample is the middle value when the data are arranged in ascendi
ng or descending order. If the number of data points is even, the median is usua
lly estimated as the midpoint between the two middle values – by adding the two mi
ddle values and dividing their sum by 2. The median is the 50th percentile.
4-70

tatistics Associated with Frequency Distribution
Measures of Variability
The range measures the spread of the data. It is simply the difference between t
he largest and smallest values in the sample. Range = Xlargest – The interquartile
range is the difference between the 75th and 25th percentile. For a set of data
points arranged in order of magnitude, the pth percentile is the value that has
p% of the data points below it and (100 - p)% above it.
4-71
Xsmallest.

tatistics Associated with Frequency Distribution
Measures of Variability
The variance is the mean squared deviation from the mean. The variance can never
be negative. The standard deviation is the square root of the variance. n (Xi -
X)2 sx = i =1 n - 1
4-72
Σ
The coefficient of variation is the ratio of the standard deviation to the mean
expressed as a percentage, and is a unitless measure of relative variability.
CV = s x/X

tatistics Associated
 with Frequency Distribution
Measures
 of hape
kewness. The tendency of the deviations from the mean to be larger in one direc
tion than in the other. It can be thought of as the tendency for one tail of the
distribution to be heavier than the other. Kurtosis is a measure of the relativ
e peakedness or flatness of the curve defined by the frequency distribution. The
kurtosis of a normal distribution is zero. If the kurtosis is positive, then th
e distribution is more peaked than a normal distribution. A negative value means
that the distribution is flatter than a normal distribution.
4-73
4-74

kewness of a Distribution
Figure 15.2

ymmetric Distribution

kewed Distribution Mean Median Mode (a) Mean Median Mode (b)
4-75

teps Involved in Hypothesis Testing
 
Fig. 15.3 Formulate H0 and H1 elect Appropriate Test Choose Level of ignifican
ce Collect Data and Calculate Test tatistic
 
Determine Probability Associated with Test tatistic Compare with Level of igni
ficance, α
Determine Critic l V lue of Test St tistic TSCR Determine if TSCR f lls into (No
n) Rejection Region
Reject or Do not Reject H0 Dr w M rketing Rese rch Conclusion
4-76
A Bro d Cl ssific tion of Hypothesis Tests
Figure 15.6 Hypothesis Tests
Tests of Associ tion
Tests of Differences
Distributions
Me ns
Proportions
Medi n/ R nkings
4-77
Cross-T bul tion
While  frequency distribution describes one v ri ble t  time,  cross-t bul t
ion describes two or more v ri bles simult neously. Cross-t bul tion results in
t bles th t reflect the joint distribution of two or more v ri bles with  limit
ed number of c tegories or distinct v lues, e.g., T ble 15.3.
4-78
Gender nd Internet Us ge
T ble 15.3
Gender Internet Us ge Light (1) He vy (2) Column Tot l M le 5 10 15 Fem le 10 5
15 Row Tot l 15 15
4-79
Internet Us ge by Gender
T ble 15.4
Gender Internet Us ge Light He vy Column tot l M le 33.3% 66.7% 100% Fem le 66.7
% 33.3% 100%
4-80
Gender by Internet Us ge
T ble 15.5
Internet Us ge Gender M le Fem le Light 33.3% 66.7% He vy 66.7% 33.3% Tot l 100.
0% 100.0%
Introduction of  Third V ri ble in CrossT bul tion
Fig. 15.7 Origin l Two V ri bles
4-81
Some Associ tion between the Two V ri bles Introduce  Third V ri ble
No Associ tion between the Two V ri bles Introduce  Third V ri ble
Refined Associ tion between the Two V ri bles
No Associ tion between the Two V ri bles
No Ch nge in the Initi l P ttern
Some Associ tion between the Two V ri bles
4-82
Purch se of F shion Clothing by M rit l St tus
T ble 15.6
Purch se of F shion Clothing High Low Column Number of respondents
Current M rit l St tus M rried 31% 69% 100% 700 Unm rried 52% 48% 100% 300
4-83
Purch se of F shion Clothing by M rit l St tus
T ble 15.7
Pur ch se of F shion Clothing High Low Column tot ls Number of c ses M rr ied 35
% 65% 100% 400 Sex M le Not M r r ied 40% 60% 100% 120 M r r ied 25% 75% 100% 30
0 Fem le Not M r r ied 60% 40% 100% 180
E ting Frequently in F st-Food Rest ur nts by F mily Size
T ble 15.12
4-84
E t Frequently in F stFood Rest ur nts Sm ll Yes No Column tot ls Number of c se
s 65% 35% 100% 500
F mily Size L rge 65% 35% 100% 500
E ting Frequently in F st Food-Rest ur nts by F mily Size & Income
T ble 15.13
4-85
Income E t Frequently in F stFood Rest ur nts Low High F mily size Sm ll L rge 6
5% 65% 35% 35% 100% 100% 250 250
F mily size Sm ll L rge Yes 65% 65% No 35% 35% Column tot ls 100% 100% Number of
respondents 250 250
4-86
Chi-squ re Distribution
Figure 15.8
Do Not Reject H0
Reject H0
Critic l V lue
χ2
Statisti s Asso iated with Cross-Tabulation
Chi-Square
4-87
The hi-square statisti  ( χ 2 ) is used to test the statisti al signifi an e of t
he observed asso iation in a ross-tabulation. The expe ted frequen y for ea h 
ell an be al ulated by using a simple formula:
nrn  fe = n
where
nr n  n
= total number in the row = total number in the olumn = total sample size
Statisti s Asso iated with Cross-Tabulation
Chi-Square
4-88
For the data in Table 15.3, the expe ted frequen ies for the ells going from le
ft to right and from top to bottom, are:
15 X 15 = 7.50 30
15 X 15 = 7.50 30
15 X 15 = 7.50 30
15 X 15 = 7.50 30
Then the value of χ 2 is al ulated as follows:
χ2 =
Σ
all cells
(f o - f e) 2 fe

tatistics
 Associated with Cross-Tabulation
Chi- quare
χ 2 is For the data in Table 15.3, the value of al ulated as: = (5 -7.5)2 + (10 -
7.5)2 + (10 - 7.5)2 + (5 - 7.5)2 7.5 7.5 7.5 7.5 =0.833 + 0.833 + 0.833+ 0.833
= 3.333
4-89
Statisti s Asso iated with Cross-Tabulation
Lambda Coeffi ient
4-90
Asymmetri  lambda measures the per entage improvement in predi ting the value of
the dependent variable, given the value of the independent variable. Lambda als
o varies between 0 and 1. A value of 0 means no improvement in predi tion. A val
ue of 1 indi ates that the predi tion an be made without error. This happens wh
en ea h independent variable ategory is asso iated with a single ategory of th
e dependent variable. Asymmetri  lambda is omputed for ea h of the variables (t
reating it as the dependent variable). A symmetri  lambda is also omputed, whi 
h is a kind of average of the two asymmetri  values. The symmetri  lambda does n
ot make an assumption about whi h variable is dependent. It measures the overall
improvement when predi tion is done in both dire tions.
A Classifi ation of Hypothesis Testing Pro edures for Examining Differen es
Fig. 15.9 Hypothesis Tests
4-91
Parametri  Tests (Metri  Tests) One Sample * t test * Z test Two or More Samples
Non-parametri  Tests (Nonmetri  Tests) One Sample * * * * Chi-Square K-S Runs Bi
nomial Two or More Samples
Independent Samples * Two-Group t test * Z test
Paired Samples * Paired t test
Independent Samples * Chi-Square * Mann-Whitney * Median * K-S
* * * *
Paired Samples Sign Wil oxon M Nemar Chi-Square
4-92
Non-Parametri  Tests
Nonparametri  tests are used when the independent variables are nonmetri . Like
parametri  tests, nonparametri  tests are available for testing variables from o
ne sample, two independent samples, or two related samples.
Non-Parametri  Tests
One Sample
Sometimes the resear her wants to test whether the observations for a parti ular
variable ould reasonably have ome from a parti ular distribution, su h as the
normal, uniform, or Poisson distribution.
4-93
The Kolmogorov-Smirnov (K-S) one-sample test is one su h goodness-of-fit test. T
he K-S ompares the umulative distribution fun tion for a variable with a spe i
fied distribution. Ai denotes the umulative relative frequen y for ea h ategor
y of the theoreti al (assumed) distribution, and Oi the omparable value of the
sample frequen y. The K-S test is based on the maximum value of the absolute dif
feren e between Ai and Oi. The test statisti  is
K = Max A i - Oi
Non-Parametri  Tests
One Sample
4-94
The hi-square test an also be performed on a single variable from one sample.
In this ontext, the hi-square serves as a goodness-of-fit test. The runs test
is a test of randomness for the di hotomous variables. This test is ondu ted by
determining whether the order or sequen e in whi h observations are obtained is
random. The binomial test is also a goodness-of-fit test for di hotomous variab
les. It tests the goodness of fit of the observed number of observations in ea h
ategory to the number expe ted under a spe ified binomial distribution.
Non-Parametri  Tests
4-95
Two Independent Samples
When the differen e in the lo ation of two populations is to be ompared based o
n observations from two independent samples, and the variable is measured on an
ordinal s ale, the Mann-Whitney U test an be used. In the Mann-Whitney U test,
the two samples are ombined and the ases are ranked in order of in reasing siz
e. The test statisti , U, is omputed as the number of times a s ore from sample
or group 1 pre edes a s ore from group 2. If the samples are from the same popu
lation, the distribution of s ores from the two groups in the rank list should b
e random. An extreme value of U would indi ate a nonrandom pattern, pointing to
the inequality of the two groups. For samples of less than 30, the exa t signifi
an e level for U is omputed. For larger samples, U is transformed into a norma
lly distributed z statisti . This z an be orre ted for ties within ranks.
4-96
SPSS Windows
The main program in SPSS is FREQUENCIES. It produ es a table of frequen y ounts
, per entages, and umulative per entages for the values of ea h variable. It gi
ves all of the asso iated statisti s. If the data are interval s aled and only t
he summary statisti s are desired, the DESCRIPTIVES pro edure an be used. The E
XPLORE pro edure produ es summary statisti s and graphi al displays, either for
all of the ases or separately for groups of ases. Mean, median, varian e, stan
dard deviation, minimum, maximum, and range are some of the statisti s that an
be al ulated.
4-97
SPSS Windows
To sele t these pro edures li k: Analyze>Des riptive Statisti s>Frequen ies Ana
lyze>Des riptive Statisti s>Des riptives Analyze>Des riptive Statisti s>Explore
The major ross-tabulation program is CROSSTABS. This program will display the 
ross- lassifi ation tables and provide ell ounts, row and olumn per entages,
the hi-square test for signifi an e, and all the measures of the strength of th
e asso iation that have been dis ussed. To sele t these pro edures li k: Analyz
e>Des riptive Statisti s>Crosstabs
4-98
SPSS Windows
The major program for ondu ting parametri  tests in SPSS is COMPARE MEANS. This
program an be used to ondu t t tests on one sample or independent or paired s
amples. To sele t these pro edures using SPSS for Windows li k:
Analyze>Compare Means>Means … Analyze>Compare Means>One-Sample T Test … Analyze>Comp
are Means>IndependentSamples T Test … Analyze>Compare Means>Paired-Samples T Test …
4-99
SPSS Windows
The nonparametri  tests dis ussed in this hapter an be ondu ted using NONPARA
METRIC TESTS. To sele t these pro edures using SPSS for Windows li k:
Analyze>Nonparametri  Tests>Chi-Square … Analyze>Nonparametri  Tests>Binomial … Anal
yze>Nonparametri  Tests>Runs … Analyze>Nonparametri  Tests>1-Sample K-S … Analyze>No
nparametri  Tests>2 Independent Samples … Analyze>Nonparametri  Tests>2 Related Sa
mples …
4-100
Produ t Moment Correlation
The produ t moment orrelation, r, summarizes the strength of asso iation betwee
n two metri  (interval or ratio s aled) variables, say X and Y. It is an index u
sed to determine whether a linear or straight-line relationship exists between X
and Y. As it was originally proposed by Karl Pearson, it is also known as the P
earson orrelation oeffi ient. It is also referred to as simple orrelation, bi
variate orrelation, or merely the orrelation oeffi ient.
4-101
Produ t Moment Correlation
r varies between -1.0 and +1.0.
The orrelation oeffi ient between two variables will be the same regardless of
their underlying units of measurement.
Statisti s Asso iated with Bivariate Regression Analysis
Regression oeffi ient. The estimated parameter b is usually referred to as the
nonstandardized regression oeffi ient. S attergram. A s atter diagram, or s att
ergram, is a plot of the values of two variables for all the ases or observatio
ns.
4-102
Standard error of estimate. This statisti , SEE, is the standard deviation of th
e a tual Y values from the predi ted Y values. Standard error. The standard devi
ation of b, SEb, is alled the standard error.
Statisti s Asso iated with Bivariate Regression Analysis
Standardized regression oeffi ient. Also termed the beta oeffi ient or beta we
ight, this is the slope obtained by the regression of Y on X when the data are s
tandardized. Sum of squared errors. The distan es of all the points from the reg
ression line are squared and added together to arrive at the sum of squared erro
rs, whi h is a measure of total error, Σe 2 j . freedom can be used to test the nu
ll hypothesis
 that no linear relationship exists between X and Y, or H0: β 1 = 0,
where t = SE
4-103
t statistic. A t statistic with n - 2 degrees of
Conducting Bivariate Regression Analysis
Plot the Scatter Diagram 
A scatter diagram,or scattergram, is a plot of the values of two varia les for
all the cases or o servations. The most commonly used technique for fitting a st
raight line to a scattergram is the least-squares procedure. In fitting the line
, the least-squares procedure minimizes the sum of squared errors, Σe 2 j .
4-104
4-105
Conducting Bivariate Regression Analysis
Fig. 17.2 Plot the
 catter Diagram Formulate the General Model Estimate the Para
meters Estimate
 tandardized
 Regression Coefficients Test for ignificance Deter
mine the trength and ignificance of Association Check Prediction Accuracy Exam
ine the Residuals Cross-Validate the Model
4-106
Multiple Regression
The general form of the multiple regression model is as follows:
Y = β 0 + β 1 X1 + β 2 X2 + β 3 X3+ . . . + β k Xk + e
which is estimated y the following equation:
   
Y =a + 1X1 + 2X2 + 3X3+ . . . + kXk  
As efore, the coefficient a represents the intercept, ut the 's are now the p
artial regression coefficients.
4-107
Multicollinearity
Multicollinearity arises when intercorrelationsamong the predictors are very hi
gh. Multicollinearity can result
 in several pro lems, including: The partial reg
ressioncoefficients may not e estimated precisely. The standard errors are lik
ely to e high. The magnitudes as well as the signs
 of the partial regression co
efficients may change from sample to sample.It ecomes difficult to assess the
relative importanceof the independent varia lesin explaining the variation in
the dependent varia le. Predictor varia les may e incorrectly included or remov
ed in stepwise regression.
4-108
SPSS Windows
The CORRELATE program computes Pearson product moment correlations and partial c
orrelations with significance levels.
 Univariate statistics, covariance, and cro
ss-product deviations may also e requested. Significance levels are included in
the output. To select these procedures using SPSS for Windows click:
 Analyze>Co

rrelate>Bivariate … Analyze>Correlate>Partial … Scatterplots
 can e o tained y clic
king: Graphs>Scatter …>Simple>Define REGRESSION calculates ivariate and multiple
regression equations, associated statistics,
 and plots.
 It allows for an easy ex
amination of residuals. This procedure can e run y clicking: Analyze>Regressio
n Linear …

Similarities and Differences etween ANOVA, Regression, and Discriminant Analysi
s 
Ta le 18.1    
ANOVA Similarities Num er of dependent varia
 les Num er of independent varia les

Differences Nature of the dependent varia les Nature of the independent varia l
es One REGRESSION One DISCRIMINANT ANALYSIS One
4-109
Multiple
Multiple
Multiple
Metric Categorical
Metric Metric
Categorical Metric
4-110
Discriminant Analysis
Discriminant analysis is a technique for analyzing data when the criterion or de
pendent varia le is categorical
 and the predictor or independent varia les are i
nterval in nature. The o jectives of discriminant analysis are as follows: Devel
opment of discriminant
 functions,
 or linear com inations
 of the predictor or ind
ependent varia les, which will
 est discriminate etween the categories of the c
riterion or dependent varia le (groups). Examination of whethersignificant diff
erences exist among the groups,
 in terms
 of the predictor varia les. Determinati
on of which predictor varia les contri ute to most
 of the intergroup differences
. Classification
 of cases to one of the groups ased on the values of the predic
tor varia les. Evaluation of the accuracy of classification.
4-111
Statistics Associated with Discriminant Analysis
Canonical correlation. Canonical correlation measures the extent of association
etween the discriminant scores and the groups. It is a measure ofassociation
etween the single
 discriminant function and the set of dummy varia les that defi
ne the group mem ership. Centroid. The centroid is the mean values for the discr
iminant scores for a particular group. There are as many centroids as there are
groups, as there is one for each group. The means for a group on all the functio
ns are the group centroids. Classification matrix. Sometimes also called
 confusi
on or prediction matrix, the classification matrix contains the num er of correc
tly classified and misclassified cases.
4-112
Statistics Associated with Discriminant Analysis
Discriminant function coefficients. The discriminant function
 coefficients (unst
andardized) are the multipliers of varia les, when the varia les are in the orig
inal units of measurement.
 Discriminant scores.
 The unstandardized coefficients
are multiplied y the valuesof the varia les. These products are summed and add
ed to the constant term to o tain the discriminant scores.
 Eigenvalue. For each
discriminant function, the Eigenvalue is the ratio of etween-group to withingro
up sums of squares. Large Eigenvalues imply superior functions.
4-113
Conducting Discriminant Analysis
Fig. 18.1 
Formulate the Pro lem
Estimate the Discriminant Function Coefficients
Determine the Significance of the Discriminant Function
Interpret the Results
Assess Validity of Discriminant Analysis
4-114
SPSS Windows 
The DISCRIMINANT program performs oth twogroup and multiple discriminant analys
is. To select this procedure using SPSS for Windows click: Analyze>Classify>Disc
riminant …
4-115
Factor Analysis
Factor analysis is a general name denoting a class of procedures primarily used
for data reduction and summarization. Factor analysis is an interdependence tech
nique in that an entireset of interdependent relationships isexamined without
making the distinction etween dependent and independent varia les. Factor analy
sis is used in the following circumstances: To identify underlying
 dimensions, o
r factors, that explain the correlationsamong a set of varia les. To identify a
new, smaller,
 set of uncorrelated varia les to replace the original set of corr
elated varia les in su sequent multivariate analysis (regression
 or discriminant
analysis).
 To identify a smaller set of salient varia les from a larger set for
use in su sequent multivariate analysis.
4-116
Factor Analysis
 Model
It is possi le to select weights or factor score coefficients so that the first
factor explains
 the largest portion of the total variance. Then a second set of
weights can e selected,
 so
 that the second factor accounts for most of the resi
dual variance,su ject to eing uncorrelated with the first factor. This same pr
inciple could e applied to selecting additional weights for the additional fact
ors.
4-117
Conducting Factor
 Analysis
Fig 19.1 Pro lem formulation Construction
 of the Correlation Matrix Method of Fa
ctor Analysis Determination of Num er of Factors Rotation of Factors Interpretat
ion of Factors Calculation
 of Factor Scores Determination of Model Fit Selection
of Surrogate Varia les
Conducting Factor Analysis
4-118

Determine the Num er of Factors 
A Priori Determination. Sometimes, ecause of
prior knowledge,
 the researcherknows how many
 factors to expect and thus can sp
ecify the num er of factors to e extracted eforehand.
Determination Based on Eigenvalues. In this approach, only factors with Eigenval
ues greater than 1.0 are retained. An Eigenvalue represents the amount of varian
ce associated with the factor. Hence, only factors with a variance
 greater than
1.0 areincluded. Factors with variance less than 1.0are no etter than a singl
e varia le,
 since, dueto standardization, each varia le has a variance of 1.0.
If the num er of varia les is less than 20, this approach will result in a conse
rvative num er of factors.
4-119
SPSS Windows
To select this procedures using SPSS for Windows click: Analyze>Data Reduction>F
actor …
4-120
Cluster Analysis 
Cluster analysis is a class of techniques used toclassify o jects or cases into
relatively homogeneous groups called clusters.
 O jects in each cluster tend to
e similar to each other and dissimilar to o jects in the other clusters. Cluste
r analysis is also called classification analysis, or numerical taxonomy. Both c
luster analysis and discriminant analysis are concerned with classification. How
ever, discriminantanalysis requires prior knowledge of the cluster or group mem
ership for each o ject or case included, to develop the classification
 rule. In
contrast, in cluster analysis there is
 no a priori information a out the group
or cluster mem ership for any of the o jects. Groups or clusters are suggested
y the data, not defined a priori.
4-121
An Ideal Clustering Situation
Fig. 20.1

Varia le 1

Varia le 2
4-122
Conducting Cluster Analysis
Fig. 20.3 Formulate the Pro
 lem Select a Distance Measure Select a Clustering Pr
ocedure Decide on the Num er of Clusters Interpret and Profile Clusters Assess t
he Validity of Clustering
4-123
A Classification of Clustering Procedures
Fig. 20.4 Hierarchical Agglomerative Divisive Sequential Threshold Linkage Metho
ds Variance Methods Ward’s Method Single Complete Average Parallel Threshold Centr
oid Methods Optimizing Partitioning Clustering Pro edures Nonhierar hi al
Condu ting Cluster Analysis
4-124
Sele t a Clustering Pro edure – Hierar hi al
Hierar hi al lustering is hara terized by the development of a hierar hy or tr
ee-like stru ture. Hierar hi al methods an be agglomerative or divisive. Agglom
erative lustering starts with ea h obje t in a separate luster. Clusters are f
ormed by grouping obje ts into bigger and bigger lusters. This pro ess is onti
nued until all obje ts are members of a single luster. Divisive lustering star
ts with all the obje ts grouped in a single luster. Clusters are divided or spl
it until ea h obje t is in a separate luster. Agglomerative methods are ommonl
y used in marketing resear h. They onsist of linkage methods, error sums of squ
ares or varian e methods, and entroid methods.
Condu ting Cluster Analysis
4-125
Sele t a Clustering Pro edure – Linkage Method
The single linkage method is based on minimum distan e, or the nearest neighbor
rule. At every stage, the distan e between two lusters is the distan e between
their two losest points (see Figure 20.5). The omplete linkage method is simil
ar to single linkage, ex ept that it is based on the maximum distan e or the fur
thest neighbor approa h. In omplete linkage, the distan e between two lusters
is al ulated as the distan e between their two furthest points. The average lin
kage method works similarly. However, in this method, the distan e between two 
lusters is defined as the average of the distan es between all pairs of obje ts,
where one member of the pair is from ea h of the lusters (Figure 20.5).
4-126
Linkage Methods of Clustering
Fig. 20.5
Single Linkage
Minimum Distan e Cluster 1 Cluster 2
Complete Linkage
Maximum Distan e
Cluster 1
Average Linkage
Cluster 2
Average Distan e Cluster 1 Cluster 2
4-127
Other Agglomerative Clustering Methods
Fig. 20.6
Ward’s Pro edure
Centroid Method
4-128
SPSS Windows
To sele t this pro edures using SPSS for Windows li k: Analyze>Classify>Hierar 
hi al Cluster … Analyze>Classify>K-Means Cluster …

You might also like