Primary data collection collection purpose process cost ti me for the problem at hand Very involved High Long Secondary Data For other problems Rapid and easy Relatively low Short 4-2 Uses of Secondary Data Identify the problem Better define the problem Formulate an approach to the proble m Answer certain research questions and test some hypotheses.
Primary data collection collection purpose process cost ti me for the problem at hand Very involved High Long Secondary Data For other problems Rapid and easy Relatively low Short 4-2 Uses of Secondary Data Identify the problem Better define the problem Formulate an approach to the proble m Answer certain research questions and test some hypotheses.
Copyright:
Attribution Non-Commercial (BY-NC)
Available Formats
Download as TXT, PDF, TXT or read online from Scribd
Primary data collection collection purpose process cost ti me for the problem at hand Very involved High Long Secondary Data For other problems Rapid and easy Relatively low Short 4-2 Uses of Secondary Data Identify the problem Better define the problem Formulate an approach to the proble m Answer certain research questions and test some hypotheses.
Copyright:
Attribution Non-Commercial (BY-NC)
Available Formats
Download as TXT, PDF, TXT or read online from Scribd
Table 4.1 Primary Data Collection Collection Collection Collection purpose process cost ti me For the problem at hand Very involved High Long Secondary Data For other problems Rapid & easy Relatively low Short 4-2 Uses of Secondary Data Identify the problem Better define the problem Develop an approach to the proble m Formulate an appropriate research design (for example, by identifying the key variables) Answer certain research questions and test some hypotheses Interpret primary data more insightfully 4-3 A Classification of Secondary Data Fig. 4.1 Secondary Data Internal External Ready to Use Requires Further Processing Published Materials Computerized Databases Syndicated Services 4-4 A Classification of Published Secondary Sources Fig. 4.2 Published Secondary Data General Business Sources Government Sources Guides Directories Indexes Statistical Data Census Data Other Government Publications 4-5 A Classification of Computerized Databases Fig. 4.3 Computerized Databases Online Internet Off-Line Bibliographic Databases Numeric Databases Full-Text Databases Directory Databases SpecialPurpose Databases 4-6 Syndicated Services: Consumers Fig. 4.4 cont. Households / Consumers Panels Electronic scanner services Purchase Media Surveys Volume Scanner Diary Scanner Diary Tracking Data Panels Panels with Cable TV Gen eral Advertising Evaluation Psychographic & Lifestyles 4-7 Syndicated Services: Institutions Fig. 4.4 cont. Institutions Retailers Wholesalers Industrial firms Audits Direct Inquiries Clipping Services Corporate Reports 4-8 A Classification of Marketing Research Data Fig. 5.1 Marketing Research Data Secondary Data Primary Data Qualitative Data Descriptive Survey Data Observational and Other Data Quantitative Data Causal Experimental Data 4-9 Qualitative vs. Quantitative Research Table 5.1 Qualitative Research Objective To gain a qualitative understanding of the underlying reasons and motivations Small number of nonrepresentative cases U nstructured Non-statistical Develop an initial understanding Quantitative Resear ch To quantify the data and generalize the results from the sample to the popula tion of interest Large number of representative cases Structured Statistical Rec ommend a final course of action Sample Data Collection Data Analysis Outcome 4-10 A Classification of Qualitative Research Procedures Fig. 5.2 Qualitative Research Procedures Direct (Non disguised) Indirect (Disguised) Projective Techniques Focus Groups Depth Interviews Association Techniques Completion Techniques Construction Techniques Expressive Techniques 4-11 Definition of Projective Techniques An unstructured, indirect form of questioning that encourages respondents to pro ject their underlying motivations, beliefs, attitudes or feelings regarding the issues of concern. In projective techniques, respondents are asked to interpret the behavior of others. In interpreting the behavior of others, respondents indi rectly project their own motivations, beliefs, attitudes, or feelings into the s ituation. 4-12 Word Association In word association, respondents are presented with a list of words, one at a ti me and asked to respond to each with the first word that comes to mind. The word s of interest, called test words, are interspersed throughout the list which als o contains some neutral, or filler words to disguise the purpose of the study. R esponses are analyzed by calculating: (1) the frequency with which any word is g iven as a response; (2) the amount of time that elapses before a response is giv en; and (3) the number of respondents who do not respond at all to a test word w ithin a reasonable period of time. 4-13 Completion Techniques In Sentence completion, respondents are given incomplete sentences and asked to complete them. Generally, they are asked to use the first word or phrase that co mes to mind. A person who shops at Sears is ______________________ A person who receives a gift certificate good for Sak's Fifth Avenue would be _______________ ___________________ J. C. Penney is most liked by _________________________ When I think of shopping in a department store, I ________ A variation of sentence c ompletion is paragraph completion, in which the respondent completes a paragraph beginning with the stimulus phrase. 4-14 Completion Techniques In story completion, respondents are given part of a story – enough to direct atte ntion to a particular topic but not to hint at the ending. They are required to give the conclusion in their own words. 4-15 Construction Techniques With a picture response, the respondents are asked to describe a series of pictu res of ordinary as well as unusual events. The respondent's interpretation of th e pictures gives indications of that individual's personality. In cartoon tests, cartoon characters are shown in a specific situation related to the problem. Th e respondents are asked to indicate what one cartoon character might say in resp onse to the comments of another character. Cartoon tests are simpler to administ er and analyze than picture response techniques. 4-16 A Cartoon Test Figure 5.4 Sears Let’s see if we can pick up some house wares at Sears 4-17 Expressive Techniques In expressive techniques, respondents are presented with a verbal or visual situ ation and asked to relate the feelings and attitudes of other people to the situ ation. Role playing Respondents are asked to play the role or assume the behavio r of someone else. Third-person technique The respondent is presented with a ver bal or visual situation and the respondent is asked to relate the beliefs and at titudes of a third person rather than directly expressing personal beliefs and a ttitudes. This third person may be a friend, neighbor, colleague, or a “typical” per son. 4-18 Advantages of Projective Techniques They may elicit responses that subjects would be unwilling or unable to give if they knew the purpose of the study. Helpful when the issues to be addressed are personal, sensitive, or subject to strong social norms. Helpful when underlying motivations, beliefs, and attitudes are operating at a subconscious level. 4-19 A Classification of Survey Methods Fig. 6.1 Survey Methods Telephone Personal Mail Electronic In-Home Mall Intercept Computer-Assisted Personal Interviewing Mail Interview E-mail Internet Traditional Telephone Computer-Assisted Telephone Interviewing Mail Panel Observation Methods 4-20 Structured versus Unstructured Observation For structured observation, the resea rcher specifies in detail what is to be observed and how the measurements are to be recorded, e.g., an auditor performing inventory analysis in a store. In unst ructured observation, the observer monitors all aspects of the phenomenon that s eem relevant to the problem at hand, e.g., observing children playing with new t oys. Observation Methods 4-21 Disguised versus Undisguised Observation In disguised observation, the responden ts are unaware that they are being observed. Disguise may be accomplished by usi ng oneway mirrors, hidden cameras, or inconspicuous mechanical devices. Observer s may be disguised as shoppers or sales clerks. In undisguised observation, the respondents are aware that they are under observation. Observation Methods 4-22 Natural versus Contrived Observation Natural observation involves observing beha vior as it takes places in the environment. For example, one could observe the b ehavior of respondents eating fast food in Burger King. In contrived observation , respondents behavior is observed in an artificial environment, such as a test kitchen. 4-23 A Classification of Observation Methods Fig. 6.3 Classifying Observation Methods Observation Methods Personal Observation Mechanical Observation Audit Content Analysis Trace Analysis 4-24 Concept of Causality A statement such as "X causes Y " will have the following meaning to an ordinary person and to a scientist. ____________________________________________________ Scientific Meaning Ordinary Meaning ____________________________________________________ X is the only caus e of Y. X is only one of a number of possible causes of Y. X must always lead to Y (X is a deterministic cause of Y). It is possible to prove that X is a cause of Y. The occurrence of X makes the occurrence of Y more probable (X is a probabilisti c cause of Y). We can never prove that X is a cause of Y. At best, we can infer that X is a cause of Y. 4-25 Definitions and Concepts Independent variables are variables or alternatives that are manipulated and who se effects are measured and compared, e.g., price levels. Test units are individ uals, organizations, or other entities whose response to the independent variabl es or treatments is being examined, e.g., consumers or stores. Dependent variabl es are the variables which measure the effect of the independent variables on th e test units, e.g., sales, profits, and market shares. Extraneous variables are all variables other than the independent variables that affect the response of t he test units, e.g., store size, store location, and competitive effort. 4-26 Experimental Design An experimental design is a set of procedures specifying the test units and how these units are to be divided into homogeneous subsamples , what independent variables or treatments are to be manipulated, what dependent variables are to be measured, and how the extraneous variables are to be contro lled. 4-27 Validity in Experimentation Internal validity refers to whether the manipulation of the independent variable s or treatments actually caused the observed effects on the dependent variables. Control of extraneous variables is a necessary condition for establishing inter nal validity. External validity refers to whether the cause-and-effect relations hips found in the experiment can be generalized. To what populations, settings, times, independent variables and dependent variables can the results be projecte d? 4-28 Controlling Extraneous Variables Randomization refers to the random assignment of test units to experimental grou ps by using random numbers. Treatment conditions are also randomly assigned to e xperimental groups. Matching involves comparing test units on a set of key backg round variables before assigning them to the treatment conditions. Statistical c ontrol involves measuring the extraneous variables and adjusting for their effec ts through statistical analysis. Design control involves the use of experiments designed to control specific extraneous variables. 4-29 A Classification of Experimental Designs Figure 7.1 Experimental Designs Pre-experimental One-Shot Case Study One Group Pretest-Posttest Static Group True Experimental Pretest-Posttest Control Group Posttest: Only Control Group So lomon FourGroup Quasi Experimental Time Series Multiple Time Series Statistical Randomized Blocks Latin Square Factorial Design 4-30 Factorial Design Is used to measure the effects of two or more independent variables at various l evels. A factorial design may also be conceptualized as a table. In a two-factor design, each level of one variable represents a row and each level of another v ariable represents a column. 4-31 Selecting a Test-Marketing Strategy Competition Socio-Cultural Environment Very +ve New Product Development Other Fa ctors Research on Existing Products Research on other Elements Very +ve Other Fa ctors Very +ve Other Factors Simulated Test Marketing Controlled Test Marketing Standard Test Marketing National Introduction Overall Marketing Strategy -ve -ve -ve -ve Stop and Reevaluate Need for Secrecy 4-32 Criteria for the Selection of Test Markets Test Markets should have the following qualities: 1) Be large enough to produce meaningful projections. They should contain at lea st 2% of the potential actual population. 2) Be representative demographically. 3) Be representative with respect to product consumption behavior. 4) Be represe ntative with respect to media usage. 5) Be representative with respect to compet ition. 6) Be relatively isolated in terms of media and physical distribution. 7) Have normal historical development in the product class 8) Have marketing resea rch and auditing services available 9) Not be over-tested 4-33 Measurement and Scaling Measurement means assigning numbers or other symbols to characteristics of objec ts according to certain prespecified rules. One-to-one correspondence between th e numbers and the characteristics being measured. The rules for assigning number s should be standardized and applied uniformly. Rules must not change over objec ts or time. 4-34 Measurement and Scaling Scaling involves creating a continuum upon which measured objects are located. C onsider an attitude scale from 1 to 100. Each respondent is assigned a number fr om 1 to 100, with 1 = Extremely Unfavorable, and 100 = Extremely Favorable. Meas urement is the actual assignment of a number from 1 to 100 to each respondent. S caling is the process of placing the respondents on a continuum with respect to their attitude toward department stores. 4-35 Primary Scales of Measurement Scale Figure 8.1 Nominal Numbers Assigned to Runners 7 8 3 Finish Ordinal Rank Order of Winners Third place Second place 9.1 First place 9.6 Finish Interval Performance Rating on a 0 to 10 Scale Time to Finish, in Seconds 8.2 Ratio 15.2 14.1 13.4 4-36 A Classification of Scaling Techniques Figure 8.2 Scaling Techniques Comparative Scales Noncomparative Scales Paired Comparison Rank Order Constant Sum Q-Sort and Other Procedures Continuous Itemized Rating Scales Rating Scales Likert Semantic Differential Stapel 4-37 A Comparison of Scaling Techniques Comparative scales involve the direct comparison of stimulus objects. Comparativ e scale data must be interpreted in relative terms and have only ordinal or rank order properties. In noncomparative scales, each object is scaled independently of the others in the stimulus set. The resulting data are generally assumed to be interval or ratio scaled. Preference for Toothpaste Brands Using Rank Order Scaling Figure 8.4 cont. 4-38 Form Brand 1. Crest 2. Colgate 3. Aim 4. Gleem 5. Macleans 6. Ultra Brite 7. Close Up 8. Pepsodent 9. Plus White 10. Stripe Rank Order _________ _________ _________ _________ _________ _________ _________ _________ _________ _________ Importance of Bathing Soap Attributes Using a Constant Sum Scale Figure 8.5 cont. 4-39 Form Attribute 1. Mildness 2. Lather 3. Shrinkage 4. Price 5. Fragrance 6. Packaging 7. Moisturizing 8. Cleaning Power Sum Average Responses of Three Segments Segment I 8 2 3 53 9 7 5 13 100 Segment II 2 4 9 17 0 5 3 60 100 Segment III 4 17 7 9 19 9 20 15 100 4-40 Noncomparative Scaling Techniques Respondents evaluate only one object at a time, and for this reason noncomparati ve scales are often referred to as monadic scales. Noncomparative techniques con sist of continuous and itemized rating scales. 4-41 Likert Scale The Likert scale requires the respondents to indicate a degree of agreement or d isagreement with each of a series of statements about the stimulus objects. Strongly disagree Disagree Neither Agree agree nor disagree 3 3 3X 4 4 4 Strongl y agree 1. Sears sells high quality merchandise. 2. Sears has poor in-store service. 3. I like to shop at Sears. 1 1 1 2X 2X 2 5 5 5 The analysis can be conducted on an item-by-item basis (profile analysis), or a total (summated) score can be calculated. When arriving at a total score, the ca tegories assigned to the negative statements by the respondents should be scored by reversing the scale. 4-42 Semantic Differential Scale The semantic differential is a seven-point rating scale with end points associat ed with bipolar labels that have semantic meaning. SEARS IS: Powerful --:--:--:- -:-X-:--:--: Weak Unreliable --:--:--:--:--:-X-:--: Reliable Modern --:--:--:--: --:--:-X-: Old-fashioned The negative adjective or phrase sometimes appears at t he left side of the scale and sometimes at the right. This controls the tendency of some respondents, particularly those with very positive or very negative att itudes, to mark the right- or left-hand sides without reading the labels. Indivi dual items on a semantic differential scale may be scored on either a -3 to +3 o r a 1 to 7 scale. A Semantic Differential Scale for Measuring SelfConcepts, Person Concepts, and P roduct Concepts 1) Rugged 2) Excitable 3) Uncomfortable 4) Dominating 5) Thrifty 6) Pleasant 7) Contemporary 8) Organized 9) Rational 10) Youthful 11) Formal 12) Orthodox 13) C omplex 14) Colorless 15) Modest :---:---:---:---:---:---:---: Delicate :---:---: ---:---:---:---:---: Calm :---:---:---:---:---:---:---: Comfortable :---:---:--- :---:---:---:---: Submissive :---:---:---:---:---:---:---: Indulgent :---:---:-- -:---:---:---:---: Unpleasant :---:---:---:---:---:---:---: Obsolete :---:---:-- -:---:---:---:---: Unorganized :---:---:---:---:---:---:---: Emotional :---:---: ---:---:---:---:---: Mature :---:---:---:---:---:---:---: Informal :---:---:---: ---:---:---:---: Liberal :---:---:---:---:---:---:---: Simple :---:---:---:---:- --:---:---: Colorful :---:---:---:---:---:---:---: Vain 4-43 4-44 Stapel Scale The Stapel scale is a unipolar rating scale with ten categories numbered from -5 to +5, without a neutral point (zero). This scale is usually presented vertical ly. SEARS +5 +4 +3 +2 +1 HIGH QUALITY -1 -2 -3 -4X -5 +5 +4 +3 +2X +1 POOR SERVICE - 1 -2 -3 -4 -5 The data obtained by using a Stapel scale can be analyzed in the same way as sem antic differential data. 4-45 Some Unique Rating Scale Configurations Figure 9.3 Thermometer Scale Instructions: Please indicate how much you like McD onald’s hamburgers by coloring in the thermometer. Start at the bottom and color up to the temperature level that best indicates how strong your preference is. Form: Like very much Dislike very much 100 75 50 25 0 Smiling Face Scale Instructions: Please point to the face that shows how much yo u like the Barbie Doll. If you do not like the Barbie Doll at all, you would point to Face 1. If you liked it very much, you would point to Face 5. Form: 1 2 3 4 5 4-46 Validity Construct validity addresses the question of what construct or characteristic th e scale is, in fact, measuring. Construct validity includes convergent, discrimi nant, and nomological validity. Convergent validity is the extent to which the s cale correlates positively with other measures of the same construct. Discrimina nt validity is the extent to which a measure does not correlate with other const ructs from which it is supposed to differ. Nomological validity is the extent to which the scale correlates in theoretically predicted ways with measures of dif ferent but related constructs. 4-47 Questionnaire Definition A questionnaire is a formalized set of questions for obtaining information from respondents. 4-48 Questionnaire Design Process Fig. 10.1 Specify the Information Needed Specify the Type of Interviewing Method Determine the Content of Individual Questions Design the Question to Overcome the Respond ent’s Inability and Unwillingness to Answer Decide the Question Structure Determin e the Question Wording Arrange the Questions in Proper Order Identify the Form a nd Layout Reproduce the Questionnaire Eliminate Bugs by Pre-testing Choosing Question Structure Unstructured Questions 4-49 Unstructured questions are open-ended questions that respondents answer in their own words. Do you intend to buy a new car within the next six months? _________ _________________________ Choosing Question Structure Structured Questions 4-50 Structured questions specify the set of response alternatives and the response f ormat. A structured question may be multiple-choice, dichotomous, or a scale. Choosing Question Structure Multiple-Choice Questions 4-51 In multiple-choice questions, the researcher provides a choice of answers and re spondents are asked to select one or more of the alternatives given. Do you inte nd to buy a new car within the next six months? ____ Definitely will not buy ___ _ Probably will not buy ____ Undecided ____ Probably will buy ____ Definitely wi ll buy ____ Other (please specify) Choosing Question Structure Dichotomous Questions 4-52 A dichotomous question has only two response alternatives: yes or no, agree or d isagree, and so on. Often, the two alternatives of interest are supplemented by a neutral alternative, such as “no opinion,” “don t know,” “both,” or “none.” Do you intend uy a new car within the next six months? _____ Yes _____ No _____ Don t know Choosing Question Wording Use Ordinary Words 4-53 “Do you think the distribution of soft drinks is adequate?” (Incorrect) “Do you think soft drinks are readily available when you want to buy them?” (Correct) Choosing Question Wording Use Unambiguous Words In a typical month, how often do you shop in department stores? _____ Never ____ _ Occasionally _____ Sometimes _____ Often _____ Regularly (Incorrect) In a typi cal month, how often do you shop in department stores? _____ Less than once ____ _ 1 or 2 times _____ 3 or 4 times _____ More than 4 times (Correct) 4-54 4-55 Flow Chart for Questionnaire Design Fig. 10.2 Introduction Ownership of Store, Bank, and Other Charge Cards Purchased Products in a Specific Department Store during the Last Two Months Yes No Ever Purchased in a Department Store? Yes How was Payment made? Credit Cash Other No Store Charge Card Bank Charge Card Other Charge Card Intentions to Use Store, Bank, and other Charge Cards 4-56 Pretesting Pretesting refers to the testing of the questionnaire on a small sample of respo ndents to identify and eliminate potential problems. A questionnaire should not be used in the field survey without adequate pretesting. All aspects of the ques tionnaire should be tested, including question content, wording, sequence, form and layout, question difficulty, and instructions. The respondents for the prete st and for the actual survey should be drawn from the same population. Pretests are best done by personal interviews, even if the actual survey is to be conduct ed by mail, telephone, or electronic means, because interviewers can observe res pondents reactions and attitudes. 4-57 Observational Forms Department Store Project Who: Purchasers, browsers, males, females, parents with children, or children alone. What: Products/brands considered, products/brands purchased, size, price of package inspected, or influence of children or other f amily members. When: Day, hour, date of observation. Where: Inside the store, ch eckout counter, or type of department within the store. Why: Influence of price, brand name, package size, promotion, or family members on the purchase. Way: Pe rsonal observer disguised as sales clerk, undisguised personal observer, hidden camera, or obtrusive mechanical device. 4-58 Questionnaire Design Checklist Table 10.1 Step 1. Specify The Information Needed Step 2. Type of Interviewing Method Step 3. Individual Question Content Step 4. Overcome Inability and Unwillingness to A nswer Step 5. Choose Question Structure Step 6. Choose Question Wording Step 7. Determine the Order of Questions Step 8. Form and Layout Step 9. Reproduce the Q uestionnaire Step 10. Pretest 4-59 Sample vs. Census Table 11.1 Conditions Favoring the Use of Sample Census Small Short Large Small Low High De structive Yes Large Long Small Large High Low Nondestructive No Type of Study 1. Budget 2. Time available 3. Population size 4. Variance in the characteristic 5 . Cost of sampling errors 6. Cost of nonsampling errors 7. Nature of measurement 8. Attention to individual cases 4-60 The Sampling Design Process Fig. 11.1 Define the Population Determine the Sampling Frame Select Sampling Technique(s) Determine the Sample Size Execute the Sampling Process 4-61 Define the Target Population The target population is the collection of elements or objects that possess the information sought by the researcher and about which inferences are to be made. The target population should be defined in terms of elements, sampling units, ex tent, and time. An element is the object about which or from which the informati on is desired, e.g., the respondent. A sampling unit is an element, or a unit co ntaining the element, that is available for selection at some stage of the sampl ing process. Extent refers to the geographical boundaries. Time is the time peri od under consideration. Sample Sizes Used in Marketing Research Studies Table 11.2 Type of Study Problem identification research (e.g. market potential) Problem-so lving research (e.g. pricing) Product tests Test marketing studies TV, radio, or print advertising (per commercial or ad tested) Test-market audits Focus groups Minimum Size Typical Range 500 200 200 200 150 10 stores 2 groups 1,000-2,500 3 00-500 300-500 300-500 200-300 10-20 stores 4-12 groups 4-62 4-63 Classification of Sampling Techniques Fig. 11.2 Sampling Techniques Probability Sampling Techniques Nonprobability Sampling Techniques Convenience Sampling Judgmental Sampling Quota Sampling Snowball Sampling Simple Random Sampling Systematic Sampling Stratified Sampling Cluster Sampling Other Sampling Techniques 4-64 Data Preparation Process Fig. 14.1 Prepare Preliminary Plan of Data Analysis Check Questionnaire Edit Cod e Transcribe Clean Data Statistically Adjust the Data Select Data Analysis Strat egy 4-65 Selecting a Data Analysis Strategy Fig. 14.5 Earlier Steps (1, 2, & 3) of the Marketing Research Process Known Char acteristics of the Data Properties of Statistical Techniques Background and Phil osophy of the Researcher Data Analysis Strategy 4-66 A Classification of Univariate Techniques Fig. 14.6 Univariate Techniques Metric Data One Sample * t test * Z test Two or More Samples Non-numeric Data One Sample * Frequency * Chi-Square * K-S * Runs * Binomial Ind ependent * Chi-Square * Mann-Whitney * Median * K-S * K-W ANOVA Two or More Samp les Independent * TwoGroup test * Z test * One-Way ANOVA Related * Paired t test Related * Sign * Wilcoxon * McNemar * Chi-Square 4-67 A Classification of Multivariate Techniques Fig. 14.7 Multivariate Techniques Interdependence Technique Variable Interdependence * Factor Analysis Interobject Similarity * Cluster Analysis * Multidimensional Scaling Dependence Technique One Dependent Variable * CrossTabulation * Analysis of Vari ance and Covariance * Multiple Regression * Conjoint Analysis More Than One Depe ndent Variable * Multivariate Analysis of Variance and Covariance * Canonical Co rrelation * Multiple Discriminant Analysis 4-68 Frequency Distribution In a frequency distribution, one variable is considered at a time. A frequency d istribution for a variable produces a table of frequency counts, percentages, an d cumulative percentages for all the values associated with that variable. Statistics Associated with Frequency Distribution Measures of Location The mean, or average value, is the most commonly used measure of central tendenc y. The mean, X ,is given by X = Σ X i /n i=1 n 4-69 Where, Xi = Observed values of the variable X n = Number of observations (sample size) The mode is the value that occurs most frequently. It represents the high est peak of the distribution. The mode is a good measure of location when the va riable is inherently categorical or has otherwise been grouped into categories.
tatistics Associated with Frequency Distribution Measures of Location The median of a sample is the middle value when the data are arranged in ascendi ng or descending order. If the number of data points is even, the median is usua lly estimated as the midpoint between the two middle values – by adding the two mi ddle values and dividing their sum by 2. The median is the 50th percentile. 4-70
tatistics Associated with Frequency Distribution Measures of Variability The range measures the spread of the data. It is simply the difference between t he largest and smallest values in the sample. Range = Xlargest – The interquartile range is the difference between the 75th and 25th percentile. For a set of data points arranged in order of magnitude, the pth percentile is the value that has p% of the data points below it and (100 - p)% above it. 4-71 Xsmallest.
tatistics Associated with Frequency Distribution Measures of Variability The variance is the mean squared deviation from the mean. The variance can never be negative. The standard deviation is the square root of the variance. n (Xi - X)2 sx = i =1 n - 1 4-72 Σ The coefficient of variation is the ratio of the standard deviation to the mean expressed as a percentage, and is a unitless measure of relative variability. CV = s x/X
tatistics Associated with Frequency Distribution Measures of hape kewness. The tendency of the deviations from the mean to be larger in one direc tion than in the other. It can be thought of as the tendency for one tail of the distribution to be heavier than the other. Kurtosis is a measure of the relativ e peakedness or flatness of the curve defined by the frequency distribution. The kurtosis of a normal distribution is zero. If the kurtosis is positive, then th e distribution is more peaked than a normal distribution. A negative value means that the distribution is flatter than a normal distribution. 4-73 4-74
kewness of a Distribution Figure 15.2
ymmetric Distribution
kewed Distribution Mean Median Mode (a) Mean Median Mode (b) 4-75
teps Involved in Hypothesis Testing
Fig. 15.3 Formulate H0 and H1 elect Appropriate Test Choose Level of ignifican ce Collect Data and Calculate Test tatistic
Determine Probability Associated with Test tatistic Compare with Level of igni ficance, α Determine Critic l V lue of Test St tistic TSCR Determine if TSCR f lls into (No n) Rejection Region Reject or Do not Reject H0 Dr w M rketing Rese rch Conclusion 4-76 A Bro d Cl ssific tion of Hypothesis Tests Figure 15.6 Hypothesis Tests Tests of Associ tion Tests of Differences Distributions Me ns Proportions Medi n/ R nkings 4-77 Cross-T bul tion While frequency distribution describes one v ri ble t time, cross-t bul t ion describes two or more v ri bles simult neously. Cross-t bul tion results in t bles th t reflect the joint distribution of two or more v ri bles with limit ed number of c tegories or distinct v lues, e.g., T ble 15.3. 4-78 Gender nd Internet Us ge T ble 15.3 Gender Internet Us ge Light (1) He vy (2) Column Tot l M le 5 10 15 Fem le 10 5 15 Row Tot l 15 15 4-79 Internet Us ge by Gender T ble 15.4 Gender Internet Us ge Light He vy Column tot l M le 33.3% 66.7% 100% Fem le 66.7 % 33.3% 100% 4-80 Gender by Internet Us ge T ble 15.5 Internet Us ge Gender M le Fem le Light 33.3% 66.7% He vy 66.7% 33.3% Tot l 100. 0% 100.0% Introduction of Third V ri ble in CrossT bul tion Fig. 15.7 Origin l Two V ri bles 4-81 Some Associ tion between the Two V ri bles Introduce Third V ri ble No Associ tion between the Two V ri bles Introduce Third V ri ble Refined Associ tion between the Two V ri bles No Associ tion between the Two V ri bles No Ch nge in the Initi l P ttern Some Associ tion between the Two V ri bles 4-82 Purch se of F shion Clothing by M rit l St tus T ble 15.6 Purch se of F shion Clothing High Low Column Number of respondents Current M rit l St tus M rried 31% 69% 100% 700 Unm rried 52% 48% 100% 300 4-83 Purch se of F shion Clothing by M rit l St tus T ble 15.7 Pur ch se of F shion Clothing High Low Column tot ls Number of c ses M rr ied 35 % 65% 100% 400 Sex M le Not M r r ied 40% 60% 100% 120 M r r ied 25% 75% 100% 30 0 Fem le Not M r r ied 60% 40% 100% 180 E ting Frequently in F st-Food Rest ur nts by F mily Size T ble 15.12 4-84 E t Frequently in F stFood Rest ur nts Sm ll Yes No Column tot ls Number of c se s 65% 35% 100% 500 F mily Size L rge 65% 35% 100% 500 E ting Frequently in F st Food-Rest ur nts by F mily Size & Income T ble 15.13 4-85 Income E t Frequently in F stFood Rest ur nts Low High F mily size Sm ll L rge 6 5% 65% 35% 35% 100% 100% 250 250 F mily size Sm ll L rge Yes 65% 65% No 35% 35% Column tot ls 100% 100% Number of respondents 250 250 4-86 Chi-squ re Distribution Figure 15.8 Do Not Reject H0 Reject H0 Critic l V lue χ2 Statisti s Asso iated with Cross-Tabulation Chi-Square 4-87 The hi-square statisti ( χ 2 ) is used to test the statisti al signifi an e of t he observed asso iation in a ross-tabulation. The expe ted frequen y for ea h ell an be al ulated by using a simple formula: nrn fe = n where nr n n = total number in the row = total number in the olumn = total sample size Statisti s Asso iated with Cross-Tabulation Chi-Square 4-88 For the data in Table 15.3, the expe ted frequen ies for the ells going from le ft to right and from top to bottom, are: 15 X 15 = 7.50 30 15 X 15 = 7.50 30 15 X 15 = 7.50 30 15 X 15 = 7.50 30 Then the value of χ 2 is al ulated as follows: χ2 = Σ all cells (f o - f e) 2 fe
tatistics Associated with Cross-Tabulation Chi- quare χ 2 is For the data in Table 15.3, the value of al ulated as: = (5 -7.5)2 + (10 - 7.5)2 + (10 - 7.5)2 + (5 - 7.5)2 7.5 7.5 7.5 7.5 =0.833 + 0.833 + 0.833+ 0.833 = 3.333 4-89 Statisti s Asso iated with Cross-Tabulation Lambda Coeffi ient 4-90 Asymmetri lambda measures the per entage improvement in predi ting the value of the dependent variable, given the value of the independent variable. Lambda als o varies between 0 and 1. A value of 0 means no improvement in predi tion. A val ue of 1 indi ates that the predi tion an be made without error. This happens wh en ea h independent variable ategory is asso iated with a single ategory of th e dependent variable. Asymmetri lambda is omputed for ea h of the variables (t reating it as the dependent variable). A symmetri lambda is also omputed, whi h is a kind of average of the two asymmetri values. The symmetri lambda does n ot make an assumption about whi h variable is dependent. It measures the overall improvement when predi tion is done in both dire tions. A Classifi ation of Hypothesis Testing Pro edures for Examining Differen es Fig. 15.9 Hypothesis Tests 4-91 Parametri Tests (Metri Tests) One Sample * t test * Z test Two or More Samples Non-parametri Tests (Nonmetri Tests) One Sample * * * * Chi-Square K-S Runs Bi nomial Two or More Samples Independent Samples * Two-Group t test * Z test Paired Samples * Paired t test Independent Samples * Chi-Square * Mann-Whitney * Median * K-S * * * * Paired Samples Sign Wil oxon M Nemar Chi-Square 4-92 Non-Parametri Tests Nonparametri tests are used when the independent variables are nonmetri . Like parametri tests, nonparametri tests are available for testing variables from o ne sample, two independent samples, or two related samples. Non-Parametri Tests One Sample Sometimes the resear her wants to test whether the observations for a parti ular variable ould reasonably have ome from a parti ular distribution, su h as the normal, uniform, or Poisson distribution. 4-93 The Kolmogorov-Smirnov (K-S) one-sample test is one su h goodness-of-fit test. T he K-S ompares the umulative distribution fun tion for a variable with a spe i fied distribution. Ai denotes the umulative relative frequen y for ea h ategor y of the theoreti al (assumed) distribution, and Oi the omparable value of the sample frequen y. The K-S test is based on the maximum value of the absolute dif feren e between Ai and Oi. The test statisti is K = Max A i - Oi Non-Parametri Tests One Sample 4-94 The hi-square test an also be performed on a single variable from one sample. In this ontext, the hi-square serves as a goodness-of-fit test. The runs test is a test of randomness for the di hotomous variables. This test is ondu ted by determining whether the order or sequen e in whi h observations are obtained is random. The binomial test is also a goodness-of-fit test for di hotomous variab les. It tests the goodness of fit of the observed number of observations in ea h ategory to the number expe ted under a spe ified binomial distribution. Non-Parametri Tests 4-95 Two Independent Samples When the differen e in the lo ation of two populations is to be ompared based o n observations from two independent samples, and the variable is measured on an ordinal s ale, the Mann-Whitney U test an be used. In the Mann-Whitney U test, the two samples are ombined and the ases are ranked in order of in reasing siz e. The test statisti , U, is omputed as the number of times a s ore from sample or group 1 pre edes a s ore from group 2. If the samples are from the same popu lation, the distribution of s ores from the two groups in the rank list should b e random. An extreme value of U would indi ate a nonrandom pattern, pointing to the inequality of the two groups. For samples of less than 30, the exa t signifi an e level for U is omputed. For larger samples, U is transformed into a norma lly distributed z statisti . This z an be orre ted for ties within ranks. 4-96 SPSS Windows The main program in SPSS is FREQUENCIES. It produ es a table of frequen y ounts , per entages, and umulative per entages for the values of ea h variable. It gi ves all of the asso iated statisti s. If the data are interval s aled and only t he summary statisti s are desired, the DESCRIPTIVES pro edure an be used. The E XPLORE pro edure produ es summary statisti s and graphi al displays, either for all of the ases or separately for groups of ases. Mean, median, varian e, stan dard deviation, minimum, maximum, and range are some of the statisti s that an be al ulated. 4-97 SPSS Windows To sele t these pro edures li k: Analyze>Des riptive Statisti s>Frequen ies Ana lyze>Des riptive Statisti s>Des riptives Analyze>Des riptive Statisti s>Explore The major ross-tabulation program is CROSSTABS. This program will display the ross- lassifi ation tables and provide ell ounts, row and olumn per entages, the hi-square test for signifi an e, and all the measures of the strength of th e asso iation that have been dis ussed. To sele t these pro edures li k: Analyz e>Des riptive Statisti s>Crosstabs 4-98 SPSS Windows The major program for ondu ting parametri tests in SPSS is COMPARE MEANS. This program an be used to ondu t t tests on one sample or independent or paired s amples. To sele t these pro edures using SPSS for Windows li k: Analyze>Compare Means>Means … Analyze>Compare Means>One-Sample T Test … Analyze>Comp are Means>IndependentSamples T Test … Analyze>Compare Means>Paired-Samples T Test … 4-99 SPSS Windows The nonparametri tests dis ussed in this hapter an be ondu ted using NONPARA METRIC TESTS. To sele t these pro edures using SPSS for Windows li k: Analyze>Nonparametri Tests>Chi-Square … Analyze>Nonparametri Tests>Binomial … Anal yze>Nonparametri Tests>Runs … Analyze>Nonparametri Tests>1-Sample K-S … Analyze>No nparametri Tests>2 Independent Samples … Analyze>Nonparametri Tests>2 Related Sa mples … 4-100 Produ t Moment Correlation The produ t moment orrelation, r, summarizes the strength of asso iation betwee n two metri (interval or ratio s aled) variables, say X and Y. It is an index u sed to determine whether a linear or straight-line relationship exists between X and Y. As it was originally proposed by Karl Pearson, it is also known as the P earson orrelation oeffi ient. It is also referred to as simple orrelation, bi variate orrelation, or merely the orrelation oeffi ient. 4-101 Produ t Moment Correlation r varies between -1.0 and +1.0. The orrelation oeffi ient between two variables will be the same regardless of their underlying units of measurement. Statisti s Asso iated with Bivariate Regression Analysis Regression oeffi ient. The estimated parameter b is usually referred to as the nonstandardized regression oeffi ient. S attergram. A s atter diagram, or s att ergram, is a plot of the values of two variables for all the ases or observatio ns. 4-102 Standard error of estimate. This statisti , SEE, is the standard deviation of th e a tual Y values from the predi ted Y values. Standard error. The standard devi ation of b, SEb, is alled the standard error. Statisti s Asso iated with Bivariate Regression Analysis Standardized regression oeffi ient. Also termed the beta oeffi ient or beta we ight, this is the slope obtained by the regression of Y on X when the data are s tandardized. Sum of squared errors. The distan es of all the points from the reg ression line are squared and added together to arrive at the sum of squared erro rs, whi h is a measure of total error, Σe 2 j . freedom can be used to test the nu ll hypothesis that no linear relationship exists between X and Y, or H0: β 1 = 0, where t = SE 4-103 t statistic. A t statistic with n - 2 degrees of Conducting Bivariate Regression Analysis Plot the Scatter Diagram A scatter diagram,or scattergram, is a plot of the values of two varia les for all the cases or o servations. The most commonly used technique for fitting a st raight line to a scattergram is the least-squares procedure. In fitting the line , the least-squares procedure minimizes the sum of squared errors, Σe 2 j . 4-104 4-105 Conducting Bivariate Regression Analysis Fig. 17.2 Plot the catter Diagram Formulate the General Model Estimate the Para meters Estimate tandardized Regression Coefficients Test for ignificance Deter mine the trength and ignificance of Association Check Prediction Accuracy Exam ine the Residuals Cross-Validate the Model 4-106 Multiple Regression The general form of the multiple regression model is as follows: Y = β 0 + β 1 X1 + β 2 X2 + β 3 X3+ . . . + β k Xk + e which is estimated y the following equation:
Y =a + 1X1 + 2X2 + 3X3+ . . . + kXk As efore, the coefficient a represents the intercept, ut the 's are now the p artial regression coefficients. 4-107 Multicollinearity Multicollinearity arises when intercorrelationsamong the predictors are very hi gh. Multicollinearity can result in several pro lems, including: The partial reg ressioncoefficients may not e estimated precisely. The standard errors are lik ely to e high. The magnitudes as well as the signs of the partial regression co efficients may change from sample to sample.It ecomes difficult to assess the relative importanceof the independent varia lesin explaining the variation in the dependent varia le. Predictor varia les may e incorrectly included or remov ed in stepwise regression. 4-108 SPSS Windows The CORRELATE program computes Pearson product moment correlations and partial c orrelations with significance levels. Univariate statistics, covariance, and cro ss-product deviations may also e requested. Significance levels are included in the output. To select these procedures using SPSS for Windows click: Analyze>Co
rrelate>Bivariate … Analyze>Correlate>Partial … Scatterplots can e o tained y clic king: Graphs>Scatter …>Simple>Define REGRESSION calculates ivariate and multiple regression equations, associated statistics, and plots. It allows for an easy ex amination of residuals. This procedure can e run y clicking: Analyze>Regressio n Linear …
Similarities and Differences etween ANOVA, Regression, and Discriminant Analysi s Ta le 18.1 ANOVA Similarities Num er of dependent varia les Num er of independent varia les
Differences Nature of the dependent varia les Nature of the independent varia l es One REGRESSION One DISCRIMINANT ANALYSIS One 4-109 Multiple Multiple Multiple Metric Categorical Metric Metric Categorical Metric 4-110 Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or de pendent varia le is categorical and the predictor or independent varia les are i nterval in nature. The o jectives of discriminant analysis are as follows: Devel opment of discriminant functions, or linear com inations of the predictor or ind ependent varia les, which will est discriminate etween the categories of the c riterion or dependent varia le (groups). Examination of whethersignificant diff erences exist among the groups, in terms of the predictor varia les. Determinati on of which predictor varia les contri ute to most of the intergroup differences . Classification of cases to one of the groups ased on the values of the predic tor varia les. Evaluation of the accuracy of classification. 4-111 Statistics Associated with Discriminant Analysis Canonical correlation. Canonical correlation measures the extent of association etween the discriminant scores and the groups. It is a measure ofassociation etween the single discriminant function and the set of dummy varia les that defi ne the group mem ership. Centroid. The centroid is the mean values for the discr iminant scores for a particular group. There are as many centroids as there are groups, as there is one for each group. The means for a group on all the functio ns are the group centroids. Classification matrix. Sometimes also called confusi on or prediction matrix, the classification matrix contains the num er of correc tly classified and misclassified cases. 4-112 Statistics Associated with Discriminant Analysis Discriminant function coefficients. The discriminant function coefficients (unst andardized) are the multipliers of varia les, when the varia les are in the orig inal units of measurement. Discriminant scores. The unstandardized coefficients are multiplied y the valuesof the varia les. These products are summed and add ed to the constant term to o tain the discriminant scores. Eigenvalue. For each discriminant function, the Eigenvalue is the ratio of etween-group to withingro up sums of squares. Large Eigenvalues imply superior functions. 4-113 Conducting Discriminant Analysis Fig. 18.1 Formulate the Pro lem Estimate the Discriminant Function Coefficients Determine the Significance of the Discriminant Function Interpret the Results Assess Validity of Discriminant Analysis 4-114 SPSS Windows The DISCRIMINANT program performs oth twogroup and multiple discriminant analys is. To select this procedure using SPSS for Windows click: Analyze>Classify>Disc riminant … 4-115 Factor Analysis Factor analysis is a general name denoting a class of procedures primarily used for data reduction and summarization. Factor analysis is an interdependence tech nique in that an entireset of interdependent relationships isexamined without making the distinction etween dependent and independent varia les. Factor analy sis is used in the following circumstances: To identify underlying dimensions, o r factors, that explain the correlationsamong a set of varia les. To identify a new, smaller, set of uncorrelated varia les to replace the original set of corr elated varia les in su sequent multivariate analysis (regression or discriminant analysis). To identify a smaller set of salient varia les from a larger set for use in su sequent multivariate analysis. 4-116 Factor Analysis Model It is possi le to select weights or factor score coefficients so that the first factor explains the largest portion of the total variance. Then a second set of weights can e selected, so that the second factor accounts for most of the resi dual variance,su ject to eing uncorrelated with the first factor. This same pr inciple could e applied to selecting additional weights for the additional fact ors. 4-117 Conducting Factor Analysis Fig 19.1 Pro lem formulation Construction of the Correlation Matrix Method of Fa ctor Analysis Determination of Num er of Factors Rotation of Factors Interpretat ion of Factors Calculation of Factor Scores Determination of Model Fit Selection of Surrogate Varia les Conducting Factor Analysis 4-118
Determine the Num er of Factors A Priori Determination. Sometimes, ecause of prior knowledge, the researcherknows how many factors to expect and thus can sp ecify the num er of factors to e extracted eforehand. Determination Based on Eigenvalues. In this approach, only factors with Eigenval ues greater than 1.0 are retained. An Eigenvalue represents the amount of varian ce associated with the factor. Hence, only factors with a variance greater than 1.0 areincluded. Factors with variance less than 1.0are no etter than a singl e varia le, since, dueto standardization, each varia le has a variance of 1.0. If the num er of varia les is less than 20, this approach will result in a conse rvative num er of factors. 4-119 SPSS Windows To select this procedures using SPSS for Windows click: Analyze>Data Reduction>F actor … 4-120 Cluster Analysis Cluster analysis is a class of techniques used toclassify o jects or cases into relatively homogeneous groups called clusters. O jects in each cluster tend to e similar to each other and dissimilar to o jects in the other clusters. Cluste r analysis is also called classification analysis, or numerical taxonomy. Both c luster analysis and discriminant analysis are concerned with classification. How ever, discriminantanalysis requires prior knowledge of the cluster or group mem ership for each o ject or case included, to develop the classification rule. In contrast, in cluster analysis there is no a priori information a out the group or cluster mem ership for any of the o jects. Groups or clusters are suggested y the data, not defined a priori. 4-121 An Ideal Clustering Situation Fig. 20.1
Varia le 1
Varia le 2 4-122 Conducting Cluster Analysis Fig. 20.3 Formulate the Pro lem Select a Distance Measure Select a Clustering Pr ocedure Decide on the Num er of Clusters Interpret and Profile Clusters Assess t he Validity of Clustering 4-123 A Classification of Clustering Procedures Fig. 20.4 Hierarchical Agglomerative Divisive Sequential Threshold Linkage Metho ds Variance Methods Ward’s Method Single Complete Average Parallel Threshold Centr oid Methods Optimizing Partitioning Clustering Pro edures Nonhierar hi al Condu ting Cluster Analysis 4-124 Sele t a Clustering Pro edure – Hierar hi al Hierar hi al lustering is hara terized by the development of a hierar hy or tr ee-like stru ture. Hierar hi al methods an be agglomerative or divisive. Agglom erative lustering starts with ea h obje t in a separate luster. Clusters are f ormed by grouping obje ts into bigger and bigger lusters. This pro ess is onti nued until all obje ts are members of a single luster. Divisive lustering star ts with all the obje ts grouped in a single luster. Clusters are divided or spl it until ea h obje t is in a separate luster. Agglomerative methods are ommonl y used in marketing resear h. They onsist of linkage methods, error sums of squ ares or varian e methods, and entroid methods. Condu ting Cluster Analysis 4-125 Sele t a Clustering Pro edure – Linkage Method The single linkage method is based on minimum distan e, or the nearest neighbor rule. At every stage, the distan e between two lusters is the distan e between their two losest points (see Figure 20.5). The omplete linkage method is simil ar to single linkage, ex ept that it is based on the maximum distan e or the fur thest neighbor approa h. In omplete linkage, the distan e between two lusters is al ulated as the distan e between their two furthest points. The average lin kage method works similarly. However, in this method, the distan e between two lusters is defined as the average of the distan es between all pairs of obje ts, where one member of the pair is from ea h of the lusters (Figure 20.5). 4-126 Linkage Methods of Clustering Fig. 20.5 Single Linkage Minimum Distan e Cluster 1 Cluster 2 Complete Linkage Maximum Distan e Cluster 1 Average Linkage Cluster 2 Average Distan e Cluster 1 Cluster 2 4-127 Other Agglomerative Clustering Methods Fig. 20.6 Ward’s Pro edure Centroid Method 4-128 SPSS Windows To sele t this pro edures using SPSS for Windows li k: Analyze>Classify>Hierar hi al Cluster … Analyze>Classify>K-Means Cluster …