123 views

Uploaded by meggie123

Data Analyses materials

- lec37
- free-pdf-ebook.com-spss-tutorial
- Practice 9 10
- Unit II - Parametric & Non-parametric Tests
- Lean Six Sigma Black Belt Certification Program
- Econometrics Chapter 7 PPT slides
- 111401_QMM Course Outline
- Statistical Hypothesis Test
- An Investigation into the Issues of Work-Life Balance of Women Entrepreneurs in Bangladesh
- 2011-12 Student Coursework Guide
- Basic Statistical Tools
- Unit 1 Big Data Analytics – an Introduction(Final)
- lecture7hypothesistestingtwosample-091020164256-phpapp02
- pgsrm
- PERSONALITY TRAITS OF NARIKURAVAR STUDENTS – AN ANALYSIS
- Black Market
- Lecture 12
- Ent. 110 (50 Bit) (E)
- Mechanical Properties of Green Coconut Fiber Reinforced Hdpe Polymer Composite
- Final Examination 2015

You are on page 1of 27

Differences

By Hair, J.F., Bush, R.P., Ortinau, D.J.

Edited by Paul Ducham

SAMPLE GROUPS

To split the sample into two groups so you can compare them, you can use the options under the

Data pull-down menu. For example, to compare the customers of the Santa Fe Grill and the

customers of Joses Southwestern Caf, the click-through sequence is: DATA SPLIT FILE

Click on Compare Groups. Now highlight the fourth screening question (Favorite Mexican

restaurantx_s4) and move it into the Groups Based on: window, and then click OK. Your

results will now be computed for each restaurant separately. The same procedure can be used

with any variable. To do so, you insert the variable of choice into the Groups Based on:

window, and then click OK. A word of caution, however, is that until you remove this instruction

all data analysis will be based on separate groups as defined by the Groups Based on: window.

SMALLER SUBSET

Sometimes you may wish to select a smaller subset of your total sample to analyze. This can be

done using the Select Cases option under the Data pull-down menu. For example, to select

customers from only the Santa Fe Grill, the click-through sequence is DATA SELECT

CASESS IF CONDITION IS SATISFIED IF. Next, highlight x_s4 Favorite Mexican

restaurant and move it into the window; click the = sign and then 1. This instructs the SPSS

software to select only questionnaires coded 1 in the x_s4 column (the fourth screening question

on the survey), which is the Santa Fe Grill. If you wanted to analyze only the Joses

Southwestern Caf respondents, then you would do the same except after the = sign, put a 0.

MEAN

The mean is the average value within the distribution and is the most commonly used measure of

central tendency. The mean tells us, for example, the average number of cups of coffee the

typical student may drink during finals to stay awake. The mean can be calculated when the data

scale is either interval or ratio. Generally, the data will show some degree of central tendency,

with most of the responses distributed close to the mean.

The mean is a very robust measure of central tendency. It is fairly insensitive to data values being

added or deleted. The mean can be subject to distortion, however, if extreme values are included

in the distribution. For example, suppose you ask four students how many cups of coffee they

1

drink in a single day. Respondent answers are as follows: Respondent A = 1 cup; Respondent B =

10 cups; Respondent C = 5 cups; and Respondent D = 6 cups. Lets also assume that we know

that respondents A and B are males and respondents C and D are females and we want to

compare consumption of coffee between males and females. Looking at the males first

(Respondents A and B), we calculate the mean number of cups to be 5.5 (1 + 10 = 11/2 = 5.5).

Similarly, looking at the females next (Respondents C and D), we calculate the mean number of

cups to be 5.5 (5 + 6 = 11/2 = 5.5). If we look only at the mean number of cups of coffee

consumed by males and females, we would conclude there are no differences in the two groups.

If we consider the underlying distribution, however, we must conclude there are some

differences and the mean in fact distorts our understanding of coffee consumption patterns

among males and females.

MODE

The mode is the value that appears in the distribution most often. For example, the average

number of cups of coffee students drink per day during finals may be 5 (the mean), while the

number of cups of coffee that most students drink is only 3 (the mode). The mode is the value

that represents the highest peak in the distributions graph. The mode is especially useful as a

measure for data that have been somehow grouped into categories. The mode of the data

distribution in Exhibit 15.2 is Occasionally because when you look in the Frequency column you

will see the largest number of responses is 111 for the Occasionally label, which has a value of

3.

;

MEDIAN

The median is the middle value of the distribution when the distribution is ordered in either an

ascending or a descending sequence. For example, if you interviewed a sample of students to

determine their coffee-drinking patterns during finals, you might find that the median number of

cups of coffee consumed is 4. The number of cups of coffee consumed above and below this

number would be the same (the median number is the exact middle of the distribution). If the

number of data observations is even, the median is generally considered to be the average of the

two middle values. If there are an odd number of observations, the median is the middle value.

3

The median is especially useful as a measure of central tendency for ordinal data and for data

that is skewed to either the right or left. For example, income data is skewed to the right because

there is no upper limit on income.

Each measure of central tendency describes a distribution in its own manner, and each measure

has its own strengths and weaknesses. For nominal data, the mode is the best measure. For

ordinal data, the median is generally best. For interval or ratio data, the mean is generally used. If

there are extreme values within the interval or ratio data, however, the mean can be distorted. In

those cases, the median and the mode should be considered. SPSS and other statistical software

packages are designed to perform such types of analysis.

MEASURES OF CENTRAL TENDENCY

The Santa Fe Grill database can be used with the SPSS software to calculate measures of central

tendency. The SPSS click-through sequence is ANALYZE DESCRIPTIVE STATISTICS

FREQUENCIES. Lets use X25Frequency of Eating as a variable to examine. Click on X25 to

highlight it and then on the arrow box for the Variables window to use in your analysis. Next

open the Statistics box and click on Mean, Median, and Mode, and then Continue and OK.

Recall that if you want to create charts, open the Charts box. Your choices are Bar, Pie, and

Histograms. For the Format box we will use the defaults, so click on OK to execute the program.

The dialog boxes for this sequence are shown in Exhibit 15.1.

Lets look at the output for the measures of central tendency shown in Exhibit 15.2. In the

Statistics table we see the mean is 3.24, the median is 3.00, the mode is 3. Recall that this

variable is measured on a 5-point scale, with lower numbers indicating lower frequency of

patronage and larger numbers indicating higher frequency. The three measures of central

tendency can all be different within the same distribution, as described above in the coffeedrinking example. But it also is possible that all three measures can be the same. In our example

here the median and the mode are the same, but the mean is different.

RANGE

The range defines the spread of the data. It is the distance between the smallest and largest values

of the variable. Another way to think about it is that the range identifies the endpoints of the

distribution of values. For variable X25Frequency of Eating, the range is the difference

between the response category 5 (largest value) and response category 1 (smallest value); that is,

the range is 4. In this example, since we defined a narrow range of response categories in our

survey, the range doesnt tell us much. However, many questions have a much wider range. For

example, if we asked how often in a month respondents rent DVDs, or how much they would

pay to buy a DVD player that also records songs, the range would be quite informative. In this

case, the respondents, not the researchers, would be defining the range by their answers. For this

reason, the range is more often used to describe the variability of open-ended questions such as

our DVD example. For variable X25Frequency of Eating, the range is calculated as the

distance between the largest and smallest values in the set of responses and equals 4 (5 - 1 = 4).

5

STANDARD DEVIATION

The estimated standard deviation describes the average distance of the distribution values from

the mean. The difference between a particular response and the distribution mean is called a

deviation. Since the mean of a distribution is a measure of central tendency, there should be

about as many values above the mean as there are below it (particularly if the distribution is

symmetrical). Consequently, if we subtracted each value in a distribution from the mean and

added them up, the result would be close to zero (the positive and negative results would cancel

each other out).

The solution to this difficulty is to square the individual deviations before we add them up

(squaring a negative number produces a positive result). To calculate the estimated standard

deviation, we use the formula below.

Once the sum of the squared deviations is determined, it is divided by the number of respondents

minus 1. The number 1 is subtracted from the number of respondents to help produce an

unbiased estimate of the standard deviation. The result of dividing the sum of the squared

deviations is the average squared deviation. To convert the result to the same units of measure as

the mean, we take the square root of the answer. This produces the estimated standard deviation

of the distribution. Sometimes the average squared deviation is also used as a measure of

dispersion for a distribution. The average squared deviation, called the variance, is used in a

number of statistical processes.

Since the estimated standard deviation is the square root of the average squared deviations, it

represents the average distance of the values in a distribution from the mean. If the estimated

standard deviation is large, the responses in a distribution of numbers do not fall very close to the

mean of the distribution. If the estimated standard deviation is small, you know that the

distribution values are close to the mean.

Another way to think about the estimated standard deviation is that its size tells you something

about the level of agreement among the respondents when they answered a particular question.

For example, in the Santa Fe Grill database, respondents were asked to rate the restaurant on the

friendliness and knowledge of its employees (X12 and X19). We will use the SPSS program later

to examine the standard deviations for these questions.

Together with the measures of central tendency, these descriptive statistics can reveal a lot about

the distribution of a set of numbers representing the answers to an item on a questionnaire. Often,

however, marketing researchers are interested in more detailed questions that involve more than

one variable at a time.

MEASURES OF DISPERSION

The Santa Fe Grill database can be used with the SPSS software to calculate measures of

dispersion, just as we did with the measures of central tendency. Note that to calculate the

measures of dispersion we will be using the database with a sample size of 405 so we have

eliminated all respondents with missing data. The SPSS click-through sequence is ANALYZE

DESCRIPTIVE STATISTICS FREQUENCIES. Lets use X222 Satisfaction as a variable

to examine. Click on X22 to highlight it and then on the arrow box to move X22 to the Variables

box. Next open the Statistics box, go to the Dispersion box in the lower-left-hand corner, and

click on Standard deviation, Variance, Range, Minimum and Maximum, and then Continue. If

you would like to create charts, then open the Charts box. Your choices are Bar, Pie, and

Histograms. For the Format box we will use the defaults, so click on OK to execute the program.

Lets look at the output for the measures of dispersion shown in Exhibit 15.3 for variable X22.

First, the highest response on the 7-point scale is 7 (maximum) and the lowest response is 3

(minimum). The range is 4 (7 - 3 = 4), the standard deviation is 1.118, and the variance is 1.251.

Astandard deviation of 1.118 on a 7-point scale tells us the responses are dispersed fairly widely

around the mean of 3.24.

The purpose of inferential statistics is to make a determination about a population on the basis of

a sample from that population. A sample is a subset of the population. For example, if we wanted

to determine the average number of cups of coffee consumed per day by students during finals at

your university, we would not interview all the students. This would be costly, take a long time,

and might be impossible since we may not be able to find them all or some would decline to

participate. Instead, if there are 16,000 students at your university, we may decide that a sample

of 200 females and 200 males is sufficiently large to provide accurate information about the

coffee-drinking habits of all 16,000 students.

You may recall that sample statistics are measures obtained directly from the sample or

calculated from the data in the sample. Apopulation parameter is a variable or some sort of

measured characteristic of the entire population. Sample statistics are useful in making

8

inferences regarding the populations parameters. Generally, the actual population parameters are

unknown since the cost to perform a true census of almost any population is prohibitive.

A frequency distribution displaying the data obtained from the sample is commonly used to

summarize the results of the data collection process. When a frequency distribution displays a

variable in terms of percentages, then this distribution is representing proportions within a

population. For example, a frequency distribution showing that 40 percent of the people

patronize Burger King indicates the percentage of the population that meets the criterion (eating

at Burger King). The proportion may be expressed as a percentage, a decimal value, or a fraction.

UNIVARIATE STATISTICAL TESTS

Marketing researchers often form hypotheses regarding population characteristics based on

sample data. The process typically begins by calculating frequency distributions and averages,

and then moves on to actually test the hypotheses. When the hypothesis testing involves

examining one variable at a time, it is referred to as a univariate statistical test. When the

hypothesis testing involves two variables it is called a bivariate statistical test. We first discuss

univariate statistical tests.

Suppose the owners of the Santa Fe Grill believe customers think their menu prices are very

reasonable. Respondents have answered this question using a 7-point scale where 1 = Strongly

Disagree and 7 = Strongly Agree. The scale is assumed to be an interval scale, and previous

research using this measure has shown the responses to be approximately normally distributed.

A couple of tasks must be completed before answering the question posed above. First, the

hypotheses to be compared (the null and alternative hypotheses) have to be developed. Then the

level of significance for rejecting the null hypothesis and accepting the alternative hypothesis

must be selected. At that point, the researcher can conduct the statistical test and determine the

answer to the research question.

In this example, the owners think the customers consider the prices of food at the Santa Fe Grill

to be very reasonable. The question is measured using a 7-point scale with 7 = Strongly Agree.

The marketing research consultant has indicated that expecting a 7 on a 7-point scale is

unreasonable. Therefore, the owners have defined reasonable prices by saying that perceptions

of the prices at Santa Fe Grill will not be significantly different from 6 = Very Favorable. The

null hypothesis is that the mean of the X166Reasonable Prices will not be significantly

different from 6. Recall that the null hypothesis asserts the status quo: any difference from what

is thought to be true is due to Random Sampling. The alternative hypothesis is: the mean

response to X166 Reasonable Prices will not be 6there is in fact a true difference between

the sample mean and the mean we think it is (6).

Assume also the owners want to be 95 percent certain the mean is not different from 6.

Therefore, the significance level will be set at .05. Using this significance level means that if the

survey of Santa Fe Grill customers is conducted many times, the probability of incorrectly

rejecting the null hypothesis when it is true would happen less than 5 times out of 100 (.05).

HYPOTHESIS TEST

Using the SPSS software, you can test the responses in the Santa Fe Grill database to find the

answer to the research question posed above. Before running this test, however, you must split

the sample into two groups: the customers of the Santa Fe Grill and the customers of Joses

Southwestern Caf. Recall that to do this, the click-through sequence is DATA SPLIT FILE

Click on Compare Groups. Now highlight the fourth screening question (Favorite Mexican

Restaurantx_s4) and move it into the Groups Based on: window, and then click OK. Your

results will now be computed for each restaurant separately.

To complete this test, the click-through sequence is ANALYZE COMPARE MEANS

ONE-SAMPLE T-TEST. When you get to the dialog box, click on X166 Reasonable Prices to

highlight it. Then click on the arrow to move X16 into the Test Variables box. In the box labeled

Test Value, enter the number 6. This is the number you want to compare the respondents

answers against, because your null hypothesis is that the mean of X16 will not be significantly

different from 6. Click on the Options box and enter 95 in the confidence interval box. This is the

same as setting the significance level at .05. Then, click on the Continue button and OK to

execute the program.

The SPSS output is shown in Exhibit 15.4. The top table is labeled One-Sample Statistics and

shows the mean, standard deviation, and standard error for X16Reasonable Prices for the two

restaurants (mean of 4.47 for Santa Fe Grill and standard deviation of 1.384). The One-Sample

Test table below shows the results of the t-test for the null hypothesis that the average response to

X16 is not significantly different from 6 (Test Value = 6). The t-test statistic is 25.613, and the

significance level is .000. This means that the null hypothesis can be rejected and the alternative

hypothesis accepted with a high level of confidence from a statistical perspective.

From a practical standpoint, in terms of the Santa Fe Grill, the results of the univariate

hypothesis test indicate respondents perceived that menu prices were significantly below a 6

(defined as very reasonable by the owners). The mean of 4.47 is substantially below 6 (7 =

Strongly Agree prices are reasonable). Thus, the Santa Fe Grill owners can conclude that their

prices are not perceived very favorably. Indeed, there is a lot of room to improve between the

mean of 4.47 on the 7-point scale and the highest value of 7. This is definitely an area that needs

to be examined. Of course, compared to Joses restaurant the Santa Fe Grill is perceived slightly

more favorably.

10

In many instances marketing researchers test hypotheses that compare the characteristics of two

groups or two variables. For example, the marketing researcher may be interested in determining

whether there is a difference between older and younger new car purchasers in terms of the

importance of a 6-disk DVD player. In situations where more than one group is involved,

bivariate tests are needed. In the following section we first explain the concept of crosstabulation, which examines two variables. We then describe three bivariate hypothesis tests: Chisquare, which is used with nominal data; and the t-test (to compare two means) and analysis of

variance (compares three or more means), both of which are used with either interval or ratio

data.

Cross-Tabulation

11

We introduced one-way frequency tables to report the findings for a single variable. The next

logical step in data analysis is to perform cross-tabulation using two variables. Cross-tabulation

is useful for examining relationships and reporting the findings for two variables. The purpose of

cross-tabulation is to determine if differences exist between subgroups of the total sample. In

fact, cross-tabulation is the primary form of data analysis in some marketing research projects.

To use cross-tabulation you must understand how to develop a cross-tabulation table as well as

how to interpret the outcome.

Note that to simplify this example we will run this crosstab only for customers of the Santa Fe

Grill. To select just customers from the Santa Fe Grill, the click-through sequence is DATA

SELECT CASES IF CONDITION IS SATISFIED IF. Highlight x_s4 Favorite Mexican

restaurant and move it into the window. Then click the = sign and next the 1. This instructs the

SPSS software to select only questionnaires coded 1 in the x_s4 column, which is the Santa Fe

Grill. If you wanted to analyze only the Joses Southwestern Caf respondents, then do the same

except after the = sign put a 0.

To run the crosstab using SPSS, the click-through sequence is ANALYZE DESCRIPTIVE

STATISTICS CROSSTABS. This will get you the set of dialog boxes shown in Exhibit 15.5.

Insert X31 in the Rows window and X32 in the Columns window. Now click on the Cells box

and check the Row box under Percentages, and then the Expected box under Counts. Then click

Continue and OK to get the results.

Exhibit 15.6 shows the cross-tabulation between X31Ad Recall and X322Gender for the

Santa Fe Grill customers (N = 253). The cross-tabulation shows frequencies and percentages,

with percentages shown only for rows. One way to interpret this table, for example, would be to

look at the Observed Count versus the Expected Count. As you can see, the numbers are not very

different. Thus, our preliminary interpretation suggests that males and females do not differ in

their recall of Santa Fe Grill ads.

In constructing a cross-tabulation table, the researcher selects the variables to use when

examining relationships. Selection of variables should be based on the objectives of the research

project. Demographic variables typically are the starting point in developing crosstabulations.

These variables usually are the columns of the cross-tabulation table, and the rows are variables

like purchase intention, usage, or other categorical response questions. Cross-tabulation Tables

show percentage calculations based on column or row totals. Thus, the researcher can make

comparisons of behaviors and intentions for different categories of predictor variables such as

income, sex, and marital status.

As a preliminary technique, cross-tabulation provides the market researcher with a powerful tool

to summarize survey data. It is easy to understand and interpret, and can provide a description of

both total and subgroup data. Yet the simplicity of this technique can create problems. Analysis

can result in an endless variety of cross-tabulation tables. In developing these tables, the analyst

must always keep in mind both the project objectives and specific research questions the study is

designed to answer.

Chi-Square Analysis

12

Marketing researchers often analyze survey data by means of one-way frequency counts and

cross-tabulations. One purpose of cross-tabulations is to study relationships among variables.

The research question is Do the numbers of responses that fall into different categories differ

from what is expected? The null hypothesis is always that the two variables are not related.

Thus, the null hypothesis in the previous example would be that the number of men and women

customers who recall Santa Fe Grill ads is the same. The alternative hypothesis is that the two

variables are related, or that men and women differ in their recall of Santa Fe Grill ads. This

question and similar ones can be answered using Chisquare analysis. Below are some other

examples of research questions that could be examined using Chi-square statistical tests:

Does frequency of eating out (infrequent, moderately frequent, and very frequent) differ

between males and females?

Do part-time and full-time workers differ in terms of how often they are absent from

work (seldom, occasionally, frequently)?

Do college students and high school students differ in their preference for Coke versus

Pepsi?

Chi-square (X2) analysis enables researchers to test for statistical significance between the

frequency distributions of two (or more) nominally scaled variables in a crosstabulation table to

determine if there is any association. Categorical data from questions about gender, education, or

other nominal variables can be examined with this statistic. Chi-square analysis compares the

observed frequencies (counts) of the responses with the expected frequencies. The Chi-square

statistic tests whether or not the observed data are distributed the way we expect them to be,

given the assumption that the variables are not related. The expected cell count is a theoretical

value, while the observed cell count is the actual cell count based on your study. For example, if

we observe that women recall ads more so than men, we would compare the observed value with

the frequency we would expect to find if there is no difference between womens and mens ad

recall. Thus, the chi-square statistic helps to answers questions about nominally scaled data that

cannot be analyzed with other types of statistical analysis, such as ANOVA or t-tests.

Calculating the X2 Value

To help you to better understand the Chi-square statistic, we will show you how to calculate it.

The formula is shown below:

13

As above equation indicates, the expected frequency is subtracted from the observed frequency

and then squared to eliminate any negative values before the results are used in further

calculations. After squaring, the resulting value is divided by the expected frequency to take into

consideration cell size differences. Then each of these calculations, which are performed for each

cell of the table, are summed over all cells to arrive at the Chi-square value. The Chi-square

value tells you how far the observed frequencies are from the expected frequencies.

Conceptually, the larger the Chi-square is, the more likely it is that the two variables are related.

This is because Chi-square is larger whenever the number actually observed in a cell is much

different than what we expected to find, given the assumption that the two variables are not

related. The computed Chi-square statistic is compared to a table of Chi-square values to

determine if the differences are statistically significant. If the calculated Chi-square is larger than

the Chi-square reported in standard statistical tables, then the two variables are related for a

given level of significance, typically .05.

Some marketing researchers call Chi-square a goodness of fit test. That is, the test evaluates

how closely the actual frequencies fit the expected frequencies. When the differences between

observed and expected frequencies are large, you have a poor fit and you reject your null

hypothesis. When the differences are small, you have a good fit.

One word of caution is necessary, however, in using Chi-square. The Chi-square results will be

distorted if more than 20 percent of the cells have an expected count of less than 5, or if any cell

has an expected count of less than 1. In such cases, you should not use this test. SPSS will tell

you if these conditions have been violated. One solution to small counts in individual cells is to

collapse them into fewer cells to get larger counts.

SPSS ApplicationChi Square

Based on their conversations with customers, the owners of the Santa Fe Grill believe that female

customers are coming to the restaurant from farther away than are male customers. The Chi14

square statistic can be used to determine if this is true. The null hypothesis is no difference in

distance driven (X30) between male and female customers of the Santa Fe Grill.

To conduct this analysis we examine only the responses for the Santa Fe Grill (N = 253). The

SPSS click-through sequence is ANALYZE DESCRIPTIVE STATISTICS CROSSTABS.

Click on X30Distance Traveled for the Row variable and on X32 Gender for the Column

variable. Click on the Statistics button and the Chi-square box, and then Continue. Next click on

the Cells button and on Expected frequencies (Observed frequencies is usually already checked).

Then click Continue and OK to execute the program.

The SPSS results are shown in Exhibit 15.7. The top table shows the actual number of responses

(count) for males and females for each of the categories of X300 Distance Driven: less than 1

mile, 15 miles, and more than 5 miles. For example, 74 males drove a distance of less than 1

mile while 12 females drove from this same distance. The expected frequencies (count) are also

shown in this table, right below the actual count.

The expected count is calculated on the basis of the proportion of the sample represented by a

particular group. For example, the total sample of Santa Fe Grill customers is 253, and 176 are

males and 77 are females. This means 69.6 percent of the sample is male and 30.4 percent is

female. When we look in the Total column for the distance driven category labeled Less than 1

mile we see that there are a total of 86 male and female respondents. To calculate the expected

frequencies, you multiply the proportion a particular group represents times the total number in

that group. For example, with males you calculate 69.6 percent of 86 and the expected frequency

is 59.8. Similarly, females are 30.4 percent of the sample so the expected number of females 26.2

(.304 86). The other expected frequencies are calculated in the same way.

Look again at the observed frequencies and note that a higher count than expected of female

customers of Santa Fe Grill drive more than 5 miles. That is, we would expect only 27.7 women

to drive to the Santa Fe Grill from more than 5 miles, but actually 34 women drove from this far

away. Similarly, there are fewer male customers than expected who drive from more than five

miles away (expected = 63.3 and actual only 57). This pattern is similar for the distance of 15

miles. That is, a higher proportion of females are driving from this distance than would be

expected.

Information in the Chi-Square Test table shows the results for this test. The Pearson Chi-Square

value is 16.945 and it is significant at the .000 level. Since this level of significance is much less

than our standard criterion of .05, we can reject the null hypothesis of no difference in distance

driven with a high degree of confidence. The interpretation of this finding suggests that female

customers are indeed driving from farther away than expected to get to the Santa Fe Grill. At the

same time, the males are driving shorter distances than expected to get to the Santa Fe Grill.

15

16

17

In addition to examining frequencies, marketing researchers often want to compare the means of

two groups. There are two possible situations when means are compared. The first is when the

means are from independent samples, and the second is when the samples are related. An

example of an independent sample comparison would be the results of interviews with male and

female coffee drinkers. The researcher may want to compare the average number of cups of

coffee consumed per day by male students with the average number of cups of coffee consumed

by female students. An example of the second situation, related samples, is when the researcher

compares the average number of cups of coffee consumed per day by male students with the

average number of soft drinks consumed per day by the same sample of male students.

In a related sample situation, the marketing researcher must take special care in analyzing the

information. Although the questions are independent, the respondents are the same. This is called

18

a paired sample. When testing for differences in related samples the researcher must use what is

called a paired samples t-test. The formula to compute the t-value for paired samples is not

presented here. Students are referred to more advanced texts for the actual calculation of the tvalue for related samples. The SPSS package contains options for both the related-samples and

the independent samples situations.

T-TEST TO COMPARE TWO MEANS

Just as with the univariate t-test, the bivariate t-test requires interval or ratio data. Also, the t-test

is especially useful when the sample size is small (n < 30) and when the population standard

deviation is unknown. Unlike the univariate test, however, we assume that the samples are drawn

from populations with normal distributions and that the variances of the populations are equal.

Essentially, the t-test for differences between group means can be conceptualized as the

difference between the means divided by the variability of the means. The t-value is a ratio of the

difference between the two sample means and the standard error. The t-test provides a

mathematical way of determining if the difference between the two sample means occurred by

chance. The formula for calculating the t-value is:

To illustrate the use of a t-test for the difference between two group means, lets turn to the Santa

Fe Grill database. The Santa Fe Grill owners want to find out if there are differences in the level

of satisfaction between male and female customers. To do that we can use the SPSS Compare

Means program.

The SPSS click-through sequence is Analyze Compare Means Independent- Samples tTest. When you get to this dialog box, move variable X22Satisfaction into the Test Variables

box and variable X32Gender into the Grouping Variable Box. For variable X32 you must

define the range in the Define Groups box. Enter a 0 for Group 1 and a 1 for Group 2 (males

were coded 0 in the database and females were coded 1) and then click Continue. For the

Options we will use the defaults, so just click OK to execute the program.

Results are shown in Exhibit 15.8. The top table shows the Group Statistics. Note that 176 male

customers and 77 female customers were interviewed. The mean satisfaction level for males was

a bit higher at 4.70, compared with 4.18 for the female customers. Also, the standard deviation

for females was smaller (.823) than for the males (1.034).

19

To find out if the two means are significantly different, we look at the information in the

Independent Samples Test table. The statistical significance of the difference in two means is

calculated differently if the variances of the two means are equal versus unequal. In the column

labeled Sig. (2-tailed) you will note that the two means are significantly different (< .000),

whether we assume equal or unequal variances. Thus, there is no support for the null hypothesis

that the two means are equal, and we conclude that male customers are significantly more

satisfied than female customers. There is other information in this table, but we do not need to

concern ourselves with it at this time.

Sometimes marketing researchers want to test for differences in two means for variables in the

same sample. For example, the owners of the Santa Fe Grill noticed that the taste of their food

was rated 4.78 while the food temperature was rated only 4.38. Since the two food variables are

obviously related, they want to know if the ratings for taste really are significantly higher (more

favorable) than for temperature. To examine this, we use the paired samples test for the

difference in two means. This test examines whether two means from two different questions

20

using the same scaling and answered by the same respondents are significantly different. The

null hypothesis is that the mean ratings for the two food variables (X18 and X20) are equal. Note

that in this example we are looking only at the responses of the Santa Fe Grill customers.

To test this hypothesis we use the SPSS paired-samples t-test. The click-through sequence is

Analyze Compare Means Paired-Samples t-Test. When you get to this dialog box,

highlight both X188Food Taste and X20Food Temperature and then click on the arrow

button to move them into the Paired Variables box. For the Options we will use the defaults, so

just click OK to execute the program.

Results are shown in Exhibit 15.9. The top table shows the Paired Samples Statistics. The mean

for food taste is 4.78 and for food temperature is 4.38. The t-value for this comparison is 8.421

(see Paired Samples Test table) and it is significant at the .000 level. Thus we can reject the null

hypothesis that the two means are equal and conclude that Santa Fe Grill customers definitely

have more favorable perceptions of food taste than food temperature.

21

Analysis of variance (ANOVA) is used to determine the statistical difference between three or

more means. For example, if a sample finds that the average number of cups of coffee consumed

per day by freshmen during finals is 3.7, while the average number of cups of coffee consumed

per day by seniors and graduate students is 4.3 cups and 5.1 cups, respectively, are these

observed differences statistically significant? The ability to make such comparisons can be quite

useful for the marketing researcher.

The technique is really quite straightforward. In this section we describe a one-way ANOVA.

The term one-way is used since there is only one independent variable. ANOVA can be used in

cases where multiple independent variables are considered, which enables the analyst to estimate

22

both the individual and joint effects of the several independent variables on the dependent

variable.

An example of an ANOVA problem may be to compare light, medium, and heavy drinkers of

Starbucks coffee on their attitude toward a particular Starbucks advertising campaign. In this

instance there is one independent variable, consumption of Starbucks coffee, but it is divided into

three different levels. Our earlier t-statistics wont work here since we have more than two

groups to compare.

ANOVA requires that the dependent variable, in this case the attitude toward the Starbucks

advertising campaign, be metric. That is, the dependent variable must be either interval or ratio

scaled. A second data requirement is that the independent variable, in this case the coffee

consumption variable, be categorical.

The null hypothesis for ANOVA always states that there is no difference between the dependent

variable groupsin this situation, the ad campaign attitudes of the groups of Starbucks coffee

drinkers. In specific terminology, the null hypothesis would be:

1 = 2 = 3

ANOVA examines the variance within a set of data. Recall from the earlier discussion of

measures of dispersion that the variance of a variable is equal to the average squared deviation

from the mean of the variable. The logic of ANOVA is that if the variance between the groups is

compared to the variance within the groups, we can make a logical determination as to whether

the group means (attitudes toward the advertising campaign) are significantly different.

Determining Statistical Significance in ANOVA

In ANOVA, the F-test is used to statistically evaluate the differences between the group means.

For example, suppose the heavy users of Starbucks coffee rate the advertising campaign 4.4 on a

five-point scale, with 5 = Very favorable. The medium users of Starbucks coffee rate the

campaign 3.9, and the light users of Starbucks coffee rate the campaign 2.5. The F-test in

ANOVA tells us if these observed differences are meaningful.

The total variance in a set of responses to a question is made up of between-group and withingroup variance. The between-group variance measures how much the sample means of the

groups differ from one another. In contrast, the within-group variance measures how much the

observations within each group differ from one another. The F-distribution is the ratio of these

two components of total variance and can be calculated as follows:

F-ratio = Variance between groups/Variance within groups

The larger the difference in the variance between groups, the larger the F-ratio. Since the total

variance in a data set is divisible into between and within components, if there is more variance

explained or accounted for by considering differences between groups than there is within

groups, then the independent variable probably has a significant impact on the dependent

23

variable. Larger F-ratios imply significant differences between the groups. The larger the F-ratio,

the more likely it is that the null hypothesis will be rejected.

ANOVA, however, is able to tell the researcher only that statistical differences exist between at

least one pair of the group means. The technique cannot identify which pairs of means are

significantly different from each other. In our example of Starbucks coffee drinkers attitudes

toward the advertising campaign, we could conclude that differences in attitudes toward the

advertising campaign exist among light, medium, and heavy coffee drinkers, but we would not

be able to determine if the differences are between light and medium, or between light and

heavy, or between medium and heavy, and so on. We would be able to say only that there are

significant differences somewhere among the groups. Thus, the marketing researcher still must

determine where the mean differences lie. Follow-up post-hoc tests have been designed for just

that purpose.

There are several follow-up tests available in statistical software packages such as SPSS and

SAS. All of these methods involve multiple comparisons, or simultaneous assessment of

confidence interval estimates of differences between the means. All means are compared two at a

time. The differences between the techniques lie in their ability to control the error rate. We shall

briefly describe the Scheff procedure, although a complete discussion of these techniques is

well beyond the scope of this book. Relative to the other follow-up tests mentioned, however, the

Scheff procedure is a more conservative method of detecting significant differences between

group means.

The Scheff follow-up test establishes simultaneous confidence intervals, which hold the entire

experiments error rate to a specified level. The test exposes differences between all pairs of

means to a high and low confidence interval range. If the difference between each pair of means

falls outside the range of the confidence interval, then we reject the null hypothesis and conclude

that the pairs of means falling outside the range are statistically different. The Scheff test might

show that one, two, or all three pairs of means in our Starbucks example are different. The

Scheff test is equivalent to simultaneous two-tailed hypothesis tests, and the technique holds the

specified analysis significance level. Because the technique holds the experimental error rate to ,

the confidence intervals tend to be wider than in the other methods, but the researcher has more

assurance that true mean differences exist. Recall that the Scheff test is very conservative so

you may wish to look at one of the other tests available in your statistical software.

SPSS ApplicationANOVA

To help you understand how ANOVA is used to answer research questions, we refer to the Santa

Fe Grill database to answer a typical question. The Santa Fe Grill owners want to know how

their restaurant compares to their major competitor, Joses Southwestern Caf. They are

particularly interested in comparing satisfaction and related variables as well as gender. The

purpose of the ANOVAanalysis is to see if the differences that do exist are statistically

significant. To examine the differences, an F-ratio is used. The larger the F-ratio, the more

difference there is among the means of the various groups with respect to their likelihood of

recommending the restaurant. Note that this application of ANOVA examines only two groups:

24

the two restaurant competitors Santa Fe Grill and Joses Southwestern Caf. But ANOVA can be

used to examine three, four, or more groups, to identify statistical differences if they exist.

SPSS can conduct the statistical analysis to test the null hypothesis. To compare male and female

customers from the two restaurants, we first must split the sample into two groupsthe male and

female customers. To do this, the click-through sequence is DATA SPLIT FILE Click on

Compare Groups. Now highlight the variable X322Gender and move it into the Groups

Based on: window, and then click OK. Your results will now be computed for male and female

customers separately.

Next we want to test whether the two restaurants are viewed differently on selected variables.

The click-through sequence is ANALYZE COMPARE MEANS ONE-WAY ANOVA.

Highlight X22Satisfaction, X23Likely to Return, and X24Likely to Recommend by

highlighting them and moving to the Dependent List window. Next, highlight x_s4 Favorite

Mexican restaurant and move it to the Factor window. This tells the SPSS software to

statistically test the differences in the responses on the three variables selected. Next click on the

Options box, then on Descriptive (to get group means), and then continue. Now click OK to run

the test.

The results for the ANOVA are shown in Exhibit 15.10. The two restaurants differ significantly

on four of the six variables compared (see Sig. column). In the top of the table are the

comparisons of males and they differ on only one variable: X24Likely to Recommend. That is,

the mean perceptions of males between the two restaurants do not differ significantly on

satisfaction or likelihood of returning. But the male customers of the Santa Fe Grill are more

likely to recommend (X24) the restaurant (mean = 3.78) than are the male customers of Joses

Southwestern Caf (mean = 3.23).

The female customers feel very differently about the two restaurants, and indeed there are

significant differences on all three variables (.000). The female customers are more favorable

about Joses Southwestern Caf, particularly in terms of satisfaction and likelihood of returning.

The females are likely to recommend Joses restaurant (mean = 5.23 on a 7-point scale), but not

likely to recommend the Santa Fe Grill (mean = 3.22).

n-Way ANOVA

Discussion of ANOVA to this point has been devoted to one-way ANOVA in which there is only

one independent variable. In the examples, the usage category (consumption of Starbucks coffee)

or the restaurant competitors (Santa Fe Grill and Joses Southwestern Caf) was the single

independent variable. It is not at all uncommon, however, for the researcher to be interested in

several independent variables simultaneously. In such cases an n-way ANOVA would be used.

Often researchers are interested in the region of the country where a product is sold as well as

consumption patterns. Using multiple independent factors creates the possibility of an interaction

effect. That is, the multiple independent factors can act together to affect group means. For

example, heavy consumers of Starbucks coffee in the Northeast may have different attitudes

25

about advertising campaigns than heavy consumers of Starbucks coffee in the West, and there

may be still further differences between the various coffeeconsumption- level groups.

Another situation that may require n-way ANOVA is the use of experimental designs, where the

researcher uses different levels of a stimulus (for example, different prices or ads) and then

measures responses to those stimuli. For example, a marketer may be interested in finding out

whether consumers prefer a humorous ad to a serious one and whether that preference varies

across gender. Each type of ad could be shown to different groups of customers (both male and

female). Then, questions about their preferences for the ad and the product it advertises could be

asked. The primary difference between the groups would be the difference in ad execution

(humorous or nonhumorous) and customer gender. An n-way ANOVAcould be used to find out

whether the ad execution differences helped cause differences in ad and product preferences, as

well as what effects might be attributable to customer gender.

From a conceptual standpoint, n-way ANOVA is similar to one-way ANOVA, but the

mathematics is more complex. However, statistical packages such as SPSS will conveniently

allow the marketing researcher to perform n-way ANOVA.

PERCEPTUAL MAPPING

While our fast-food example illustrates how perceptual mapping grouped pairs of restaurants

together based on perceived ratings, perceptual mapping has many other important applications

in marketing research. Other applications include

thereby help to position new products.

26

Image measurement. Perceptual mapping can be used to identify the image of the

company to help to position one company relative to the competition.

brand.

Distribution. Perceptual mapping can be used to assess similarities of brands and channel

outlets.

27

- lec37Uploaded byKofi Appiah-Danquah
- free-pdf-ebook.com-spss-tutorialUploaded byMakhue Khumzz
- Practice 9 10Uploaded byketansahils2849
- Unit II - Parametric & Non-parametric TestsUploaded byJagadeesh Rocckz
- Lean Six Sigma Black Belt Certification ProgramUploaded byHenry Harvin Education
- Econometrics Chapter 7 PPT slidesUploaded byIsabelleDwight
- 111401_QMM Course OutlineUploaded bySomesh Pal
- Statistical Hypothesis TestUploaded byIsh Roman
- An Investigation into the Issues of Work-Life Balance of Women Entrepreneurs in BangladeshUploaded byIOSRjournal
- 2011-12 Student Coursework GuideUploaded byOMEDINED
- Basic Statistical ToolsUploaded byHaytham Janoub Hamouda
- Unit 1 Big Data Analytics – an Introduction(Final)Uploaded byMurtaza Vasanwala
- lecture7hypothesistestingtwosample-091020164256-phpapp02Uploaded byBalamurali Balasingam
- pgsrmUploaded byAnonymous ZVbwfc
- PERSONALITY TRAITS OF NARIKURAVAR STUDENTS – AN ANALYSISUploaded byAnonymous CwJeBCAXp
- Black MarketUploaded byKundai Marvel Matangira
- Lecture 12Uploaded byAnonymous 0cbz1W
- Ent. 110 (50 Bit) (E)Uploaded bysagar
- Mechanical Properties of Green Coconut Fiber Reinforced Hdpe Polymer CompositeUploaded bysyampnaidu
- Final Examination 2015Uploaded bySahil Jain
- 4605390 b.a. Hons PsychologyUploaded bySandeep Swami Swami
- Math AUploaded byFrishian Gail Quijano
- Pengantar SeminarUploaded byKirigaya Hideyoshi
- Transformando La Movilidad Urbana en Mexico2Uploaded byLuu BeHl
- Thesis ReportUploaded byAngelie Lape
- 4. Environmental StandardsUploaded byAhmad Muhammad
- 12.pdfUploaded byVijay1986
- IPS7e_LecturePowerPointSlides_ch07Uploaded byYunsik Kim
- presentationanalysisandinterpretationofdata-140724104415-phpapp02Uploaded byEdje Anthony Bautista
- HypothesisUploaded byAmit Saha

- Cuddy, l. l. (1991). Melodic Patterns and Tonal Structure. Psychomusicology- Music, Mind & Brain, 10(2), 107-126.Uploaded bygoni56509
- R_ Companion to Applied RegressionUploaded byartemdotgr
- Assumptions Underlying the One-way AnovaUploaded byTayyaba Mahr
- Factorial a NovaUploaded byfaisal042006
- Afari, Aldridge, & FraserUploaded byAchmad Nizar
- 3_Labour Unrest in RMG SectorUploaded byDipta Whippersnapper Kaushik
- 51130Uploaded byMalcolm Christopher
- The Effect of Grounding Melodic Interventions on Symptoms of Anxiety in Bereavard Patients- A Proposed Study - Juan P. ZamboniniUploaded byJuan Pedro Zambonini
- Developing Listening Comprehension Skills (4)Uploaded byGlobal Research and Development Services
- Classification of Spanish Unifloral Honeys by Discriminant AnalysisUploaded byWaleed El-azab
- PMBSB Sample Q&AUploaded bysivaramankm
- Customer Satisfaction Survey on BanksUploaded byAnkit Singh
- Ecostat Paper (Acne Statistics)Uploaded byMac Co
- Ch 17 StatisticaUploaded byEugen
- FindingsUploaded byCharles Gilmore
- Impact of talent management practices on employee retention with respect to selected private hospitals in Sangli CityUploaded byEditor IJTSRD
- Performance of tomato rootstocks in False Root-knot Nematode (Nacobbus aberrans) infested soilUploaded bysebayfer
- microgrids paperUploaded byapi-267759400
- hall04Uploaded byDaya
- Effects of Tea on Glucose LevelsUploaded bybhermanson9723
- Kakatiya University - MbasyllUploaded byShyam Sfdc
- Lec1.RegressionUploaded byMuhammad muneeb
- Am J Orthod Dentofacial Orthop. 2005128 5 568-74Uploaded byVero Zelaya Huerta
- Some Factors Affecting USED for CONCLUSIONSUploaded byPortia Shilenge
- Mock ExamUploaded byEddie Martinez Jr.
- Comparison_root Cause Analysis ToolsUploaded byapi-3847614
- NCB20203 (1)Uploaded bysyafiqah umran
- Final Paper(S Prusty, A Ray, B.C Routara)Uploaded byAbuSaleh
- Relationship of Institutional Ownership with Firm Value and Earnings Quality: Evidence from Tehran Stock ExchangeUploaded byTI Journals Publishing
- Technology Influence on DemandUploaded bySanat Kumar