You are on page 1of 3

Conduct Inferential Analysis Descriptive statistics help you analyze descriptive questions.

However,
when you compare groups or relate two or more variables, inferential analysis comes into play. The
basic idea is to look at scores from a sample and use the results to draw inferences or make predictions
about the population. Often you cannot study the entire population because of size and cost, so we
instead examine a sample that has been carefully chosen from the population. When you study this
sample and obtain scores, several approaches exist for determining if the sample scores you receive are
a good estimate of the population scores (see Vogt, 2005). Ask yourself: 1. Is the sample score (e.g., the
mean difference between two groups) probably a wrong estimate of the population mean? The
procedure you use to examine this question is hypothesis testing. Hypothesis testing is a procedure for
making decisions about results by comparing an observed value of a sample with a population value to
determine if no difference or relationship exists between the values. This is the traditional way to test
whether the sample mean is a good estimate of the population mean. It provides a yes–no answer:
Either the sample mean is a good estimate or it is not. Because we can never absolutely prove that the
sample is a good estimate, we try to establish whether it is a wrong estimate. 2. How confi dent are you
that your sample score is right? This is the confi dence interval approach. A confidence interval or
interval estimate is the range of upper and lower statistical values that is consistent with observed data
and is likely to contain the actual population mean. In this approach, you determine an interval or range
in which your population score would likely fall. In this sense, confi dence intervals give us more fl
exibility than the yes–no options of hypothesis testing.

3. Does the sample score or differences between two groups make practical sense? This is the effect size
approach. Effect size is a means for identifying the practical strength of the conclusions about group
differences or about the relationship among variables in a quantitative study. Effect sizes tell us how
different the sample values are and allow us to make a judgment as to whether this is signifi cant based
on our knowledge of measures, the participants, and the data collection effort. The reason we have
more than one approach is that recently some researchers have felt that a yes–no hypothesis testing
answer to our quantitative questions and hypotheses leads to misinterpretations and errors (Finch,
Cumming, & Thomason, 2001). Confi dence intervals and effect sizes provide a more practical reading of
results. In reporting research today, it is useful to report all three estimates of your population:
hypothesis testing, the confi dence interval, and the effect size (Wilkinson & Task Force on Statistical
Inference, 1999). Hypothesis Testing There are fi ve steps in hypothesis testing: (a) identify a null and
alternative hypothesis; (b) set the level of signifi cance, or alpha level; (c) collect data; (d) compute the
sample statistic; and (e) make a decision about rejecting or failing to reject the null hypothesis. 1.
Identify your null and alternative hypothesis. The null hypothesis is a prediction about the population
and is typically stated using the language of “no difference” (or “no relationship” or “no association”).
The alternative hypothesis, however, indicates a difference (or relationship or association), and the
direction of this difference may be positive or negative (alternative directional hypotheses) or either
positive or negative (alternative nondirectional hypotheses). Returning to the data for high school
students in Table 6.2, you may state a null and alternative hypothesis as follows: Null Hypothesis: There
is no difference between smokers and nonsmokers on depression scores. Alternative Hypothesis
(nondirectional and directional): There is a difference between smokers and nonsmokers on depression
scores. (Or, written another way): Smokers are more depressed than nonsmokers. 2. Set the level of
signifi cance, or alpha level, for rejecting the null hypothesis. If we were to collect a large number of
sample means, and if the null hypothesis is true (“no difference”), the theoretical distribution would
approximate a normal or bell-shaped curve, as illustrated in Figure 6.4. In this fi gure, we see a normal
curve illustrating the distribution of sample means of all possible outcomes if the null hypothesis is true.
We would expect most of our sample means to fall in the center of the curve if the hypothesis is true,
but a small number would fall at the extremes. In other words, we would expect to fi nd that, for any
sample of smokers and nonsmokers, their depression scores are similar, but in a small percentage of
cases, you might actually fi nd them to be different. As you can see, there are shaded areas at each end
of this curve. We would expect there to be an extremely low probability that a score would fall within
these areas. A standard is needed for these low probability areas for precisely marking them on this
curve. This is called setting a signifi cance level. A significance level (or alpha level) is a probability level
that refl ects the maximum risk you are willing to take that any observed differences are due to chance.
It is typically set at .01 (1 out of 100 times the sample score will be due to chance) or .05 (5 out of 100
times it will be due to chance). This means that 1 out of 100 times (or 5 out of 100 times) an extremely
low probability value will actually be observed if the null hypothesis is true. In some situations, it is
important to set the alpha level even stricter than .01 or .05. Assume that a researcher is testing the
effects of a drug that has severe side effects. The alpha level might be set at a stricter level for rejection,
say .001, if the drug might have damaging side effects for cancer patients rather than a less conservative
level of .05 if the drug would have less damaging side effects for individuals with acne. The area on the
normal curve for low probability values if the null hypothesis is true is called the critical region. If sample
data (i.e., the difference between smokers and nonsmokers on depression) falls into the critical region,
the null hypothesis is rejected. This means that instead of “there is no difference” as stated in the null
hypothesis, we fi nd the alternative to probably be true: “there is a difference.” Also notice in Figure 6.4
that this critical region marked by a signifi cance level occurs at both ends of the normal curve. When
the critical region for rejection of the null hypothesis is divided into two areas at the tails of the sampling
distribution, we have a two-tailed test of significance (Vogt, 2005). However, if we place the region at
only one end for rejection of the null hypothesis, we have a one-tailed test of significance. You use one-
tailed tests when previous research indicates a probable direction (e.g., a directional, alternative
hypothesis). In contrast, a two-tailed test of signifi cance is more conservative, or demanding, because
the area of rejection at either end of the curve is less than that of a one-tailed test. We say that a one-
tailed test has more power, which means that we are more likely to reject the null hypothesis. 3. Collect
data. You collect data by administering an instrument or recording behaviors on a check sheet for
participants. Then, as discussed earlier in this chapter, you code the data and input it into a computer fi
le for analysis. 4. Compute the sample statistic. Next, using a computer program, you compute a statistic
or p value and determine if it falls inside or outside of the critical region. A p value is the probability ( p)
that a result could have been produced by chance if the null hypothesis were true. After calculating the
p value, we compare it with a value in a table located in the back of major statistics books (e.g.,
Gravetter & Wallnau, 2007) related to the statistical test by fi nding the value given our signifi cant level
(e.g., .01), whether our test is one tailed or two tailed, and the degrees of freedom for our statistical test
(or examine the printout for this value). Degrees of freedom (df) used in a statistical test is usually one
less than the number of scores. For a sample of scores, df = n–1. The degrees of freedom establish the
number of scores in a sample that are independent and free to vary because the sample mean places a
restriction on sample variability. In a sample of scores, when the value of the mean is known, all scores
but one can vary (i.e., be independent of each other and have any values), because one score is
restricted by the sample mean (Gravetter & Wallnau, 2007). The diffi cult part is determining what
statistical test to use. Table 6.5 presents many of the common statistical tests used in educational
research. Also, consult Appendix C for common statistical tests, their defi nition, and examples. Seven
questions need to be answered to arrive at the appropriate statistical test (also see Rudestam &
Newton, 1992, for similar criteria): • Do you plan to compare groups or relate variables in your
hypotheses or research questions? • How many independent variables do you have in one research
question or hypothesis? • How many dependent variables do you have in one research question or
hypothesis? Typically, researchers use only one dependent variable, or if multiple dependent variables
are of interest, each variable is analyzed one by one. • Will you be statistically controlling for covariates
in your analysis of the research question or hypothesis? • How will your independent variable(s) be
measured? The type of measurement on scale possibilities are categorical (nominal and ordinal) and
continuous (interval/ ratio) scales. • How will your dependent variable(s) be measured? As with your
independent variables, identify whether the dependent variables are categorical or continuous
variables. • Are the scores on your variables normally distributed; that is, could you assume a normal
curve if the scores were plotted on a graph? Certain statistics have been designed to work best with
normally distributed data and others with nonnormally distributed data. (See appendix D for additional
information about nonnormal distributions.) Given these seven questions, what statistical test would
you use to study these null hypotheses? “There is no difference between smokers and nonsmokers on
depression scores.” “There is no difference between smokers and nonsmokers and peer group affi
liation.” For the fi rst hypothesis, you would select a t test, and for the second, the chi-square statistic.
Can you identify the decisions that went into selecting both tests based on the seven criteria? 5. Make a
decision about rejecting or failing to reject the null hypothesis. Let’s assume that you have now
computed the statistical test for the two hypotheses using the data reported earlier in Table 6.2. Assume
that you have used SPSS Version 14.0 and have printouts as shown in Table 6.6. In Table 6.6, you
compare smokers and nonsmokers in terms of their scores on depression. The statistical test you
computed was a t-test analysis and it indicated that the 26 nonsmokers have a mean of 69.77 on the
depression scale, whereas the 24 smokers have a mean of 79.79, a difference of 10.02 points between
the two groups. The two-tailed signifi cance test indicates a t = -7.49 with 48 degrees of freedom,
resulting in a two-tailed p value of .00 ( p = .00). This p value is statistically signifi cant because it is less
than alpha = .05. If the p value is less than alpha, you reject the null hypothesis; if it is greater than
alpha, you accept the hypothesis. Our overall conclusion, then, is that there is a difference between
nonsmokers and smokers and their.

You might also like