You are on page 1of 17

UNIT 4:

Parametric Tests

Introduction:
In testing hypothesis, it is relatively important to select the most appropriate statistical
tools that will truly detect significant difference or relationship between parameters or
values. To understand better about inferential statistical tools for hypotheses testing,
this module offers you brief discussions to orient you on the nature and assumptions
for every statistical tool and later translate them into practice through some
exercises.

OBJECTIVES:

AFTER going through the module, it is expected that you will be able to do the
following with at least 80% proficiency level:

1. Discuss the assumptions to be satisfied when selecting specific parametric test;


2. Decide on appropriate parametric test given set of problems; and
3. Apply statistical procedures in a given set of problems.

PARAMETRIC TESTS

When do we use parametric tests?


We use parametric tests when: (1) the distribution of data is normal; and (2)
the level of measurement to be analyzed is expressed in interval and ratio data.

Why do we use parametric tests?


We use parametric tests because they are more powerful compared to non-
parametric tests.

How do we use parametric tests?


(1) Determine whether the data are distributed normally or abnormally;
(2) Determine if the data are expressed in interval or ratio data; and
(3) if all are satisfied, use the parametric tests.
Three questions that should be considered when selecting appropriate
parametric tests:

(1) The research question;


(2) The design of the study; and
(3) The levels of measurement.

Types of research question that should be carefully considered:


(1) A relational question seeks information about the relationship among
variables; in this situation, investigators will be interested in determining whether
there is association.
(2) A causal question seeks information about the effects of an intervention
on an outcome; in this situation, the investigator will be interested in determining
whether there is a difference.

NOW, LET US LEARN MORE ABOUT TYPES OF PARAMETRIC TESTS


APPROPRIATE FOR CAUSAL QUESTIONS.

INDEPENDENT T-TEST FOR TWO SAMPLES

The independent t-test is an inferential statistical test that determines whether


there is a statistically significant difference between the means in two unrelated
groups. Unrelated groups, also called unpaired groups or independent groups, are
groups in which the cases (e.g., participants) in each group are different.
The null hypothesis for the independent t-test is that the population means
from the two unrelated groups are equal: H0: u1 = u2.
The alternative hypothesis, which is that the population means are not equal:
HA: u1 ≠ u2

To do this, we need to set a significance level (also called alpha) that allows
us to either reject or accept the alternative hypothesis. Most commonly, this value is
set at 0.05.

In order to run an independent t-test, the following assumptions should be


met:
(1) Samples which are randomly selected;
(2) Data which are expressed in interval or ratio;
(3) Data that are normally distributed; and
(4) Data with equal variances.

Having learned about the basic assumptions before selecting t-test, let us
explore further by applying the aforesaid principles in using t-test, this time through
the aid of Microsoft Excel. If you have your laptops, or any android device with this
program, turn it on, and let us understand better how t-test is run.

The following example teaches you how to perform a t-test in Excel. But, we
first need to install Data Analysis ToolPak in Excel before we can successfully run
the test.
How to Install the Data Analysis ToolPak in Excel?

The Data Analysis ToolPak must be installed on your copy of Excel to


perform t-tests. To determine whether you have this ToolPak installed, click Data in
Excel’s menu across the top and look for Data Analysis in the Analyze section. If you
don’t see Data Analysis, you need to install it. Don’t worry. It’s free!
To install Excel’s Analysis Tookpak, click the File tab on the top-left and then
click Options on the bottom-left. Then, click Add-Ins. On the Manage drop-down list,
choose Excel Add-ins, and click Go. On the popup that appears, check Analysis
ToolPak and click OK.

After you enable it, click Data


Analysis in the Data menu to display
the analyses you can perform. Among
other options, the popup presents three
types of t-test, which we’ll cover next.

Were you successful in installing


Data Analysis ToolPAk? If not, go back
to the previous page, read the
instructions carefully, and repeat the
procedure.

Let us continue:

Below you can find the study


hours of 6 female students and 5 male students enrolled in Environmental Science
class. Our null and alternative hypotheses will be expressed this way:
H0: μ1 - μ2 = 0
H1: μ1 - μ2 ≠ 0
To perform a t-Test, execute the following steps:
1. First, perform an F-Test to determine if the variances of the two populations are
equal. If not, choose the t-test two samples assuming unequal variance.
2. On the Data tab, in the Analysis group, click Data Analysis.

3. Select t-Test: Two-Sample Assuming Unequal Variances and click OK.

4. Click in the Variable 1 Range box and select the range A2:A7.
5. Click in the Variable 2 Range box and select the range B2:B6.
6. Click in the Hypothesized Mean Difference box and type 0 (H 0: μ1 - μ2 = 0).
7. Click in the Output Range box and select cell E1.
8. Click OK.
After performing the test, Microsoft Excel will give you this result:

Conclusion: We do a two-tail test (inequality). lf t Stat > t Critical two-tail, we reject


the null hypothesis. This is not the case, 1.473< 2.365. Therefore, we fail to reject or
we accept the null hypothesis. The observed difference between the sample means
(33 - 24.8) is not convincing enough to say that the average number of study hours
between female and male students differ significantly.

DEPENDENT T-TEST FOR PAIRED SAMPLES

What does this test do?


The dependent t-test (also called the paired t-test or paired-samples t-test)
compares the means of two related groups to determine whether there is a
statistically significant difference between these means.

What variables do you need for a dependent t-test?


You need one dependent variable that is measured on an interval or ratio
scale. You also need one categorical variable that has only two related groups.

What is meant by "related groups"?


This indicates that the same participants are tested more than once. Thus, in
the dependent t-test, "related groups" indicates that the same participants are
present in both groups. The reason that it is possible to have the same participants in
each group is because each participant has been measured on two occasions on the
same dependent variable.

Does the dependent t-test test for "changes" or "differences" between related
groups?
The dependent t-test can be used to test either a "change" or a "difference" in
means between two related groups, but not both at the same time. Whether you are
measuring a "change" or "difference" between the means of the two related groups
depends on your study design.
What are the assumptions of the dependent t-test?
(1) Samples which are randomly selected
(2) Data which are expressed in interval or ratio
(3) Data that are normally distributed (just the differences between the
groups);
(4) Data with equal variances.

What hypothesis is being tested?


The dependent t-test is testing the null hypothesis that there are no
differences between the means of the two related groups. We can express this as
follows:
H0: µ1 = µ2
HA: µ1 ≠ µ2

What is the advantage of a dependent t-test over an independent t-test?


The major advantage of choosing a repeated-measures design (and
therefore, running a dependent t-test) is that you get to eliminate the individual
differences that occur between participants – the concept that no two people are the
same – and this increases the power of the test. What this means is that if you are
more likely to detect a (statistically significant) difference, if one does exist, using the
dependent t-test versus the independent t-test.

How do I report the result of a dependent t-test?

You need to report the test as follows:

where df is N – 1, where N = number of participants.

NOW, THAT WE ALREADY KNOW ABOUT THE PRINCIPLES BEHIND


SELECTING DEPENDENT T-TEST, LET US EXPLORE FURTHER BY
UNDERSTANDING HOW THE TEST IS RUN THROUGH MICROSOFT EXCEL.
How to do t-Tests in Excel?
For this example, imagine that we have a training program in Disaster
Preparedness, and we need to determine whether the difference between the mean
pretest score and the mean posttest
score is significantly different.
To perform a paired t-test in
Excel, arrange your data into two
columns so that each row represents
one person or item, as shown below.
Note that the analysis does not use the
subject’s ID number.

1. In Excel, click Data Analysis on the Data tab.


2. From the Data Analysis popup, choose t-Test: Paired Two Sample for Means.
3. Under Input, select the ranges for both Variable 1 and Variable 2.
4. In Hypothesized Mean Difference, you’ll typically enter zero. This value is the
null hypothesis value, which represents no effect. In this case, a mean
difference of zero represents no difference between the two methods, which is
no effect.
5. Check the Labels checkbox if you have meaningful variables labels in row 1.
This option helps make the output easier to interpret. Ensure that you include
the label row in step #3.
6. Excel uses a default
Alpha value of 0.05,
which is usually a good
value. Alpha is the
significance level.
Change this value only
when you have a
specific reason for doing
so.
7. Click OK.
For the example data, your popup should look like the image below:
The output indicates that mean
for the Pretest is 97.06223 and for the
Posttest it is 107.8346. If the p-value
is less than your significance level, the
difference between means is
statistically significant. Again, Excel
provides p-values for both one-tailed
and two-tailed t-tests—and we’ll stick
with the two-tailed result. For our
results, we’ll use P (T<=t) two-tail,
which is the p-value for the two-tailed
form of the t-test. Because our p-value
(0.002221) is less than the standard significance level of 0.05, we can reject the null
hypothesis. Our sample data support the hypothesis that the population means are
different. Specifically, the Posttest mean is greater than the Pretest mean.
MOVING FURTHER, LET US FIND OUT HOW TO TREAT CAUSAL
QUESTIONS CONSIDERING THREE OR MORE GROUPS OF SAMPLES.
Unlike previous discussions, in which we learned about determining
difference between two groups, either in paired or unpaired groups, here, we will
learn about determining mean differences among three or more groups.

THE ONE-WAY Analysis Of Variance


One-way ANOVA is a hypothesis test that allows you to compare three or
more group means.

How to run One-Way ANOVA in Excel?


One-way Analysis of Variance (ANOVA) follows the assumptions that one
categorical factor for the independent variable and a continuous variable for
the dependent variable are present. The values of the categorical factor divide
the continuous data into groups. The test determines whether the mean differences
between these groups are statistically significant. For example, if fertilizer type is
your categorical variable, you can determine whether the differences between plant
growth means for at least three fertilizers are statistically significant.
To perform one-way ANOVA in Excel, choose the option shown below.
The standard hypotheses for one-way ANOVA are the following:
(1) Null: All group means are equal.
(2) Alternative: Not all group means are equal.

If the p-value is less than your significance level (usually 0.05), reject the null
hypothesis. Your sample data support the hypothesis that the mean of at least one
population is different from the other population means.

Step-by-Step Instructions for Running a One Factor ANOVA in Excel

Let’s conduct a one-way ANOVA!


Our example scenario is that we are
comparing the strength of raw material
from four suppliers. Supplier is our
categorical independent variable (factor)
while strength is the continuous
dependent variable. We draw a random
sample of 10 units of material from each
supplier and measure the strength of all
units. Now, we want to determine whether
the mean strengths of the material from
the four suppliers are different.
To perform a one-way ANOVA in
Excel, arrange your data in columns, as shown below. For our example, each column
represents raw material from one supplier.
In Excel, do the following steps:
1. Click Data Analysis on the Data tab.
2. From the Data Analysis popup, choose Anova: Single Factor.
3. Under Input, select the ranges for all columns of data.
4. In Grouped By, choose Columns.
5. Check the Labels checkbox if you
have meaningful variables labels
in row 1. This option helps make
the output easier to interpret.
Ensure that you include the label
row in step #3.
6. Excel uses a default Alpha value
of 0.05, which is usually a good
value. Alpha is the significance
level. Change this value only when
you have a specific reason for
doing so.
7. Click OK.
Here’s how the popup should look:

Interpreting the One-Way ANOVA Results

The Summary table indicates that the mean strengths range from a low of
8.837952 for supplier 4 to a high of 11.20252 for supplier 1. Our sample means are
different. However, we need to determine whether our data support the notion that
the population means are not equal. The differences we see in our samples might be
the result of random sampling error.
In the ANOVA table, the p-value is 0.031054. Because this value is less than
our significance level of 0.05, we reject the null hypothesis. Our sample data provide
strong enough evidence to conclude that the four population means are not equal.

PEARSON PRODUCT-MOMENT CORRELATION

What does this test do?

The Pearson product-moment correlation coefficient (or Pearson correlation


coefficient, for short) is a measure of the strength of a linear association between two
variables and is denoted by r. Basically, a Pearson product-moment correlation
attempts to draw a line of best fit through the data of two variables, and the Pearson
correlation coefficient, r, indicates how far away all these data points are to this line
of best fit.

What values can the Pearson correlation coefficient take?

The Pearson correlation coefficient, r, can take a range of values from +1 to -


1. A value of 0 indicates that there is no association between the two variables. A
value greater than 0 indicates a positive association; that is, as the value of one
variable increases, so does the value of the other variable. A value less than 0
indicates a negative association; that is, as the value of one variable increases, the
value of the other variable decreases.
How can we determine the strength of association based on the Pearson correlation
coefficient?

The stronger the association of the two variables, the closer the Pearson
correlation coefficient, r, will be to either +1 or -1 depending on whether the
relationship is positive or negative, respectively. Achieving a value of +1 or -1 means
that all your data points are included on the line of best fit – there are no data points
that show any variation away from this line. Values for r between +1 and -1 (for
example, r = 0.8 or -0.4) indicate that there is variation around the line of best fit. The
closer the value of r to 0 the greater the variation around the line of best fit.

Are there guidelines to interpreting Pearson's correlation coefficient?

Yes, the following guidelines have been proposed:


Coefficient, r
Strength of Association Positive Negative
Small .1 to .3 -0.1 to -0.3
Medium .3 to .5 -0.3 to -0.5
Large .5 to 1.0 -0.5 to -1.0

Remember that these values are guidelines and whether an association is


strong or not will also depend on what you are measuring.

Can you use any type of variable for Pearson's correlation coefficient?

No, the two variables have to be measured on either an interval or ratio scale.
However, both variables do not need to be measured on the same scale (e.g., one
variable can be ratio and one can be interval).

What about dependent and independent variables?

The Pearson product-moment correlation does not take into consideration


whether a variable has been classified as a dependent or independent variable. It
treats all variables equally. For example, you might want to find out whether
basketball performance is correlated to a person's height. You might, therefore, plot a
graph of performance against height and calculate the Pearson correlation
coefficient. Let’s say, for example, that r = .67. That is, as height increases so does
basketball performance. This makes sense. However, if we plotted the variables the
other way around and wanted to determine whether a person's height was
determined by their basketball performance (which makes no sense), we would still
get r = .67. This is because the Pearson correlation coefficient makes no account of
any theory behind why you chose the two variables to compare.

What assumptions does Pearson's correlation make?

1. Your two variables should be measured on a continuous scale (i.e., they are
measured at the interval or ratio level). Examples of continuous variables include
revision time (measured in hours), intelligence (measured using IQ score), exam
performance (measured from 0 to 100), weight (measured in kg), driving speed
(measured in km/h) and so forth.

2. Your two continuous variables should be paired, which means that each case
(e.g., each participant) has two values: one for each variable. These "values" are
also referred to as "data points".

For example, imagine that you had collected the revision times (measured in
hours) and exam results (measured from 0 to 100) from 100 randomly sampled
students at a university (i.e., you have two continuous variables: "revision time" and
"exam performance"). Each of the 100 students would have a value for revision time
(e.g., "student #1" studied for "23 hours") and an exam result (e.g., "student #1"
scored "81 out of 100"). Therefore, you would have 100 paired values.

3. There should be independence of cases, which means that the two observations
for one case (e.g., the scores for revision time and exam performance for "student
#1") should be independent of the two observations for any other case (e.g., the
scores for revision time and exam performance for "student #2", or "student #3", or
"student #50", for example).

4. There should be a linear relationship between your two continuous variables. To


test to see whether your two variables form a linear relationship you simply need to
plot them on a graph (a scatter plot, for example) and visually inspect the graph's
shape. In the diagram below, you will find a few different examples of a linear
relationship and some non-linear relationships. It is not appropriate to analyze a non-
linear relationship using a Pearson product-moment correlation.

5. Theoretically, both continuous variables should follow a bivariate normal


distribution, although in practice it is frequently accepted that simply having univariate
normality in both variables is sufficient (i.e., each variable is normally distributed).

6. There should be homoscedasticity, which means that the variances along the line
of best fit remain similar as you move along the line. If the variances are not similar,
there is heteroscedasticity.

7. There should be no univariate or multivariate outliers. An outlier is an observation


within your sample that does not follow a similar pattern to the rest of your data.
Remember that in a Pearson’s correlation, each case (e.g., each participant) will
have two values/observations (e.g., a value for revision time and an exam score).
You need to consider outliers that are unusual only on one variable, known as
"univariate outliers", as well as those that are an unusual "combination" of both
variables, known as "multivariate outliers".

How to use PEARSON Function in Excel?

In Excel, there is a function available to calculate the Pearson correlation


coefficient. However, there is no simple means of calculating a p-value for this. A way
around this is to firstly calculate a t statistic which will then be used to determine the
p-value.

1. In Excel, click on an empty cell where you want the correlation coefficient to be
entered. Then enter the following formula:
=PEARSON (array1, array2)
Simply replace ‘array1‘ with the range of cells containing the first variable and
replace ‘array2‘ with the range of cells containing the second variable.

For the example above, the Pearson correlation coefficient (r) is ‘0.76‘.

2. Calculate the t-statistic from the coefficient value


The next step is to convert the Pearson correlation coefficient value to a t-
statistic. To do this, two components are required: r and the number of pairs in the
test (n).
In order to determine the number of pairs, simply count them manually or use
the count function (=COUNT). Each pair should be a pair, so remove any entries that
are not a pair.
The equation used to convert r to the t-statistic can be found below.

The formula to do this in Excel can be found below.

=(r*SQRT (n-2))/ (SQRT (1-r^2))

Simply replace the ‘r‘ with the correlation coefficient value and replace the ‘n‘
with the number of observations in the analysis.

For the example in this guide, the formula used in Excel can be seen below.

Note, if your coefficient value is negative, then use the following formula:

=(ABS(r)*SQRT(n-2))/(SQRT(1-ABS(r)^2))

The addition of the ABS function converts the coefficient value to an absolute
(positive) number. Otherwise, a negative coefficient value will bring up an error.

3. Calculate the p-value from the t statistic

The final step in the process of calculating the p-value for a Pearson
correlation test in Excel is to convert the t-statistic to a p-value.
Before this can be done, we just need to calculate a final piece of information:
the number of degrees of freedom (DF). The DF can be found by subtracting 2 from
n (n – 2).
Now we are ready to calculate the p-value. To do this, simply use
the =TDIST function in Excel.
=TDIST(x, deg_freedom, tails)
Replace the ‘x‘ with the t statistic
created previously and replace the
‘deg_freedom‘ with the DF. Finally, for
the tails, enter the number ‘1‘ for a one-
tailed analysis or a ‘2‘ for a two-tailed
analysis. If you are unsure about which to
use, use a two-tailed analysis (‘2‘).
Figure at the right is a screenshot
for how this looks in Excel by using the
example.
In the example, the p value is
‘0.006‘. Therefore, there is a significant
positive correlation (r=0.76) between
participant ages and their BMI.

EXERCISES

1. T-TEST FOR TWO SAMPLE MEANS (INDEPENDENT)

Below is a link to an Excel spreadsheet containing data on the sizes of oak


seedlings of two possible ecotypes (populations adapted to the local ecological
conditions where they evolved). One population (Local) originally came from a site
near the Twin Cities, the other (Southern) from a site in south-central Wisconsin. In
2012, when they were one year old and 20-30 cm tall, samples of both were planted
in the same habitat restoration site just to the east of the Twin Cities. The scientists
studying the seedlings wanted to know whether the two ecotypes were growing at
different rates, so measured them again in 2015, after three years of growth. These
data can be analyzed with a two-sample t-test because they are normally distributed
and are not paired (i.e., there is no sensible way of pairing any individual Local
seedling with a particular Southern seedling).

LINK:https://www.stthomas.edu/media/collegeofartsandsciences/biology/statspages/
FishCreekseedlings2015.xlsx

2. T-TEST FOR TWO MEANS (DEPENDENT)

Below is an Excel spreadsheet containing data on the root biomass in


samples of soil collected at two different depths (0-7 cm and 7-14 cm) from 38
different locations in an old pasture field. The ecologists who collected these data
were interested in comparing the amount of root material at different depths, and
since the data are clearly paired (the 0-7 cm depth sample from each location can be
paired with the 7-14 cm sample from the same location), a paired t-test is appropriate
for this analysis.
Root
Distance along Biomass
Transect transect 0-7 Root Biomass 7-14
T1 L2 6.224 2.175
T1 L3 8.93 1.863
T1 L5 6.201 1.533
T1 L6 5.064 2.021
T1 L7 3.617 2.084
T2 L3 3.979 1.58
T2 L5 5.467 1.35
T2 L6 4.233 1.031
T3 L1 4.905 1.34
T3 L2 5.143 2.104
T3 L3 3.126 1.886
T3 L4 5.739 1.604
T3 L5 6.667 1.341
T4 L1 4.088 1.611
T4 L2 5.373 3.017
T4 L3 4.326 1.257
T4 L4 4.235 0.634
T4 L5 5.142 2.61
T4 L6 3.6 0.719
T4 L7 5.327 1.067
T5 L1 4.024 1.972
T5 L2 4.48 1.603
T5 L3 4.87 1.654
T5 L4 7.415 2.947
T5 L6 6.283 1.414
T6 L1 5.12 1.189
T6 L2 3.162 0.688
T6 L3 5.273 2.877
T6 L4 5.516 1.743
T6 L5 5.436 2.055
T6 L6 4.932 1.919
T6 L7 6.711 1.337
T7 L1 3.611 2.027
T7 L2 5.835 1.488
T7 L3 3.254 1.027
T7 L4 3.579 1.992
T7 L6 6.522 2.642
T7 L7 4.731 1.318
Note that the data are arranged differently from the two sample (unpaired) t-
test described above. There, the data for the two samples of the dependent variable
(seedling height) were all in one column and the two categories of the independent
variable (Local vs. Southern) were in a second column. Here the data for the two
samples (0-7 cm and 7-14 cm depths) are in different columns.

3. ANOVA

Neuroscience researchers examined the impact of environment on rat


development. Rats were randomly assigned to be raised in one of the four following
test conditions: Impoverished (wire mesh cage - housed alone), standard (cage with
other rats), enriched (cage with other rats and toys), super enriched (cage with rats
and toys changes on a periodic basis). After two months, the rats were tested on a
variety of learning measures (including the number of trials to learn a maze to a three
perfect trial criteria), and several neurological measure (overall cortical weight,
degree of dendritic branching, etc.). The data for the maze task is below. Compute
the appropriate test for the data provided below.

Impoverished Standard Enriched Super Enriched


22 17 12 8
19 21 14 7
15 15 11 10
24 12 9 9
18 19 15 12

You might also like