You are on page 1of 51

Lecture Notes 8 – Testing of Hypothesis 1

Engr. Caesar Pobre Llapitan

Unit I
HYPOTHESIS TESTING

I. INTRODUCTION

We actually use two statistical hypotheses, the null hypothesis and the alternative hypothesis. In
other words, given a research question our next task is to state this question in the form of a null
hypothesis and an alternative hypothesis.

The null hypothesis states that a population parameter is equal to some specific value. The symbol for
the null hypothesis is H subzero, H0. Null stands for zero hence the symbol. The null hypothesis is
also thought of as the hypothesis of no difference. For example, the hypothesis of no difference
between the experimental group and the control group in an experiment.

The alternative hypothesis states that a population parameter is equal to some value other than that
stated by the null hypothesis. The alterative hypothesis is in the direction we would wish our
experiment to turn out and thus is really a statement of the research question in the form of a
statistical hypothesis. The symbol for the alternative hypothesis is H sub 1, H1, or H sub A, HA. In
these lessons we will use the H1 format although our text uses the HA format.

To summarize then, given a research problem, if we wish to test the significance of our results, we
must state our research question as a pair of statistical hypotheses. The null hypothesis, H 0, states that
a population parameter (usually the mean) is equal to some specific value. The alternative hypothesis,
H1, states that the population parameter is equal to some value other than that stated by the null
hypothesis.

Generally, the alternative hypothesis has one of three forms.


1. The selected parameter is greater than that specified by the null hypothesis.
2. The selected parameter is less than that specified by the null hypothesis.
3. The selected parameter is not equal to that specified by the null hypothesis.

This does seem like a rather backward process, stating our result as no result (the null hypothesis)
and then attempting to reject this hypothesis so that we can accept the alternative hypothesis. In 1935,
Sir Ronald Fisher (quoted in Couch, 1987) stated it as follows.

"In relation to any experiment we may speak of this hypothesis as the null hypothesis and it should be
noted that the null hypothesis is never proved or established, but is possibly disproved, in the course
of experimentation. Every experiment may be said to exist only in order to give the facts a chance of
disproving the null hypothesis."

Statistical Hypotheses
Be sure to read through the definitions for this section before trying to make sense out of the
following.

The first thing to do when given a claim is to write the claim mathematically (if possible), and decide
whether the given claim is the null or alternative hypothesis. If the given claim contains equality, or a
statement of no change from the given or accepted condition, then it is the null hypothesis,
otherwise, if it represents change, it is the alternative hypothesis.

The following example is not a mathematical example, but may help introduce the concept.

Example
"He's dead, Jim," said Dr. McCoy to Captain Kirk.
Lecture Notes 8 – Testing of Hypothesis 2
Engr. Caesar Pobre Llapitan

Mr. Spock, as the science officer, is put in charge of statistically determining the correctness of Bones'
statement and deciding the fate of the crew member (to vaporize or try to revive). His first step is to
arrive at the hypothesis to be tested.

Does the statement represent a change in previous condition?


 Yes, there is change, thus it is the alternative hypothesis, H1
 No, there is no change, therefore is the null hypothesis, H0

The correct answer is that there is change. Dead represents a change from the accepted state of alive.
The null hypothesis always represents no change. Therefore, the hypotheses are:
 H0 : Patient is alive.
 H1 : Patient is not alive (dead).

States of nature are something that you, as a statistician have no control over. Either it is, or it isn't.
This represents the true nature of things.

Possible states of nature (Based on H0)


 Patient is alive (H0 true - H1 false )
 Patient is dead (H0 false - H1 true)

Decisions are something that you have control over. You may make a correct decision or an incorrect
decision. It depends on the state of nature as to whether your decision is correct or in error.

Possible decisions (Based on H0) / Conclusions (Based on claim )


 Reject H0 / "Sufficient evidence to say patient is dead"
 Fail to Reject H0 / "Insufficient evidence to say patient is dead"

There are four possibilities that can occur based on the two possible states of nature and the two
decisions which we can make.
Statisticians will never accept the null hypothesis, we will fail to reject. In other words, we'll say that it
isn't, or that we don't have enough evidence to say that it isn't, but we'll never say that it is, because
someone else might come along with another sample which shows that it isn't and we don't want to
be wrong.

Statistically (double) speaking


State of Nature

Decision H0 True H0 False

Reject H0 Patient is alive, Patient is dead,


Sufficient evidence of death Sufficient evidence of death

Fail to reject H0 Patient is alive, Patient is dead,


Insufficient evidence of death Insufficient evidence of death

In English
State of Nature

Decision H0 True H0 False

Reject H0 Vaporize a live person Vaporize a dead person

Fail to reject H0 Try to revive a live person Try to revive a dead person
Lecture Notes 8 – Testing of Hypothesis 3
Engr. Caesar Pobre Llapitan

Were you right?


State of Nature

Decision H0 True H0 False

Reject H0 Type I Error Correct Assessment


alpha

Fail to reject H0 Correct Assessment Type II Error


beta

Which of the two errors is more serious? Type I or Type II?

Since Type I is the more serious error (usually), that is the one we concentrate on. We usually pick
alpha to be very small (0.05, 0.01). Note: alpha is not a Type I error. Alpha is the probability of
committing a Type I error. Likewise, beta is the probability of committing a Type II error.

Conclusions
Conclusions are sentence answers which include whether there is enough evidence or not (based on
the decision), the level of significance, and whether the original claim is supported or rejected.
Conclusions are based on the original claim, which may be the null or alternative hypotheses. The
decisions are always based on the null hypothesis

Original Claim

H0 H1
Decision "REJECT" "SUPPORT"

Reject H0 There is sufficient evidence at the There is sufficient evidence at the


"SUFFICIENT" alpha level of significance to reject alpha level of significance to
the claim that (insert original claim support the claim that (insert
here) original claim here)

Fail to reject H0 There is insufficient evidence at There is insufficient evidence at


"INSUFFICIENT" the alpha level of significance to the alpha level of significance to
reject the claim that (insert original support the claim that (insert
claim here) original claim here)

Activity in Stating Statistical Hypotheses


For each of the following research problems, state the null hypothesis and the alternative hypothesis.
1. Sample one bottle to see if the beer machine is putting too much beer in the bottle.  = 16.02
ounces.
In this problem we are taking a sample of one observation and seeing how it compares with
the population value. We are only interested in the case where the machine is putting too
much beer into the bottle so our alternative hypothesis will only involve the greater than case.
Our two hypotheses for this problem are:
H0:  = 16.02
H1:  > 16.02

2. Is the mean time for running the mile for a group of 20 joggers undergoing a fitness program
significantly less than the population value of 10 minutes ( = 10)?
In this research problem we are comparing the mean time for a group of joggers with a known
population mean. So our null hypothesis will be
H0:  = 10
Lecture Notes 8 – Testing of Hypothesis 4
Engr. Caesar Pobre Llapitan

There are three possibilities for the alternative hypothesis.


1. The joggers mean is less than the population mean.
2. The joggers mean is greater than the population mean.
3. The joggers mean is not equal to the population mean, that is to say that it could
either by less than or be greater than.

In this particular case we are concerned with the joggers time being less than the population
value so our alternative hypothesis would be:
H1:  < 10

3. Is there a significant difference is tested IQ between a group of students who have taken a
special program aimed at changing cognitive functioning as compared with a matched control
group who did not participate in the special program?
In this problem, a very common one in research, we are comparing the means of two groups,
an experimental group and a control group. In this case the null hypothesis would be stated as
H0: 1 = 2

Again, there are three possibilities for the alternative hypothesis.


1. The mean of the first group is higher than that of the second group.
2. The mean of the first group is lower than that of the second group.
3. The mean of the first group is different than the mean of the second group.

Since we are interested in the case where there is a significant difference between the two
groups, we would select the third option and the alternative hypothesis would be
H1: 1  2
4. Is there a significant correlation between reading and spelling for fifth grade pupils?
In this research problem we are dealing with the significance of a correlation coefficient. We
have a single group of students measured on two variables (reading and spelling). The null
hypothesis, the hypothesis of no difference, would be that the correlation is zero. The Greek
letter rho is used to represent the population correlation, so the null hypothesis would be
H0:  = 0

In this problem we are asking if there is a significant correlation between reading and spelling.
We did not specify whether we wanted a significant positive correlation or a significant
negative correlation. Thus, our alternative hypothesis would look for a correlation that was
significantly different than zero in either direction or significantly not equal to zero. The
alternative hypothesis for this problem would be
H1:   0

II. TYPE OF TESTS

The Decision-Making Process


The process of statistical decision making for research involves setting up a null hypothesis and then
either rejecting or failing to reject the null hypothesis. If we fail to reject the null hypothesis, that is
the end of the process. When we fail to reject the null hypothesis, we can say that the results of our
experiment are not significant. We could also say that our results are inconclusive. However, if we
reject the null hypothesis, we can then accept the alternative hypothesis, and indicate that our results
were significant. There are generally three ways we can talk about the significance of the results based
on how the alternative hypothesis was stated.
1. The experimental group is significantly higher than the control group or the correlation is a
significant positive correlation.
Lecture Notes 8 – Testing of Hypothesis 5
Engr. Caesar Pobre Llapitan

2. The experimental group is significantly lower than the control group or the correlation is a
significant negative correlation.
3. The experimental group is significantly different than the control group or the correlation is
significant without regard to direction.

The last form of the alternative hypothesis is referred to as a two-tailed test; that is, it can be
significant in either of two directions. The other two options are referred to as one-tailed tests. We
only look for significance in a single direction.

The type of test is determined by the Alternative Hypothesis (H1)

Left Tailed Test


H1: parameter < value
Notice the inequality points to the left
Decision Rule: Reject H0 if t.s. < c.v.

Right Tailed Test


H1: parameter > value
Notice the inequality points to the right
Decision Rule: Reject H0 if t.s. > c.v.

Two Tailed Test


H1: parameter not equal value
Another way to write not equal is < or >
Notice the inequality points to both sides
Decision Rule: Reject H0 if t.s. < c.v. (left) or
t.s. > c.v. (right)

The decision rule can be summarized as follows:


Reject H0 if the test statistic falls in the critical region
Reject H0 if the test statistic is more extreme than the critical value

Types of Error
Let’s take the case in which the null hypothesis, H 0, is true (there truly is no difference between the
two groups). In this case if we reject H 0 we are making an error. This type of error (rejecting H 0 when
we shouldn't have) is referred to as a type I error. It is also referred to as the alpha level or
significance level of the experiment. This type of error can be controlled by the experimenter as he or
she sets the significance level of the experiment. A common level for alpha is .05 or the 5% level.
Another way of thinking of the alpha level is that it is the probability of making a type I error. So, if we
set  = 0.5 we are saying that we are willing to make a type I error 5% of the time.

On the other hand, if we fail to reject H 0 when it is in fact true, we are making the correct decision.
With alpha at .05 we would expect to do so 95% of the time.

Now let's take the case where the true status of the null hypothesis is false. In that case we should
reject it. To reject a false H0 is the correct decision. On the other hand, if H 0 is false and we fail to
reject it then we are making an error. This type of error (failing to reject H 0 when we should have
rejected it) is referred to as a type II error. Beta is used as the symbol for the probability of making a
type II error. Beta, the probability of making a type two error, cannot be set by the experimenter as
can the alpha level, but beta is related to alpha. The higher the alpha level is set (here we mean a less
Lecture Notes 8 – Testing of Hypothesis 6
Engr. Caesar Pobre Llapitan

probable setting, .01 is higher than .05, and .001 is higher than .01) the more likely it is that we will
make a type II error (or the higher the beta level is). The lower the alpha level (.05 rather than .01) the
less likely we are to make a type II error. We are in somewhat of a dilemma here. If we set alpha high
then we are less likely to make a type I error but are more likely to make a type II error. On the other
hand, if we set the alpha level low, we are more likely to make a type I error but less likely to make a
type II error.

This is confusing to the potential researcher, but one way of getting around it is just to set your alpha
level at .05 (and not at .01 or .001). In this way you are balancing the relationship between type I and
type II errors in your decision-making process.

The information we have discussed is summarized in the following table.

Null Hypothesis Decision Table


True Status of Null Hypothesis
H0 is True H0 is False
Type I Error Correct
Reject H0
 level Decision
Decision
Fail to Correct Type II Error
Reject H0 Decision β level

As a final thought we might also add that although we cannot control type II error (beta level) directly
except by lowering the alpha level, different statistics, at the same alpha level are more resistant to
causing type II error. This characteristic of a statistic is called the power of a statistic. A more
powerful statistic is less likely to yield a type II error. The power of a statistic is one minus beta, it is
the tendency of a statistic not to make a type II error.
β
Power = 1 -

III. CONFIDENCE INTERVALS AS TESTS

Using the confidence interval to perform a hypothesis test only works with a two-tailed test.
 If the hypothesized value of the parameter lies within the confidence interval with a 1-alpha
level of confidence, then the decision at an alpha level of significance is to fail to reject the null
hypothesis.
 If the hypothesized value of the parameter lies outside the confidence interval with a 1-alpha
level of confidence, then the decision at an alpha level of significance is to reject the null
hypothesis.

However, it has a couple of problems.


 It only works with two-tail hypothesis tests.
 It requires that you compute the confidence interval first. This involves taking a z-score or t-
score and converting it into an x-score, which is more difficult than standardizing an x-score.

IV. HYPOTHESIS TESTING STEPS

Here are the steps to performing hypothesis testing


1. Write the original claim and identify whether it is the null hypothesis or the alternative
hypothesis.
2. Write the null and alternative hypothesis. Use the alternative hypothesis to identify the type
of test.
3. Write down all information from the problem.
4. Find the critical value using the tables
Lecture Notes 8 – Testing of Hypothesis 7
Engr. Caesar Pobre Llapitan

5. Compute the test statistic


6. Make a decision to reject or fail to reject the null hypothesis. A picture showing the critical
value and test statistic may be useful.
7. Write the conclusion.

V. TESTING A SINGLE MEAN

You are testing mu, you are not testing x bar. If you knew the value of mu, then there would be
nothing to test.

All hypothesis testing is done under the assumption the null hypothesis is true!

The value for all population parameters in the test statistics come from the null hypothesis. This is
true not only for means, but all of the testing we're going to be doing.

Population Standard Deviation Known


x̄ − μ
z=
σ
√n
If the population standard deviation, sigma, is known, then the population mean has a normal
distribution, and you will be using the z-score formula for sample means. The test statistic is the
standard formula you've seen before.
The critical value is obtained from the normal table, or the bottom line from the t-table.

Population Standard Deviation Unknown


x̄ − μ
t=
s
√n
If the population standard deviation, sigma, is unknown, then the population mean has a student's t
distribution, and you will be using the t-score formula for sample means. The test statistic is very
similar to that for the z-score, except that sigma has been replaced by s and z has been replaced by t.

The critical value is obtained from the t-table. The degrees of freedom for this test is n - 1.

General Pattern
Notice the general pattern of these test statistics is (observed - expected) / standard deviation.

Observed − Expected
Test Statistic =
Standard Deviation
Hypothesis Test: Pi = 3.2?
In 1897, legislature was introduced in Indiana which would make 3.2 the official value of pi for the
State. Now, that sounds ridiculous, but is it really?

Claim: Pi is 3.2.
To test the claim, we're going to generate a whole bunch of values for pi, and then test to see if the
mean is 3.2.
H0 :  = 3.2 (original claim)
H1 :  <> 3.2 (two tail test)
Lecture Notes 8 – Testing of Hypothesis 8
Engr. Caesar Pobre Llapitan

Procedure:
The area of the unit circle is pi. The area of the unit circle in the first quadrant is pi/4. The calculator
generates random numbers between 0 and 1. What we're going to do is generate two random numbers
which will simulate a randomly selected point in a unit square in the first quadrant. If the point is
within the circle, then the distance from (0, 0) will be less than or equal to 1, if the point is outside the
circle, the distance will be greater than 1.

Have the calculator generate a squared distance from zero (the square of the distance illustrates the
same properties as far as being less than 1 or greater than 1). Do this 25 times. Each time, record
whether the point is inside the circle (<1) or outside the circle (>1).

RAND^2 + RAND^2

Pi/4 is approximately equal to the ratio of the points inside the circle to the total number of points.
Therefore, pi will be 4 times the ratio of the points inside the circle to the total number of points.

This whole process is repeated several times, and the mean and standard deviation is recorded.

The hypothesis test is then conducted using the t-test to see if the true mean is 3.2 (based on the
sample mean).

Example:
20 values for pi were generated by generating 25 pairs of random numbers and checking to see if they
were inside or outside the circle as illustrated above.

3.68 3.20 3.04 2.56 3.36

3.36 3.36 3.52 3.04 3.20

3.52 3.36 3.04 2.72 3.36

3.52 2.88 2.88 3.68 2.60

The mean of the sample is 3.194, the standard deviation is 0.3384857923.

The test statistic t = (3.194 - 3.2) / (0.3384857293/sqrt(20)) = -0.0792730931

The critical value, with an 0.05 level of significance since none was stated, for a two-tail test with 19
degrees of freedom is t = +/- 2.093.

Since the test statistic is not in the critical region, the decision is fail to reject the null hypothesis

There is insufficient evidence at the 0.05 level of significance to reject the claim that pi is 3.2.

Note the double speak, but it serves to illustrate the point. We would not dare to claim that pi was 3.2,
even though this sample seems to illustrate this. The sample doesn't provide enough evidence to show
it's not 3.2, but there may be another sample somewhere which does provide enough evidence (let's
hope so). So, we won't say it is 3.2, just that we don't have enough evidence to prove it isn't 3.2.

VI. TESTING A SINGLE PROPORTION


Lecture Notes 8 – Testing of Hypothesis 9
Engr. Caesar Pobre Llapitan

You are testing p, you are not testing p hat. If you knew the value of p, then there would be nothing to
test.

All hypothesis testing is done under the assumption the null hypothesis is true!

The value for all population parameters in the test statistics come from the null hypothesis. This is
true not only for proportions, but all of the testing we're going to be doing.

^p − p
z=
pq
√ n

The population proportion has an approximately normal distribution if np and nq are both at least 5.
Remember that we are approximating the binomial using the normal, and that the p we're talking
about is the probability of success on a single trial. The test statistic is shown in the box to the right.

The critical value is found from the normal table, or from the bottom row of the t-table.

The steps involved in the hypothesis testing remain the same. The only thing that changes is the
formula for calculating the test statistic and perhaps the distribution which is used.

General Pattern
Notice the general pattern of these test statistics is (observed - expected) / standard deviation.

Observed − Expected
Test Statistic =
Standard Deviation
VII.PROBABILITY VALUES

Classical Approach
The Classical Approach to hypothesis testing is to compare a test statistic and a critical value. It is best
used for distributions which give areas and require you to look up the critical value (like the Student's
t distribution) rather than distributions which have you look up a test statistic to find an area (like the
normal distribution).

The Classical Approach also has three different decision rules, depending on whether it is a left tail,
right tail, or two tail tests.

One problem with the Classical Approach is that if a different level of significance is desired, a
different critical value must be read from the table.

p-Value Approach
The P-Value Approach, short for Probability Value, approaches hypothesis testing from a different
manner. Instead of comparing z-scores or t-scores as in the classical approach, you're comparing
probabilities, or areas.
The level of significance (alpha) is the area in the critical region. That is, the area in the tails to the
right or left of the critical values.

The p-value is the area to the right or left of the test statistic. If it is a two-tail test, then look up the
probability in one tail and double it.
Lecture Notes 8 – Testing of Hypothesis 10
Engr. Caesar Pobre Llapitan

If the test statistic is in the critical region, then the p-value will be less than the level of significance. It
does not matter whether it is a left tail, right tail, or two tail tests. This rule always holds.

Reject the null hypothesis if the p-value is less than the level of significance.

You will fail to reject the null hypothesis if the p-value is greater than or equal to the level of
significance.

The p-value approach is best suited for the normal distribution when doing calculations by hand.
However, many statistical packages will give the p-value but not the critical value. This is because it is
easier for a computer or calculator to find the probability than it is to find the critical value.

Another benefit of the p-value is that the statistician immediately knows at what level the testing
becomes significant. That is, a p-value of 0.06 would be rejected at an 0.10 level of significance, but it
would fail to reject at an 0.05 level of significance. Warning: Do not decide on the level of significance
after calculating the test statistic and finding the p-value.

Here is a proportion to help you keep the order straight. Any proportion equivalent to the following
statement is correct.

The test statistic is to the p-value as the critical value is to the level of significance.

Definitions of Key Terms


Null Hypothesis (H0)
 Statement of zero or no change. If the original claim includes equality (<=, =, or >=), it is the
null hypothesis.
 If the original claim does not include equality (<, not equal, >) then the null hypothesis is the
complement of the original claim.
 The null hypothesis always includes the equal sign. The decision is based on the null
hypothesis.

Alternative Hypothesis (H1 or Ha)


 Statement which is true if the null hypothesis is false. The type of test (left, right, or two-tail)
is based on the alternative hypothesis.

Type I error
 Rejecting the null hypothesis when it is true (saying false when true). Usually the more serious
error.

Type II error
 Failing to reject the null hypothesis when it is false (saying true when false).

alpha
 Probability of committing a Type I error.

beta
 Probability of committing a Type II error.

Test statistic
 Sample statistic used to decide whether to reject or fail to reject the null hypothesis.

Critical region
 Set of all values which would cause us to reject H0
Lecture Notes 8 – Testing of Hypothesis 11
Engr. Caesar Pobre Llapitan

Critical value(s)
 The value(s) which separate the critical region from the non-critical region.
 The critical values are determined independently of the sample statistics.

Significance level (alpha)


 The probability of rejecting the null hypothesis when it is true. alpha = 0.05 and alpha = 0.01
are common.
 If no level of significance is given, use alpha = 0.05.
 The level of significance is the complement of the level of confidence in estimation.

Decision
 A statement based upon the null hypothesis. It is either "reject the null hypothesis" or "fail to
reject the null hypothesis". We will never accept the null hypothesis.

Conclusion
 A statement which indicates the level of evidence (sufficient or insufficient), at what level of
significance, and whether the original claim is rejected (null) or supported (alternative).
Lecture Notes 8 – Testing of Hypothesis 12
Engr. Caesar Pobre Llapitan

Unit II
ONE SAMPLE STATISTICAL TESTS

I. INTRODUCTION TO SINGLE SAMPLE TESTS

In our last lesson we looked at the process for making inferences about research. In this context we
looked at the significance of a single score. We wanted to see if a score differed significantly from a
population value. To test statistical hypotheses involving a single score we calculated the scores Z-
score. We referred to this as the Z-score test. As a reminder the formula for the Z-score (or the Z-score
test) was
X− μ
Z=
σ

In this lesson we are going to move on and look at inferential statistics to test hypotheses concerned
with comparing a single sample (instead of a single score) with some population parameter. We will
discuss two statistics to use with single samples.
1. We may wish to compare a sample mean with the population mean when the population
standard deviation is known. In that case we will use the Z-test. Do not confuse the Z-test
(used for a single sample) with the Z-score test (used with a single score).
2. If we wish to compare a sample mean with the population mean when the population
standard deviation is not known we use the one-sample t-test.

II. THE Z-TEST

Research Problem: We randomly select a group of 9 subjects from a population with a mean IQ of 100
and standard deviation of 15 ( μ=100 , σ =15 ).

We give the subjects intensive "Get Smart" training and then administer an IQ test. The sample mean
IQ is 113 and the sample standard deviation is 10. Did the training result in a significant increase in IQ
score?

In this problem we see that we have a single sample and we wish to compare the sample mean with a
population mean. We know what the population standard deviation is. From what we have said we
can see that the inferential statistic we need to use here is the Z-test.
The formula for the Z-test is

X̄ − μ X̄
Z=
σ X̄

where X̄ is the sample mean (113 in our problem)

μX is the sampling distribution of the mean.

The sampling distribution of the mean is the mean of a set of many sample means taken from a
population. It is the mean of all the means. In practice the sampling distribution of the mean is the
same as the population mean, so we can use μ instead of
μ X . In our problem the population
mean is 100.
Lecture Notes 8 – Testing of Hypothesis 13
Engr. Caesar Pobre Llapitan

σ X is the standard error of the mean. It is the standard deviation of many sample means.
Unfortunately for us the standard error of the mean does not equal the population standard deviation
but instead is equal to the population standard deviation (sigma) divided by the square root of the
sample size (n). So for our problem

σ 15
σ X= = =5
√n √9
We are now ready to calculate the value of Z for our problem. We have all the information we need:
1. The sample mean = 113
2. The sampling distribution of the mean, which equals the population mean = 100
3. The standard error of the mean, which is the population standard deviation divided by the
square root of the sample size = 5

So the value of Z for our problem is

X−μ X 113−100
Z= = = 2 .6
σX 5

We can now go ahead and complete the six step process for testing statistical hypotheses for our
research problem.

Example: Z-test in the process to test statistical hypotheses for a research problem

Research Problem: We randomly select a group of 9 subjects from a population with a mean IQ of 100
and standard deviation of 15 ( = 100,  = 15).

We give the subjects intensive "Get Smart" training and then administer an IQ test. The sample mean
IQ is 113 and the sample standard deviation is 10. Did the training result in a significant increase in IQ
score?

The research question for this experiment is - Does training subjects with the Get Smart training
program, increase their IQ significantly over the average IQ for the general population? We will use
the six-step process to test statistical hypotheses for this research problem.
1. State null hypothesis and alternative hypothesis:
H0:  = 100
H1:  > 100

2. Set the alpha level:  = 0.05

3. Calculate the value of the proper statistic:


Since this problem involves comparing a single group's mean with the population mean and
the standard deviation for the population is known, the proper statistical test to use is the Z-
test.
Z = 2.6

4. State the rule for rejecting the null hypothesis:

We need to find the value of Z that will only be exceeded 5% of the time since we have set our
alpha level at .05. Since the Z score is normally distributed (or has the Z distribution), we can
find this 5% level by looking at the table in Appendix A in the textbook. We look for .45 in
Lecture Notes 8 – Testing of Hypothesis 14
Engr. Caesar Pobre Llapitan

column 2 (area from the mean to Z) since that point would have 5% of the scores at or higher
than it. The associated Z-score would be 1.64 (or 1.65).

Our rejection rule then would be: Reject H0 if Z  1.64.

5. Decision: Reject H0, p < .05, one-tailed.


Our decision rule said reject H0 if the Z value is equal to or greater than 1.64. Our Z value was
2.6 and 2.6 is greater than 1.64 so we reject H0. We also add to the decision the alpha level (p <
.05) and the tailedness of the test (one-tailed).

6. Statement of results: The average IQ of the group taking the Get Smart training program is
significantly higher than that of the general population.

If we reject the null hypothesis, we accept the alternative hypothesis. The statement of results
then states the alternative hypothesis which is the research question stated in the affirmative
manner.

We mentioned that we use the Z-test to compare the mean of a sample with the population mean
when the population standard deviation is known. We will now turn to the statistic to use when the
standard deviation of the population is not known, the one-sample t-test.

III. THE ONE-SAMPLE T-TEST

Consider the following research problem: We have a random sample of 25 fifth grade pupils who can
do 15 pushups on the average, with a standard deviation of 9, after completing a special physical
education program. Does this value of 15 differ significantly from the population value of 12?
In this problem we are comparing a sample mean with a population mean but we do not know the
population standard deviation. We can't use the Z-test in this case but we can use the one-sample t-
test. The one sample t-test does not require the population standard deviation. The formula for the
one-sample t-test is

X̄−μ
t=
S X̄

where X̄ is the sample mean, μ is the population mean, and S X̄ is the sample estimate of the
standard error of the mean.

In the problem we are considering, we do not know the population standard deviation (or the
standard error of the mean) so we estimate it from the sample data. The sample estimate of the
standard error of the mean is based on S (the sample standard deviation) and the square root of n (the
sample size).

S
S X̄ =
√n
If you look back at the research problem you will see that we have all the data we need to calculate
the value of t.
The sample mean, X̄ is 15.

The population mean, μ is 12.

The sample standard deviation, S is 9.


Lecture Notes 8 – Testing of Hypothesis 15
Engr. Caesar Pobre Llapitan

The sample size, n is 25.

We can thus calculate the value of t as follows:


S 9
SX    1.8
n 15
X   15  12
t   1.667
SX 1.8

The t statistic is not distributed normally like the z statistic is but is distributed as (guess what) the t-
distribution, also referred to as student's distribution. We will use this distribution when we do the
six-step process for testing statistical hypothesis. To use the table for the t-distribution we need to
know one other piece of information and that is the degrees of freedom for the one sample t-test.

Degree of freedom is a mathematical concept that involves the amount of freedom you have to
substitute various values in an equation. For example, say we have three numbers that add up to 44.
For the first two numbers we are free to use any numbers we wish but when we get to the third
number, we do not have any freedom of choice if the sum is to be 44. Therefore, we say that with the
three numbers we have two degrees of freedom.

For the one-sample t statistic the degrees of freedom (df) are equal to the sample size minus 1, or for
our research problem:

df  n  1  25  1  24

To put all this information together go ahead and look at the example problem using the one-sample
t-test.

Example: One-sample t-test in the process to test statistical hypotheses for a research problem
Research Problem: We have a random sample of 25 fifth grade pupils who can do 15 pushups on the
average, with a standard deviation of 9, after completing a special physical education program. Does
this value of 15 differ significantly from the population value of 12?

The research question for this experiment is - Does having students complete a special physical
education program result in a significantly different number of pushups they can do, as compared
with the population average? We will use the six-step process to test statistical hypotheses for this
research problem.
1. State null hypothesis and alternative hypothesis:
H0:  = 12
H1:   12

In this problem the research question does not indicate the desired direction of the result, so
the alternative hypothesis will use the not equal choice. This means that a significant result
could either be significantly greater than the population value or significantly less than the
population value.

2. Set the alpha level:  = 0.05

3. Calculate the value of the proper statistic:


Since this problem involves comparing a single group's mean with the population mean and
the standard deviation for the population is not known, the proper statistical test to use is the
one-sample t-test.
t = 1.667
Lecture Notes 8 – Testing of Hypothesis 16
Engr. Caesar Pobre Llapitan

df = 24

4. State the rule for rejecting the null hypothesis:


We need to find the value of t that will only be exceeded 5% of the time in either direction if
we set the alpha level at .05. So that means we will be looking for the upper 2.5% of the
distribution and the lower 2.5% of the distribution (2.5 + 2.5 = 5%). In other words, we will be
using a two-tailed test. To find the significant values of t we use the table in Appendix C of the
text book (page 318 - Distribution of t). If we enter the table under the heading "Level of
significance for two-tailed test" and .05 we read down the column until we come to the row
that has 24 degrees of freedom (See df column to the left of the table). We see that the table
value of t is 2.064, so this is the value we will use for our rejection rule.

Our rejection rule then would be: Reject H0 if t or if t  -2.064

Look at this rejection rule carefully as this is the general way to state the rejection rule for a
two-tailed test.

5. Decision: Fail to reject H0


Since the calculated value of t (1.667) is not greater than 2.064 nor less than -2.064, we can not
reject the null hypothesis.

Since our decision was to fail to reject H 0, we do not have to add the alpha level or the
tailedness of the test as we did when we rejected H0.

6. Statement of results: The number of pushups done by a group of fifth grade pupils who have
participated in a special physical education program, does not, differ significantly from the
population average.
Lecture Notes 8 – Testing of Hypothesis 17
Engr. Caesar Pobre Llapitan

Unit III
TWO PARAMETER TESTING

I. INTRODUCTION TO TWO SAMPLE TESTS

The two-sample t-tests we will consider are


1. The independent t-test which is used to compare two samples means when the two samples
are independent of one another.
2. The non-independent or dependent t-test which is used for matched samples (where the two
samples are not independent of one another as they are matched) and for pre-test/post-test
comparisons where the pre-test and post-test are taken on the same group of subjects.

In this lesson we will consider the independent t-test, and in the next lesson we will consider the
dependent t-test.

II. THE INDEPENDENT T-TEST

The independent t-test, as we have already mentioned is used when we wish to compare the statistical
significance of a possible difference between the means of two groups on some independent variable
and the two groups are independent of one another.

The formula for the independent t-test is

X 1 −X 2
t=
SS 1 +SS 2
√( n1 + n2 −2 )( 1 1
+
n1 n2 )
where
X1 is the mean for group 1,
X2 is the mean for group 2,
SS1 is the sum of squares for group 1,
SS2 is the sum of squares for group 2,
n1 is the number of subjects in group 1, and
n2 is the number of subjects in group 2.

The sum of squares is a new way of looking at variance. It gives us an indication of how spread out the
scores in a sample is. The t-value we are finding is the difference between the two means divided by
their sum of squares and taking the degrees of freedom into consideration.
  X1 
2

SS1   X 12 
n1

and
  X2 
2

SS2   X 22 
n2

We can see that each sum of squares is the sum of the squared scores in the sample minus the sum of
the scores quantity squared divided by the size of the sample (n).
Lecture Notes 8 – Testing of Hypothesis 18
Engr. Caesar Pobre Llapitan

So, to calculate the independent-t value we need to know:


1. The mean for sample or group 1
2. The mean for sample or group 2
3. The summation X and summation X squared for group 1
4. The summation X and summation X squared for group 2
5. The sample size for group 1 (n1)
6. The sample size for group 2 (n2)

We also need to know the degrees of freedom for the independent t-test which is:
df = n1 + n2 – 2

Let's do a sample problem using the independent t-test.

Example: Using the independent t-test


Research Problem: Job satisfaction as a function of work schedule was investigated in two different
factories. In the first factory the employees are on a fixed shift system while in the second factory the
workers have a rotating shift system. Under the fixed shift system, a worker always works the same
shift, while under the rotating shift system, a worker rotates through the three shifts. Using the scores
below determine if there is a significant difference in job satisfaction between the two groups of
workers.
Work Satisfaction Scores for Two Groups of Workers
Fixed Shift Rotating Shift
79 63
83 71
68 46
59 57
81 53
76 46
80 57
74 76
58 52
49 68
68 73

In this problem we see that we have two samples and the samples are independent of one another.
We can see that the inferential statistic we need to use here is the independent t-test.

We can calculate the quantities we need to solve this problem as follows:

Worksheet to calculate independent t-test value.


X1 (X1)2 X2 (X2)2
79 6241 63 3969
83 6889 71 5041
68 4624 46 2116
59 3481 57 3249
81 6561 53 2809
76 5776 46 2116
80 6400 57 3249
74 5476 76 5776
Lecture Notes 8 – Testing of Hypothesis 19
Engr. Caesar Pobre Llapitan

58 3364 52 2704
49 2401 68 4624
68 4 624 73 5329
------ ------ ------ ------
775 55837 662 40982

We can use the totals from this worksheet and the number of subjects in each group to calculate the
sum of squares for group 1, the sum of squares for group 2, the mean for group 1, the mean for group 2,
and the value for the independent t.

  X1 
2
(775)2
SS1   X 12   55837   1234.73
n1 11

  X2 
2
(662)2
SS1   X 22   40982   1141.64
n2 11
775 662
X1   70.45 X2   60.18
11 11
X1  X2 70.45  60.18
t   2.209
 SS1  SS2  1 1   1234.73  1141.64  1 1 
       
 n1  n2  2  n1 n2   11  11  2  11 11 

We now have the information we need to complete the six step statistical inference process for our
research problem.
1. State the null hypothesis and the alternative hypothesis based on your research
question.
H 0 : μ1 = μ 2
H 1 : μ1 ≠ μ2

Note: Our problem did not state which direction of significance we will be looking for;
therefore, we will be looking for a significant difference between the two means in either
direction.

2. Set the alpha level.


 = 0.5

Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a type I
error.

3. Calculate the value of the appropriate statistic. Also indicate the degrees of freedom
for the statistical test if necessary.
t = 2.209
df = n1 + n2 - 2 = 11 + 11 - 2 = 20

Note: We have calculated the t-value and will also need to know the degrees of freedom when
we go to look up the critical values of t.

4. Write the decision rule for rejecting the null hypothesis.


Reject H0 if t is >= 2.086 or if t <= -2.086
Lecture Notes 8 – Testing of Hypothesis 20
Engr. Caesar Pobre Llapitan

Note: To write the decision rule we need to know the critical value for t, with an alpha level
of .05 and a two-tailed test. We can do this by looking at Appendix C (Distribution of t) on
page 318 of the text book. Look for the column of the table under .05 for Level of significance
for two-tailed tests, read down the column until you are level with 20 in the df column, and
you will find the critical value of t which is 2.086. That means our result is significant if the
calculated t value is less than or equal to -2.086 or is greater than or equal to 2.086.

5. Write a summary statement based on the decision.


Reject H0, p < .05, two-tailed

Note: Since our calculated value of t (2.209) is greater than or equal to 2.086, we reject the null
hypothesis and accept the alternative hypothesis.

6. Write a statement of results in standard English.


There is a significant difference in job satisfaction between the two groups of workers.

Additional problem using the independent t-test


Research Problem: A new test preparation company, called Bright Future (BF), wants to convince high
school students studying for the American College Testing (ACT) assessment test that enrolling in
their test preparation course would significantly improve the students' ACT scores. BF selects 10
students at random and assigns five to the experimental group and five to the control group. The
experimental group students participate in the test preparation course conducted by BF. At the
conclusion of the course, both groups of students take the ACT test form which was given to high
school students the previous year. BF conducts a t-test for independent samples to compare the
scores of Group 1 (Experimental, E) to those of Group 2 (Control, C).

ACT Scores of Experimental and Control Groups


Experimental Group Control Group
23 17
18 19
26 21
32 14
21 19

1. State the null hypothesis and the alternative hypothesis based on your research
question.
H0: 1 = 2
H1: 1 > 2

2. Set the alpha level.  = 0.05

3. Calculate the value of the appropriate statistic. Also indicate the degrees of freedom
for the statistical test if necessary.
t = 2.252
df = n1 + n2 - 2 = 5 + 5 - 2 = 8

4. Write the decision rule for rejecting the null hypothesis.


Reject H0 if t is >= 1.860

5. Write a summary statement based on the decision.


Reject H0, p < .05, one-tailed
Lecture Notes 8 – Testing of Hypothesis 21
Engr. Caesar Pobre Llapitan

6. Write a statement of results in Standard English.


Students who participated in the test-taking course, scored significantly higher on the practice
form of the ACT, than did the control group of students.

III. THE DEPENDENT T-TEST

The Mean of the Difference: D


The idea with the dependent case is to create a new variable, D, which is the difference between the
paired values. You will then be testing the mean of this new variable.

Here are some steps to help you accomplish the hypothesis testing
1. Write down the original claim in simple terms. For example: After > Before.
2. Move everything to one side: After - Before > 0.
3. Call the difference you have on the left side D: D = After - Before > 0.
4. Convert to proper notation:
μ
D >0
5. Compute the new variable D and be sure to follow the order you have defined in step 3. Do not
simply take the smaller away from the larger. From this point, you can think of having a new
set of values. Technically, they are called D, but you can think of them as x. The original values
from the two samples can be discarded.
6. Find the mean and standard deviation of the variable D. Use these as the values in the t-test
from chapter 9.

When subjects are connected to one another by either of these methods the variance is constrained,
so when we use a statistical test to measure the significance of a difference between the means we
must use a test that takes into consideration these constrained variances. This is what the dependent
t-test does.

The formula for the dependent t is:

t=
∑D
2 2

√ n ∑ D − (∑ D )
n−1
where D is the difference between pairs of scores,

D = X2 – X1

Notice that we subtract the score for the first X from the paired second X. This is probably so that
when we are finding the difference between the pre-test and post-test, that we subtract the pre-test
(X1) from the post-test (X2).

And the degrees of freedom for the dependent-t test is


df = n – 1

and
n is the number pairs of subjects in the study.

Example: Using the dependent t-test


Research Problem: The Beck Depression Scale (pre-test) was administered to ten adolescents
undergoing anger management therapy. After four weeks of therapy the same scale was administered
Lecture Notes 8 – Testing of Hypothesis 22
Engr. Caesar Pobre Llapitan

again (post-test). Does the anger management therapy significantly reduce the scores on the
depression scale?

In this problem we are comparing pre-test and post-test scores for a group of subjects. This would be
an appropriate situation for the dependent t-test.

The pre-test and post-test scores, as well as D and D 2 are shown in the following table.

Pre and Post-Test Scores for 10 Adolescents on the Beck Depression Scale
Pre-Test Post-Test D
D2
(X1) (X2) (X2-X1)
14 0 -14 196
6 0 -6 36
4 3 -1 1
15 20 5 25
3 0 -3 9
3 0 -3 9
6 1 -5 25
5 1 -4 16
6 1 -5 25
3 0 -3 9
----- ----- ----- -----
-39 351

For our problem:

t
D 
39
  2.623
n D 2    D  10(351)  (39)2
2

n1 10  1

and the degrees of freedom for this problem is:

df = n – 1 = 10-1 = 9

We now have the information we need to complete the six step process for testing statistical
hypotheses for our research problem.

1. State the null hypothesis and the alternative hypothesis based on your research
question.
H 0 : μ1 = μ 2
H 1 : μ1 > μ2

Note: Our problem stated that the therapy would decrease the depression score. Therefore our
alternative hypothesis states that mu1 (the pre-test score) will be greater than mu2 (the post-
test score).

2. Set the alpha level.


 = 0.5
Lecture Notes 8 – Testing of Hypothesis 23
Engr. Caesar Pobre Llapitan

Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a type I
error.

3. Calculate the value of the appropriate statistic. Also indicate the degrees of freedom
for the statistical test if necessary.
t = -2.623
df = n - 1 = 10 - 1 = 9

Note: We have calculated the t-value and will also need to know the degrees of freedom when
we go to look up the critical value of t.

4. Write the decision rule for rejecting the null hypothesis.


Reject H0 if t is <= -1.833

Note: To write the decision rule we need to know the critical value for t, with an alpha level
of .05 and a one-tailed test. Now the dependent t is calculated by subtracting the pretest from
the posttest so if the posttest is actually less than the pretest, posttest minus pretest will be a
negative quantity.

5. Write a summary statement based on the decision.


Reject H0, p < .05, one-tailed

Note: Since our calculated value of t (-2.623) is less than or equal to -1.833, we reject the null
hypothesis and accept the alternative hypothesis.

6. Write a statement of results in standard English.


The management therapy did significantly reduce the depression scores for the adolescents.

Additional problem using the dependent t-test


Research Problem: A special program, developed to enhance children's self-concept, is implemented
with four- and five-year-old children in Sunny Days Preschool. The intervention lasts six weeks and
involves various activities in the class and at home. All the children in the program are pre-tested and
post-tested using the Purdue Self-Concept Scale for Preschool Children (PSCS). The test is comprised
of a series of 40 pairs of pictures, and scores can range from 0 to 40. The dependent t test will be used
to test the hypothesis that students would improve significantly on the posttest, compared with their
pretest scores.
Pretest and Posttest Scores of Six Students
Pretest Posttest
31 34
30 31
33 33
35 40
32 36
34 39

1. State the null hypothesis and the alternative hypothesis based on your research
question.
H 0 : μ1 = μ 2
H 1 : μ1 < μ2
Lecture Notes 8 – Testing of Hypothesis 24
Engr. Caesar Pobre Llapitan

2. Set the alpha level.  = 0.05

3. Calculate the value of the appropriate statistic. Also indicate the degrees of freedom
for the statistical test if necessary.
t = 3.503
df = n - 1 = 6 - 1 = 5

4. Write the decision rule for rejecting the null hypothesis.


Reject H0 if t is >= 2.015

5. Write a summary statement based on the decision.


Reject H0, p < .05, one-tailed

6. Write a statement of results in standard English.


Students who participated in the test-taking course, scored significantly higher on the practice
form of the PSCS, than did the control group of students.

IV. INDEPENDENT MEANS

Sums and Differences of Independent Variables


Independent variables can be combined to form new variables. The mean and variance of the
combination can be found from the means and the variances of the original variables.

Combination of Variables In English (Melodic Mathematics)


μ xy μ x  μ y The mean of a sum is the sum of the means.
μ xy μ x μ y The mean of a difference is the difference of the means.

σ 2x  y  σ 2x  σ 2y The variance of a sum is the sum of the variances.

σ 2x  y  σ 2x  σ 2y The variance of a difference is the sum of the variances.

The Difference of the Means: 1 - 2


Since we are combining two variables by subtraction, the important rules from the table above are
that the mean of the difference is the difference of the means and the variance of the difference is the
sum of the variances.

It is important to note that the variance of the difference is the sum of the variances, not the standard
deviation of the difference is the sum of the standard deviations. When we go to find the standard
error, we must combine variances to do so. Also, you're probably wondering why the variance of the
difference is the sum of the variances instead of the difference of the variances. Since the values are
squared, the negative associated with the second variable becomes positive, and it becomes the sum
of the variances. Also, variances can't be negative, and if you took the difference of the variances, it
could be negative.

Population Variances Known


Lecture Notes 8 – Testing of Hypothesis 25
Engr. Caesar Pobre Llapitan

( x̄ 1 − x̄ 2 ) − ( μ 1 − μ2 )
z=
2 2
σ1 σ2
√ +
n1 n2

When the population variances are known, the difference of the means has a normal distribution. The
variance of the difference is the sum of the variances divided by the sample sizes. This makes sense,
hopefully, because according to the central limit theorem, the variance of the sampling distribution of
the sample means is the variance divided by the sample size, so what we are doing is add the variance
of each mean together. The test statistic is shown.

Population Variances Unknown, but both sample sizes large

( x̄1 − x̄ 2 ) − ( μ1 − μ2 )
p = z=
2 2
s 1 s2
√ +
n1 n 2

When the population variances aren't known, the difference of the means has a Student's t
distribution. However, if both sample sizes are large enough, then you will be using the normal row
from the t-table, so your book lumps this under the normal distribution, rather than the t-
distribution. This gives us the chance to work the problem without knowing if the population
variances are equal or not. The test statistic is shown, and is identical to above, except the sample
variances are used instead of the population variances.

Population Variances Unknown, unequal with small sample sizes


How do you know if the variances are equal or not if you don't know what they are. Some books teach
the F-test to test the equality of two variances, and if your book does that, then you should use the F-
test to see. Other books (statisticians) argue that if you do the F-test first to see if the variances are
equal, and then use the same level of significance to perform the t-test to test the difference of the
means, that the overall level of significance isn't the same.

( x̄ 1 − x̄2 ) − ( μ1 − μ 2 )
p =
s 21 s22

+
n 1 n2
df = min (df 1 , df 2 )

Since you don't know the population variances, you're going to be using a Student's t distribution.
Since the variances are unequal, there is no attempt made to average them together as we will in the
next situation. The degree of freedom is the smaller of the two degrees of freedom (n-1 for each). The
"min" function means take the minimum or smaller of the two values. Otherwise, the formula is the
same as we used with large sample sizes.

Population Variances Unknown but equal with small sample sizes

2 2
df 1 s 1 + df 2 s2
s 2p =
df 1 + df 2

If the variances are equal, then an effort is made to average them together. Now, equal does not mean
identical. It is possible for two variances to be statistically equal but be numerically different. We will
Lecture Notes 8 – Testing of Hypothesis 26
Engr. Caesar Pobre Llapitan

find a pooled estimate of the variance which is simply the weighted mean of the variance. The
weighting factors are the degrees of freedom.

( x̄ 1 − x̄2 ) − ( μ1 − μ 2)
p=
s2p s 2p

+
n 1 n2
df = df 1 + df 2

Once the pooled estimate of the variance is computed, this mean (average) variance is used in the
place of the individual sample variances. Otherwise, the formula is the same as before. The degrees of
freedom are the sum of the individual degrees of freedom.

Two Proportions
Remember that the normal distribution can be used to approximate the binomial distribution in
certain cases. Specifically, the approximation was considered good when np and nq were both at least
5. Well, now, we're talking about two proportions, so np and nq must be at least 5 for both samples.

x1
^p1 =
n1
x
^p2 = 2
n2
x1 + x 2
p̄ =
n1 + n2

We don't have a way to specifically test two proportions for values, what we have is the ability to test
the difference between the proportions. So, much like the test for two means from independent
populations, we will be looking at the difference of the proportions.

We will also be computing an average proportion and calling it p-bar. It is the total number of
successes divided by the total number of trials. The definitions which are necessary are shown to the
right.

The test statistic has the same general pattern as before (observed minus expected divided by
standard error). The test statistic used here is similar to that for a single population proportion,
except the difference of proportions are used instead of a single proportion, and the value of p-bar is
used instead of p in the standard error portion.

( ^p1 − p^ 2 ) − ( p1 − p2 )
z=
p̄ q̄ p̄ q̄
√ +
n1 n 2

Since we're using the normal approximation to the binomial, the difference of proportions has a
normal distribution. The test statistic is given.

Some people will be tempted to try to simplify the denominator of this test statistic incorrectly. It can
be simplified, but the correct simplification is not to simply place the product of p-bar and q-bar over
the sum of the n's. Remember that to add fractions, you must have a common denominator, that is
why this simplification is incorrect.
Lecture Notes 8 – Testing of Hypothesis 27
Engr. Caesar Pobre Llapitan

( ^p1 − ^p2 ) − ( p1 − p2)


z=
1 1
√ p̄ q̄
( +
n1 n2 )
The correct simplification would be to factor a p-bar and q-bar out of the two expressions. This is
usually the formula given, because it is easier to calculate, but I wanted to give it the other way first so
you could compare it to the other formulas and see how similar they all are.

Definitions of Key Terms


Dependent Samples
 Samples in which the subjects are paired or matched in some way.
 Dependent samples must have the same sample size, but it is possible to have the same
sample size without being dependent.

Independent Samples
 Samples which are independent when they are not related. Independent samples may or may
not have the same sample size.

Pooled Estimate of the Variance


 A weighted average of the two sample variances when the variances are equal. The variances
are "close enough" to be considered equal, but not exactly the same, so this pooled estimate
brings the two together to find the average variance.
Lecture Notes 8 – Testing of Hypothesis 28
Engr. Caesar Pobre Llapitan

Unit VI
CORRELATION AND REGRESSION

I. CORRELATION

Sum of Squares
We introduced a notation earlier in the course called the sum of squares. This notation was the SS
notation, and will make these formulas much easier to work with.
  x
2

SS ( x )   x 2

n

  y
2

SS ( y)   y 2

n

SS ( xy)   xy 
  x   y
n

Notice these are all the same pattern,

SS( x)   xx 
  x   x
SS(x) could be written as n

  x
2

 x2  n SS ( x )
s2x  
Also note that n1 n1

Pearson's Correlation Coefficient


There is a measure of linear correlation. The population parameter is denoted by the greek letter rho
and the sample statistic is denoted by the roman letter r.
Here are some properties of r
 r only measures the strength of a linear relationship. There are other kinds of relationships
besides linear.
 r is always between -1 and 1 inclusive. -1 means perfect negative linear correlation and +1
means perfect positive linear correlation
 r has the same sign as the slope of the regression (best fit) line
 r does not change if the independent (x) and dependent (y) variables are interchanged
 r does not change if the scale on either variable is changed. You may multiply, divide, add, or
subtract a value to/from all the x-values or y-values without changing the value of r.
 r has a Student's t distribution

n ∑ xy − ( ∑ x )( ∑ y )
r=
(√ n ∑ x2 − (∑ x )2)(n ∑ y 2 − ( ∑ y )2 )
Here is the formula for r. This formula can be simplified through some simple algebra and then some
substitutions using the SS notation discussed earlier.
Lecture Notes 8 – Testing of Hypothesis 29
Engr. Caesar Pobre Llapitan

(∑ x )(∑ y )
∑ xy − n
r=
2 2

√(∑ x2 −
(∑ x )
n )(∑ y2 −
(∑ y )
n

If you divide the numerator and denominator by n, then you get something which is starting to
)
hopefully look familiar. Each of these values has been seen before in the Sum of Squares notation
section. So, the linear correlation coefficient can be written in terms of sum of squares.

SS( x)
r=
√ SS(x ) SS( y )
This is the formula that we would be using for calculating the linear correlation coefficient if we were
doing it by hand.

Hypothesis Testing
The claim we will be testing is "There is significant linear correlation"

The Greek letter for r is rho, so the parameter used for linear correlation is rho
 H0: rho = 0
 H1: rho <> 0
r has a t distribution with n – 2 degrees of freedom, and the test statistic is given by:

n−2
t =r
√ 1−r 2

Now, there are n – 2 degrees of freedom this time. This is a difference from before. As an over-
simplification, you subtract one degree of freedom for each variable, and since there are 2 variables,
the degrees of freedom are n-2.

observed − expected
test statistic =
This doesn't look like our standard error

If you consider the standard error for r is


1−r 2
standard error =
n−2 √
The formula for the test statistic is
r−p
t =
1−r 2
√ n−2

which does look like the pattern we're looking for.

Remember that
Hypothesis testing is always done under the assumption that the null hypothesis is true.
Lecture Notes 8 – Testing of Hypothesis 30
Engr. Caesar Pobre Llapitan

Additional Note: 1 - r2 is later identified as the coefficient of non-determination

Hypothesis Testing Revisited


If you are testing to see if there is significant linear correlation (a two tailed test), then there is
another way to perform the hypothesis testing. There is a table of critical values for the Pearson's
Product Moment Coefficient (PPMC). The degrees of freedom are n-2.

The test statistic in this case is simply the value of r. You compare the absolute value of r (don't worry
if it's negative or positive) to the critical value in the table. If the test statistic is greater than the
critical value, then there is significant linear correlation. Furthermore, you are able to say there is
significant positive linear correlation if the original value of r is positive, and significant negative
linear correlation if the original value of r was negative.

This is the most common technique used. However, the first technique, with the t-value must be used
if it is not a two-tail test, or if a different level of significance (other than 0.01 or 0.05) is desired.

Causation
If there is a significant linear correlation between two variables, then one of five situations can be
true.
 There is a direct cause and effect relationship
 There is a reverse cause and effect relationship
 The relationship may be caused by a third variable
 The relationship may be caused by complex interactions of several variables
 The relationship may be coincidental

II. REGRESSION

The idea behind regression is that when there is significant linear correlation, you can use a line to
estimate the value of the dependent variable for certain values of the independent variable.
The regression equation should only used
 When there is significant linear correlation. That is, when you reject the null hypothesis that
rho=0 in a correlation hypothesis test.
 The value of the independent variable being used in the estimation is close to the original
values. That is, you should not use a regression equation obtained using x's between 10 and 20
to estimate y when x is 200.
 The regression equation should not be used with different populations. That is, if x is the
height of a male, and y is the weight of a male, then you shouldn't use the regression equation
to estimate the weight of a female.
 The regression equation shouldn't be used to forecast values not from that time frame. If data
is from the 1960's, it probably isn't valid in the 1990's.

Assuming that you've decided that you can have a regression equation because there is significant
linear correlation between the two variables, the equation becomes: y' = ax + b or y' = a + bx (some
books use y-hat instead of y-prime). The Bluman text uses the second formula, however, more people
are familiar with the notion of y = mx + b, so I will use the first.

a is the slope of the regression line:

n ( ∑ xy ) − ( ∑ x )( ∑ y )
a= 2
n ( ∑ x 2) − (∑ x )
Lecture Notes 8 – Testing of Hypothesis 31
Engr. Caesar Pobre Llapitan

b is the y-intercept of the regression line:

(∑ y ) (∑ x 2 ) − (∑ x )(∑ xy )
b= 2
n (∑ x 2) − (∑ x )

The regression line is sometimes called the "line of best fit" or the "best fit line".

Since it "best fits" the data, it makes sense that the line passes through the means.

The regression equation is the line with slope a passing through the point ( x̄, ȳ )
Another way to write the equation would be

y ' y a x  x 
y '  y  ax  ax
y '  ax  y  ax
SS( xy)
a
SS( x )
b  y  ax

Apply just a little algebra, and we have the formulas for a and b that we would use.
sy
a r 
It also turns out that the slope of the regression line can be written as sx . Since the standard
deviations can't be negative, the sign of the slope is determined by the sign of the correlation
coefficient. This agrees with the statement made earlier that the slope of the regression line will have
the same slope as the correlation coefficient.

III. COEFFICIENT OF DETERMINATION

Coefficient of Determination
The coefficient of determination is
 the percent of the variation that can be explained by the regression equation
 the explained variation divided by the total variation
 the square of r

What's all this variation stuff?

Every sample has some variation in it (unless all the values are identical, and that's unlikely to
happen). The total variation is made up of two parts, the part that can be explained by the regression
equation and the part that can't be explained by the regression equation.

2
∑ ( y− ȳ )2 = ∑ ( y'− ȳ ) + ∑ ( y− y' )2
total = explained + unexplained
Well, the ratio of the explained variation to the total variation is a measure of how good the
regression line is. If the regression line passed through every point on the scatter plot exactly, it would
Lecture Notes 8 – Testing of Hypothesis 32
Engr. Caesar Pobre Llapitan

be able to explain all of the variation. The further the line is from the points, the less it is able to
explain.

Coefficient of Non-Determination
The coefficient of non-determination is ...
 The percent of variation which is unexplained by the regression equation
 The unexplained variation divided by the total variation
 1 - r2

Standard Error of the Estimate


The coefficient of non-determination was used in the t-test to see if there was significant linear
correlation. It was the in the numerator of the standard error formula.

1−r 2
se =
√ n−2
The standard error of the estimate is the square root of the coefficient of non-determination divided
by it's degrees of freedom.

Confidence Interval for y'

1−r 2
E = z α/2

n−2
y' − E < y < y' + E
The following only works when the sample size is large. Large in this instance is usually taken to be
more than 100. We're not going to cover this in class, but is provided here for your information. The
maximum error of the estimate is given, and this maximum error of the estimate is subtracted from
and added to the estimated value of y.

Definitions of Key Terms


Coefficient of Determination
 The percent of the variation that can be explained by the regression equation

Correlation
 A method used to determine if a relationship between variables exists

Correlation Coefficient
 A statistic or parameter which measures the strength and direction of a relationship between
two variables

Dependent Variable
 A variable in correlation or regression that can not be controlled, that is, it depends on the
independent variable.

Independent Variable
 A variable in correlation or regression which can be controlled, that is, it is independent of the
other variable.

Pearson Product Moment Correlation Coefficient


 A measure of the strength and direction of the linear relationship between two variables
Regression
Lecture Notes 8 – Testing of Hypothesis 33
Engr. Caesar Pobre Llapitan

 A method used to describe the relationship between two variables.

Regression Line
 The best fit line.

Scatter Plot
 An plot of the data values on a coordinate system. The independent variable is graphed along
the x-axis and the dependent variable along the y-axis

Standard Error of the Estimate


 The standard deviation of the observed values about the predicted values

Unit VII
Lecture Notes 8 – Testing of Hypothesis 34
Engr. Caesar Pobre Llapitan

CHI-SQUARE

I. CHI-SQUARE DISTRIBUTION

 n  1  s2 df  s2
2  
2 2

The chi-square (  ) distribution is obtained from the values of the ratio of the sample variance and
2

population variance multiplied by the degrees of freedom. This occurs when the population is
normally distributed with population variance sigma^2.

Properties of the Chi-Square


 Chi-square is non-negative. Is the ratio of two non-negative values, therefore must be non-
negative itself.
 Chi-square is non-symmetric.
 There are many different chi-square distributions, one for each degree of freedom.
 The degrees of freedom when working with a single population variance is n-1.

Chi-Square Probabilities
Since the chi-square distribution isn't symmetric, the method for looking up left-tail values is
different from the method for looking up right tail values.
 Area to the right - just use the area given.
 Area to the left - the table requires the area to the right, so subtract the given area from one
and look this area up in the table.
 Area in both tails - divide the area by two. Look up this area for the right critical value and one
minus this area for the left critical value.

DF which aren't in the table


When the degrees of freedom aren't listed in the table, there are a couple of choices that you have.
 You can interpolate. This is probably the more accurate way. Interpolation involves estimating
the critical value by figuring how far the given degrees of freedom are between the two df in
the table and going that far between the critical values in the table.
 You can go with the critical value which is less likely to cause you to reject in error (type I
error). For a right tail test, this is the critical value further to the right (larger). For a left tail
test, it is the value further to the left (smaller). For a two-tail test, it's the value further to the
left and the value further to the right. Note, it is not the column with the degrees of freedom
further to the right; it's the critical value which is further to the right.

II. SINGLE POPULATION VARIANCE

df  s2
2 
The variable  2 has a chi-square distribution if the population variance has a normal
distribution. The degrees of freedom are n-1. We can use this to test the population variance under
certain conditions

Conditions for testing


 The population has a normal distribution
 The data is from a random sample
 The observations must be independent of each other
 The test statistic has a chi-square distribution with n-1 degrees of freedom and is given by:
Lecture Notes 8 – Testing of Hypothesis 35
Engr. Caesar Pobre Llapitan

df  s2
2 
2

Testing is done in the same manner as before. Remember, all hypothesis testing is done under the
assumption the null hypothesis is true.

Confidence Intervals
df  s2
2 
If you solve the test statistic formula for the population variance, you get: 2
1. Find the two critical values (alpha/2 and 1-alpha/2)
2. Compute the value for the population variance given above.
3. Place the population variance between the two values calculated in step 2 (put the smaller one
first).

Note, the left-hand endpoint of the confidence interval comes when the right critical value is used
and the right-hand endpoint of the confidence interval comes when the left critical value is used. This
is because the critical values are in the denominator and so dividing by the larger critical value (right
tail) gives the smaller endpoint.

III. GOODNESS-OF-FIT TEST

The idea behind the chi-square goodness-of-fit test is to see if the sample comes from the population
with the claimed distribution. Another way of looking at that is to ask if the frequency distribution fits
a specific pattern.

Two values are involved, an observed value, which is the frequency of a category from a sample, and
the expected frequency, which is calculated based upon the claimed distribution. The derivation of
the formula is very similar to that of the variance which was done earlier (chapter 2 or 3).

The idea is that if the observed frequency is really close to the claimed (expected) frequency, then the
square of the deviations will be small. The square of the deviation is divided by the expected
frequency to weight frequencies. A difference of 10 may be very significant if 12 was the expected
frequency, but a difference of 10 isn't very significant at all if the expected frequency was 1200.

If the sum of these weighted squared deviations is small, the observed frequencies are close to the
expected frequencies and there would be no reason to reject the claim that it came from that
distribution. Only when the sum is large is the reason to question the distribution. Therefore, the chi-
square goodness-of-fit test is always a right tail test.

2 
 Observed  Expected  2
Expected

The test statistic has a chi-square distribution when the following assumptions are met
 The data are obtained from a random sample
 The expected frequency of each category must be at least 5. This goes back to the requirement
that the data be normally distributed. You're simulating a multinomial experiment (using a
discrete distribution) with the goodness-of-fit test (and a continuous distribution), and if each
expected frequency is at least five then you can use the normal distribution to approximate
(much like the binomial). If the expected

The following are properties of the goodness-of-fit test


Lecture Notes 8 – Testing of Hypothesis 36
Engr. Caesar Pobre Llapitan

 The data are the observed frequencies. This means that there is only one data value for each
category.
 The degree of freedom is one less than the number of categories, not one less than the sample
size.
 It is always a right tail test.
 It has a chi-square distribution.
 The value of the test statistic doesn't change if the order of the categories is switched.
 The test statistic is

Interpreting the Claim


There are four ways you might be given a claim.
1. The values occur with equal frequency. Other words for this are "uniform", "no preference", or
"no difference". To find the expected frequencies, total the observed frequencies and divide by
the number of categories. This quotient is the expected frequency for each category.
2. Specific proportions or probabilities are given. To find the expected frequencies, multiply the
total of the observed frequencies by the probability for each category.
3. The expected frequencies are given to you. In this case, you don't have to do anything.
4. A specific distribution is claimed. For example, "The data is normally distributed". To work a
problem like this, you need to group the data and find the frequency for each class. Then, find
the probability of being within that class by converting the scores to z-scores and looking up
the probabilities. Finally, multiply the probabilities by the total observed frequency. (It's not
really as bad as it sounds).

One-Variable Chi-Square (goodness-of-fit test) with equal expected frequencies


We can use the chi-square statistic to test the distribution of measures over levels of a variable to
indicate if the distribution of measures is the same for all levels. This is the first use of the one-
variable chi-square test. This test is also referred to as the goodness-of-fit test.

Using the example, we already mentioned of the frequency with which entering freshman, when
required to purchase a computer for college use, select Macintosh Computers, IBM Computers, or
Some other brand of computer. We want to know if there is a significant difference among the
frequencies with which these three brands of computers are selected or if the students select equally
among the three brands.

The data for 100 students is recorded in the table below (the observed frequencies). We have also
indicated the expected frequency for each category. Since there are 100 measures or observations and
there are three categories (Macintosh, IBM, and Other) we would indicate the expected frequency for
each category to be 100/3 or 33.333. In the third column of the table we have calculated the square of
the observed frequency minus the expected frequency divided by the expected frequency. The sum of
the third column would be the value of the chi-square statistic.

Frequency with which students select computer brand


Observed Expected
Computer (O - E)2/E
Frequency Frequency
IBM 47 33.333 5.604
Macintosh 36 33.333 0.213
Other 17 33.333 8.003
Total (chi-square) 13.820

From the table we can see that:


(O  E )2
2   5.604  0.213  8.003  13.820
E
Lecture Notes 8 – Testing of Hypothesis 37
Engr. Caesar Pobre Llapitan

The df = C - 1 = 3 - 1 = 2

We can compare the obtained value of chi-square with the critical value for the .05 level and with
degrees of freedom of 2 obtained from Distribution of Chi Square table. Looking under the column for
.05 and the row for df = 2 we see that the critical value for chi-square is 5.991.

We now have the information we need to complete the six step process for testing statistical
hypotheses for our research problem.
1. State the null hypothesis and the alternative hypothesis based on your research
question.
H 0 :O  E
H 1 :O  E

Note: Our null hypothesis, for the chi-square test, states that there are no differences between
the observed and the expected frequencies. The alternate hypothesis states that there are
significant differences between the observed and expected frequencies.

2. Set the alpha level.


 = 0.5

Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a type I
error.

3. Calculate the value of the appropriate statistic. Also indicate the degrees of freedom
for the statistical test if necessary.
 2 =13.820
df = C - 1 = 2

4. Write the decision rule for rejecting the null hypothesis.


Reject H0 if  >= 5.991.
2

Note: To write the decision rule we had to know the critical value for chi-square, with an alpha
level of .05, and 2 degrees of freedom. We can do this by looking at Appendix Table F and
noting the tabled value for the column for the .05 level and the row for 2 df.

5. Write a summary statement based on the decision.


Reject H0, p < .05

Note: Since our calculated value of  (13.820) is greater than 5.991, we reject the null
2

hypothesis and accept the alternative hypothesis.

6. Write a statement of results in standard English.


There is a significant difference among the frequencies with which students purchased three
different brands of computers.

One-Variable Chi-Square (goodness-of-fit test) with predetermined expected frequencies


Let's look at the problem we just solved, in a way that illustrates the other use of one-variable chi-
square that is with predetermined expected frequencies rather than with equal frequencies. We could
formulate our revised problem as follows:

In a national study, students required to buy computers for college use bought IBM computers 50% of
the time, Macintosh computers 25% of the time, and other computers 25% of the time. Of 100
Lecture Notes 8 – Testing of Hypothesis 38
Engr. Caesar Pobre Llapitan

entering freshman we surveyed 36 bought Macintosh Computers, 47 bought IBM computers, and 17
bought some other brand of computer. We want to know if this frequency of computer buying
behavior is similar to or different than the national study data.

The data for 100 students is recorded in the table below (the observed frequencies). In this case the
expected frequencies are those from the national study. To get the expected frequency we take the
percentages from the national study times the total number of subjects in the current study.
 Expected frequency for IBM = 100 X 50% = 50
 Expected frequency for Macintosh = 100 X 25% = 25
 Expected frequency for Other = 100 X 25% = 25

The expected frequencies are recorded in the second column of the table. As before we have
calculated the square of the observed frequency minus the expected frequency divided by the
expected frequency and recorded this result in the third column of the table. The sum of the third
column would be the value of the chi-square statistic.

Frequency with which students select computer brand


Observed Expected
Computer (O - E)2/E
Frequency Frequency
IBM 47 50 0.18
Macintosh 36 25 4.84
Other 17 25 2.56
Total (chi-square) 7.58

From the table we can see that:


2
χ = 0.18 + 4.84 + 2.56 = 7.58
The df = C - 1 = 3 - 1 = 2

We can compare the obtained value of chi-square with the critical value for the .05 level and with
degrees of freedom of 2. We see that the critical value for chi-square is 5.991.

We now have the information we need to complete the six-step process for testing statistical
hypotheses for our research problem.
1. State the null hypothesis and the alternative hypothesis based on your research
question.
H0 : O  E
H1 : O  E

Note: Our null hypothesis, for the chi-square test, states that there are no differences between
the observed and the expected frequencies. The alternate hypothesis states that there are
significant differences between the observed and expected frequencies.

2. Set the alpha level.


 = 0.5

Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a type I
error.

3. Calculate the value of the appropriate statistic. Also indicate the degrees of freedom
for the statistical test if necessary.
Lecture Notes 8 – Testing of Hypothesis 39
Engr. Caesar Pobre Llapitan

χ 2=7.58
df = C - 1 = 2

4. Write the decision rule for rejecting the null hypothesis.


2
Reject H0 if χ >= 5.991.

5. Write a summary statement based on the decision.


Reject H0, p < .05

2
Note: Since our calculated value of χ (7.58) is greater than 5.991, we reject the null
hypothesis and accept the alternative hypothesis.

6. Write a statement of results in Standard English.


There is a significant difference among the frequencies with which students purchased three
different brands of computers and the proportions suggested by a national study.

IV. TEST FOR INDEPENDENCE

In the test for independence, the claim is that the row and column variables are independent of each
other. This is the null hypothesis.

The multiplication rule said that if two events were independent, then the probability of both
occurring was the product of the probabilities of each occurring. This is key to working the test for
independence. If you end up rejecting the null hypothesis, then the assumption must have been
wrong and the row and column variable are dependent. Remember, all hypothesis testing is done
under the assumption the null hypothesis is true.

The test statistic used is the same as the chi-square goodness-of-fit test. The principle behind the test
for independence is the same as the principle behind the goodness-of-fit test. The test for
independence is always a right tail test.

In fact, you can think of the test for independence as a goodness-of-fit test where the data is arranged
into table form. This table is called a contingency table.

 
2  Observed  Expected  2
Expected

The test statistic has a chi-square distribution when the following assumptions are met
 The data are obtained from a random sample
 The expected frequency of each category must be at least 5.

The following are properties of the test for independence


 The data are the observed frequencies.
 The data is arranged into a contingency table.
 The degrees of freedom are the degrees of freedom for the row variable times the degrees of
freedom for the column variable. It is not one less than the sample size, it is the product of the
two degrees of freedom.
 It is always a right tail test.
 It has a chi-square distribution.
Lecture Notes 8 – Testing of Hypothesis 40
Engr. Caesar Pobre Llapitan

 The expected value is computed by taking the row total times the column total and dividing
by the grand total
 The value of the test statistic doesn't change if the order of the rows or columns are switched.
 The value of the test statistic doesn't change if the rows and columns are interchanged
(transpose of the matrix)
 The test statistic is

Two-Variable Chi-Square (test of independence)


Now let us consider the case of the two-variable chi-square test, also known as the test of
independence.
For example, we may wish to know if there is a significant difference in the frequencies with which
males come from small, medium, or large cities as contrasted with females. The two variables we are
considering here are hometown size (small, medium, or large) and sex (male or female). Another way
of putting our research question is: Is gender independent of size of hometown?

The data for 30 females and 6 males is in the following table.

Frequency with which males and females come from small, medium, and large cities

Small Medium Large Totals

Female 10 14 6 30
Male 4 1 1 6
Totals 14 15 7 36

The formula for chi-square is the same as before:


(O  E )2
2 
E
where
O is the observed frequency, and
E is the expected frequency.

The degrees of freedom for the two-dimensional chi-square statistic is:


df = (C - 1)(R - 1)

where C is the number of columns or levels of the first variable and R is the number of rows or levels
of the second variable.

In the table above, we have the observed frequencies (six of them). Now we must calculate the
expected frequency for each of the six cells. For two-variable chi-square we find the expected
frequencies with the formula:

Expected Frequency for a Cell = (Column Total X Row Total)/Grand Total


In the table above, we can see that the Column Totals are 14 (small), 15 (medium), and 7 (large), while
the Row Totals are 30 (female) and 6 (male). The grand total is 36.

Using the formula, we can thus find the expected frequency for each cell.
1. The expected frequency for the small female cell is 1430/36 = 11.667
2. The expected frequency for the medium female cell is 1530/36 = 12.500
3. The expected frequency for the large female cell is 730/36 = 5.833
4. The expected frequency for the small male cell is 146/36 = 2.333
5. The expected frequency for the medium male cell is 156/36 = 2.500
6. The expected frequency for the large male cell is 76/36 = 1.167
Lecture Notes 8 – Testing of Hypothesis 41
Engr. Caesar Pobre Llapitan

We can put these expected frequencies in our table and also include the values for (O - E)2/E. The
sum of all these will of course be the value of chi-square.

Observed frequencies, expected frequencies, and (O - E)2/E for males and females from small,
medium, and large cities

Small Medium Large Totals

(O-
Observed Expected (O-E)2/E Observed Expected Observed Expected (O-E)2/E
E)2/E
Female 10 11.667 0.238 14 12.500 0.180 6 5.833 0.005 30
Male 4 2.333 1.191 1 2.500 0.900 1 1.167 0.024 6
Totals 14 15 7 36

From the table we can see that:


(O  E )2
2 
E
= 0.238 + 0.180 + 0.005 + 1.191 + 0.900 + 0.024 = 2.538

and df = (C - 1)(R - 1) = (3 - 1)(2 - 1) = (2)(1) = 2

We now have the information we need to complete the six step process for testing statistical
hypotheses for our research problem.

1. State the null hypothesis and the alternative hypothesis based on your research
question.
H0: O = E
H1: O ≠ E

2. Set the alpha level.


α = 0.5
3. Calculate the value of the appropriate statistic. Also indicate the degrees of freedom
for the statistical test if necessary.
2
χ = 2.538
df = (C - 1)(R - 1) = (2)(1) = 2

4. Write the decision rule for rejecting the null hypothesis.

Reject H0 if   5.991.
2

Note: To write the decision rule we had to know the critical value for chi-square, with an alpha
level of .05, and 2 degrees of freedom. We can do this by looking at Appendix Table F and
noting the tabled value for the column for the .05 level and the row for 2 df.

5. Write a summary statement based on the decision.


Fail to reject H0
2
Note: Since our calculated value of χ (2.538) is not greater than 5.991, we fail to reject the
null hypothesis and are unable to accept the alternative hypothesis.
Lecture Notes 8 – Testing of Hypothesis 42
Engr. Caesar Pobre Llapitan

6. Write a statement of results in standard English.


There is not a significant difference in the frequencies with which males come from small,
medium, or large towns as compared with females.
Hometown size is not independent of gender.

Chi-square is a useful non-parametric statistic to help evaluate statistical hypothesis, involving


frequencies with which observations fall in various categories (nominal data).

Definitions of Key Terms


Chi-square distribution
 A distribution obtained from the multiplying the ratio of sample variance to population
variance by the degrees of freedom when random samples are selected from a normally
distributed population

Contingency Table
 Data arranged in table form for the chi-square independence test

Expected Frequency
 The frequencies obtained by calculation.

Goodness-of-fit Test
 A test to see if a sample comes from a population with the given distribution.

Independence Test
 A test to see if the row and column variables are independent.

Observed Frequency
 The frequencies obtained by observation. These are the sample frequencies.
Lecture Notes 8 – Testing of Hypothesis 43
Engr. Caesar Pobre Llapitan

Unit VIII
F-TEST

I. F-TEST

The F-distribution is formed by the ratio of two independent chi-square variables divided by their
respective degrees of freedom.

Since F is formed by chi-square, many of the chi-square properties carry

df 1⋅s 21
2
σ1
df 1
F=
df 2⋅s 22
σ 22
df 2

over to the F distribution.


 The F-values are all non-negative
 The distribution is non-symmetric
 The mean is approximately 1
 There are two independent degrees of freedom, one for the numerator, and one for the
denominator.
 There are many different F distributions, one for each pair of degrees of freedom.

F-Test
The F-test is designed to test if two population variances are equal. It does this by comparing the ratio
of two variances. So, if the variances are equal, the ratio of the variances will be 1.

s12
F
s22

All hypothesis testing is done under the assumption the null hypothesis is true

If the null hypothesis is true, then the F test-statistic given above can be simplified (dramatically).
This ratio of sample variances will be test statistic used. If the null hypothesis is false, then we will
reject the null hypothesis that the ratio was equal to 1 and our assumption that they were equal.

There are several different F-tables. Each one has a different level of significance. So, find the correct
level of significance first, and then look up the numerator degrees of freedom and the denominator
degrees of freedom to find the critical value.

You will notice that all of the tables only give level of significance for right tail tests. Because the F
distribution is not symmetric, and there are no negative values, you may not simply take the opposite
of the right critical value to find the left critical value. The way to find a left critical value is to reverse
the degrees of freedom, look up the right critical value, and then take the reciprocal of this value. For
example, the critical value with 0.05 on the left with 12 numerator and 15 denominator degrees of
freedom is found of taking the reciprocal of the critical value with 0.05 on the right with 15 numerator
and 12 denominator degrees of freedom.
Lecture Notes 8 – Testing of Hypothesis 44
Engr. Caesar Pobre Llapitan

Avoiding Left Critical Values


Since the left critical values are a pain to calculate, they are often avoided altogether. This is the
procedure followed in the textbook. You can force the F test into a right tail test by placing the sample
with the large variance in the numerator and the smaller variance in the denominator. It does not
matter which sample has the larger sample size, only which sample has the larger variance.

The numerator degrees of freedom will be the degrees of freedom for whichever sample has the larger
variance (since it is in the numerator) and the denominator degrees of freedom will be the degrees of
freedom for whichever sample has the smaller variance (since it is in the denominator).

If a two-tail test is being conducted, you still have to divide alpha by 2, but you only look up and
compare the right critical value.

Assumptions / Notes
 The larger variance should always be placed in the numerator
 The test statistic is F = s1^2 / s2^2 where s1^2 > s2^2
 Divide alpha by 2 for a two-tail test and then find the right critical value
 If standard deviations are given instead of variances, they must be squared
 When the degrees of freedom aren't given in the table, go with the value with the larger
critical value (this happens to be the smaller degrees of freedom). This is so that you are less
likely to reject in error (type I error)
 The populations from which the samples were obtained must be normal.
 The samples must be independent

II. ONE-WAY ANOVA

A One-Way Analysis of Variance is a way to test the equality of three or more means at one time by
using variances.

Assumptions
 The populations from which the samples were obtained must be normally or approximately
normally distributed.
 The samples must be independent.
 The variances of the populations must be equal.

Hypotheses
The null hypothesis will be that all population means are equal, the alternative hypothesis is that at
least one mean is different.

In the following, lower case letters apply to the individual samples and capital letters apply to the
entire set collectively. That is, n is one of many sample sizes, but N is the total sample size.

Grand Mean

X̄ GM =
∑x
N
The grand mean of a set of samples is the total of all the data values divided by the total sample size.
This requires that you have all of the sample data available to you, which is usually the case, but not
always. It turns out that all that is necessary to find perform a one-way analysis of variance are the
number of samples, the sample means, the sample variances, and the sample sizes.
Lecture Notes 8 – Testing of Hypothesis 45
Engr. Caesar Pobre Llapitan

X̄ GM =
∑ n x̄
∑n
Another way to find the grand mean is to find the weighted average of the sample means. The weight
applied is the sample size.

Total Variation

2
SS(T ) = ∑ ( x − X̄ GM )

The total variation (not variance) is comprised the sum of the squares of the differences of each mean
with the grand mean.

There is the between group variation and the within group variation. The whole idea behind the
analysis of variance is to compare the ratio of between group variance to within group variance. If the
variance caused by the interaction between the samples is much larger when compared to the
variance that appears within each group, then it is because the means aren't the same.

Between Group Variation


SS(B ) = ∑ n ( x̄ − X̄ GM )2

The variation due to the interaction between the samples is denoted SS(B) for Sum of Squares
Between groups. If the sample means are close to each other (and therefore the Grand Mean) this will
be small. There are k samples involved with one data value for each sample (the sample mean), so
there are k – 1 degrees of freedom.

The variance due to the interaction between the samples is denoted MS(B) for Mean Square Between
groups. This is the between group variation divided by its degrees of freedom. It is also denoted by
2
sb .
Within Group Variation
2
SS(W ) = ∑ df⋅s
The variation due to differences within individual samples, denoted SS(W) for Sum of Squares Within
groups. Each sample is considered independently, no interaction between samples is involved. The
degree of freedom is equal to the sum of the individual degrees of freedom for each sample. Since
each sample has degrees of freedom equal to one less than their sample sizes, and there are k samples,
the total degrees of freedom is k less than the total sample size:

df = N – k

The variance due to the differences within individual samples is denoted MS(W) for Mean Square
Within groups. This is the within group variation divided by its degrees of freedom. It is also denoted
2
by
sw . It is the weighted average of the variances (weighted with the degrees of freedom).

F test statistic
sb2
F
sw2
Lecture Notes 8 – Testing of Hypothesis 46
Engr. Caesar Pobre Llapitan

Recall that a F variable is the ratio of two independent chi-square variables divided by their respective
degrees of freedom. Also recall that the F test statistic is the ratio of two sample variances, well, it
turns out that's exactly what we have here. The F test statistic is found by dividing the between group
variance by the within group variance. The degrees of freedom for the numerator are the degrees of
freedom for the between group (k-1) and the degrees of freedom for the denominator are the degrees
of freedom for the within group (N-k).

Summary Table
All of this sounds like a lot to remember, and it is. However, there is a table which makes things really
nice.

SS df MS F

Between SS(B) k-1 SS(B) MS(B)


----------- --------------
k-1 MS(W)

Within SS(W) N-k SS(W) .


-----------
N-k

Total SS(W) + SS(B) N-1 . .

Notice that each Mean Square is just the Sum of Squares divided by its degrees of freedom, and the F
value is the ratio of the mean squares. Do not put the largest variance in the numerator, always divide
the between variance by the within variance. If the between variance is smaller than the within
variance, then the means are really close to each other and you will fail to reject the claim that they
are all equal. The degrees of freedom of the F-test are in the same order they appear in the table
(nifty, eh?).

Decision Rule
The decision will be to reject the null hypothesis if the test statistic from the table is greater than the
F critical value with k-1 numerator and N – k denominator degrees of freedom.

If the decision is to reject the null, then at least one of the means is different. However, the ANOVA
does not tell you where the difference lies. For this, you need another test, either the Scheffe' or Tukey
test.

III. SCHEFFE' AND TUKEY TESTS

When the decision from the One-Way Analysis of Variance is to reject the null hypothesis, it means
that at least one of the means isn't the same as the other means. What we need is a way to figure out
where the differences lie, not just that there is a difference.

This is where the Scheffe' and Tukey tests come into play. They will help us analyze pairs of means to
see if there is a difference -- much like the difference of two means covered earlier.

Hypotheses
H0 : i   j
H1 : i   j
Lecture Notes 8 – Testing of Hypothesis 47
Engr. Caesar Pobre Llapitan

Both tests are set up to test if pairs of means are different. The formulas refer to mean i and mean j.
The values of i and j vary, and the total number of tests will be equal to a combination of k objects, 2
at a time C(k, 2), where k is the number of samples.

Scheffé Test
The Scheffe' test is customarily used with unequal sample sizes, although it could be used with equal
sample sizes.

The critical value for the Scheffe' test is the degrees of freedom for the between variance times the
critical value for the one-way ANOVA. This simplifies to be:

CV = (k – 1) F(k – 1, N – k, alpha)

The test statistic is a little bit harder to compute.

 xi  x j 
2

TS : Fs 
 1 1 
sw2   
 ni n j 
 

Pure mathematicians will argue that this shouldn't be called F because it doesn't have an F
distribution (it's the degrees of freedom times an F), but we'll live it with it.

Reject H0 if the test statistic is greater than the critical value. Note, this is a right tail test. If there is no
difference between the means, the numerator will be close to zero, and so performing a left tail test
wouldn't show anything.

Tukey Test
The Tukey test is only usable when the sample sizes are the same.

The Critical Value is looked up in a table. There are actually several different tables, one for each level
of significance. The number of samples, k, is used as a index along the top, and the degrees of freedom
for the within group variance, v = N – k, are used as an index along the left side.

xi  x j
TS : q 
sw2 / n

The test statistic is found by dividing the difference between the means by the square root of the ratio
of the within group variation and the sample size.

Reject the null hypothesis if the absolute value of the test statistic is greater than the critical value
(just like the linear correlation coefficient critical values).

IV. TWO-WAY ANOVA

The two-way analysis of variance is an extension to the one-way analysis of variance. There are two
independent variables (hence the name two-way).

Assumptions
 The populations from which the samples were obtained must be normally or approximately
normally distributed.
Lecture Notes 8 – Testing of Hypothesis 48
Engr. Caesar Pobre Llapitan

 The samples must be independent.


 The variances of the populations must be equal.
 The groups must have the same sample size.

Hypotheses
There are three sets of hypotheses with the two-way ANOVA.

The null hypotheses for each of the sets are given below.
1. The population means of the first factor are equal. This is like the one-way ANOVA for the
row factor.
2. The population means of the second factor are equal. This is like the one-way ANOVA for the
column factor.
3. There is no interaction between the two factors. This is similar to performing a test for
independence with contingency tables.

Factors
The two independent variables in a two-way ANOVA are called factors. The idea is that there are two
variables, factors, which affect the dependent variable. Each factor will have two or more levels within
it, and the degrees of freedom for each factor is one less than the number of levels.

Treatment Groups
Treatment Groups are formed by making all possible combinations of the two factors. For example, if
the first factor has 3 levels and the second factor has 2 levels, then there will be 3  2 = 6 different
treatment groups.

As an example, let's assume we're planting corn. The type of seed and type of fertilizer are the two
factors we're considering in this example. This example has 15 treatment groups. There are 3 – 1 = 2
degrees of freedom for the type of seed, and 5 – 1 = 4 degrees of freedom for the type of fertilizer.
There are 2  4 = 8 degrees of freedom for the interaction between the type of seed and type of
fertilizer.

The data that actually appears in the table are samples. In this case, 2 samples from each treatment
group were taken.

  Fert I Fert II Fert III Fert IV Fert V

Seed A-402 106, 110 95, 100 94, 107 103, 104 100, 102

Seed B-894 110, 112 98, 99 100, 101 108, 112 105, 107

Seed C-952 94, 97 86, 87 98, 99 99, 101 94, 98

Main Effect
The main effect involves the independent variables one at a time. The interaction is ignored for this
part. Just the rows or just the columns are used, not mixed. This is the part which is similar to the
one-way analysis of variance. Each of the variances calculated to analyze the main effects are like the
between variances

Interaction Effect
The interaction effect is the effect that one factor has on the other factor. The degrees of freedom here
are the product of the two degrees of freedom for each factor.
Lecture Notes 8 – Testing of Hypothesis 49
Engr. Caesar Pobre Llapitan

Within Variation
The Within variation is the sum of squares within each treatment group. You have one less than the
sample size (remember all treatment groups must have the same sample size for a two-way ANOVA)
for each treatment group. The total number of treatment groups is the product of the number of
levels for each factor. The within variance is the within variation divided by its degrees of freedom.

The within group is also called the error.

F-Tests
There is an F-test for each of the hypotheses, and the F-test is the mean square for each main effect
and the interaction effect divided by the within variance. The numerator degrees of freedom come
from each effect, and the denominator degrees of freedom is the degrees of freedom for the within
variance in each case.

Two-Way ANOVA Table


It is assumed that main effect A has a levels (and A = a – 1 df), main effect B has b levels (and B = b – 1
df), n is the sample size of each treatment, and N = abn is the total sample size. Notice the overall
degrees of freedom is once again one less than the total sample size.

Source SS df MS F

Main Effect A given A, a - 1 SS / df MS(A) / MS(W)

Main Effect B given B, b - 1 SS / df MS(B) / MS(W)

Interaction Effect given AB, (a - 1)(b - 1) SS / df MS(AB) /


MS(W)

Within given N - ab, ab(n - 1) SS / df  

Total sum of others N - 1, abn - 1    

Summary
The following results are calculated using spreadsheet. It provides the p-value and the critical values
are for alpha = 0.05.

Source of SS df MS F P-value F-crit


Variation

Seed 512.8667 2 256.4333 28.283 0.000008 3.682

Fertilizer 449.4667 4 112.3667 12.393 0.000119 3.056

Interaction 143.1333 8 17.8917 1.973 0.122090 2.641

Within 136.0000 15 9.0667

Total 1241.4667 29

From the above results, we can see that the main effects are both significant, but the interaction
between them isn't. That is, the types of seed aren't all equal, and the types of fertilizer aren't all
equal, but the type of seed doesn't interact with the type of fertilizer.
Lecture Notes 8 – Testing of Hypothesis 50
Engr. Caesar Pobre Llapitan

Definitions of Key Terms


F-distribution
 The ratio of two independent chi-square variables divided by their respective degrees of
freedom. If the population variances are equal, this simplifies to be the ratio of the sample
variances.

Analysis of Variance (ANOVA)


 A technique used to test a hypothesis concerning the means of three or mor
populations.

One-Way Analysis of Variance


 Analysis of Variance when there is only one independent variable. The null hypothesis will be
that all population means are equal, the alternative hypothesis is that at least one mean is
different.

Between Group Variation


 The variation due to the interaction between the samples, denoted SS(B) for Sum of Squares
Between groups. If the sample means are close to each other (and therefore the Grand Mean)
this will be small. There are k samples involved with one data value for each sample (the
sample mean), so there are k-1 degrees of freedom.

Between Group Variance


 The variance due to the interaction between the samples, denoted MS(B) for Mean Square
Between groups. This is the between group variation divided by its degrees of freedom.

Within Group Variation


 The variation due to differences within individual samples, denoted SS(W) for Sum of Squares
Within groups. Each sample is considered independently, no interaction between samples is
involved. The degrees of freedom is equal to the sum of the individual degrees of freedom for
each sample. Since each sample has degrees of freedom equal to one less than their sample
sizes, and there are k samples, the total degrees of freedom is k less than the total sample size:
df = N - k.

Within Group Variance


 The variance due to the differences within individual samples, denoted MS(W) for Mean
Square Within groups. This is the within group variation divided by its degrees of freedom.

Scheffe' Test
 A test used to find where the differences between means lie when the Analysis of Variance
indicates the means are not all equal. The Scheffe' test is generally used when the sample sizes
are different.

Tukey Test
 A test used to find where the differences between the means lie when the Analysis of Variance
indicates the means are not all equal. The Tukey test is generally used when the sample sizes
are all the same.

Two-Way Analysis of Variance


 An extension to the one-way analysis of variance. There are two independent variables. There
are three sets of hypothesis with the two-way ANOVA. The first null hypothesis is that there is
no interaction between the two factors. The second null hypothesis is that the population
means of the first factor are equal. The third null hypothesis is that the population means of
the second factor are equal.
Lecture Notes 8 – Testing of Hypothesis 51
Engr. Caesar Pobre Llapitan

Factors
 The two independent variables in a two-way ANOVA.

Treatment Groups
 Groups formed by making all possible combinations of the two factors. For example, if the first
factor has 3 levels and the second factor has 2 levels, then there will be 3x2=6 different
treatment groups.

Interaction Effect
 The effect one factor has on the other factor

Main Effect
 The effects of the independent variables.

You might also like