Eda Hypothesis Testing For Two Sample

You might also like

You are on page 1of 5

Page 1 of 5

ENGINEERING DATA ANALYSIS


HYPOTHESIS TESTING FOR TWO SAMPLES

In the previous chapter, tests were described for a parameter of a single population. In this chapter, some of these tests will
be developed, and new ones introduced, to test for the equality of a parameter in two populations.

Test on Mean Difference when Population Variance is Known

Investigating claims of a population mean difference generally requires a comparison of two sample means. The variance of
a sample mean depends upon the sample size and the variance of the population from which the sample is selected.
Consequently the sizes of the two samples and the variances of the two populations will influence the comparison of sample
means.

The standardised normal statistic and may be used to test the equality of two normal population means 𝜇1 𝑎𝑛𝑑 𝜇2 based
upon independent random samples is given by
(𝑥̅̅̅1 − ̅̅̅)
𝑥2
𝑧=
𝜎2 𝜎2
√ 1 + 2
𝑛1 𝑛2

Example: The alkalinity in milligrams per liter of water in the upper reaches of rivers in a particular region is known to be
normally distributed with a standard deviation of 10 mg/L. Alkalinity readings in the lower reaches of rivers in the same
region are also known to be normally distributed, but with a standard deviation of 25 mg/L.
Investigate at 1% level of significance, the claim that the true mean alkalinity of water in the lower reaches of this river is
greater than that in the upper reaches.

Test on Mean Difference when Population Variance is Unknown

In most practical situations the population variances are unknown and the samples sizes are less than 30. If it may be
assumed that the population variances are equal, but unknown then a test is available for all sample sizes providing the two
populations are normal.
The test statistic for this is given by the t-statistic with degrees of freedom 𝑑𝑓 = 𝑛1 + 𝑛2 − 2,

(𝑥
̅̅̅1 − ̅̅̅)
𝑥2
𝑡=
1 1
𝜎
̂√𝑝 𝑛 +𝑛
1 2
Where 𝜎
̂𝑝 can be derived from the equation of the pooled estimate variance

Example: Mr Brown is the owner of a small bakery in large town. He believes that the smell of fresh baking will encourage
customers to purchase goods from his bakery. To investigate this belief, he records daily sales for 10 days when all the
bakery’s windows are open, and the daily sales for another 10days when all the windows are closed. Assume that these
data to be random and from normal populations, investigate the baker’s belief.

EDA: Hypothesis Testing for Two Samples


Page 2 of 5

Assignment: Answer the following problems

1. A program organizer wishes to determine which among two popular local bands A and B is more favoured by
teenagers in Pangasinan. Two random samples of 250 and 300 teenagers were gathered in two concert areas in
Pangasinan. Band A performed in the 1st group and band B performed in the 2nd group. After their performances,
the audience evaluated the two bands from 1-10 with 10 as the highest. Band A scored 8.73 with a standard
deviation of 1.15 and band B scored 8.85 with a standard deviation of 0.80. Does this mean that band B was more
favoured? Let alpha level be at 0.05.

2. A microbiologist wishes to determine whether there is any difference in the time it takes to make yoghurt from two
different starters; lactobacillus acidophilus (A) and bulgarius (B). Seven batches of yoghurt were made with each of
the starters. Assuming that both sets of times may be considered to be random from normal populations with the
same variance, test the hypothesis that the mean time taken to make yoghurt is the same for both starters.

Test on Difference between Two Proportions

It deals with the procedures for drawing inferences about the difference between populations whose data are nominal.
When the data are nominal in nature the only way to make a meaningful computation is by counting the occurrence of each
type of outcome and compute the proportions. Thus, the parameter to be tested and estimated is the difference between
two proportions. The formula for the z test for comparing two proportions is

̂1 − 𝑝̂
(𝑝 2) 𝑥1 𝑥2
𝑧= ; 𝑝
̂1 = ; 𝑝
̂2 =
1 1 𝑛1 𝑛2
√𝑝̅𝑞̅ (
𝑛1 𝑛2 )
+

𝑥1 + 𝑥2
𝑝̅ =
𝑛1 + 𝑛2

𝑞̅ = 1 − 𝑝̅

Assumptions:

1. Subjects are randomly selected and independently assigned to two groups.


2. Data are nominal level

Hypotheses:

𝐻𝑜 : 𝑝1 = 𝑝2

𝐻𝑎 : 𝑝1 ≠ 𝑝2

𝐻𝑎 : 𝑝1 > 𝑝2

𝐻𝑎 : 𝑝1 < 𝑝2

Example: In a sample of 240 store customers, 72 used Visa card. In another sample of 190, 76 used a Master card. At 10%
confidence interval, is there a difference in the proportion of people who use each type of credit card?

EDA: Hypothesis Testing for Two Samples


Page 3 of 5

One way Analysis of Variance

We previously learned how to compare two population means using either the pooled two-sample t-test or Welch's t-test.
What happens if we want to compare more than two means? In this lesson, we'll learn how to do just that. More specifically,
we'll learn how to use the analysis of variance method to compare the equality of the (unknown) means μ1, μ2,
…, μm of m normal distributions with an unknown but common variance σ2. Take specific note about that last part.... "an
unknown but common variance σ2." That is, the analysis of variance method assumes that the population variances are
equal. In that regard, the analysis of variance method can be thought of as an extension of the pooled two-sample t-test.

Before we can use this test, we should first consider the assumptions to be met:

1. For each group (sample), the population where it was derived is normally distributed, or at least the sampling
distribution is normally distributed.
2. The population for each group have the same variance.
3. Each of the samples is independent random sample.

The Basic Idea Behind Analysis of Variance

Analysis of variance involves dividing the overall variability in observed data values so that we can draw conclusions about
the equality, or lack thereof, of the means of the populations from where the data came. The overall (or "total") variability is
divided into two components:
 the variability "between" groups
 the variability "within" groups

We summarize the division of the variability in an "analysis of variance table", which is often shortened and called an
"ANOVA table." The detail of the computations of the analysis of variance can be summarized in a table as follows:

Samples Having the Same Sizes

If the random samples are of same sizes, say size 𝑛, coming from 𝑘 different populations what we usually want to
determine is a proper decision for the following hypothesis;

𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑘
𝐻1 : At least two of the means are unequal.

Source of Sum of 𝑑𝑓 Mean Square Computed


Variation the 𝑓
Squares

Between 𝑆𝑆𝐵 𝑘−1 𝑆𝑆𝐵


groups 𝑠1 2 =
(column) 𝑘−1

𝑠1 2
Within 𝑆𝑆𝑊 𝑘(𝑛 𝑠2 2 𝑓= 2
𝑠2
groups (error) − 1) 𝑆𝑆𝑊
=
𝑘(𝑛 − 1)

Total 𝑇𝑆𝑆 𝑛𝑘 − 1

∑𝑘𝑖=1 𝑇𝑖 2 𝑇 2
𝑆𝑆𝐵 = −
𝑛 𝑛𝑘

EDA: Hypothesis Testing for Two Samples


Page 4 of 5

𝑘 𝑛
𝑇2
𝑇𝑆𝑆 = ∑ ∑ 𝑥𝑖𝑗 2 −
𝑛𝑘
𝑖=1 𝑗=1
𝑆𝑆𝑊 = 𝑇𝑆𝑆 − 𝑆𝑆𝐵

; 𝑤ℎ𝑒𝑟𝑒 𝑛 𝑖𝑠 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑖𝑛 𝑒𝑎𝑐ℎ 𝑔𝑟𝑜𝑢𝑝 𝑎𝑛𝑑 𝑘 𝑖𝑠 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑔𝑟𝑜𝑢𝑝𝑠

Example:
Students were given different drug treatments before revising for their exams. Some were given a memory drug, some a
placebo drug and some no treatment. The exam score (%) are shown below for the three different groups:

Memory Drug Placebo No treatment


70 37 3
77 43 10
83 50 17
90 57 23
97 63 30
Carry out a one way ANOVA by hand to test the hypothesis that the treatments will have different effects.

Samples of Unequal Sizes.

When the samples don’t have equal sizes, we may still follow the same methods as if we were performing ANOVA with
samples of the same sizes provided that we slightly reform the formula for SSB into
𝑘
𝑇𝑖2 𝑇 2
𝑆𝑆𝐵 = ∑ −
𝑛𝑖 𝑁
𝑖=1

Assuming that 𝑛1 , 𝑛2 , … 𝑛𝑘 are the sample sizes with 𝑁 = ∑𝑘𝑖=1 𝑛𝑖 and replacing 𝑛𝑘 in TSS by 𝑁 we have,

𝑘 𝑛
𝑇2
𝑇𝑆𝑆 = ∑ ∑ 𝑥𝑖𝑗2 −
𝑁
𝑖=1 𝑗=1

𝑆𝑆𝑊 = 𝑇𝑆𝑆 − 𝑆𝑆𝐵


Source of Sum of 𝑑𝑓 Mean Square Computed
Variation the 𝑓
Squares

Between 𝑆𝑆𝐵 𝑘−1 𝑆𝑆𝐵


groups 𝑠1 2 =
(column) 𝑘−1

𝑠1 2
Within 𝑆𝑆𝑊 𝑁−𝑘 𝑆𝑆𝑊 𝑓=
𝑠2 2 = 𝑠2 2
groups (error) 𝑁−𝑘

Total 𝑇𝑆𝑆 𝑁−1

Example: A psychologist was interested in whether different TV shows lead to a more positive outlook on life. People were
split into 4 groups and then taken to a room to view a program. The four groups saw: Eat BUlaga , Ang Probinsyano, TV
Patrol, Unang Hirit. After the program a blood was taken and serotonin levels measured.

EDA: Hypothesis Testing for Two Samples


Page 5 of 5

Eat Bulaga Ang Probinsyano TV Patrol Unang Unit


11 4 4 7
7 8 3 7
8 6 2 5
14 11 2 4
11 9 3 3
10 8 6 4
5 4
4
Carry out a one –way ANOVA by hand to test the hypothesis that some TV shows make people happier than others.

EDA: Hypothesis Testing for Two Samples

You might also like