You are on page 1of 16

Data Analysis

Two Sample t-test


Equal & unequal Variance
Using R

Sahil & Anshika


Checking the variance of the data set #1
By default, the R t.test() function makes the assumption that the variances of the two groups of samples, being
compared, are different. Therefore, Welch t-test is performed by default. Welch t-test is just an adaptation of
t-test, and it is used when the two samples have possibly unequal variances.
Thus, we’ll use F-test to test for differences in variances.

The following R code can be used : var.test(x,y)


T test for Hypothesis #1
Aim:
We have a cohort of 20 individuals (10 women and 10 men). The question is to test whether
women’s average weight is significantly different from men’s average weight? The number of
individuals considered here is obviously low. This is just to illustrate the usage of two-sample
t-test.

Data are saved in two differents numeric vectors (x and y) : hence to perform the T
test we have improvised the commands in R, as > t.test(data2, data3) is not applicable
in this case, since both data2 and data3 and in the same excel.
Hypothesis
Null hypothesis is false if and only if
level alpha = 0.05 > p- value
Rejection Region
A critical region, also known as the rejection region, is a set of values for the test statistic for which the null hypothesis
is rejected. i.e. if the observed test statistic is in the critical region then we reject the null hypothesis and accept the
alternative hypothesis.
How we proceeded .

Unlike in the case mentioned in the book, our data set was fairly different, hence
we decided to improvise with our commands.
t.test(x,y) was used to make sure we stay within the domain of our textbook.
However to feed our entry into R we used to a different approach.

.
Result

The t value which we got was


t= -3.1704

Now in most softwares we would be required to look the the


corresponding p value to know it’s value. However in R, the p
value is represented straight away p-value= 0.005319

The confidence interval (conf.int) of the mean differences at


95% is also shown (conf.int= [7.4, 36.52]); and finally, we
have the means of the two groups of samples (average weight
of women = 73.21, average weight of men =51.25).
Conclusion for Equal Variance
The p-value of the test is 0.0053, which is less than the significance level alpha =
0.05. We can then reject the null hypothesis and conclude that women’s
average weight is significantly different from men’s average weight with a
p-value = 0.0053.
Checking the variance of the data set #2
T test for Hypothesis #2
Aim: We have two locations that serve fast food through a drive-up window. Management would like to study the time
it takes to fill an order at each location, is there evidence of a difference in the mean waiting time between the two
branches

The time it takes to fill an order is defined as the number of


minutes it takes from when a customer gives the order to when
it is delivered to that customer at the drive up window
Data are collected from a random sample of 20 cars at each
location, lets call it location A and Location B.
Hypothesis
Null hypothesis is false if and only
if
level alpha = 0.05 > p- value
Rejection Region
How we proceeded
Result

The t value which we got was


t= -0.91009

Now in most softwares we would be required to look the the corresponding p value to know it’s
value. However in R, the p value is represented straight away p-value= 0.3717

The confidence interval (conf.int) of the mean differences at 95% is also shown (conf.int=
[-1.2898638, 0.4998638]); and finally, we have the means of the two groups of samples (average
x= 3.315, y =3.710).
Conclusion for Unequal Variance
The p-value of the test is 0.3717, which is more than the significance level alpha = 0.05.
We cannot then reject the null hypothesis and hence we conclude that there is no
evidence of a difference between the two location means

You might also like