You are on page 1of 7

Topic: Foundations of Statistics and Probability for Data Science

Sub-Topic: Statistical test- t test, Chi square test

LEARNING OBJECTIVES :

1. Be able to explain statistical tests and their needs ?

2. Be able to explain t-test

2.1. Be able to explain t-test introduction and basic t-test calculation

2.2. Be able to explain Small sample vs large sample

3. Be able to explain chi square test

3.1. Be able to explain Chi-square distribution and uses

3.2. Be able to explain Chi-square test for goodness of fit with an example

3.3. Be able to explain Chi-square test for independence of two variables


with an example

Name of Presenter: Pavan Kumar S Page 1 of 7


Date of Presentation: 17/08/2021
Topic: Foundations of Statistics and Probability for Data Science
Sub-Topic: Statistical test- t test, Chi square test

QUESTION 1 :
An outbreak of Salmonella-related illness was attributed to ice cream produced at a
certain factory. Scientists measured the level of Salmonella in 9 randomly sampled
batches of ice cream.

The levels (in MPN/g) were: 0.593, 0.142, 0.329, 0.691, 0.231, 0.793, 0.519, 0.392,
0.418

Is there evidence that the mean level of Salmonella in the ice cream is greater than
0.3 MPN/g? (Any value greater than 0.3 MPN/g is considered dangerous)

t-distribution is a continuous probability distribution, derived from normal distribution. It is


used when the sample is too small to capture the population distribution.

t-test is used in place of z-test when :


❏ Population variance is not known
❏ Sample size is small (< 30)

Name of Presenter: Pavan Kumar S Page 2 of 7


Date of Presentation: 17/08/2021
Topic: Foundations of Statistics and Probability for Data Science
Sub-Topic: Statistical test- t test, Chi square test

i. Null and alternative hypothesis

H0 : μ <= 0.3

H1 : μ > 0.3

This is right-tailed test.

Sample size (n) = 9

Sample mean (𝑥) = 0.4564444

Sample SD (s) = 0.2128439

Degrees of freedom (df) = n - 1 = 8

We will use the t-test, since n (sample size) < 30 and population standard deviation
is not known.

ii. Compute t values

T_critical = qt(0.05, df, lower.tail = FALSE) = 1.859548

0.4564444 − 0.3 0.1564444 ∗ 3 0.4693332


T score = = = = 2.205
0.2128439/√9 0.2128439 0.2128439

Since, t score > t critical value ( 2.2051 > 1.859548 ), the t-score lies in the critical
region, hence we reject the Null hypothesis.

Name of Presenter: Pavan Kumar S Page 3 of 7


Date of Presentation: 17/08/2021
Topic: Foundations of Statistics and Probability for Data Science
Sub-Topic: Statistical test- t test, Chi square test

Since, p critical or alpha > p score ( 0.05 > 0.02926785 ), the t-score lies in the
critical region, hence we reject the Null hypothesis.

QUESTION 2 :

A national survey agency conducts a nationwide survey on consumer satisfaction


and finds out the response distribution as follows:

Response Frequency

Excellent 8%

Good 47%

Fair 34%

Poor 11%

A store manager wants to find if these results of customer surveys apply to the
customers of supermarkets in her city. So, she interviewed 207 randomly selected
customers and asked them to rate their responses. The results of this local survey
are given below.

Response Frequency

Excellent 21

Name of Presenter: Pavan Kumar S Page 4 of 7


Date of Presentation: 17/08/2021
Topic: Foundations of Statistics and Probability for Data Science
Sub-Topic: Statistical test- t test, Chi square test

Good 109

Fair 62

Poor 15

Determine if the local responses from this survey are the same as expected
frequencies of the national survey, at 95% confidence level.

chi-distribution is a continuous probability distribution to check independence of two


categorical variables.
χ2 = ∑ (Observed − Expected ) 2/ Expected

Ho : Observed = Expected

Ha : Observed ≠ Expected

Chi is always a one-tailed test.

Degrees of Freedom, df = n - 1 = 3

Significance level = 0.05

Observed = 21, 109, 62, 15

Expected = 8 * 207 , 47 * 207 , 34 * 207 , 11 * 207 = 16.56, 97.29, 70.38, 22.77


100 100 100 100

chi = 6.249084

Name of Presenter: Pavan Kumar S Page 5 of 7


Date of Presentation: 17/08/2021
Topic: Foundations of Statistics and Probability for Data Science
Sub-Topic: Statistical test- t test, Chi square test

P ( chi > 6.249084 ) = 0.100101

The p score is larger than the significance value. Hence, we fail to reject the null
hypothesis.

Chi critical = 7.814728

The test statistic falls in the acceptance region. Hence, we fail to reject the null
hypothesis. Therefore, local responses from this survey are the same as expected
frequencies of the national survey.

QUESTION 3 :

A survey is conducted by a gaming company that makes three video games. It wants
to know if the preference of game depends on the gender of the player. Total
number of participants is 1000. Here is the survey result.

Game A Game B Game C Total

Male 200 150 50 400

Female 250 300 50 600

Total 450 450 100 1000

Name of Presenter: Pavan Kumar S Page 6 of 7


Date of Presentation: 17/08/2021
Topic: Foundations of Statistics and Probability for Data Science
Sub-Topic: Statistical test- t test, Chi square test

Is men's preference different from women's preference?

Check with 0.05 level of significance.

Ho : Preference of game is independent of Gender

Ha : Preference of game is dependent on Gender

Chi is always a one-tailed test.

Degrees of Freedom, df = ( m - 1 )( n - 1 ) = 2 * 1 = 2

Significance level, α = 0.05

Observed = 200, 250, 150, 300, 50, 50

Expected = 400 * 450 , 600 * 450 , 400 * 450 , 600 * 450 , 400 * 100 , 600 * 100

1000 1000 1000 1000 1000 1000

Expected = 180, 270, 180, 270, 40, 60

chi = 16.2037

P ( chi > 16.2037 ) = 0.0003029775

The p score is smaller than the significance value. Hence, we reject the null
hypothesis.

Name of Presenter: Pavan Kumar S Page 7 of 7


Date of Presentation: 17/08/2021

You might also like