You are on page 1of 14

Final Exam of Business Statistics I at ADA University

Instructions:
1) You have to work alone. You are welcome to use lecture notes and any other supplementary
materials;
2) We will have the oral part of the exam where you have to explain and defend your answers;
3) For the exam, you have to find your dataset from the exam_data.xlsx file. We will refer to it as “your
dataset”.This data is different for everybody. The student who uses another person’s dataset will receive
an automatic zero grade.
4) Please submit your final answer in pdf format. I prefer typing. If you use handwriting make sure that it
is legible. Otherwise, your final mark will be affected.
5) I encourage you to draw graphs while solving questions.
31/07/2020
2
Hypothesis Testing Regarding 1 Population [60 percent,
each question carries 4 percent weight]
Test of Hypothesis about population mean when standard deviation is unknown
1) Assume that your dataset is a sample from the population, and each number refers to a daily
expenditure of a given student. Now, I am interested in the population average i.e. mean
expenditure of ADA students. However, the problem is I do not have the full population data.
What is your estimator and estimate of the population mean?

Daily expenditure of given student


7
8
7
5
4
2
5
5
5
5
5
5
6
6
5
5
4
8
4
4
3
8
3
4
3
4
5
7
4
8
5.133333333
Average
1.634400308
Standard Deviation

Estimator and population estimates are calculated as follows:

sigma 1.633
confidence
level 0.95
alpha 0.05
   
n 30
5.13333
x-hat 3
1.95996
z_alpha/2 4
   
   
0.58435
E 1
5.71768
Upper limit 4
4.54898
Lower limit 3

2) Build 90%, 95% and 99% confidence intervals for the population mean. Note that we do not
know the standard deviation. Remember of “Chinese product” joke.

Z = X bar + mu/ sigma

Desired Confidence Interval Z Score


90% 1.645
   
95% 1.96
   
99% 2.576

3) How many observations do I need to have the margin of error equal to 2 at 95% confidence
level?

We see that the  margin of error tells how many percentage points your results will differ from the real
population value. For example, a 95% confidence interval with a 2 percent margin of error means that
your statistic will be within 2 percentage points of the real population value 95% of the time.

Z* sqrt( p (1-p))/n <= 2

1.96 * 0.5/ sqrt( n) <= 2

0.98/sqrt(n) <= 2/100

N= 7

4) Take the average of the first three numbers in your dataset, let’s call it ‘c’. Test the hypothesis
that the true mean of expenditures is equal to ‘c’. Choose 5% level of significance and perform
the test using critical value approach. Moreover, you have to perform the same test in left-
tailed, right-tailed and two-tailed version but keep 𝛼=0.05 in all cases.

Ho : True mean of expenditure = 7.34


H1 : True mean of expenditure not equal to 7.34
Considering first three
datasets
  7
  8
  7
Average 7.333333333

𝛼=0.05
Z = (x-bar - µ)/ σ

Z = 7.34 – 5.14/ 1.64 = 1.3416

For n = 3 and 𝛼=0.05 ,

Z value using the statistical table = 2.353

So Z calculated is less than z value we say that the average expenditure for 3 datasets is not equal to
true mean and accept the null hypothesis

5) Perform the same hypothesis testing (right,left and two-tailed version) using p-value approach.
Define p-value and find an approximate p value using the appropriate table. Again keep 𝛼=
0.05.

A p value is used in hypothesis testing to help you support or reject the null hypothesis. 

Ho : True mean of expenditure = 7.34


H1 : True mean of expenditure not equal to 7.34

Z = (x-bar - µ)/ σ/sqrt(n)

Z = 5.14 – 7.34 / ( 1.64/ sqrt(3))

Z = -1.8960

C = 0.95

1-c = 0.05 = alpha

We accept the alternative hypothesis


Test of Hypothesis about population proportion
1) Assume that your dataset is a sample from the population, and each number refers to a daily
expenditure of a given student. Now, I am interested in proportion of ‘poor students’. So I
define ‘poor students’ to be those whose daily expenditures are less than 5 manats per day.
However, the problem is I do not have the full population data. What is your estimator and
estimate of the population proportion?

Poor students
4
2
4
4
4
3
3
4
3
4
4

sigma 0.69
confidence level 0.95
alpha 0.05
   
n 11
x-hat 3.56
z_alpha/2 1.959963985
   
   
E 0.029547569
Upper limit 3.589547569
Lower limit -0.029547569
2) Build 90%, 95% and 99% confidence intervals for the population proportion.

Desired Confidence Interval Z Score


90% 1.645
   
95% 1.96
   
99% 2.576

90%

Z = 1.96 * 0.69/ SQRT (11)

Z = 0.1229

95%

Z = 1.645 * 1.96/ SQRT(11)

Z = 1.3009

99%

Z = 2.576 * 1.96/ SQRT(11)

Z = 1.53
3) How many observations do I need to have the margin of error equal to 2 at 95% confidence
level?

Z* sqrt( p (1-p))/n <= 2

1.96 * 0.5/ sqrt( n) <= 2

0.98/sqrt(n) <= 2/100

N <= 7

4) Take the average of the first three numbers in your dataset and divide that by 10, and let’s call it
‘c’. Test the hypothesis that the true population proportion is equal to ‘c’. Choose 5% level of
significance and perform the test using critical value approach. Moreover, you have to perform
the same test in left-tailed, right-tailed and two-tailed version but keep 𝛼=0.05 in all cases.

  4
  2
  4
 c 3.333333333
sigma 0.333333333

Ho : True mean of expenditure = 7.34


H1 : True mean of expenditure not equal to 7.34

Z alpha * sigma / sqrt(3)

= 1.96* 0.333/ sqrt(3)


= 0.3774

Z alpha at n = 3 and 95% confidence interval = 2.353

Accept the alternative hypothesis H1

5) Perform the same hypothesis testing (right,left and two-tailed version) using p-value approach. Define
p-value and find an approximate p value using the appropriate table. Again keep 𝛼= 0.05.

A p value is used in hypothesis testing to help you support or reject the null hypothesis. 

Ho : True mean of expenditure = 7.34


H1 : True mean of expenditure not equal to 7.34

Z = (x-bar - µ)/ σ/sqrt(n)

Z = 3.34 – 0.33 / ( 1.64/ sqrt(3))

Z = 3.1757

C = 0.95

1-c = 0.05 = alpha

We accept the alternative hypothesis


Hypothesis testing regarding the equality of 2 population variances:
1) Divide your dataset in two equal parts with 15 data values in each part. Assume the first part is a
random sample from the while the second part is a random sample from the daily expenditures
of BSU students(Population 2). Test the hypothesis about equality of the 2 population variances.

Populaton1 Population2

7 5

8 4

7 8

5 4

4 4

2 3

5 8

5 3

5 4

5 3

5 4

5 5

6 7

6 4

5 8
Average
5.33 4.93
Standard deviation
1.35 1.81
Variance
1.82 3.26

Ratio of variances = 1.82/ 3.26 = 0.5583 which is not equal to 1


Hence as per the F test if variances ratio = 1 then equal variance

Hence we accept null hypothesis that variances of two populations are not equal

2) What is the test statistic for testing the hypothesis about the equality of two population
variances? What is the distribution of the test statistic? Try to draw it and show some
characteristics of the distribution.

We make use of the two-tailed version tests against the alternative that the variances are not equal.

So F test is used here to find whether two independent estimates of variance can be assumed to


be estimates of the same variance.

Null hypothesis - two normal populations have the same variance.

Alternative Hypothesis - two normal populations do not have the same variance.

F-Test Two-Sample for Variances    


     
  Populaton1 Population2
Mean 5.333333333 4.933333333
Variance 1.952380952 3.495238095
Observations 15 15
df 14 14
F 0.558583106  
P(F<=f) one-tail 0.143906706  
F Critical one-tail 0.402620943  
Hypothesis Testing Regarding Several Population
Proportions [8 percent, each question carries 4 percent
weight]
1) Take the first 6 values of your dataset. For example, you can check the dataset in front of my
name in exam_data.xlsx file. Those values for my case are as follows (it will be different for each of
you):

Assume these are the results of a die rolling experiment such that the value 1 appeared 6 times, the
value of 2 appeared 5 times and so on. Let me show it in a table format:

χ2 for α =0.05 is

χ2=Σ (O−E)2 / E

The first 6 datasets in my sample are :

7
8
7
5
4
2
The critical value approach used is The level of significance which is selected in using the given α =0.05
that dictates the critical value.

H0 : The die is fair


H1: The die is not fair

Values on a Die 1 2 3 4 5 6 Sum  


Observed
Frequency 6 5 8 3 6 4 32  
Expected
Frequency 5.33 5.33 5.33 5.33 5.33 5.33    
Assumed 0.166666 0.16666 0.16666 0.16666 0.16666 0.16666
distribution 7 7 7 7 7 7 1  
(O-E)^2 0.44 0.11 7.11 5.44 0.44 1.78    

sum[ (O-E)^2/E] 0.08 0.02 1.33 1.02 0.08 0.33 2.88 Chi squre value

Chi square for alpha = 0.05 and df = 6-1 = 5

Chi square is 11.07

Obtained chi square is 2.88 less than chi square using table so we accept the null hypothesis that the die
is fair that is been used
3) Define what is a p-value. Use p-value approach to test the same hypothesis. You are allowed to
use approximations based on the appropriate tables.

P value is 11.07 using chi square table

Theoretical question[4 percent]


1) Define Central Limit Theorem and Law of Large Numbers in your own words and referring to the
simulation we did I R program.

We state the Central Limit Theorem.as that if suppose we have a population with mean μ and
standard deviation σ and take sufficiently large random samples from the population with
replacement , then the distribution of the sample means will be approximately normally
distributed.

In the case of the law of large numbers states that the sample mean of independent and
identically distributed observations converges to a certain value. 

You might also like