You are on page 1of 5

Solutions Final Exam Quantitative Data Analysis 1 2023

Below a points indication ‘per step’ within each subquestion is given, (small) adjustments to these
points allocation might be done during the correction process!

Exercise 1 (10 pts = 5 + 2 + 3)


1 1
a. 𝑥̄ = ∑𝑛𝑖=1 𝑥𝑖 = ∗ (15 ∗ 25 + 25 ∗ 35 + 30 ∗ 45 + 20 ∗ 55 + 10 ∗ 65) (1 point)
𝑛 100
= 43.5
So the mean income is 43,500.00 Euro (1 point)

50
𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑑𝑖𝑎𝑛 = 𝐿50 = (100 + 1) ∗ 100 = 50.5 (1 point)

The 50th and 51st observation after ordering are in the class 40 – 50, so M=45
The median income is 45,000 Euro. (1 point)

The mode of the income data is the income value that appears most frequently. In this case,
the 40-50 income group is the class with the highest frequency, so the mode is 45,000 Euro.
(1 point)
b. The distribution is skewed to the left, (1 point)
since the mean < Median (1 point)

c. Y = number of employees with an income above 40,000 in the drawn sample


60 59
𝑃(𝑌 = 2) = 100 ⋅ 99 (2 points)
1 point for use multiplication rule and 1 point for correct probabilities
≈ 0.3576 (1 point)

Exercise 2 (18 pts = 4 + 3 + 3 + 8)

a. 𝐵𝑂𝑋~𝑁(𝜇𝐵𝑂𝑋 = 50 ⋅ 17, 𝜎𝐵𝑂𝑋 = 50 ∗ 0.52 ) (1 point)

960−50⋅17
𝑃(𝐵𝑂𝑋 > 860) = 𝑃 (𝑍 > ) (1 point)
√50⋅0.5

≈ 𝑃(𝑍 > 2.83) = 1 − 𝑃(𝑍 ≤ 2.83) (1 point)


≈ 1 − 0.9977 = 0.0023 (1 point)
ℎ−50∗17
b. 𝑃(𝐵𝑂𝑋 < ℎ) = 𝑃 (𝑍 < ) = 0.95 ⟹
√50⋅0.5
𝑃(𝐵𝑂𝑋 ≥ ℎ) = 0.05 (1 point)
𝑧5% = 1.645 (1 point)
⟹ ℎ = 𝜇𝐵𝑂𝑋 + 𝑧 ⋅ 𝜎𝐵𝑂𝑋 = 50 ⋅ 17 + 1.645 ⋅ √50 ⋅ 0.5 (1 point)
≈ 855.82 (1 point)
18.5−17.2
c. 𝑧 = 0.4 ≈ 3.25. (1 point)
The observed weight lies 3.25 standard deviations above the mean and is an potential
outlier. (1 points)
𝑠
d. 𝑥̅ ± 𝑡𝛼 𝑛 (1 point)
2 √
1 − 𝛼 = 0.95 ⇒ 𝛼 = 0.05 ⇒ 𝑡𝛼 = 𝑡0.025 ≈ 2.009 (use df=50 in table) (1 point)
2
0.4
95% CI for mean weight: 17.2 ± 2.009 ⋅ (1 point)
√50

≈ (17.09,17.31) (2 points)
1 point per boundary
The whole interval lies completely above 17 (1 point)
So, there is sufficient statistical evidence that the mean weight differs from 17 gram.
(1 point)
The confidence level of the test is 5%, since the test is two-sided (1 point)

Exercise 3 (10 pts = 3 + 4 + 4)

a. 𝑃(𝐴 = 8 𝑜𝑟 𝐵 = 2) = 𝑃(𝐴 = 8) + 𝑃(𝐵 = 2) − 𝑃(𝐴 = 8 𝑎𝑛𝑑 𝐵 = 2) (1 point)


=0.05+0.05+0.10+0.30+0.10+0.05−0.05 (1 point)
=0.60 (1 point)
b. 𝐸[𝐴|𝐵 = 5] = ∑𝑃(𝐴 = 𝑥|𝐵 = 5) ⋅ 𝑥 (1 point)
0.10 0.15 0.05
= ⋅2+ ⋅5+ ⋅8 (2 points)
0.10+0.15+0.05 0.10+0.15+0.05 0.10+0.15+0.05
1 point for correct P(B) and 1 point for correct joint probabilities
≈ 4.5 (1 point)
c. Let: profit = daily profit in Euro
The table with profits:

B\A 2 5 8

2 26 41 56

5 50 65 80

8 74 89 104

Computation/use correct profits (2 points)


-1 point for each incorrect profit

So, 𝑃(𝑝𝑟𝑜𝑓𝑖𝑡 ≥ 50) = 1 − 𝑃(𝑝𝑟𝑜𝑓𝑖𝑡 < 50) = 1 − (0.30 + 0.10) (1 point)


= 0.60 (1 point)

Exercise 4 (21 pts = 3 + 4 + 14)

a. Let:
X=Tenure
Y=Delegation Score
𝑠𝑥𝑦 157.024
𝑟𝑋𝑌 = 𝑠 𝑠 = 310.000⋅ 79.820 (2 points)
𝑋 𝑦 √ √
1 point for correct covariance and 1 point for correct standard deviations

≈ 0.9982 (1 point)

b. 𝑦̂ = 𝑏0 + 𝑏1 𝑥 with:
𝑠𝑥𝑦 157.024
𝑏1 = = ≈ 0.5065 (1 point)
𝑠𝑥2 310.000
𝑏0 = 𝑦̅ − 𝑏1 𝑥̅ ≈ 16.4233 − 0.5065 … ⋅ 31.00 = 0.7218 (1 point)

Interpretation slope: If tenure increases with 1, the delegation score goes up with 0.5065 on
average. (2 points)

c. 1. Hypotheses:
𝐻0 : 𝜌 = 0 (1 point)
𝑣. 𝑠.
𝐻1 : 𝜌 > 0 (1 point)
𝛼 = 1%

1 point per hypothesis, if 𝛼 not mentioned do not subtract points. If sample statistics used in
the hypothesis zero points for this step.

2. Test statistic and distribution:


𝑟√𝑛−2
𝑡= ~𝑡[𝑑𝑓 = n − 2] (2 points)
√1−𝑟 2

-1 point if distribution not mentioned s.

3. Conditions:
(1) Random sample (1 point)
(II) Large n or bivariate normal distribution in the population (1 point)

4. Rejection region:
Reject 𝐻0 ⇔ 𝑡 ≥ t α = 𝑡1% ≈ 2.467 (use 𝑑𝑓 = 28) (2 points)
If 𝑡1% mentioned, but value is wrong, still 1 point can be earned. If sign is to the
wrong side, or two-sided area used, zero points.

5. Outcome and decision:


0.450√28
t = √1−0.4502 ≈ 2.666 (2 points)

2.666 ≥ 2.467 ⇒ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 (𝑠𝑢𝑝𝑝𝑜𝑟𝑡 𝑓𝑜𝑟 𝐻1 ) (1 point)


Reasoning has to be mentioned to earn point

6. conclusion:
Given the sample and a significance level of 1%, there is sufficient evidence to infer
that there is positive correlation between delegation score and age. (3 points)
1 point for given sample and sign. level, 1 point for sufficient evidence and 1 point for
describing H1 in words (answer research question)

Remark: If complete incorrect test statistic given in step 2: only points can be earned
for step 1.
Exercise 5 (26 pts = 2 + 5 + 15 + 4)

a. It tells us that there is more variability in the sample mean after the implementation of the
training program than before the training program. (2 points)
1 point for mentioning variability and 1 point for mentioning that ‘after is higher'
𝑥̅𝑏𝑒𝑓𝑜𝑟𝑒 −42000
b. 𝑡 = 𝑆.𝐸.(𝑥̅𝑏𝑒𝑓𝑜𝑟𝑒 )
(1 point)
41500−42000
= 278.54301
(1 point)
= −1.795 (1 point)
use df=30-1=29
𝑡2.5% < 1.795 < 𝑡5% (1 point)

⇒ 2.5% < 𝑃(𝑡 > 1.795) < 5%


⇒ 2.5% < 𝑝 − 𝑣𝑎𝑙𝑢𝑒 < 5% (1 point)

c. This is a matched pairs test since the same sales representatives are studied before and after
the implementation of the program. (1 point)
Let 𝐷 = 𝐵𝑒𝑓𝑜𝑟𝑒 − 𝐴𝑓𝑡𝑒𝑟
1. Hypotheses:
𝐻0 : μ𝐷 = 0 (1 point)

𝑣. 𝑠.
𝐻1 : 𝜇𝐷 < 0 (1 point)
𝛼 = 5%
1 point per hypothesis, if 𝛼 not mentioned do not subtract points. If sample statistics
used in the hypothesis zero points for this step.

2. Test statistic and distribution:


𝑥̅ −𝜇 𝑥̅ 𝐷
𝑡 = 𝑠 𝐷/√𝑛0 = 𝑠 ~𝑡[𝑑𝑓 = 𝑛𝐷 − 1 = 29 (2 point)
𝐷 𝐷 𝐷 /√30

-1 point if distribution not mentioned and, also -1 point if 𝑋̅𝐷 . filled in. If ‘1’

3. Conditions:
(1) Random sample (1 point)
(II) 𝑛𝐷 = 30 ≥ 30, so normality not needed (1 point)

4. Rejection region:
Reject 𝐻0 ⇔ 𝑡 ≤ −𝑡𝛼 = −𝑡5% ≈ −1.699 (use 𝑑𝑓 = 29) (2 points)
If −𝑡5% mentioned, but value is wrong, still 1 point can be earned. If sign is to the wrong
side, or two-sided area used, zero points.

5. Outcome and decision:


t = −2.290 (SPSS output) (2 points)
Value can also be computed ‘manually’
−2.290 ≤ −1.699 ⇒ 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 (𝑠𝑢𝑝𝑝𝑜𝑟𝑡 𝑓𝑜𝑟 𝐻1 ) (1 point)
Reasoning has to be mentioned to earn point
6. conclusion:
Given the sample and a significance level of 5%, there is sufficient evidence to infer that
the mean sales after following the training program are greater than the mean sales
before following the training program. (3 points)
1 point for given sample and sign. level, 1 point for sufficient evidence and 1 point for
describing H1 in words (answer research question)

Remark: If complete incorrect test statistic given in step 2: only points can be earned
for step 1.
d. Joe might have made the error of not rejecting the null hypothesis that the mean sales after
the implementation of the training program are equal to 45,000 Euro, whereas in reality the
main sales after the implementation program are higher than 45,000 Euro. (2 points)

This is a type II error and has probability 𝛽, which depends on 𝛼, 𝑛, 𝜎 and the actual value of
𝜇𝐴𝑓𝑡𝑒𝑟 . (2 points)
1 point per mentioned factor

Exercise 6 (14 pts = 7 + 3 + 4)


a. Hypothesis:
𝐻0 : Job type and education level are independent (1 point)
𝑣. 𝑠.
𝐻1 : Job type and education level are dependent (1 point)
𝛼 = 5%
Observed value test statistic:
(𝑂𝑗 −𝐸𝑗 )2 (6−7.5)2 (3−5)2
𝜒 2 = ∑3×2
𝑗=1 𝐸𝑗
= 7.5
+ ⋯+ 5
≈ 2.267 (3 points)
-1 point per error in computation
Statistical decision:
2
Since 2.267 < 5.991 = Χ 5% ⇒ Do not Reject 𝐻0 /No support for 𝐻1 (2 points)
1 point for showing correct decision rule and point for actual decision

b. Since Job type is an ordinal variable, at least one of the categories is nominal (education level
is ordinal) and we need to use Cramers V. (3 points)
1 point for ‘Job type ordinal’, 1 point for at least one category ordinal and 1 point for
Cramers V

c. Use 𝛼 = 10% (1 point)


𝑝 = 0.138 > 0.10 ⇒ 𝐷𝑜 𝑛𝑜𝑡 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0 (2 points)
1 point for correct p-value and 1 point for correct decision (including rejection)
There is not sufficient statistical evidence to support that Salary is not normally distributed.
(1 point)

You might also like