You are on page 1of 8

Final Exam Quantitative Data Analysis 1 2022-2023

Before you start:


- Write your name under ’Name student’ on the answer sheet (page 1).
- Fill in your student number under ’Student ID’. Mark the corresponding number
for each of the columns.

While taking the exam:


- Write down your complete solutions of the problems in the answer boxes. Only a final
answer is worth zero points.
- Each (sub)question will have its own answer box. Make sure to answer the questions
in the correct answer box.
- Do not write or draw outside the answer boxes.
- Do not remove the staple from your exam! All sheets must remain attached.
- If you cannot complete your answer inside the answer box, continue your answer in the
’extra answer space’ box at the end of the exam (question 7). Make sure to refer to the
problem number.
- If you need an answer from a previous question and you did not manage to solve the
previous question, use a reasonable value.
- Use a '.' as a decimal separator and if you need to round, round your final answer to the
number of decimals as asked in the question.
- There is a formula sheet and a table sheet available for the exam.
- There are 6 problems (20 sub questions) and you can earn 100 points. Your grade is
the number of points divided by 10.
- The Final exam has a weight of 70% on the final grade for the course Quantitative
Data Analysis 1.
- During the test a simple calculator is allowed; a graphical or financial calculator is not
allowed.
- You are allowed to bring a dictionary to the exam. It is not allowed that there are notes
in the dictionary however.
- Scrap paper will not be corrected, you will need to hand it in at the end of the exam.
- Write with a blue or black pen.

1
Exercise 1 (10 pts = 5 + 2 + 3)

Suppose you have the following data on the annual income (in 1,000 €) of 100 employees of a
company:

Income (in 1,000 €) Number of employees

20 - 30 15

30 - 40 25

40 - 50 30

50 - 60 20

60 - 70 10

Questions:

a. Find the mean, median, and mode of the income data in Euro, using the class
midpoints. (2 decimals)
b. Is the distribution skewed to the left or to the right, based on your results of part a.?
c. Suppose you randomly select two employees out of the 100 without replacement.
What is the probability that both people have an income of more thatn 40,000 Euro?
(4 decimals)

2
Exercise 2 (18 pts = 4 + 3 + 3 + 8)

A company that produces electronic components for cars wants to ensure that the weight of the
boxes with components they send to the car manufacturers is within a certain range. A box contains
50 components; this can be seen as a random sample of size 50 drawn from the population. Let 𝐵𝑂𝑋
be the weight of a randomly selected box containing 50 components.
From the past it is known that the weight of one component is normally distributed with a
(population) mean of 17 gram and a (population) standard deviation of 0.5 gram.
Question:

a. Determine the probability that the total weight of the sample will be higher than 860
gram (𝑃(𝐵𝑂𝑋 > 860)). (4 decimals)
b. Determine the 95th percentile of the weight of the boxes (that is find ℎ in
𝑃(𝐵𝑂𝑋 < ℎ) = 0.95). (2 decimals)

A random box has been selected and the weight of each of the 50 components is measured; again
this can be seen as a random sample of 50 components. The mean weight of the components in the
box turns out to be 17.2 gram and the standard deviation of the weights appears to be 0.4 gram.

Question:

c. One observed weight of a component in the selected box is 18.5 gram, determine
the z-score of this observed weight in the sample and interpret this z-score. (2
decimals)
d. Construct a 95% confidence interval for the mean weight of a component, based on
the weight of the components in the box/sample and carefully draw a conclusion
about the conjecture that the population mean differs from 17 gram. Explicitly write
down the interval estimator (i.e. the formula), the resulting interval and the conclusion
with explanation and the significance level of the test.

3
Exercise 3 (10 pts = 3 + 4 + 4)

Suppose a company sells two types of products, A and B, and the daily demand for these
products follows the bivariate distribution shown in the table below:

B\A 2 5 8

2 0.30 0.10 0.05

5 0.10 0.15 0.05

8 0.05 0.10 0.10

Questions:

a. Determine the probability that the demand for product A is 8 or the demand for
product B is 2. (2 decimals)
b. Determine the expected value of the demand for product A given that the demand
for product B is 5. (2 decimals)

The company makes a profit of 5 Euro for each unit of product A sold and a profit of 8 Euro for each
unit of product B sold.

Question:

c. Determine the probability that the daily profit from selling these products is at least
50 Euro? (4 decimals)

4
Exercise 4 (21 pts = 3 + 4 + 14)

A company is interested in studying the delegation habits of its employees. The company believes
that employee delegation can be influenced by employee tenure and age. To explore this
relationship, the company has collected data on employee delegation behaviors.

They gathered information on 30 employees, including their tenure (in years), age (in years) and
delegation score (measured on a scale of 1-10, with higher scores indicating more delegation). First,
they want to determine whether there is a significant linear relationship between employee tenure
and delegation score. Below you can find (partial) SPSS output that can be used to answer this
question.

Correlations
Delegation
Tenure score
Tenure Pearson Correlation 1 ***
Sig. (2-tailed) *** ***
Sum of Squares and 8990.000 4553.700
Cross-products
Covariance *** 157.024
N 30 30
Delegation Pearson Correlation *** ***
score Sig. (2-tailed) ***
Sum of Squares and 4553.700 2314.774
Cross-products
Covariance *** ***
N 30 30

Questions:
a. Compute and interpret the correlation coefficient between employee tenure and delegation
score. (3 decimals)
b. Determine the sample linear regression equation for the relationship between employee
tenure and delegation score. Also interpret the slope of the regression equation. (4
decimals)

Now they want to study the relationship between Age and Delegation. In the (partial) SPSS output
below, you can find information that can be used to study this relationship:

5
Correlations
Delegation
Age score
Age Pearson 1 .450**
Correlation
Sig. (1-tailed) ***
N 30 30
Delegation Pearson .450** 1
score Correlation
Sig. (1-tailed) ***
N 30 30

Question:
c. Test if there is positive correlation between delegation scores and age using a hypothesis
test with 𝛼 = 1%. Explicitly write down: the hypotheses, the test statistic and its
distribution, the conditions, the rejection region, the sample outcome and decision and the
conclusion.

6
Exercise 5 (26 pts = 2 + 5 + 15 + 4)
A company has implemented a new training program for their sales representatives and wants to
know if the program has led to an improvement in sales performance. To test this, the company
has collected data from a random sample of sales representatives at two points in
time: once before the training program was implemented and then after the program was
implemented.
The company wants to know if there is a statistically significant difference in the average sales per-
formance of the sales representatives before and after the training program. Below you can find an
analysis (SPSS output) that can be used to analyze this. The sales performance is measure in Euro per
month sales per sales representative.

Questions:
a. Explain what the difference between the two standard errors of the mean for ‘Before’ and
‘After’ tells us about the sample mean of ‘Before’ and ‘After’.
b. Approximate the p-value (give a lower and an upper bound) for the hypothesis test to draw
a conclusion about the conjecture that the mean sales before the implementation of the
training program (𝜇𝐵𝑒𝑓𝑜𝑟𝑒 ) is below 42,000 Euro per month. Also draw a conclusion about
this conjecture when 𝛼 = 1%.
c. Perform a hypothesis test with 𝛼 = 2.5%, to draw a conclusion about the conjecture that
the mean sales after the training program (𝜇𝐴𝑓𝑡𝑒𝑟 ) are greater than the mean sales before
the training program (𝜇𝐵𝑒𝑓𝑜𝑟𝑒 ). Explicitly write down: the hypotheses, the test statistic and
its distribution, the conditions, the rejection region, the sample outcome and decision and
the conclusion.
d. Joe tested the hypothesis that the mean sales after the implementation of the training pro-
gram (𝜇𝐴𝑓𝑡𝑒𝑟 ) are higher than 45,000 Euro per month. He concluded that there is not suffi-
cient evidence to infer that the monthly sales are above 45,000 Euro after the implementa-
tion of the training program. Describe the possible error that he might have made and men-
tion two factor(s) the probability of making this error depends on.

7
Exercise 6 (14 pts = 7 + 3 + 4)
You have been hired as a data analyst for a company that is interested in exploring the relationships
between job type, education level, and salary. You have been given a dataset consisting of the fol-
lowing variables: job type, education level, and salary. The company has provided you with a data
set consisting of 40 observations, you entered the observed values into SPSS and produced the fol-
lowing SPSS output:

Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
Salary .149 40 .200* .927 40 .138
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction

Questions:

a. Perform a categorical association test between job type and education level, use 𝛼 =
5%. State the null and alternative hypotheses, compute the observed value of the test
statistic, and write down your statistical decision.
b. Give the measure that is best suited to describe the categorical association between
job type and Education level. Motivate your answer.
c. Can we, using the appropriate statistical test, conclude that the variable salary is
normally distributed? Explicitly write down the significance level you choose and use
the p-value to draw a conclusion.

You might also like