Professional Documents
Culture Documents
a. What are the two broad branches of statistics? Clearly explain the purpose of each branch.
Descriptive statistics convert data into information using numerical and graphical methods.
Inferential statistics: we use sample data in order to test claims made about the population
parameters (which include μ, σ2, P, and ρ).
Answer: (μ, σ2, σ, P, σxy, and ρ) are population parameters, which are unknown. Each
population parameter takes only one (true) value.
^
Answer: ( , S2, S, P , Sxy, and rxy) are sample statistics. Sample statistics are the point
estimates of the population parameters.
Page 1
e. What is the reason that a sample statistic is subject to an error? Explain clearly.
Problem 2 (Unit 1). Consider the GDP (gross domestic product) of GCC countries in 2022:
a. Construct the frequency distribution table for these data. Make sure that the table includes the
frequency, the relative frequency, and the slice of the pie. Show your work.
______________________________________________________________________________
Country Frequency Relative frequency (%) Slice of the pie (◦) .
Page 2
d. What information is conveyed by each chart? Clearly explain.
The bar chart (on the left) provides information about the GDP of each country, measured
in billions of $. With $1,733 billion, Saudi Arabia had the highest GDP in 2022. With $75
billion, the Bahrain had the smallest GDP.
The pie chart (on the right) provides information about the proportion. Again, with 58.9%
(or with a 212.04◦ slice of the pie), Saudi Arabia had the highest GDP in 2022. With 2.6%
(or with a 9.36◦ slice of the pie), Bahrain had the smallest GDP in 2022.
Problem 3 (Unit 1). Supporters claim that new windmills can generate an average of 800 kilowatts
of power per day. Based on a random sample of 100 windmills, the average power per day is found
to be 776 kilowatts. Now, answer the following questions.
a. What kind of descriptive measure are we dealing with here (mean, variance, or proportion)?
Answer: Mean
Answer: “800 kilowatts” represents the claimed value of the population mean (μ)
Answer: “ ̅ = 776 kilowatts” represents the point estimate of the population mean (μ)
Page 3
i. Test the validity of the claim (made by the supporters), assuming that the standard error of the
sample mean is 10 kilowatts. Do you reject (or fail to reject) the claim? Show your calculations.
Answer: Note that ̅ = 776 is the point estimate of µ, and Se( ̅ ) = 10 is the standard error of ̅ . Here,
we need to find the 95% confidence interval estimate of µ as follows:
Interpretation: We are 95% confident that this interval estimate contains the population
mean (µ). However, there is a 5% chance that this conclusion is incorrect.
Conclusion: Since the claimed value of 800 kilowatts is not in this interval estimate, we
reject the claim (based on the sample information).
Page 4
Problem 4 (Unit 2). Let the variable X be the number of hours that AUS students study per weekend.
For a random sample of eight students, X = {26, 14, 6, 22, 10, 8, 20, 18}.
Answer: A random sampling procedure gives every student an equal chance of being selected. This
makes a random sample a good representative of the population.
1 26
2 14
3 6
4 22
5 10
6 8
7 20
=8 18 .
∑ = 124
∑
The mean value is, 15.5 hours
Since = 8 is even, the median is the average of the middle two data point. This means
that the median grade is 16 hours.
c. Is the distribution of the data skewed or approximately symmetric in this example? Explain.
Answer: The median (16 hours) is similar to the mean value (15.5 hours). Accordingly, we
conclude that the distribution is approximately symmetric.
Page 5
d. Is the mean or median an appropriate measure of central tendency in this example? Explain.
Answer: Since the distribution is approximately symmetric (not skewed), the mean is a better
measure of central tendency in this example. Note that in calculating the mean value, we make
use of every value that X takes.
e. Find and interpret the 25th percentile (Q1), 50th percentile (Q2), and 75th percentile (Q3). Show
your work.
For the 25th percentile (Q1), p = 25, and L25 8 1 2.25th position
Because L25 is not an integer, the 25th percentile is Q1 = 8.5 [= 8 + .25 (10 – 8)].
Thus, the 25th percentile is Q1 = 8.5 hours. This means that 25% of the X-values are below
For the 50th percentile (Q2), p = 50, and L50 8 1 4.5th position
Because L50 is not an integer, the 50th percentile is Q2 = 16 [= 14 + .50 (18 – 14)].
Thus, the 50th percentile (median) is Q2 = 16 hours. This means that 50% of the X-values
Page 6
For the 75th percentile (Q3), p = 75, and L75 8 1 6.75th position
Because L25 is not an integer, the 75th percentile is Q3 = 21.5 [= 20 + .75 (22 – 20)].
Thus, the 75th percentile is Q3 = 21.5 hours. This means that 75% of the X-values are below
Answer: Since the distribution is approximately symmetric (not skewed), the range is a better
measure of variation in this example. The range gives a complete picture of variation for the
entire data.
X = 6, 8, 10, 14, 18, 20, 22, 26
Page 7
Problem 5 (Unit 2). The average rental payment (denoted by X) for a typical one-bedroom
apartment in a large city is estimated to be $700 (that is, ̅ = $700) with a standard deviation S =
$100. In answering the following questions, use Chebyshev’s theorem and the empirical rule
(where appropriate).
a. What fraction of rental payments is between $500 and $900? Note that no information is
given about the shape of the distribution. Show your work.
Answer: Note that the z-value for 500 is 2, and the z-value for 900 is
2.
With no information about the shape of the distribution, we use Chebyshev’s theorem.
According to Chebyshev’s Theorem, at least 75% of rental payments must lie within
2 standard deviations of the mean. In other words, at least 75% of rental payments are
between 500 and 700.
b. How does your answer to part (a) change if the distribution is bell-shaped? Show your work
by drawing a bell-shaped distribution that includes your answer.
Answer: Again, note that the z-value for 500 is 2, and the z-value for 900
is 2.
Since the distribution is bell-shaped, then according to the Empirical Rule, approximately
95% of rental payments will lie within 2 standard deviations of the mean. In other words,
approximately 95% of rental payments are between $500 and $900.
Page 8
c. What fraction of rental payments is between $600 and $800 if the distribution is bell-shaped?
Show your work by drawing a bell-shaped distribution that includes your answer.
Answer: Note that the z-value for 600 is 1, and the z-value for 800 is
1.
Since the distribution is bell-shaped, then according to the Empirical Rule, approximately
68% of rental payments will lie within 1 standard deviation of the mean. In other words,
approximately 68% of rental payments are between $600 and $800.
d. What fraction of rental payments is above $600 if the distribution is bell-shaped? Show your
work by drawing a bell-shaped distribution that includes your answer.
Answer: As shown below, approximately 68% of rental payments are between $600 and
$800. This means that approximately 16% of rental payments are below $600, and
approximately 84% of rental payments are above $600.
Page 9
Problem 6 (Unit 3). Studies indicate that there is a strong association between a country’s economic
growth rate (X) and unemployment rate (Y). For a random sample of 50 countries in 2022, = -.85
with a standard error of .04. A critic claims that the true correlation coefficient between Y and X is
-.90. Now answer the following questions.
a. What kind of descriptive measure are we dealing with here (mean, variance, proportion, or
correlation coefficient)?
Answer: “-.90” is the claimed value of the population correlation coefficient (ρ)
Answer: “-.85” is the point estimate of the population correlation coefficient (ρ)
f. What does = -.85 reveal about the relationship between X and Y? Clearly explain.
Answer: “-.85” is negative and close to 1, indicating that there is a linear strong negative
association between a country’s economic growth rate and unemployment rate.
Page 10
g. Use the sample estimate to test the validity of the claim. Do you reject (or fail to reject) the
claim? Show your calculations.
Answer: Note that rxy = -.85 is the point estimate of ρ, and Se(rxy) = .04 is the standard error of rxy.
Point estimate (rxy) - 2 Se(rxy) < ρ < point estimate (rxy) + 2 Se(rxy)
-.85 - 2 × .04 < µ < -.85 + 2 × .04 or -.93 < ρ < -.77
Interpretation: We are 95% confident that this interval estimate contains the population
correlation coefficient (ρ). However, there is a 5% chance that this conclusion is incorrect.
Conclusion: Since the claimed value of -.90 is in this interval estimate, we fail to reject the
claim (based on the sample information).
Page 11
Problem 7 (Units 1, 2 & 3). Ali produces and sales laptops. The following table includes Ali’s
advertising expenditure (X in hundreds of $) and Ali’s sales of laptops (Y, in thousands of $) for 2016-
2022.
b. Using time plots, display the behavior of Ali’s advertising expenditure and sales over the
2016-2022 period. Comment on your results.
Answer: As shown below, both sales (Y) and adverting expenditure (X) are declining over
the period of 2016-2018, and, then, start increasing from 2019 to 2022. The largest increase
in both Y and X is in 2022.
Page 12
c. Which variable is the dependent variable, and which is the independent variable? Explain.
Answer: Sales (Y) depends on advertising (X). Therefore, Y is the dependent variable, and X
is the independent variable. That is, X changes first, and, then, Y responds by increasing.
d. Using the scatter diagram, investigate the relationship between the two variables. Is the
relationship perfect or imperfect? Is the relationship positive or negative? Explain clearly.
Answer: As shown below, the relationship is imperfect, since the scatter points do not all
fall on one straight line. The straight line passes through the scatter points slopes up,
meaning that the relationship between Y and X is positive.
e. Find the sample mean value of advertising expenditure in addition to the standard deviation.
Show your work.
Answer:
With ∑(xi - ̅ )2 = 14,966 and (n – 1) = 6, the sample variance s x2 = 2,494.33 and the sample
Page 13
f. Find the sample mean value of sales in addition to the standard deviation. Show your work.
Answer:
With ∑(yi - )2 = 342 and (n –1) = 6, the sample variance s y2 = 57 and the sample standard
deviation is = 7.55.
g. Find both the sample covariance and correlation coefficient. Show your work.
Answer:
i xi yi xi - ̅ yi - (xi - ̅ )(yi - ) .
2016 137 32 -10 -1 10
2017 107 29 -40 -4 160
2018 90 24 -57 -9 513
2019 101 25 -46 -8 368
2020 193 37 46 4 184
2021 195 41 48 8 384
2022 206 43 59 10 590 .
∑(xi - ̅ )(yi - ) = 2209
.
The sample correlation coefficient is: .976
. .
h. Using your results in part (g), what can you say about the direction and strength of the
Answer: The correlation coefficient ( .976) is positive and close to one, meaning that
there is a strong positive linear association between advertising expenditure and sales.
Page 14