You are on page 1of 14

. Answers to Homework Assignment 1 for Units 1, 2, and 3 (30 points) .

Problem 1 (Units 1 & 3). Answer the following questions:

a. What are the two broad branches of statistics? Clearly explain the purpose of each branch.

 Descriptive statistics convert data into information using numerical and graphical methods.

 Inferential statistics: we use sample data in order to test claims made about the population
parameters (which include μ, σ2, P, and ρ).

b. What is the difference between a population and a sample? Explain clearly.

 A population is the set of all elements under study.

 A sample is an observed subset of elements from the population.

c. Briefly explain what each of the following 6 notations represents.

 Answer: (μ, σ2, σ, P, σxy, and ρ) are population parameters, which are unknown. Each
population parameter takes only one (true) value.

μ represents the population mean,


σ2 represents the population variance,
σ represents the population standard deviation,
P represents the population proportion,
σxy represents the population covariance, and
ρ represents the population correlation coefficient.

d. Briefly explain what each of the following 6 notations represents.

^
 Answer: ( , S2, S, P , Sxy, and rxy) are sample statistics. Sample statistics are the point
estimates of the population parameters.

represents the sample mean,


2
S represents the sample variance,
S represents the sample standard deviation,
^
P represents the sample proportion,
Sxy represents the population covariance, and
rxy represents the population correlation coefficient.

Page 1
e. What is the reason that a sample statistic is subject to an error? Explain clearly.

 Answer: A sample statistic (such as ) provides an estimate for a population parameter


(such as μ). A sample statistic is subject to an error, because the sample contains only a
subset of the data from the population.

Problem 2 (Unit 1). Consider the GDP (gross domestic product) of GCC countries in 2022:

Bahrain ($75 billion) Kuwait ($210 billion) Oman ($162 billion)


Qatar ($260 billion) Saudi Arabia ($1,733 billion) UAE ($503 billion)

a. Construct the frequency distribution table for these data. Make sure that the table includes the
frequency, the relative frequency, and the slice of the pie. Show your work.
______________________________________________________________________________
Country Frequency Relative frequency (%) Slice of the pie (◦) .

Bahrain $75 2.6% (= [75 ÷ 2943]×100) 9.36◦ (= .026 × 360)

Kuwait $210 7.1% (= [210 ÷ 2943]×100) 25.56◦ (= .071 × 360)

Oman $162 5.5% (= [162 ÷ 2943]×100) 19.8◦ (= .055 × 360)

Qatar $260 8.8% (= [260 ÷ 2943]×100) 31.68◦ (= .088 × 360)

Saudi Arabia $1,733 58.9% (= [1733 ÷ 2943]×100) 212.04◦ (= .589 × 360)

UAE $503 17.1% (= [503 ÷ 2943]×100) 61.56◦ (= .171 × 360) .

Total $2,943 100% 360◦

b. Draw a bar chart. Show your work.

c. Draw a pie chart. Show your work.

Page 2
d. What information is conveyed by each chart? Clearly explain.

 The bar chart (on the left) provides information about the GDP of each country, measured
in billions of $. With $1,733 billion, Saudi Arabia had the highest GDP in 2022. With $75
billion, the Bahrain had the smallest GDP.

 The pie chart (on the right) provides information about the proportion. Again, with 58.9%
(or with a 212.04◦ slice of the pie), Saudi Arabia had the highest GDP in 2022. With 2.6%
(or with a 9.36◦ slice of the pie), Bahrain had the smallest GDP in 2022.

Problem 3 (Unit 1). Supporters claim that new windmills can generate an average of 800 kilowatts
of power per day. Based on a random sample of 100 windmills, the average power per day is found
to be 776 kilowatts. Now, answer the following questions.

a. What kind of descriptive measure are we dealing with here (mean, variance, or proportion)?

Answer: Mean

b. What is the population in this example?

Answer: All new windmills

c. What is the sample in this example?

d. Answer: 100 windmills randomly selected

e. What is the notation for the population parameter in this example?

Answer: μ (the population mean)

f. What is the notation for the sample statistic in this example?

Answer: (the sample mean)

g. What does “800 kilowatts” represent in this example?

Answer: “800 kilowatts” represents the claimed value of the population mean (μ)

h. What does “776 kilowatts” represent in this example?

Answer: “ ̅ = 776 kilowatts” represents the point estimate of the population mean (μ)

Page 3
i. Test the validity of the claim (made by the supporters), assuming that the standard error of the
sample mean is 10 kilowatts. Do you reject (or fail to reject) the claim? Show your calculations.

Answer: Note that ̅ = 776 is the point estimate of µ, and Se( ̅ ) = 10 is the standard error of ̅ . Here,
we need to find the 95% confidence interval estimate of µ as follows:

Point estimate ( ̅ ) – 2 Se( ̅ ) < µ< point estimate ( ̅ ) + 2 Se( ̅ )

776 – 2 × 10 < µ < 776 + 2 × 10 or 756 < µ < 796

 Interpretation: We are 95% confident that this interval estimate contains the population
mean (µ). However, there is a 5% chance that this conclusion is incorrect.

 Conclusion: Since the claimed value of 800 kilowatts is not in this interval estimate, we
reject the claim (based on the sample information).

Page 4
Problem 4 (Unit 2). Let the variable X be the number of hours that AUS students study per weekend.
For a random sample of eight students, X = {26, 14, 6, 22, 10, 8, 20, 18}.

a. Clearly explain why the sample is taking randomly.

Answer: A random sampling procedure gives every student an equal chance of being selected. This
makes a random sample a good representative of the population.

b. Find the mean and median. Show your work.

Answer: We can present the data in a table as follows:

i = student number = No. of hours per weekend


___________________________________________________________________________________

1 26
2 14
3 6
4 22
5 10
6 8
7 20
=8 18 .
∑ = 124


 The mean value is, 15.5 hours

 To find the median, we arrange the data in an ascending order as follows:

6, 8, 10, 14, 18, 20, 22, 26

 Since = 8 is even, the median is the average of the middle two data point. This means
that the median grade is 16 hours.

c. Is the distribution of the data skewed or approximately symmetric in this example? Explain.

Answer: The median (16 hours) is similar to the mean value (15.5 hours). Accordingly, we
conclude that the distribution is approximately symmetric.

Page 5
d. Is the mean or median an appropriate measure of central tendency in this example? Explain.

Answer: Since the distribution is approximately symmetric (not skewed), the mean is a better
measure of central tendency in this example. Note that in calculating the mean value, we make
use of every value that X takes.

e. Find and interpret the 25th percentile (Q1), 50th percentile (Q2), and 75th percentile (Q3). Show
your work.

Answer: First, we arrange the data in an ascending order,

6, 8, 10, 14, 18, 20, 22, 26

and then use the following formula, 1 .

For the 25th percentile (Q1), p = 25, and L25 8 1 2.25th position

Because L25 is not an integer, the 25th percentile is Q1 = 8.5 [= 8 + .25 (10 – 8)].

Thus, the 25th percentile is Q1 = 8.5 hours. This means that 25% of the X-values are below

8.5 hours and 75% are above 8.5 hours.

For the 50th percentile (Q2), p = 50, and L50 8 1 4.5th position

Because L50 is not an integer, the 50th percentile is Q2 = 16 [= 14 + .50 (18 – 14)].

Thus, the 50th percentile (median) is Q2 = 16 hours. This means that 50% of the X-values

are below 16 hours and 50% are above 16 hours.

Page 6
For the 75th percentile (Q3), p = 75, and L75 8 1 6.75th position

Because L25 is not an integer, the 75th percentile is Q3 = 21.5 [= 20 + .75 (22 – 20)].

Thus, the 75th percentile is Q3 = 21.5 hours. This means that 75% of the X-values are below

21.5 hours and 25% are above 21.5 hours.

f. Find the range and interquartile range. Show your work.

Answer: Range = highest value – lowest value = 26 – 6 = 20 hours

Interquartile range (IQR) = Q3 – Q1 = 21.5 – 8.5 = 13 hours

g. Is the range or interquartile range an appropriate measure of variation in this example?


Explain.

Answer: Since the distribution is approximately symmetric (not skewed), the range is a better
measure of variation in this example. The range gives a complete picture of variation for the
entire data.
X = 6, 8, 10, 14, 18, 20, 22, 26

Page 7
Problem 5 (Unit 2). The average rental payment (denoted by X) for a typical one-bedroom
apartment in a large city is estimated to be $700 (that is, ̅ = $700) with a standard deviation S =
$100. In answering the following questions, use Chebyshev’s theorem and the empirical rule
(where appropriate).

a. What fraction of rental payments is between $500 and $900? Note that no information is
given about the shape of the distribution. Show your work.

Answer: Note that the z-value for 500 is 2, and the z-value for 900 is
2.

With no information about the shape of the distribution, we use Chebyshev’s theorem.

According to Chebyshev’s Theorem, at least 75% of rental payments must lie within
2 standard deviations of the mean. In other words, at least 75% of rental payments are
between 500 and 700.

b. How does your answer to part (a) change if the distribution is bell-shaped? Show your work
by drawing a bell-shaped distribution that includes your answer.

Answer: Again, note that the z-value for 500 is 2, and the z-value for 900
is 2.

Since the distribution is bell-shaped, then according to the Empirical Rule, approximately
95% of rental payments will lie within 2 standard deviations of the mean. In other words,
approximately 95% of rental payments are between $500 and $900.

Page 8
c. What fraction of rental payments is between $600 and $800 if the distribution is bell-shaped?
Show your work by drawing a bell-shaped distribution that includes your answer.

Answer: Note that the z-value for 600 is 1, and the z-value for 800 is
1.

Since the distribution is bell-shaped, then according to the Empirical Rule, approximately
68% of rental payments will lie within 1 standard deviation of the mean. In other words,
approximately 68% of rental payments are between $600 and $800.

d. What fraction of rental payments is above $600 if the distribution is bell-shaped? Show your
work by drawing a bell-shaped distribution that includes your answer.

Answer: As shown below, approximately 68% of rental payments are between $600 and
$800. This means that approximately 16% of rental payments are below $600, and
approximately 84% of rental payments are above $600.

Page 9
Problem 6 (Unit 3). Studies indicate that there is a strong association between a country’s economic
growth rate (X) and unemployment rate (Y). For a random sample of 50 countries in 2022, = -.85
with a standard error of .04. A critic claims that the true correlation coefficient between Y and X is
-.90. Now answer the following questions.

a. What kind of descriptive measure are we dealing with here (mean, variance, proportion, or
correlation coefficient)?

Answer: correlation coefficient

b. What is the notation for the population parameter in this example?

Answer: ρ (the population correlation coefficient)

c. What is the notation for the sample statistic in this example?

Answer: rxy (the sample correlation coefficient)

d. What does -.90 represent in this example? Clearly explain.

Answer: “-.90” is the claimed value of the population correlation coefficient (ρ)

e. What does -.85 represent in this example? Clearly explain.

Answer: “-.85” is the point estimate of the population correlation coefficient (ρ)

f. What does = -.85 reveal about the relationship between X and Y? Clearly explain.

Answer: “-.85” is negative and close to 1, indicating that there is a linear strong negative
association between a country’s economic growth rate and unemployment rate.

Page 10
g. Use the sample estimate to test the validity of the claim. Do you reject (or fail to reject) the
claim? Show your calculations.

Answer: Note that rxy = -.85 is the point estimate of ρ, and Se(rxy) = .04 is the standard error of rxy.

Here, we need to find the 95% confidence interval estimate of ρ as follows:

Point estimate (rxy) - 2 Se(rxy) < ρ < point estimate (rxy) + 2 Se(rxy)

-.85 - 2 × .04 < µ < -.85 + 2 × .04 or -.93 < ρ < -.77

 Interpretation: We are 95% confident that this interval estimate contains the population
correlation coefficient (ρ). However, there is a 5% chance that this conclusion is incorrect.

 Conclusion: Since the claimed value of -.90 is in this interval estimate, we fail to reject the
claim (based on the sample information).

Page 11
Problem 7 (Units 1, 2 & 3). Ali produces and sales laptops. The following table includes Ali’s
advertising expenditure (X in hundreds of $) and Ali’s sales of laptops (Y, in thousands of $) for 2016-
2022.

Year Sales (Y) Advertising expenditure (X) .


2016 137 32
2017 107 29
2018 90 24
2019 101 25
2020 193 37
2021 195 41
2022 206 43 .

a. Are these cross-sectional or time-series data? Explain clearly.

Answer: Time-series data, because Y and X vary over time.

b. Using time plots, display the behavior of Ali’s advertising expenditure and sales over the
2016-2022 period. Comment on your results.

Answer: As shown below, both sales (Y) and adverting expenditure (X) are declining over
the period of 2016-2018, and, then, start increasing from 2019 to 2022. The largest increase
in both Y and X is in 2022.

Page 12
c. Which variable is the dependent variable, and which is the independent variable? Explain.

Answer: Sales (Y) depends on advertising (X). Therefore, Y is the dependent variable, and X
is the independent variable. That is, X changes first, and, then, Y responds by increasing.

d. Using the scatter diagram, investigate the relationship between the two variables. Is the
relationship perfect or imperfect? Is the relationship positive or negative? Explain clearly.

Answer: As shown below, the relationship is imperfect, since the scatter points do not all
fall on one straight line. The straight line passes through the scatter points slopes up,
meaning that the relationship between Y and X is positive.

e. Find the sample mean value of advertising expenditure in addition to the standard deviation.
Show your work.

Answer:

.i xi (xi - ̅ ) (xi - ̅ )2 yi (yi - ) (yi - )2 .


2016 137 -10 100 32 -1 1
2017 107 -40 1600 29 -4 16
2018 90 -57 3249 24 -9 81
2019 101 -46 2116 25 -8 64
2020 193 46 2116 37 4 16
2021 195 48 2304 41 8 64
2022 206 59 3481 43 10 100 .
∑ xi = 1029; ∑(xi - ̅ ) = 0; ∑(xi - ̅ )2 =14966; ∑ yi = 231; ∑(yi - ) = 0; ∑(yi - )2 =342

 With ∑ xi = 1,029 and n = 7, the sample mean ̅ 147.

 With ∑(xi - ̅ )2 = 14,966 and (n – 1) = 6, the sample variance s x2 = 2,494.33 and the sample

standard deviation is = 49.94.

Page 13
f. Find the sample mean value of sales in addition to the standard deviation. Show your work.

Answer:

 With ∑ xi = 231 and n = 7, the sample mean ̅ 33.

 With ∑(yi - )2 = 342 and (n –1) = 6, the sample variance s y2 = 57 and the sample standard

deviation is = 7.55.

g. Find both the sample covariance and correlation coefficient. Show your work.

Answer:

i xi yi xi - ̅ yi - (xi - ̅ )(yi - ) .
2016 137 32 -10 -1 10
2017 107 29 -40 -4 160
2018 90 24 -57 -9 513
2019 101 25 -46 -8 368
2020 193 37 46 4 184
2021 195 41 48 8 384
2022 206 43 59 10 590 .
∑(xi - ̅ )(yi - ) = 2209

 With ∑ (xi - ̅ )(yi - ) = 2,209 and (n – 1) = 6, the sample covariance = 368.17

 The sample covariance 368.17 is positive, indicating that there is a positive

linear association between the two variables.

.
 The sample correlation coefficient is: .976
. .

h. Using your results in part (g), what can you say about the direction and strength of the

association between X and Y? Explain clearly.

Answer: The correlation coefficient ( .976) is positive and close to one, meaning that

there is a strong positive linear association between advertising expenditure and sales.

Page 14

You might also like