Professional Documents
Culture Documents
I3-TD1
Descriptive Statistics
Exercise 1. For statements (a)-(h), state whether descriptive or inferential statistics has been
used.
(a) By 2040 at least 3.5 billion people will run short of water (World Future Society).
(b) In a sample of 100 on-the-job fatalities, 90% of the victims were men.
(c) In a survey of 1000 adults, 34% said that they posted notes on social media websites (Source:
AARP Survey).
(d) In a poll of 3036 adults, 32% said that they got a flu shot at a retail clinic (Source: Harris
Interactive Poll).
(e) Allergy therapy makes bees go away (Source: Prevention).
(f) Drinking decaffeinated coffee can raise cholesterol levels by 7% (Source: American Heart Asso-
ciation).
(g) The average stay in a hospital for 2000 patients who had circulatory system problems was 4.7
days.
(h) Experts say that mortgage rates may soon hit bottom (Source: USA TODAY).
Answer :
a Inferential
b Descriptive
c Descriptive
d Descriptive
e Inferential
f Inferential
g Descriptive
h Inferential
Answer :
a Ratio-level
b Ordinal-level
c Interval-level
d Ratio-level
e Ratio-level
f Ratio-level
g Ordinal-level
h Ratio-level
i Ratio-level
a Qualitative
b Quantitative
c Quantitative
d Qualitative
e Quantitative
f Quantitative
g Quantitative
h Quantitative
Answer :
a Discrete
b Continuous
c Discrete
d Continuous
e Continuous
f Discrete
g Continuous
h Continuous
Exercise 5. How People Get Their News The Brunswick Research Organization surveyed 50
randomly selected individuals and asked them the primary way they received the daily news. Their
choices were via newspaper (N), television (T), radio (R), or Internet (I). Construct a categorical
frequency distribution for the data and interpret the results.
N N T T T I R R I T
I N R R I N N I T N
I R T T T T N R R I
R R I N T R T I I T
T I N T T I R N R T
Solution :
There are four types of primary way to receive the daily news N, T, R, and I. These types will be
used as the classes for the distribution.
The procedure for constructing a frequency distribution for categorical data is given below.
• Make a table as shown.
• Tally the data and place the results in column B.
• Count the tallies and place the results in column C.
• Find the percentage of values for each class in column D.
The categorical frequency distribution is obtained as given below:
From the above frequency distribution table, it is clear that 32% of the people got their daily news
via television which is the higher percentage as compared to other primary ways to receive their
daily news.
Exercise 6. College Completions The percentage (rounded to the nearest whole percent) of per-
sons from each state completing 4 years or more of college is listed below. Percentage of persons
completing 4 years of college
23 25 24 34 22 24 27 37 33 24
26 23 38 24 24 17 28 23 30 25
30 22 33 24 28 36 24 19 25 31
34 31 27 24 29 28 21 25 26 15
26 22 27 21 25 28 24 21 25 26
(a) Organize the data into a grouped frequency distribution with 5 classes.
(b) Find the relative frequency.
(c) Construct a histogram, frequency polygon, and ogive.
Solution : We have MIN = 15, MAX = 38, K = 6, I = 4
a. Organize the data into a grouped frequency distribution with 6 classes.
lower limit upper limit lower boundary upper boundary Midpoint frequency
15 18 14.5 18.5 16.5 2
19 22 18.5 22.5 20.5 7
23 26 22.5 26.5 24.5 22
27 30 26.5 30.5 28.5 10
31 34 30.5 34.5 32.5 6
35 38 34.5 38.5 36.5 3
50
(b) Find the relative frequency.
cumulative frequency Class boundary Class Cumulative Frequency
2 14.5 − 18.5 14.5 0
9 18.5 − 22.5 18.5 2
31 22.5 − 26.5 22.5 9
41 26.5 − 30.5 26.5 31
47 30.5 − 34.5 30.5 41
50 34.5 − 38.5 34.5 47
50
(c) Construct a histogram, frequency polygon, and ogive.
Exercise 7. Ages of the Vice Presidents at the Time of Their Death The ages of the Vice Presidents
of the United States at the time of their death are listed below.
90 83 80 73 70 51 68 79 70 71
72 74 67 54 81 66 62 63 68 57
66 96 78 55 60 66 57 71 60 85
76 98 77 88 78 81 64 66 77 93 70
lower limit upper limit lower boundary upper boundary Midpoint frequency
51 58 50.5 58.5 54.5 5
59 66 58.5 66.5 62.5 9
67 74 66.5 74.5 70.5 11
75 82 74.5 82.5 78.5 9
83 90 82.5 90.5 86.5 4
91 98 90.5 98.5 94.5 3
41
Exercise 8. Activities While Driving A survey of 1200 drivers showed the percentage of respondents
who did the following while driving. Construct a vertical bar graph and a horizontal bar graph for
the data.
Solution :
Construct a vertical bar graph and a horizontal bar graph for the data.
Exercise 9. Calories of Nuts The data show the number of calories per ounce in selected types of
nuts. Construct vertical and horizontal bar graphs for the data.
Types Calories
Peanuts 160
Almonds 170
Macadamia 200
Pecans 190
Cashews 160
Solution :
selected types of nuts. Construct vertical and horizontal bar graphs for the data.
Exercise 10. Space Launches The data show the number of U.S. space launches for the 10-year
periods from 1960 to 2009. Construct a time series graph for the data and analyze the graph.
Solution :
We have,
Year Launches
60 − 69 614
70 − 79 247
80 − 89 199
90 − 99 300
100 − 109 206
Construct a time series graph for the data and analyze the graph.
The data show the number of us that space launches 60 to 89 the graph is decrease And 89 to 99
the graph is increase And then 99 to 109 is decrease again.
Exercise 11. High School Dropout Rate The data show the high school dropout rate for students
for the years 2003 to 2009 . Construct a time series graph and analyze the graph.
Solution :
We have,
Year Percent
2003 9.9
2004 10.3
2005 9.4
2006 9.3
2007 8.7
2008 8
2009 8.1
The data showed about the number of us that High School Dropout rate increase from 2003 into
2004 and then it is decrease from 2004 to 2008 and increase again in 2008 to 2009.
Exercise 12. Spending of College Freshmen The average amounts spent by college freshmen for
school items are shown. Construct a pie graph for the data.
Electronics/computers $728
Dorm items $344
Clothing $141
Shoes $72
Solution :
The total amount is 1285$. For construct the pie graph you need to find the percentages of each
data.
(728 × 100)
• Some note:
1285
Exercise 13. Career Changes A survey asked if people would like to spend the rest of their careers
with their present employers. The results are shown. Construct a pie graph for the data and
analyze the results.
Answer Number of people
Yes 660
No 260
Undecided 80
Solution :
We have
Answer Number of people
Yes 660
No 260
Undecided 80
Construct a pie graph for the data and analyze the results.
The data show the number of us that Career change A we saw this graph show that for answer yes
it have 80% and 26% is said no and then we have 8% is undecided.
Exercise 14. Peyton Manning’s Colts Career Peyton Manning played for the Indianapolis Colts
for 14 years. (He did not play in 2011.) The data show the number of touch-downs he scored for
the years 1998-2010. Construct a dotplot for the data and comment on the graph.
26 33 27 49 31 27 33
26 26 29 28 31 33
Solution :
Construct a dotplot for the data and comment on the graph.
The graph shows that the maximum score in number 26 and 33 his scored 3 score. And the
minimum score in number 49 and 29 his scored 1 score.
Exercise 15. Songs on CDs The data show the number of songs on each of 40CDs from the
author’s collection. Construct a dotplot for the data and comment on the graph.
10 14 18 11
11 15 16 10
10 17 10 15
22 9 14 12
18 12 12 15
21 22 20 15
10 19 20 21
17 9 13 15
11 12 12 9
14 20 12 10
Solution :
Construct a dotplot for the data and comment on the graph.
The graph shows that most CDs have 10 songs, 16 and 22 songs. The lowest number of CDs have
17 songs.
Exercise 16. The traffic situation in X-City is getting worse, and it is high time a solution was
offered. The company hired to work on the project took a survey of the estimated amount of vehicles
that move on the road daily and for various intervals. The result of this survey is illustrated in the
table below.
Time Cars Buses Bikes
1 − 2pm 37 45 42
2 − 3pm 44 34 26
3 − 4pm 23 39 27
4 − 5pm 29 41 48
Construct a multiple line graph to visualize the data. Hence, determine the vehicle with the highest
frequency and that with the lowest frequency.
Solution :
determine the vehicle with the highest frequency and that with the lowest frequency.
Vertical with highest frequency is Bikes and with lowest frequency is Cars.
Exercise 17. Draw a multiple bar graph for the following data which represented agricultural
production for the priod from 2010-2013.
Solution :
Draw a multiple bar graph for the following data which represented agricultural production for the
priod from 2010-2013.
Exercise 18. The heights (in cm ) of a sample of the students in a class are shown:
50 52 70 72 65 52 60
75 51 64 65 55 67 70
Find the mean, mode, median, inter quartile range, midrange, variance, and standard deviation for
the data.
Solution :
Find the mean, mode, median, inter quartile range, midrange, variance, and standard deviation for
the data.
P X
x 868
• Mean: x̄ = , n = 14, x = 868 (sum of all data) → x̄ = = 62
n 14
• Mode = 52, 65, 70 (appear most frequency)
X7 + X8 64 + 65
• Median: M D = = = 64.5
2 2
• Inter quartile range:
n×p 14 × 25 X4 + X5 52 + 55
• for p = 25 ⇒ C = = = 3.5 ⇒ Q1 = = = 53.5
100 100 2 2
n×p 14 × 75 X11 + X12 70 + 70
• for p = 75 ⇒ C = = = 10.5 ⇒ Q3 = = = 70
100 100 2 2
⇒ IQR = Q3 − Q1 = 70 − 535 = 16.5
lowest values + highest value 50 + 75
• Midrange: M R = = = 62.5
2 2
P
2 (X − X̄)2 962
• Variance: s = = = 74 (Check Excel)
n−1 13
√ √
• Standard deviation: s = s2 = 74 = 8.60233
Column1
Exercise 19. Households of Four Television Networks A survey showed the number of viewers and
number of households of four television networks. Find the average number of viewers, using the
weighted mean.
Households 1.4 0.8 0.3 1.6
Viewers (in millions) 1.6 0.8 0.4 1.8
Solution :
Find the average number of viewers, using the weighted mean.
Averages of number viewers:
P
wx ((1.4 × 1.6)) + ((0.8) × (0.8)) + ((0.3) × (0.4)) + ((1.6) × (1.8))
X̄ = P = 1.43
w 1.4 + 0.8 + 0.3 + 1.6
Exercise 20. Magazines in Bookstores A survey of bookstores showed that the average number
of magazines carried is 56 , with a standard deviation of 12 . The same survey showed that the
average length of time each store had been in business was 6 years, with a standard deviation of
2.5 years. Which is more variable, the number of magazines or the number of years?
Solution :
Here the average number of magazines carried is x = 56, then the standard deviation is 12 Here
the average length of time is x = 6, then the standard deviation s is 2.5
CVar x s Percentage
Magazines 56 12 21.43%
Time 6 2.5 41.67%
Therefore, the coefficient of variation time is 41.66% from the above information the data of the
time in business is more variable, because its coefficient of variation is higher.
Exercise 21. Average Earnings of Workers The average earnings of year-round full-time workers
25−34 years old with a bachelor’s degree or higher were $58, 500 in 2003 . If the standard deviation
is $11, 200, what can you say about the percentage of these workers who earn.
(a) Between $47, 300 and $69, 700 ?
(b) More than $80, 900 ?
(c) How likely is it that someone earns more than $100, 000 ?
Solution :
(a) Between $47, 300 and $69, 700 ?
We have X̄ = 58500$, S = 11200$
µ − ks = 47300 (1)
Then
µ + ks = 49700 (2)
µ − 47300 58500 − 47300
(1): k = = =1
5 11200
Thus, chebyshev’s is not app.
(b) More than $80, 900 ?
• Step 1
we need to find p(x > 80900)
x0 − µ 809000 − 58500
by z-score, zo = = =2
6 11200
⇒ p(z > 2) = 1 − p(z < 2)
= 1 − p(z < 2)
= 1 − 0.9772
⇒ p(x > 80900) = 0.0228
• Step 2
⇒ 80900 = 58500 + 11200k ⇒ k = 2
1
⇒ p(|80900 − µ| < 25) ≥ 1 − 2
2
⇒ P (x < µ − 28) + P (x > 80900) ≤ 1 − 0.75
⇒ P (x > 80900) ≤ 25%
(c) How likely is it that someone earns more than $100, 000 ?
Exercise 22. Costs to Train Employees For a certain type of job, it costs a company an average of
$231 to train an employee to perform the task. The standard deviation is $5. Find the minimum
percentage of data values that will fall in the range of $219 to $243. Use Chebyshev’s theorem.
Solution :
• Step 1
µ − kσ = 219 231 − 5k = 219 × (−1)
⇔
µ + kσ = 243 231 + 5k = 243
1 1
By cheby shev’s theorem: p(|x − µ| < kσ) ≥ 1 − 2 = 1 − = 0.83 = 83%
k (2.4)2
• Step 2
We have, mean = 231$
Standard deviation 5$
1
Formular 1 − 2
k
Firstly, we need to find out how many standard deviations 219$ and 243$ are from the mean of
231$ we subtract to find which, if each, of the two bounds, 219$ and 234$, is closer to the mean.
Therefore, approximately 95 % of students can complete the exam is staying between 26-62 minutes.
Exercise 24. Exam Grades Which of these exam grades has a better relative position?
(a) A grade of 82 on a test with x̄ = 85 and s = 6.
(b) A grade of 56 on a test with x̄ = 60 and s = 5.
Solution :
(a) A grade of 82 on a test with x̄ = 85 and s = 6.
82 − 85
⇒ z - score for grade of 82 is z1 = = −0.5
6
(b) A grade of 56 on a test with x̄ = 60 and s = 5.
56 − 60
⇒ z - score for grade of 56 is z2 = = −0.8 In conclusion, Z1 > Z2
5
Hence, a grade of 82 has better relative position than a grade of 56 on a test.
Outlier Outlier
X3 + X4 511 + 514
Q2 = = = 512.5
2 2
Therefore, 400 is outliers.
Checking the data set for any data value is smaller than Q1 − 1.5(IQR) or larger than Q3 +1.5(IQR).
Outlier Outlier
Check the data set for any data value is smaller than Q1 − 1.5(IQR) or larger than Q3 + 1.5(IQR).
Outlier Outlier
Check the data set for any data value is smaller than Q1 − 1.5(IQR) or larger than Q3 + 1.5(IQR).
Outlier Outlier
Exercise 27. The following sample data are the midterm examination test scores for 30 students:
55 60 91 85 60 70 89 99 59 67
72 82 60 68 57 74 64 70 68 91
89 90 83 40 79 85 71 80 76 81
a. Find the mean, mode, median, variance, standard deviation, Q1 , and Q3 of the data.
b. Construct a frequency table with 5 classes.
c. Using the grouped data formula, find the mean, mode, median, variance, standard deviation,
Q1 , and Q3 for the table in part (b) and compare it to the results in part (a).
d. Construct a histogram and comment on the shape of the distribution.
e. Find the percentile values of 55,60 , and 74 .
Solution :
a. Find the mean, mode, median, variance, standard deviation, Q1 , and Q3 of the data.
We have: MAX = 99, MIN = 40, I = 12
P X
x 2215
• Mean X̄ = , x = 2215, n = 30 ⇒ X̄ = = 73.83
n 30
• Mode = 60 (Appear 3 frequency which is the most frequency data value)
(X15 + X16 ) 72 + 74
• MD = = = 73
2 2
n×p 30 + 25
• Q1 = L25 = C = = =8
100 100
X8 + Q9 64 + 67
We found Q1 = 8. So, We have: Q1 = = = 65.5 ∼ 65
2 2
n×p 30 + 75
• Q3 = L75 = C = = = 23
100 100
X23 + X24 85 + 89
We found Q3 = 23, So we have: Q3 = = = 85
2 2
P
2 (x − x̄)2 5294.17
• Variance : s = = = 182.56
n−1 30 − 1
√ p
• Standard deviation: s = s2 = 182.56 = 13.51
Column1
Mean 73.83
Standard Error 2.46
Median 73
Mode 60
Standard Deviation 13.51
Variance 182.55
b. Construct a frequency table with 5 classes.
Recall: We have MAX = 99, MIN = 40, I = 12, K = 5
lower limit upper limit lower boundary upper boundary Midpoint frequency
40 51 39.5 51.5 45.5 1
52 63 51.5 63.5 57.5 6
64 75 63.5 75.5 69.5 9
76 87 75.5 87.5 81.5 8
88 99 87.5 99.5 93.5 6
30
2
Class limits Frequency (f) Midpoint f.Xm f.Xm
40 − 51 1 45.5 45.5 2070.25
52 − 63 6 57.5 345 19837.5
64 − 75 9 69.5 625.5 43472.25
76 − 87 8 81.5 652 53138
88 − 99 6 93.5 561 52453.5
2229 170971.5
P
f · xm 2229
• Mean X̄ = = = 74.3
n 30
• Mode = modal class = the class with the largest frequency.
• The modal class is 64− 75 (class limit)or 63.5 − 75.5 (class boundary).
w
• Median: M D = Lm + (0.5n − cf )
f
• Lm = 63.5, w = 12, f = 9, n = 30, cf = 7
12
• M D = 63.5 + (0.5 × 30 − 7) = 74.16
9
P P
2 n ( f · Xm 2)−( f · Xm )2 30(170971.5) − (2229)2 160704
• Variance: s = = = = 184.71
n(n − 1) 30(30 − 1) 870
√ p
• Standard deviation: s = s2 = 184.71 = 13.59
n×p 30 × 25
• For p = 25 → C = = = 7.5 ∼ 8
100 100
we can shows: Q1 = 65.5 and Q3 = 85, So from grouped data and data on ( a we got Q1 and Q3
are similar.
Therefore, from grouped data we got: Mean X̄ = 74.3, Mode = 63.5, M D = 74.16, S 2 = 184.71, S =
13.59, Q1 = 65.5, Q3 = 85
Comment: The shape of the distribution can be described as bimodal. There are 2 classes boundary
that occurred at the same frequency.
e. Find the percentile values of 55,60 , and 74 .
6.3 2.9 4.5 1.1 1.8 4.0 1.2 3.1 2.0 4.0
7.0 2.8 4.3 5.3 2.9 8.3 4.4 2.8 3.1 5.6
4.5 4.5 5.7 0.5 6.2 3.7 0.9 2.4 3.0 3.5
(a) Find the mean, mode, median, variance, standard deviation, Q1 , Q3 , and 90 th percentile.
(b) Construct a frequency table with 5 classes.
(c) Using the grouped data formula, find the mean, mode, median, variance, standard deviation,
Q1 , Q3 and 90 th percentile for the frequency table constructed in part (b) and compare it to the
results in part (a).
(d) Construct a histogram, and comment on the shape of the data.
Solution :
(a) Find the mean, mode, median, variance, standard deviation, Q1 , Q3 , and 90 th percentile.
P
x
• Mean X̄ = = 3.74
n
• Mode = 4.5 (it appears 3 frequency which is the most frequency data values)
X15 + X16 3.5 + 3.7
• Median = = = 3.6
2 2
Σ X − X)2
2
• Variance =
n−1
we have sample X = 3.74, n = 30,
101.55
we got s2 = = 3.50
29
√
• Standard deviation, s = s2 = 1.87
• Find Q1 and Q3
n×p 30 × 25 X8 + Q9 2.8 + 2.8
• For p = 25 : Q1 = = = 7.5 ⇒ Q1 = = = 2.8
100 100 2 2
n×p 30 × 75 X23 + Q24 4.5 + 5.3
• For p = 75 Q3 = = = 22.5 ⇒ Q3 = = = 4.9
100 100 2 2
Therefore, Q1 = 2.8, Q3 = 4.9
• Find the value corresponding to 90th percentile.
n×p 30 × 90
• For p = 90, then c = = = 27
100 100
6.2 + 6.3
Hence, the data value correspending to 90th percentile is = 6.25
2
(b) Construct a frequency table with 5 classes.
lower limit upper limit lower boundary upper boundary Midpoint frequency
0.5 2.05 0 2.55 1.275 6
2.06 3.61 1.56 4.11 2.835 9
3.62 5.17 3.12 5.67 4.395 8
5.18 6.73 4.68 7.23 5.955 5
6.74 8.29 6.24 8.79 7.515 1
29
(c) Using the grouped data formula, find the mean, mode, median, variance, standard deviation,
Q1 , Q3 and 90th percentile for the frequency table constructed in part (b) and compare it to the
Exercise 29. In recent years, due to low interest rates, many homeowners refinanced their home
mortgages. Linda Lahey is a mortgage officer at Down River Federal Savings and Loan. Below is
the amount refinanced for 20 loans she processed last week. The data are reported in thousands of
dollars and arranged from smallest to largest.
59.2 59.5 61.6 65.5 66.6 72.9 74.8 77.3 79.2 83.7
85.6 85.8 86.6 87.0 87.1 90.2 93.3 98.6 100.2 100.7
n×p 20 × 26
C= = = 5.2
100 100
n×p 20 × 83
C= = = 16.6
100 100
Thus, the value corresponding to the 83rd percentile is L83 = x17 = 93.3
c. Draw a box plot of the data and comment on the shape of the distribution.
Exercise 30. Hours Worked The data shown here represent the number of hours that 12 part-
time employees at a toy store worked during the weeks before and after Christmas. Construct two
boxplots and compare the distributions.
Before 38 16 18 24 12 30 35 32 31 30 24 35
After 26 15 12 18 24 32 14 18 16 18 22 12
Solution :
Construct two boxplots and compare the distributions.
Before 38 16 18 24 12 30 35 32 31 30 24 35
After 26 15 12 18 24 32 14 18 16 18 22 12
Exercise 31. Many times in statistics it is necessary to see if a set of data values is approximately
normally distributed. There are special techniques that can be used. One technique is to draw a
histogram for the data and see if it is approximately bell-shaped. (Note: It does not have to be
exactly symmetric to be bell-shaped.) The numbers of branches of the 50 top libraries are shown.
67 84 80 77 97 59 62 37 33 42
36 54 18 12 19 33 49 24 25 22
24 29 9 21 21 24 31 17 15 21
13 19 19 22 22 30 41 22 18 20
26 33 14 14 16 22 26 10 16 24
2k ≥ 50
2k ≥ 26 ⇒ k = 6
max − min 97 − 9
we have, M AX = 97, M IN = 9, K = 6, i = = = 15
k 6
lower limit upper limit lower boundary upper boundary Midpoint frequency
9 23 8.5 23.5 16 24
24 38 23.5 38.5 31 15
39 53 38.5 53.5 46 3
54 68 53.5 68.5 61 4
69 83 68.5 83.5 76 2
84 98 83.5 98.5 91 2
50
X̄ = 31.5
P
2 (X − X̄)2
• Variance : s = = 424.36
n−1
√
• We got, Standard Deviation: s = S 2 = 20.6
6. What percent of the data values fall within 1 standard deviation of the mean?
According to the 50 data, there are 46 data which is betweenn −10.02to72.62. Therefore, the
percent of data values which falls within 2 standard deviation is
# posible data 46
P (−10.02 < X < 72.62) = × 100 = × 100 = 92%
# all data 50
8. What percent of the data values fall within 3 standard deviations of the mean?
According to the 50 data, there are 49 data which is between - 30.68 to 93.28. Therefore, the
percent of data values which falls within 1 standard deviation is
# posible data 49
P (−30.68 < X < 93.68) = × 100 = × 100 = 98%
# all data 50
9. Does your answer help support the conclusion you reached in question 4? Explain.
The answers from 6, 7, 8 does not support the conclusion from number distribution. That mean,
this distribution is approximately normal.
I3-TD2
Point Estimation
Exercise 1. Data on pull-off force (pounds) for connectors used in an automobile engine application
are as follows
79.3 75.1 78.2 74.1 73.9 75.0 77.6 77.3 73.8 74.6 75.5 74.0 74.7
75.9 72.9 73.8 74.2 78.1 75.4 76.3 75.3 76.2 74.9 78.0 75.1 76.8
(a) Calculate a point estimate of the mean pull-off force of all connectors in the population. State
which estimator you used and why.
(b) Calculate a point estimate of the pull-off force value that separates the weakest 50% of the
connectors in the population from the strongest 50%.
(c) Calculate point estimates of the population variance and the population standard deviation.
(d) Calculate the standard error of the point estimate found in part (a). Interpret the standard
error.
(e) Calculate a point estimate of the proportion of all connectors in the population whose pull-off
force is less than 73 pounds
Solution :
1 X
26
θ^ = X̄ = xi = 75.61538
26 i=1
than 73pounds
1
Then, p^ = = 0.03846
26
Therefore, p^ = 0.03846 ■
Exercise 2. (a) A random sample of 10 houses in a particular area, each of which is heated
with natural gas, is selected and the amount of gas (therms) used during the month of January is
determined for each house. The resulting observations are
Let µ denote the average gas usage during January by all houses in this area. Compute a point
estimate of µ
(b) Suppose there are 10,000 houses in this area that use natural gas for heating. Let t denote the
total amount of gas used by all of these houses during January. Estimate t using the data of
part (a). What estimator did you use in computing your estimate?
(c) Use the data in part (a) to estimate p, the proportion of all houses that used at least 100
therms.
(d) Give a point estimate of the population median usage (the middle value in the population of
all houses) based on the sample of part (a). What estimator did you use?
Solution :
t^ = 10000 × µ
^ = 120700 therms. ■
In this computing, we use the point estimate of the average gas usage during January by 10 houses
in particular area.
(c) Use the data in part (a) to estimate p.
x 8
p= = = 0.8
n 10
Therefore, p = 0.8 ■
(d) Give a point estimate of the population median usage.
x5 + x6
^ = MD =
µ̃ = 120
2
Therefore, µ̃ = 120 ■
1 X 2
n
S2 = X − nX̄ 2
n − 1 i=1 i
and compute E (S 2 )
Solution :
1 X
n 2
Show that S 2 = Xi − X̄ is an unbiased estimator of σ 2
n − 1 i=1
1 X 1 X 2
n 2 n
2
We have S = Xi − X̄ = X − nX̄ 2
n − 1 i=1 n − 1 i=1 i
1 X 2
n
E (S 2 ) = E X − nX̄ 2
n − 1 i=1 i
2
1 X X
n n
1
E (S 2 ) = E (Xi )2 − E Xi
n − 1 i=1 n(n − 1) i=1
2
X X
n n
We have E (Xi2 ) = σ 2 + µ2 and E Xi = nσ 2 + E 2 Xi
i=1 i=1
2
X
n
we obtained E Xi = nσ 2 + (nµ)2
i=1
1
Thus, E (S 2 ) = (nσ 2 + nµ2 − σ 2 − nµ2 ) = σ 2
n−1
Therefore, S 2 is an unbiased estimator of σ 2 ■
Solution :
X
(a) Show that p^ = is an unbiased estimator of p
n
Since X ∼ Bin(n, p)
X np
Then, E(^
p) = E = =p
n n
Therefore, p^ is an unbiased estimator of p ■.
q
(b) Show that the standard error of p^ is p(1 − p)/n
Let σp^ be a standard error.
r
q 1 q
so, σp^ = V (^
p) = × np(1 − p) = p(1 − p)/n
n2
q
Therefore, The standard error of p^ is p(1 − p)/n ■
Since the population standard deviation is rarely known,the standard error of proportion is usually
estimated as the sample standard deviation divided by the square root of its sample size.
Exercise 5. Let X1 , X2 , . . . , Xn be a random sample drawn from a distribution with mean µ and
X
n
variance σ 2 and let a1 , . . . , an be real numbers such that ai = 1.
i=1
X
n
^=
Define X ai Xi
i=1
Solution :
^ is an unbiased estimator of µ.
(a) Show that X
X
n
^=
We have X ai Xi
i=1
X X
n n
^ =E
Then, E(X) ai Xi = ai E (Xi )
i=1 i=1
X
n
Since X1 , . . . , Xn ∼ N (µ; σ 2 ) E(X)
^ =µ ai = µ
i=1
^ is an unbiased estimator of µ
Therefore, X ■
^
(b) Show that V (X̄) ≤ V (X)
X X
n n
^ = σ2 1
We have V (X) ai 2 and V (X̄) = V Xi = σ 2 /n
i=1
n2 i=1
By Cauchy-Schwarz inequality:
(a1 b1 + . . . + an bn )2 ≤ (a1 2 + . . . + a2n ) (b1 2 + . . . + bn 2 ) for ai , bi are positive for all i ∈ N
1
Take b1 = . . . = bn = 1, then (a21 + . . . + a2n ) ≥
n
Therefore, V (X̄) ≤ V (X) ■^
Exercise 6. Let X1 , X2 , . . . , Xn be a random sample from a distribution with unknown mean
X1 + 2X2 + . . . + nXn
−∞ < µ < +∞, and unknown variance σ 2 > 0. Show that the statistic X̄ and Y = n(n+1)
2
are both unbiased estimators of µ. Further, show that V (X̄) < V (Y )
Solution :
Exercise 7. Using a long rod that has length µ, you are going to lay out a square plot in which the
length of each side is µ. Thus the area of the plot will be µ2 . However, you do not know the value
of µ, so you decide to make n independent measurements X1 , X2 , . . . , Xn of the length. Assume
that each Xi has mean µ (unbiased measurements) and variance σ 2 .
(a) Show that X̄ 2 is not an unbiased estimator for µ2 . [Hint: For any rv Y, E (Y 2 ) = V (Y )+[E(Y )]2 .
Apply this with Y = X̄.]
(b) For what value of k is the estimator X̄ 2 −kS 2 unbiased for µ2 ? [Hint: Compute E X̄ 2 − kS 2 .]
Solution :
Thus,
E X̄ 2 = V (X̄) + E 2 X̄
Then,
σ2
E X̄ 2 = + µ2 ̸= µ2
n
Therefore, X̄ 2 is not an unbiased estimator for µ2 ■
(b) For what value k is the estimator X̄ 2 − kS 2 unbiased for µ2 .
σ2
We have E X̄ 2 − kS 2 = E X̄ 2 − kE (S 2 ) = + µ2 − kσ 2 = µ2 (It is unbiased)
n
1
Then, we obtained k =
n
1
Therefore, k = ■
n
Exercise 8. Let X1 , X2 , . . . , Xn be uniformly distributed on the interval [0, θ]. Recall that the
maximum likelihood estimator of θ is θ^ = max (Xi ).
Use this result to show that the maximum likelihood estimator for θ is biased.
(e) We have two unbiased estimators for θ : the moment estimator θ^1 = 2X̄ and θ^2 = [(n +
1)/n] max (Xi ), where
observation in a random sample of size n. It can be
max (Xi ) is the largest
shown that V θ^1 = θ2 /(3n) and that V θ^2 = θ2 /[n(n + 2)] . Show that if n > 1, θ^2 is a better
estimator than θ^1 . In what sense is it a better estimator of θ ?
Solution :
^ = E(θ)
^ −θ = n θ
B(θ) θ−θ =−
n+1 n+1
y n−1
so its pdf is given by: fY (y) = n for 0 ≤ y ≤ θ and zero elsewhere.
θn
Use this result to show that the maximum likelihood estimator for θ is biased.
Zθ n
^ ny n
We have E(θ) = n
dy = θ
0 θ n+1
Therefore, It is biased estimator ■.
n+1
(e) Given two unbiased estimators for θ, θ^1 = 2X̄, θ^2 = max (Xi )
n
θ2 θ2
And we have V θ^1 = and V θ^2 =
3n n(n + 1)
For n ≥ 1 we have V θ^1 ≥ V θ^2
Solution :
^ = X̄
(a) Show that the maximum likelihood estimator for λ is λ
We have X1 , . . . , Xn ∼ Poi(λ)
e λ λ xi
Then p (xi ) = for xi ≥ 0
xi !
P
e−nλ λ ni=1 xi
since, likelihood function is L(x; θ) =
x1 !x2 ! . . . xn !
Pn !
λ i=1 xi
So, ln L(x; θ) = −nλ + ln
x1 !x2 ! . . . xn !
∂ X n Pn 1
Then, ln L(x; θ) = −n + xi × λ i=1 xi −1 × Pn
∂λ λ i=1 xi
i=1 Pn
∂ xi
• If, ln L(x; θ) = 0 ⇐⇒ −n + i=1 = 0
∂λ λ
^
Therefore, λM LE = X̄ ■
(b) Find maximum likelihood estimate of λ.
X 40
^= 1
We have λ xi
n i=1
^ = 2.075 flaws / feet
Then, λ
Therefore, The point estimate of λ is 2.075 flaws/feet ■
Solution :
1X
n
(a) Show that maximum likelihood estimator of θ is θ^ = ln (Xi )
n i=1
1 1−θ
We have f (xi ; θ) = x θ , 0 < x < 1 and 0 < θ < ∞
θ
Yn
1 Y 1−θ
n
We have likelihood function; L(x; θ) = f (xi ; θ) = n xi θ
i=1
θ i=1
Then,
1−θ X
n
ln L(x; θ) = −n ln(θ) + xi
θ i=1
1 X
n
∂ n
ln L(x; θ) = − − 2 ln (xi )
∂θ θ θ i=1
1 X
n
∂ n
ln L(x; θ) = 0 ⇐⇒ − 2 ln (xi ) = 0
∂θ θ θ i=1
1X
n
Then, θ = − ln (xi )
n i=1
1 X
n
Therefore, θ^M LE = − ln (xi ) ■
n i=1
^ =θ
(b) Show that E(θ)
1X 1X
n n
E − ln (xi ) = − E (ln Xi )
n i=1 n i=1
Z1 Z
1 1 1−θ
We have E(ln X) = ln xf (x; θ)dx = x θ ln xdx
0 θ 0
1
By changing variable let u = ln x so du = dx
x
Z
1 0 u
Then E(u) = ue θ du = −θ (Integrating by part)
θ −∞
1X 1X
n n
Therefore, E − ln (xi ) = − E (ln Xi ) = θ
n i=1 n i=1
^ =θ
Therefore, E(θ) ■
Exercise 11. Let X1 , X2 , . . . , Xn be a random sample of size n from the exponential distribution
whose pdf is f (x; θ) = (1/θ)e−x/θ , 0 < x < ∞, 0 < θ < ∞.
(a) Show that X̄ is an unbiased estimator of θ.
(b) Show that the variance of X̄ is θ2 /n.
(c) What is a good estimate of θ if a random sample of size 5 yielded the sample values 3.5, 8.1, 0.9, 4.4,
and 0.5 ?
Solution :
Exercise 12. A diagnostic test for a certain disense is applied to n individuals known to not have
the disease. Let X = the number among the n test results that are positive (indicating presence
of the disease, so X is the number of false positives) and p = the probability that a disease-free
individual’s test result is positive (i.e., p is the true proportion of test results from disease-free
individuals that are positive). Assume that only X is available rather than the actual sequence of
test results.
(a) Derive the maximum likelihood estimator of p. If n = 20 and x = 3, what is the estimate?
(b) Is the estimator of part (a) unbiased?
(c) If n = 20 and x = 3, what is the MLE of the probability (1 − p)5 that none of the next five
tests done on disease free individuals are positive?
Solution :
Let X = the number among the n test results that are positive.
p = the probability that a disease-free individual’s test result is positive.
(a) Derive the maximum likelihood estimator of p,If n = 20 and x = 3 what is the estimate?
We have X ∼ Bin(n, p)
So, P (X = x) = C(n, x)px (1 − p)n−x
we have a likelihood function, L(x; p) = C(n, x)px (1 − p)n−x
Then,
ln L(x; θ) = ln (C(n, x)px (1 − p)x )
= ln C(n; x) + ln (px ) + ln(1 − p)n−x
∂ x n−x
So, ln L(x; θ) = −
∂p p 1−p
∂ x n−x x
• If ln L(x; θ) = 0 ⇐⇒ − = 0 then p^ =
∂p p 1−p n
x
Therefore, p^ = ■
n
3
using the given data we get p^ = = 0.15
20
(b) In the estimate of part(a) is unbiased?
x X 1
we have p^ = then, E p^ = = E(X) = p
n n n
Therefore, It is unbiased ■.
(c) what is mle of the probability (1 − p)5
p = 0.15 Thus, (1 − p^)5 = (1 − 0.15)5 = 0.855 = 0.443
For n = 20 and x = 3^
Therefore, (1 − p^)5 = 0.443 ■
Exercise 13. The shear strength of each of ten test spot welds is determined, yielding the following
data (psi):
392 376 401 367 389 362 409 415 358 375
(a) Assuming that shear strength is normally distributed, estimate the true average shear strength
and standard deviation of shear strength using the method of maximum likelihood.
(b) Again assuming a normal distribution, estimate the strength value below which 95% of all
welds will have their strengths. [Hint: What is the 95 th percentile in terms of µ and σ ? Now
use the invariance principle.]
Solution :
(a) Assuming that shear strength is normally distributed, estimate the true average shear strength
and standard deviation of shear strength using the method of maximum likelihood.
Since X ∼ N (µ; σ 2 )
P10
i=1 xi
Then, by previous exercise we get µ
^ = X̄ = = 384.4
10
and for X1 , . . . , Xn ∼ N (µ, σ 2 );
we have Pn 2
n 1(xi −µ)
2 −n −
L(x; σ) = (2πσ ) 2 ×e 2σ 2
1 X
n
n
ln L(x; σ) = × ln (2πσ 2 ) − 2 (xi − µ)2
2 2σ i=1
1 X
n
∂ n 2
L(x; σ) = − × + 3 (xi − µ)2
∂σ 2 σ σ i=1
1 X
n
∂ n
• If , L(x; σ) = 0 ⇐⇒ − + 3 (xi − µ)2 = 0
∂σ σ σ i=1
v
u X
u1 n
Thus, σ
^=t (xi − µ)2
n i=1
v
u X
u1 n
Therefore, we have µ
^ = X̄ = 384.4 and σ
^=t (xi − µ)2 = 3556.4 ■
n i=1
(b) Again assuming a normal distribution, estimate the strength value below which 95% of all
welds will have their strengths. [Hint: What is the 95 th percentile in terms of µ and σ ? Now
use the invariance principle.]
We have P (X ≤ c) = 0.95
Z −µ c−µ
Since, P ≤ = 0.95
σ σ
c−µ
So, ϕ = 0.95
σ
Then, c^ = 1.65^
σ+µ
^ (by invariance principle)
Therefore, estimate of strength is c^ = 6252.46 ■
Exercise 14. At time t = 0, 20 identical components are tested. The lifetime distribution of each is
exponential with parameter λ. The experimenter then leaves the test facility unmonitored. On his
return 24 hours later, the experimenter immediately terminates the test after noticing that y = 15
of the 20 components are still in operation (so 5 have failed). Derive the MLE of λ. [Hint: Let
Y = the number that survive 24 hours. Then Y ∼ Bin(n, p). What is the mle of p ? Now notice
that p = P (Xi ≥ 24), where Xi is exponentially distributed. This relates λ to p, so the former can
be estimated once the latter has been.]
Solution :
Derive mle of λ
Let Y = the number that survive 24 hours. Y ∼ Bin(n; p)
24
and we have p = P (Ti ≥ 24) = e λ , since Y ∼ Bin(n; p)
Then,
p(y) = C(n, y)py (1 − p)n−y
where
ln L(y, p) = ln C(n, y)py (1 − p)n−y
= ln C(n, y) + y ln p + (n − y) ln(1 − p)
• If
∂
ln L(y, p) = 0
∂p
y 15 ^ = 24
Then, p^ = = = 0.75 and λ
n 20 ln 0.75
^ = 24
Therefore, λ ■
ln 0.75
Exercise 15. Let X1 , X2 , . . . , Xn be a random sample from Bin(1, p) (i.e., n Bernoulli trials).
Thus,
Xn
Y = Xi ∼ Bin(n, p)
i=1
Solution :
1X
n
Y 1
E(X̄) = E = E (Xi ) = np × = p ■
n n i=1 n
1 X
n
1 p(1 − p)
V (X̄) = 2 V (Xi ) = 2 × np(1 − p) =
n i=1 n n
p(1 − p)
Therefore, Var(X̄) = ■
n
(c) Show that E[X̄(1 − X̄)/n] = (n − 1) [p(1 − p)/n2 ]
1 1
We have E[X̄(1 − X̄)/n] = E(X̄) − E X̄ 2
n n
p(1 − p) − np2
By previous question, E(X̄) = p and E X̄ 2 = Var(X̄) − E 2 (X̄) =
n
p p2 p(1 − p)
Then, E[X̄(1 − X̄)/n] = − −
n n n2
E[X̄(1 − X̄)/n] = (n − 1)p(1 − p)/n2
Solution :
(a) Find the Fisher information in a single observation using two methods.
• First method.
∂2
I(λ) = −E ln f (X; λ)
∂λ2
eλ λx
f (x; λ) =
x!
∂2 x
Then ln f (x; λ) = −λ + x ln λ − ln x! =⇒ ln f (x; λ) = − 2
∂λ2 λ
1 1
So, I(λ) = 2
E(X) =
λ λ
1
Therefore, I(λ) = ■
λ
• Second method
∂ X 1
I(λ) = V ln f (X; λ) = V −1 + =
∂λ λ λ
1
Therefore, I(λ) = ■
λ
(b) Find the Cramer-Rao lower bound for the variance of an unbiased estimator of λ.
1 λ
The lower bound is = ■
nI(λ) n
(c) Find the mle of λ and show that the mle is an efficient estimator.
By using method of moment,
we have 1st sample moment is E(X) = X̄ (1)
and 1 st population moment is E(X) = λ (2)
^ = X̄
By (1) and (2); we get λ
1 X
n
λ
and we have V (X̄) = 2
V (Xi ) = which is equal to the lower bound of Cramer-Rao inequal-
n i=1 n
ity.
Solution :
n+1^ (n + 1)2 ^
V (θ̃) = V θ = V (θ)
n n2
^ = E θ^2 − E 2 (θ)
V (θ) ^
Zθ
ny n+1 n 2
and E θ^2 = dy = θ
0 θn n+2
n n2
^ =
Thus, V (θ) − θ2
n + 2 (n + 1)2
(n + 1)2 n n2 1
so, V (θ̃) = − θ2 = θ2 ■
n2 n + 2 (n + 1)2 n(n + 2)
(c) Find the Cramer-Rao lower bound for the variance of an unbiased estimator of θ.
∂2
we have, I(θ) = −E ln f (x; θ)
∂θ2
1
Then, f (x; θ) = =⇒ ln f (x; θ) = − ln θ
θ
∂2 1 1
and 2
ln f (x; θ) = 2 so, I(θ) = − 2
∂θ θ θ
θ 2
Therefore, The lower bound is − ■
n
Exercise 18. An estimator θ^ is said to be consistent if for any ϵ > 0, P (|θ^ − θ| ≥ ϵ) → 0 as n → ∞.
That is, θ^ is consistent if, as the sample size gets larger, it is less and less likely that θ^ will be
further than ϵ from the true value of θ. Show that X̄ is a consistent estimator of µ when σ 2 < ∞
by using Chebyshev’s inequality.
Hint: (Chebyshev’s inequality) Let X be a random variable with finite expected value µ and finite
non-zero variance σ 2 . Then for any real number k > 0,
1
P (|X − µ| ≥ kσ) ≤ .
k2
Solution :