You are on page 1of 43

Institute of Technology of Cambodia Statistics ( 2022-2023 )

I3-TD1
Descriptive Statistics

Exercise 1. For statements (a)-(h), state whether descriptive or inferential statistics has been
used.
(a) By 2040 at least 3.5 billion people will run short of water (World Future Society).
(b) In a sample of 100 on-the-job fatalities, 90% of the victims were men.
(c) In a survey of 1000 adults, 34% said that they posted notes on social media websites (Source:
AARP Survey).
(d) In a poll of 3036 adults, 32% said that they got a flu shot at a retail clinic (Source: Harris
Interactive Poll).
(e) Allergy therapy makes bees go away (Source: Prevention).
(f) Drinking decaffeinated coffee can raise cholesterol levels by 7% (Source: American Heart Asso-
ciation).
(g) The average stay in a hospital for 2000 patients who had circulatory system problems was 4.7
days.
(h) Experts say that mortgage rates may soon hit bottom (Source: USA TODAY).
Answer :

a Inferential
b Descriptive
c Descriptive
d Descriptive
e Inferential
f Inferential
g Descriptive
h Inferential

Exercise 2. For statements (a)-(i), classify each as nominal-level, ordinal-level, interval-level, or


ratiolevel measurement
(a) Pages in the 25 best-selling mystery novels.
(b) Rankings of golfers in a tournament.
(c) Temperatures inside 10 pizza ovens.
(d) Weights of selected cell phones.
(e) Salaries of the coaches in the NFL.
(f) Times required to complete a chess game.
(g) Ratings of textbooks (poor, fair, good, excellent).
(h) Number of amps delivered by battery chargers.
(i) Ages of children in a day care center. Categories of magazines in a physician’s office (sports,
women’s, health, men’s, news).

By : Sun Bunra 1 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Answer :

a Ratio-level
b Ordinal-level
c Interval-level
d Ratio-level
e Ratio-level
f Ratio-level
g Ordinal-level
h Ratio-level
i Ratio-level

Exercise 3. For statements (a)-(h), classify each variable as qualitative or quantitative.


(a) Marital status of nurses in a hospital.
(b) Time it takes to run a marathon.
(c) Weights of lobsters in a tank in a restaurant.
(d) Colors of automobiles in a shopping center parking lot.
(e) Ounces of ice cream in a large milkshake.
(f) Capacity of the NFL football stadiums.
(g) Ages of people living in a personal care home.
(h) Different vitamins taken.
Answer :

a Qualitative
b Quantitative
c Quantitative
d Qualitative
e Quantitative
f Quantitative
g Quantitative
h Quantitative

Exercise 4. For statements (a)-(h), classify each variable as discrete or continuous.


(a) Number of pizzas sold by Pizza Express each day.
(b) Relative humidity levels in operating rooms at local hospitals.
(c) Number of bananas in a bunch at several local super-markets.
(d) Lifetimes (in hours) of 15 iPod batteries.
(e) Weights of the backpacks of first-graders on a school bus.
(f) Number of students each day who make appointments with a math tutor at a local college.
(g) Blood pressures of runners in a marathon.
(h) Ages of children in a preschool.

By : Sun Bunra 2 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Answer :
a Discrete
b Continuous
c Discrete
d Continuous
e Continuous
f Discrete
g Continuous
h Continuous

Exercise 5. How People Get Their News The Brunswick Research Organization surveyed 50
randomly selected individuals and asked them the primary way they received the daily news. Their
choices were via newspaper (N), television (T), radio (R), or Internet (I). Construct a categorical
frequency distribution for the data and interpret the results.

N N T T T I R R I T
I N R R I N N I T N
I R T T T T N R R I
R R I N T R T I I T
T I N T T I R N R T

Solution :
There are four types of primary way to receive the daily news N, T, R, and I. These types will be
used as the classes for the distribution.
The procedure for constructing a frequency distribution for categorical data is given below.
• Make a table as shown.
• Tally the data and place the results in column B.
• Count the tallies and place the results in column C.
• Find the percentage of values for each class in column D.
The categorical frequency distribution is obtained as given below:

Classes Frequency Percentage


N 10 20%
T 16 32%
R 12 24%
I 12 24%
Total 50 100%

From the above frequency distribution table, it is clear that 32% of the people got their daily news
via television which is the higher percentage as compared to other primary ways to receive their
daily news.

By : Sun Bunra 3 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 6. College Completions The percentage (rounded to the nearest whole percent) of per-
sons from each state completing 4 years or more of college is listed below. Percentage of persons
completing 4 years of college
23 25 24 34 22 24 27 37 33 24
26 23 38 24 24 17 28 23 30 25
30 22 33 24 28 36 24 19 25 31
34 31 27 24 29 28 21 25 26 15
26 22 27 21 25 28 24 21 25 26
(a) Organize the data into a grouped frequency distribution with 5 classes.
(b) Find the relative frequency.
(c) Construct a histogram, frequency polygon, and ogive.
Solution : We have MIN = 15, MAX = 38, K = 6, I = 4
a. Organize the data into a grouped frequency distribution with 6 classes.
lower limit upper limit lower boundary upper boundary Midpoint frequency
15 18 14.5 18.5 16.5 2
19 22 18.5 22.5 20.5 7
23 26 22.5 26.5 24.5 22
27 30 26.5 30.5 28.5 10
31 34 30.5 34.5 32.5 6
35 38 34.5 38.5 36.5 3
50
(b) Find the relative frequency.
cumulative frequency Class boundary Class Cumulative Frequency
2 14.5 − 18.5 14.5 0
9 18.5 − 22.5 18.5 2
31 22.5 − 26.5 22.5 9
41 26.5 − 30.5 26.5 31
47 30.5 − 34.5 30.5 41
50 34.5 − 38.5 34.5 47
50
(c) Construct a histogram, frequency polygon, and ogive.

By : Sun Bunra 4 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 7. Ages of the Vice Presidents at the Time of Their Death The ages of the Vice Presidents
of the United States at the time of their death are listed below.

90 83 80 73 70 51 68 79 70 71
72 74 67 54 81 66 62 63 68 57
66 96 78 55 60 66 57 71 60 85
76 98 77 88 78 81 64 66 77 93 70

(a) Use the data to construct a frequency distribution with 6 classes.


(b) Find the relative frequency.
(c) Construct a histogram, frequency polygon, and ogive.
Solution : We have MIN = 51, MAX = 98, K = 6, I = 8
(a) Use the data to construct a frequency distribution with 6 classes.

lower limit upper limit lower boundary upper boundary Midpoint frequency
51 58 50.5 58.5 54.5 5
59 66 58.5 66.5 62.5 9
67 74 66.5 74.5 70.5 11
75 82 74.5 82.5 78.5 9
83 90 82.5 90.5 86.5 4
91 98 90.5 98.5 94.5 3
41

(b) Find the relative frequency.

cumulative frequency Class boundary Class Cumulative Frequency


5 50.5 − 58.5 50.5 0
14 58.5 − 66.5 58.5 2
25 66.5 − 74.5 66.5 3
34 74.5 − 82.5 74.5 4
38 82.5 − 90.5 82.5 17
41 90.5 − 98.5 90.5 34
98.5 41

(c) Construct a histogram, frequency polygon, and ogive.

By : Sun Bunra 5 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 8. Activities While Driving A survey of 1200 drivers showed the percentage of respondents
who did the following while driving. Construct a vertical bar graph and a horizontal bar graph for
the data.

Drink beverage 80%


Talk on cell phone 73%
Eat a meal 41%
Experience road rage 23%
Smoke 21%

Solution :
Construct a vertical bar graph and a horizontal bar graph for the data.

Drink Beverage 80%


Talk on cell phone 73%
Eat a meal 41%
Experience road rage 23%
Smoke 21%

By : Sun Bunra 6 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 9. Calories of Nuts The data show the number of calories per ounce in selected types of
nuts. Construct vertical and horizontal bar graphs for the data.

Types Calories
Peanuts 160
Almonds 170
Macadamia 200
Pecans 190
Cashews 160

Solution :
selected types of nuts. Construct vertical and horizontal bar graphs for the data.

By : Sun Bunra 7 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 10. Space Launches The data show the number of U.S. space launches for the 10-year
periods from 1960 to 2009. Construct a time series graph for the data and analyze the graph.

Year 60 − 69 70 − 79 80 − 89 90 − 99 100 − 109


Launches 614 247 199 300 206

Solution :
We have,
Year Launches
60 − 69 614
70 − 79 247
80 − 89 199
90 − 99 300
100 − 109 206
Construct a time series graph for the data and analyze the graph.

The data show the number of us that space launches 60 to 89 the graph is decrease And 89 to 99
the graph is increase And then 99 to 109 is decrease again.

By : Sun Bunra 8 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 11. High School Dropout Rate The data show the high school dropout rate for students
for the years 2003 to 2009 . Construct a time series graph and analyze the graph.

Year 2003 2004 2005 2006 2007 2008 2009


Percent 9.9 10.3 9.4 9.3 8.7 8.0 8.1

Solution :
We have,
Year Percent
2003 9.9
2004 10.3
2005 9.4
2006 9.3
2007 8.7
2008 8
2009 8.1

Construct a time series graph and analyze the graph.

The data showed about the number of us that High School Dropout rate increase from 2003 into
2004 and then it is decrease from 2004 to 2008 and increase again in 2008 to 2009.

Exercise 12. Spending of College Freshmen The average amounts spent by college freshmen for
school items are shown. Construct a pie graph for the data.

Electronics/computers $728
Dorm items $344
Clothing $141
Shoes $72

Solution :
The total amount is 1285$. For construct the pie graph you need to find the percentages of each
data.
(728 × 100)
• Some note:
1285

By : Sun Bunra 9 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 13. Career Changes A survey asked if people would like to spend the rest of their careers
with their present employers. The results are shown. Construct a pie graph for the data and
analyze the results.
Answer Number of people
Yes 660
No 260
Undecided 80
Solution :
We have
Answer Number of people
Yes 660
No 260
Undecided 80
Construct a pie graph for the data and analyze the results.

The data show the number of us that Career change A we saw this graph show that for answer yes
it have 80% and 26% is said no and then we have 8% is undecided.

By : Sun Bunra 10 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 14. Peyton Manning’s Colts Career Peyton Manning played for the Indianapolis Colts
for 14 years. (He did not play in 2011.) The data show the number of touch-downs he scored for
the years 1998-2010. Construct a dotplot for the data and comment on the graph.
26 33 27 49 31 27 33
26 26 29 28 31 33
Solution :
Construct a dotplot for the data and comment on the graph.

The graph shows that the maximum score in number 26 and 33 his scored 3 score. And the
minimum score in number 49 and 29 his scored 1 score.
Exercise 15. Songs on CDs The data show the number of songs on each of 40CDs from the
author’s collection. Construct a dotplot for the data and comment on the graph.
10 14 18 11
11 15 16 10
10 17 10 15
22 9 14 12
18 12 12 15
21 22 20 15
10 19 20 21
17 9 13 15
11 12 12 9
14 20 12 10

By : Sun Bunra 11 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Solution :
Construct a dotplot for the data and comment on the graph.

The graph shows that most CDs have 10 songs, 16 and 22 songs. The lowest number of CDs have
17 songs.

Exercise 16. The traffic situation in X-City is getting worse, and it is high time a solution was
offered. The company hired to work on the project took a survey of the estimated amount of vehicles
that move on the road daily and for various intervals. The result of this survey is illustrated in the
table below.
Time Cars Buses Bikes
1 − 2pm 37 45 42
2 − 3pm 44 34 26
3 − 4pm 23 39 27
4 − 5pm 29 41 48
Construct a multiple line graph to visualize the data. Hence, determine the vehicle with the highest
frequency and that with the lowest frequency.
Solution :
determine the vehicle with the highest frequency and that with the lowest frequency.

By : Sun Bunra 12 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Vertical with highest frequency is Bikes and with lowest frequency is Cars.

Exercise 17. Draw a multiple bar graph for the following data which represented agricultural
production for the priod from 2010-2013.

Year Food grains (tones) Vegetables (tones) Others (tones)


2010 100 30 10
2011 120 40 15
2012 130 45 25
2013 150 50 25

Solution :
Draw a multiple bar graph for the following data which represented agricultural production for the
priod from 2010-2013.

Exercise 18. The heights (in cm ) of a sample of the students in a class are shown:

50 52 70 72 65 52 60
75 51 64 65 55 67 70

Find the mean, mode, median, inter quartile range, midrange, variance, and standard deviation for
the data.
Solution :

By : Sun Bunra 13 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Find the mean, mode, median, inter quartile range, midrange, variance, and standard deviation for
the data.
P X
x 868
• Mean: x̄ = , n = 14, x = 868 (sum of all data) → x̄ = = 62
n 14
• Mode = 52, 65, 70 (appear most frequency)
X7 + X8 64 + 65
• Median: M D = = = 64.5
2 2
• Inter quartile range:
n×p 14 × 25 X4 + X5 52 + 55
• for p = 25 ⇒ C = = = 3.5 ⇒ Q1 = = = 53.5
100 100 2 2
n×p 14 × 75 X11 + X12 70 + 70
• for p = 75 ⇒ C = = = 10.5 ⇒ Q3 = = = 70
100 100 2 2
⇒ IQR = Q3 − Q1 = 70 − 535 = 16.5
lowest values + highest value 50 + 75
• Midrange: M R = = = 62.5
2 2
P
2 (X − X̄)2 962
• Variance: s = = = 74 (Check Excel)
n−1 13
√ √
• Standard deviation: s = s2 = 74 = 8.60233

Column1

Mean 62 IQR= Q3-Q1


Standard Error 2.299 Q3 Q1
Median 64.5 69.25 52.75
Mode 52
Standard Deviation 8.60 IQR 16.5
Sample Variance 74
Inter Quartile Range 16.5 Midrange
Sum 62.5 62.5
Mangange 25
Minimum 50

Exercise 19. Households of Four Television Networks A survey showed the number of viewers and
number of households of four television networks. Find the average number of viewers, using the
weighted mean.
Households 1.4 0.8 0.3 1.6
Viewers (in millions) 1.6 0.8 0.4 1.8
Solution :
Find the average number of viewers, using the weighted mean.
Averages of number viewers:
P
wx ((1.4 × 1.6)) + ((0.8) × (0.8)) + ((0.3) × (0.4)) + ((1.6) × (1.8))
X̄ = P = 1.43
w 1.4 + 0.8 + 0.3 + 1.6

Exercise 20. Magazines in Bookstores A survey of bookstores showed that the average number
of magazines carried is 56 , with a standard deviation of 12 . The same survey showed that the
average length of time each store had been in business was 6 years, with a standard deviation of
2.5 years. Which is more variable, the number of magazines or the number of years?

By : Sun Bunra 14 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Solution :
Here the average number of magazines carried is x = 56, then the standard deviation is 12 Here
the average length of time is x = 6, then the standard deviation s is 2.5

CVar x s Percentage
Magazines 56 12 21.43%
Time 6 2.5 41.67%

Therefore, the coefficient of variation time is 41.66% from the above information the data of the
time in business is more variable, because its coefficient of variation is higher.

Exercise 21. Average Earnings of Workers The average earnings of year-round full-time workers
25−34 years old with a bachelor’s degree or higher were $58, 500 in 2003 . If the standard deviation
is $11, 200, what can you say about the percentage of these workers who earn.
(a) Between $47, 300 and $69, 700 ?
(b) More than $80, 900 ?
(c) How likely is it that someone earns more than $100, 000 ?
Solution :
(a) Between $47, 300 and $69, 700 ?
We have X̄ = 58500$, S = 11200$

µ − ks = 47300 (1)
Then
µ + ks = 49700 (2)
µ − 47300 58500 − 47300
(1): k = = =1
5 11200
Thus, chebyshev’s is not app.
(b) More than $80, 900 ?
• Step 1
we need to find p(x > 80900)
x0 − µ 809000 − 58500
by z-score, zo = = =2
6 11200
⇒ p(z > 2) = 1 − p(z < 2)
= 1 − p(z < 2)
= 1 − 0.9772
⇒ p(x > 80900) = 0.0228
• Step 2
⇒ 80900 = 58500 + 11200k ⇒ k = 2
1
⇒ p(|80900 − µ| < 25) ≥ 1 − 2
2
⇒ P (x < µ − 28) + P (x > 80900) ≤ 1 − 0.75
⇒ P (x > 80900) ≤ 25%

By : Sun Bunra 15 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

(c) How likely is it that someone earns more than $100, 000 ?

⇒ P (X > 80900) ≤ 1 − 0.75


1 100000 − 58500
⇒ P (µ − ks < x < ks + µ) = 1 − ⇒k= = 3.7
k2 11200
1
⇒ P (|100000 − µ|) ≥ 3.75 ≥ 1 − = 0.926
(3.7)2
⇒ P (X < µ − 3.75) + P (X > 100000) ≤ 1 − 0.926

Exercise 22. Costs to Train Employees For a certain type of job, it costs a company an average of
$231 to train an employee to perform the task. The standard deviation is $5. Find the minimum
percentage of data values that will fall in the range of $219 to $243. Use Chebyshev’s theorem.
Solution :
• Step 1


µ − kσ = 219 231 − 5k = 219 × (−1)

µ + kσ = 243 231 + 5k = 243
1 1
By cheby shev’s theorem: p(|x − µ| < kσ) ≥ 1 − 2 = 1 − = 0.83 = 83%
k (2.4)2
• Step 2
We have, mean = 231$
Standard deviation 5$
1
Formular 1 − 2
k
Firstly, we need to find out how many standard deviations 219$ and 243$ are from the mean of
231$ we subtract to find which, if each, of the two bounds, 219$ and 234$, is closer to the mean.

$231 $219 = $12


$243 $231 = $12
We find that both limits are 12$ from the mean. Now want to see how many standard deviations
this 12$ difference. So, we divide by the given standard deviation, 5$ to find out.

12$ 12 1 1
we got k = = 2.4 ⇔ 1 − 2 = 1 − = 1 − 0.1736 = 0.8263 = 82.63%
5$ 5 k (2.4)2
Exercise 23. Exam Completion Time The mean time it takes a group of students to complete a
statistics final exam is 44 minutes, and the standard deviation is 9 minutes. Within what limits
would you expect approximately 95% of the students to complete the exam? Assume the variable
is approximately normally distributed.
Solution :
From what we can show from the informing
We have: µ = 44, σ = 9
Approximately 95% we got k = 2 (normally)

µ − kσ = 44 − 2(9) = 26
µ − kσ = 44 + 2(9) = 62

Therefore, approximately 95 % of students can complete the exam is staying between 26-62 minutes.

By : Sun Bunra 16 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 24. Exam Grades Which of these exam grades has a better relative position?
(a) A grade of 82 on a test with x̄ = 85 and s = 6.
(b) A grade of 56 on a test with x̄ = 60 and s = 5.
Solution :
(a) A grade of 82 on a test with x̄ = 85 and s = 6.
82 − 85
⇒ z - score for grade of 82 is z1 = = −0.5
6
(b) A grade of 56 on a test with x̄ = 60 and s = 5.
56 − 60
⇒ z - score for grade of 56 is z2 = = −0.8 In conclusion, Z1 > Z2
5
Hence, a grade of 82 has better relative position than a grade of 56 on a test.

Exercise 25. Check each data set for outliers.


(a) 506, 511, 517, 514, 400, 521
(b) 3, 7, 9, 6, 8, 10, 14, 16, 20, 12
Solution :
(a) 506, 511, 517, 514, 400, 521
We have, 400, 506, 511, 514, 517, 521
n×p 6 × 25 X2 + X3 506 + 511
• P = 25 ⇒ C = = = 1.5 ⇒ Q1 = = = 508.5
100 100 2 2
n×p 6 × 75 X5 + X6 517 + 521
• P = 75 ⇒ C = = = 4.5 ⇒ Q3 = = = 519
100 100 2 2
• Find the interquartile range : IQR = Q3 − Q1 = 519 − 508.5 = 10.5
• Multiply the IQR by 1.5 : IQR ×1.5 = 10.5 × 1.5 = 15.75
• Subtract the value obtained in step 3 from Q1 , and add value to Q3 .
Check the data set for any data value is smaller than Q1 − 1.5(IQR) or larger than Q3 + 1.5(IQR).

Outlier Outlier

Q1 − 1.5IQR = 492.75 Q3 + 1.5IQR = 534.75

Data Outliers Q1 Q3 IQR Upper boundary Lower boundary


400 TRUE 507.25 516.25 9 534.75 492.75
506 FALSE
511 FALSE
514 FALSE
517 FALSE
521 FALSE

X3 + X4 511 + 514
Q2 = = = 512.5
2 2
Therefore, 400 is outliers.

By : Sun Bunra 17 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

b. 3, 7, 9, 6, 8, 10, 14, 16, 20, 12


• Arrange the data in order and find Q1 and Q3 : 3, 6, 7, 8, 9, 10, 12, 14, 16, 20
n×p 10 × 25 X3 + X4 7+8
• p = 25 : C = = = 2.5 ⇒ Q1 = = = 7.5
100 100 2 2
n×p 10 × 75 X8 + X9 14 + 16
• p = 75 : C = = = 7.5 ⇒ Q3 = = = 15
100 100 2 2
• Find the interquartile range: IQR = Q3 − Q1 , IQR = 15 − 7.5 = 7.5
• Multiply the IQR by 1.5 : IQR ×1.5 = 7.5 × 1.5 = 11.25
• Substract the value obtained in step 3 from Q1 and add value to Q3

Q1 − 1.5IQR = 7.5 − 11.25 = −3.75, Q3 + 1.5IQR = 15 + 11.25 = 26.25

Checking the data set for any data value is smaller than Q1 − 1.5(IQR) or larger than Q3 +1.5(IQR).

Outlier Outlier

Q1 − 1.5IQR = −3.37 Q3 + 1.5IQR = 26.25

Data Outliers Q1 Q3 IQR Upper boundary Lower boundary


3 FALSE 7.25 13.5 6.25 26.25 −3.75
7 FALSE
9 FALSE
6 FALSE
8 FALSE
10 FALSE
14 FALSE
16 FALSE
20 FALSE
12 FALSE
There are no outliers because no data value is out of (−3.75, 26.25)

Exercise 26. Check each data set for outliers.


(a) 14, 18, 27, 26, 19, 13, 5, 25
(b) 112, 157, 192, 116, 153, 129, 131
Solution :
Check each data set for outliers.
(a) 14, 18, 27, 26, 19, 13, 5, 25
• Arrange the data in order and find Q1 and Q3 : 5, 13, 14, 18, 19, 25, 26, 27
n×p 8 × 25 X2 + X3 13 + 14
• p = 25 ⇒ C = = = 2 ⇒ Q1 = = = 13.5
100 100 2 2
n×p 8 × 75 X6 + X7 25 + 26
• p = 75 ⇒ C = = = 6 ⇒ Q3 = = = 25.5
100 100 2 2
• Find the interquatile range: IQR = Q3 − Q1 : IQR = 25.5 − 13.5 = 12
• Multiply the IQR by 1.5 : IQR × 1.5 = 12 × 1.5 = 18

By : Sun Bunra 18 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

• Substract the value ontained in step 3 from Q1 and add value to Q3 .

Q1 − 1.5IQR = 13.5 − 18 = −4.5Q3 + 1.5IQR = 25.5 + 18 = 43.5

Check the data set for any data value is smaller than Q1 − 1.5(IQR) or larger than Q3 + 1.5(IQR).
Outlier Outlier

Q1 − 1.5IQR = −4.5 Q3 + 1.5IQR = 43.5

Data Outliers Q1 Q3 IQR Upper boundary Lower boundary


14 FALSE 13.75 25.25 11.5 43.5 −4.5
18 FALSE
27 FALSE
26 FALSE
19 FALSE
13 FALSE
5 FALSE
25 FALSE
There are no outliers because no data value is out of (−4.5, 43.5).
(b) 112, 157, 192, 116, 153, 129, 131
• Arrange the data in order and find Q1 and Q3 : 112, 116, 129, 131, 153, 157, 192
n×p 7 × 25 X2 + X3 116 + 129
• p = 25 ⇒ C = = = 1.75 ⇒ Q1 = = = 122.5
100 100 2 2
n×p 7 × 75 X5 + X6 153 + 157
• p = 75 ⇒ C = = = 5.25 ⇒ Q3 = = = 155
100 100 2 2
• Find the interquatile range: IQR = Q3 − Q1 : IQR = 155 − 122.5 = 32.5
• Multiply the IQR by 1.5: IQR ×1.5 = 32.5 × 1.5 = 48.75
• Substract the value ontained in step 3 from Q1 and add value to Q3 .

Q1 − 1.5IQR = 122.5 − 48.75 = 73.75, Q3 + 1.5IQR = 155 + 48.75 = 203.75

Check the data set for any data value is smaller than Q1 − 1.5(IQR) or larger than Q3 + 1.5(IQR).
Outlier Outlier

Q1 − 1.5IQR = 73.75 Q3 + 1.5IQR = 203.75

Data Outliers Q1 Q3 IQR Upper boundary Lower boundary


112 FALSE 122.5 155 32.5 203.75 73.75
157 FALSE
192 FALSE
116 FALSE
153 FALSE
129 FALSE
131 FALSE
There are no outliers because no data value is out of (73.75, 203.75)

By : Sun Bunra 19 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 27. The following sample data are the midterm examination test scores for 30 students:
55 60 91 85 60 70 89 99 59 67
72 82 60 68 57 74 64 70 68 91
89 90 83 40 79 85 71 80 76 81
a. Find the mean, mode, median, variance, standard deviation, Q1 , and Q3 of the data.
b. Construct a frequency table with 5 classes.
c. Using the grouped data formula, find the mean, mode, median, variance, standard deviation,
Q1 , and Q3 for the table in part (b) and compare it to the results in part (a).
d. Construct a histogram and comment on the shape of the distribution.
e. Find the percentile values of 55,60 , and 74 .
Solution :
a. Find the mean, mode, median, variance, standard deviation, Q1 , and Q3 of the data.
We have: MAX = 99, MIN = 40, I = 12
P X
x 2215
• Mean X̄ = , x = 2215, n = 30 ⇒ X̄ = = 73.83
n 30
• Mode = 60 (Appear 3 frequency which is the most frequency data value)
(X15 + X16 ) 72 + 74
• MD = = = 73
2 2
n×p 30 + 25
• Q1 = L25 = C = = =8
100 100
X8 + Q9 64 + 67
We found Q1 = 8. So, We have: Q1 = = = 65.5 ∼ 65
2 2
n×p 30 + 75
• Q3 = L75 = C = = = 23
100 100
X23 + X24 85 + 89
We found Q3 = 23, So we have: Q3 = = = 85
2 2
P
2 (x − x̄)2 5294.17
• Variance : s = = = 182.56
n−1 30 − 1
√ p
• Standard deviation: s = s2 = 182.56 = 13.51
Column1
Mean 73.83
Standard Error 2.46
Median 73
Mode 60
Standard Deviation 13.51
Variance 182.55
b. Construct a frequency table with 5 classes.
Recall: We have MAX = 99, MIN = 40, I = 12, K = 5
lower limit upper limit lower boundary upper boundary Midpoint frequency
40 51 39.5 51.5 45.5 1
52 63 51.5 63.5 57.5 6
64 75 63.5 75.5 69.5 9
76 87 75.5 87.5 81.5 8
88 99 87.5 99.5 93.5 6
30

By : Sun Bunra 20 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

cumulative frequency Class boundary Class Cumulative Frequency


1 39.5 − 51.5 39.5 0
7 51.5 − 63.6 51.5 1
16 63.5 − 75.7 63.5 7
24 75.5 − 87.8 75.5 16
30 87.5 − 99.9 87.5 24
30
c. Using the grouped data formula, find the mean, mode, median, variance, standard deviation,
Q1 , and Q3 for the table in part (b) and compare it to the results in part (a).

2
Class limits Frequency (f) Midpoint f.Xm f.Xm
40 − 51 1 45.5 45.5 2070.25
52 − 63 6 57.5 345 19837.5
64 − 75 9 69.5 625.5 43472.25
76 − 87 8 81.5 652 53138
88 − 99 6 93.5 561 52453.5
2229 170971.5
P
f · xm 2229
• Mean X̄ = = = 74.3
n 30
• Mode = modal class = the class with the largest frequency.
• The modal class is 64− 75 (class limit)or 63.5 − 75.5 (class boundary).
w
• Median: M D = Lm + (0.5n − cf )
f
• Lm = 63.5, w = 12, f = 9, n = 30, cf = 7
12
• M D = 63.5 + (0.5 × 30 − 7) = 74.16
9
P P
2 n ( f · Xm 2)−( f · Xm )2 30(170971.5) − (2229)2 160704
• Variance: s = = = = 184.71
n(n − 1) 30(30 − 1) 870
√ p
• Standard deviation: s = s2 = 184.71 = 13.59
n×p 30 × 25
• For p = 25 → C = = = 7.5 ∼ 8
100 100
we can shows: Q1 = 65.5 and Q3 = 85, So from grouped data and data on ( a we got Q1 and Q3
are similar.
Therefore, from grouped data we got: Mean X̄ = 74.3, Mode = 63.5, M D = 74.16, S 2 = 184.71, S =
13.59, Q1 = 65.5, Q3 = 85

By : Sun Bunra 21 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

d. Construct a histogram and comment on the shape of the distribution.

Comment: The shape of the distribution can be described as bimodal. There are 2 classes boundary
that occurred at the same frequency.
e. Find the percentile values of 55,60 , and 74 .

( number of values below x) + 0.5


Percentile = × 100
total number of values
1 + 0.5
• for x = 55, then the percentile = × 100 = 5th percentile
30
Hence, a student whose score was 55 did better than 5% of the class.
4 + 0.5
• For x = 60, then the percentile = × 100 = 15th percentile .
30
Hence, a student whose score was 60 did better than 15% of the class.
15 + 0.5
• For x = 74, then the percentile = × 100 = 51.6th percentile. Hence, a student whose
30
score was 74 did better than 51.6% of the class.

Exercise 28. For the following data:

6.3 2.9 4.5 1.1 1.8 4.0 1.2 3.1 2.0 4.0
7.0 2.8 4.3 5.3 2.9 8.3 4.4 2.8 3.1 5.6
4.5 4.5 5.7 0.5 6.2 3.7 0.9 2.4 3.0 3.5

(a) Find the mean, mode, median, variance, standard deviation, Q1 , Q3 , and 90 th percentile.
(b) Construct a frequency table with 5 classes.
(c) Using the grouped data formula, find the mean, mode, median, variance, standard deviation,
Q1 , Q3 and 90 th percentile for the frequency table constructed in part (b) and compare it to the
results in part (a).
(d) Construct a histogram, and comment on the shape of the data.

By : Sun Bunra 22 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Solution :
(a) Find the mean, mode, median, variance, standard deviation, Q1 , Q3 , and 90 th percentile.
P
x
• Mean X̄ = = 3.74
n
• Mode = 4.5 (it appears 3 frequency which is the most frequency data values)
X15 + X16 3.5 + 3.7
• Median = = = 3.6
2 2

Σ X − X)2
2
• Variance =
n−1
we have sample X = 3.74, n = 30,
101.55
we got s2 = = 3.50
29

• Standard deviation, s = s2 = 1.87
• Find Q1 and Q3
n×p 30 × 25 X8 + Q9 2.8 + 2.8
• For p = 25 : Q1 = = = 7.5 ⇒ Q1 = = = 2.8
100 100 2 2
n×p 30 × 75 X23 + Q24 4.5 + 5.3
• For p = 75 Q3 = = = 22.5 ⇒ Q3 = = = 4.9
100 100 2 2
Therefore, Q1 = 2.8, Q3 = 4.9
• Find the value corresponding to 90th percentile.
n×p 30 × 90
• For p = 90, then c = = = 27
100 100
6.2 + 6.3
Hence, the data value correspending to 90th percentile is = 6.25
2
(b) Construct a frequency table with 5 classes.

lower limit upper limit lower boundary upper boundary Midpoint frequency
0.5 2.05 0 2.55 1.275 6
2.06 3.61 1.56 4.11 2.835 9
3.62 5.17 3.12 5.67 4.395 8
5.18 6.73 4.68 7.23 5.955 5
6.74 8.29 6.24 8.79 7.515 1
29

cumulative frequency Class boundary Class Cumulative Frequency


1 0 − 2.55 0 0
10 1.56 − 4.11 2.55 6
18 3.12 − 5.67 4.11 9
23 4.68 − 7.23 5.67 8
24 6.24 − 8.79 7.23 5
8.79 1

(c) Using the grouped data formula, find the mean, mode, median, variance, standard deviation,
Q1 , Q3 and 90th percentile for the frequency table constructed in part (b) and compare it to the

By : Sun Bunra 23 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

results in part (a).


2
Class limits Frequency (f) Midpoint f.Xm f.Xm
0.5 − 2.05 6 1.275 7.65 9.75
2.06 − 3.61 9 2.835 25.51 72.33
3.62 − 5.17 8 4.395 35.16 154.52
5.18 − 6.73 5 5.955 29.77 177.31
6.74 − 8.29 1 7.515 7.51 56.47
105.615 470.40
P
f · Xm 105.615
• Mean X̄ = = = 3.52
n 30
• Mode = modal class = the class with the largest frequency.
• The modal class is 64− 75 (class limit) or 4.12 − 5.67 (class boundary).
w
• Median: M D = Lm + (0.5n − cf )
f
• Lm = 3.57, w = 3, f = 8, n = 30, cf = 9
3
• M D = 4.12 + (0.5 × 30 − 9) = 6.37
8
P P
2 n ( f · x2m ) − ( f · xm )2 2957.54
• Variance: s = = = 3.4
n(n − 1) 870
√ √
• Standard deviation: s = s2 = 3.4 = 1.84
n×p 30 × 25
• For p = 25 → C = = = 7.5 ∼ 8
100 100
we can shows: Q1 = 2.8 and Q3 = 4.9, So from grouped data and data on (a) we got Q1 and Q3
are similar.
(d) Construct a histogram, and comment on the shape of the data.

Exercise 29. In recent years, due to low interest rates, many homeowners refinanced their home
mortgages. Linda Lahey is a mortgage officer at Down River Federal Savings and Loan. Below is
the amount refinanced for 20 loans she processed last week. The data are reported in thousands of
dollars and arranged from smallest to largest.
59.2 59.5 61.6 65.5 66.6 72.9 74.8 77.3 79.2 83.7
85.6 85.8 86.6 87.0 87.1 90.2 93.3 98.6 100.2 100.7

By : Sun Bunra 24 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

a. Find the median, first quartile, and third quartile.


b. Find the 26 th and 83rd percentiles.
c. Draw a box plot of the data and comment on the shape of the distribution.
Solution :
a. Find the median, first quartile, and third quartile.
X10 + X11 83.7 + 85.6
• Medain = = = 84.65
2 2
n×p 20 × 25 X5 + X6 66.6 + 72.9
• For p = 25 → C = = = 5 ⇒ Q1 = = = 69.75
100 100 2 2
n×p 20 × 75 X15 + X16 87.1 + 90.2
• For p = 75 → C = = = 15 ⇒ Q3 = = = 88.65
100 100 2 2
b. Find the 26th and 83rd percentiles.

n×p 20 × 26
C= = = 5.2
100 100

Since, the value corresponding to the 26th percentile is L26 = x6 = 72.9

n×p 20 × 83
C= = = 16.6
100 100

Thus, the value corresponding to the 83rd percentile is L83 = x17 = 93.3
c. Draw a box plot of the data and comment on the shape of the distribution.

Exercise 30. Hours Worked The data shown here represent the number of hours that 12 part-
time employees at a toy store worked during the weeks before and after Christmas. Construct two
boxplots and compare the distributions.

Before 38 16 18 24 12 30 35 32 31 30 24 35
After 26 15 12 18 24 32 14 18 16 18 22 12

Solution :
Construct two boxplots and compare the distributions.

Before 38 16 18 24 12 30 35 32 31 30 24 35
After 26 15 12 18 24 32 14 18 16 18 22 12

By : Sun Bunra 25 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 31. Many times in statistics it is necessary to see if a set of data values is approximately
normally distributed. There are special techniques that can be used. One technique is to draw a
histogram for the data and see if it is approximately bell-shaped. (Note: It does not have to be
exactly symmetric to be bell-shaped.) The numbers of branches of the 50 top libraries are shown.

67 84 80 77 97 59 62 37 33 42
36 54 18 12 19 33 49 24 25 22
24 29 9 21 21 24 31 17 15 21
13 19 19 22 22 30 41 22 18 20
26 33 14 14 16 22 26 10 16 24

1. Construct a frequency distribution for the data.


2. Construct a histogram for the data.
3. Describe the shape of the histogram.
4. Based on your answer to question 3, do you feel that the distribution is approximately normal?
5. Find the mean and standard deviation for the data.
6. What percent of the data values fall within 1 standard deviation of the mean?
7. What percent of the data values fall within 2 standard deviations of the mean?
8. What percent of the data values fall within 3 standard deviations of the mean?
9. Does your answer help support the conclusion you reached in question 4? Explain.
Solution :
1. Construct a frequency distribution for the data.
Recall: The 2k rule says that 2k ≥ n where:
• K is the number of classes
• N is trhe number of data point the width of each class can be found by using:

largest value − smallest value


k

The all data number of branches is 50 so we got:

2k ≥ 50
2k ≥ 26 ⇒ k = 6

By : Sun Bunra 26 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

max − min 97 − 9
we have, M AX = 97, M IN = 9, K = 6, i = = = 15
k 6

lower limit upper limit lower boundary upper boundary Midpoint frequency
9 23 8.5 23.5 16 24
24 38 23.5 38.5 31 15
39 53 38.5 53.5 46 3
54 68 53.5 68.5 61 4
69 83 68.5 83.5 76 2
84 98 83.5 98.5 91 2
50

cumulative frequency Class boundary Class Cumulative Frequency


2 14.5 − 18.5 14.5 0
17 18.5 − 22.5 18.5 2
20 22.5 − 26.5 22.5 9
24 26.5 − 30.5 26.5 31
26 30.5 − 34.5 30.5 41
28 34.5 − 38.5 34.5 47
50
2
Classes Boundaries Midrange Xm Frequency f f.Xm f.Xm
9 − 23 8.5 − 23.5 16 24 384 6144
24 − 38 23.5 − 38.5 31 15 465 14415
39 − 53 38.5 − 53.5 46 3 138 6348
54 − 68 53.5 − 68.5 61 4 244 14884
69 − 83 68.5 − 83.5 76 2 152 11552
84 − 98 83.5 − 98.5 91 2 182 16562
sigma 1565 69905
2. Construct a histogram for the data.

3. Describe the shape of the histogram.


According to the histogram, the distribution has falls to the left or it is positively skewed.
4. The distribution does not appear to be normal.

By : Sun Bunra 27 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

5. Find the mean and standard deviation for the data.


1X (24)(16) + (15)(31) + (3)(46) + (4)(61) + (2)(76) + (2)(91)
• Mean x̄ = f · xm =
n 50

X̄ = 31.5
P
2 (X − X̄)2
• Variance : s = = 424.36
n−1

• We got, Standard Deviation: s = S 2 = 20.6

6. What percent of the data values fall within 1 standard deviation of the mean?

X : |X − X̄| < ks ⇔ X̄ − s < X < X̄ + s ⇔ 11 < X < 52

According to the 50 data, there are 40 data which is betweenn 11 to 52.


Therefore, the percent of data values which falls within 1 standard deviation is
# posible data 40
P (−9.7 < X < 72.7) = × 100 = × 100 = 80%
# all data 50
7. What percent of the data values fall within 2 standard deviations of the mean?

X : |X − X̄| < ks ⇔ X̄ − 2s < X < X̄ + 2s ⇔ −10.02 < X < 72.62

According to the 50 data, there are 46 data which is betweenn −10.02to72.62. Therefore, the
percent of data values which falls within 2 standard deviation is
# posible data 46
P (−10.02 < X < 72.62) = × 100 = × 100 = 92%
# all data 50
8. What percent of the data values fall within 3 standard deviations of the mean?

X : |X − X̄| < ks ⇔ X̄ − 3s < X < X̄ + 3s ⇔ −30.68 < X < 93.28

According to the 50 data, there are 49 data which is between - 30.68 to 93.28. Therefore, the
percent of data values which falls within 1 standard deviation is
# posible data 49
P (−30.68 < X < 93.68) = × 100 = × 100 = 98%
# all data 50
9. Does your answer help support the conclusion you reached in question 4? Explain.
The answers from 6, 7, 8 does not support the conclusion from number distribution. That mean,
this distribution is approximately normal.

By : Sun Bunra 28 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

I3-TD2
Point Estimation

Exercise 1. Data on pull-off force (pounds) for connectors used in an automobile engine application
are as follows
79.3 75.1 78.2 74.1 73.9 75.0 77.6 77.3 73.8 74.6 75.5 74.0 74.7
75.9 72.9 73.8 74.2 78.1 75.4 76.3 75.3 76.2 74.9 78.0 75.1 76.8

(a) Calculate a point estimate of the mean pull-off force of all connectors in the population. State
which estimator you used and why.
(b) Calculate a point estimate of the pull-off force value that separates the weakest 50% of the
connectors in the population from the strongest 50%.
(c) Calculate point estimates of the population variance and the population standard deviation.
(d) Calculate the standard error of the point estimate found in part (a). Interpret the standard
error.
(e) Calculate a point estimate of the proportion of all connectors in the population whose pull-off
force is less than 73 pounds

Solution :

(a) Calculate a point estimate of the mean pull-off.

1 X
26
θ^ = X̄ = xi = 75.61538
26 i=1

Therefore, Point estimate of the mean pull-off is 75.61538 ■


(b) Calculate a point estimate of the pull-off force value that separates the weakest 50% of the
connectors in the population from the strongest 50%
 
x n2 − x n2 +1 x13 − x14
µ
^ = MD = = = 75.2
2 2
Therefore, µ
^ = 75.2 ■
(c) Calculate point estimate of the population variance and the population standard deviation.
We have s2 is population variance and s is the population standard deviation
1 X
26
2
Where s = (xi − x̄)2 = 7.5076
n − 1 i=1
√ p
Then, s = s2 = 7.50765 = 2.74
Therefore, s2 = 7.5076 and s = 2.74 ■
(d) Calculate the standard errors of the population estimate found in part(a).
r
s2 2.74
q
σ
^X̄ = V (X̄) = = √ = 0.53735
n 26
Therefore, σ
^X̄ = 0.53735 ■
(e) Calculate a point estimate of the population of all connectors in the population whose pull-off
x
force is less than 73 pounds p^ = , where x is the number of connectors whose force is less
n

By : Sun Bunra 29 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

than 73pounds
1
Then, p^ = = 0.03846
26
Therefore, p^ = 0.03846 ■

Exercise 2. (a) A random sample of 10 houses in a particular area, each of which is heated
with natural gas, is selected and the amount of gas (therms) used during the month of January is
determined for each house. The resulting observations are

103 156 118 89 125 147 122 109 138 99

Let µ denote the average gas usage during January by all houses in this area. Compute a point
estimate of µ
(b) Suppose there are 10,000 houses in this area that use natural gas for heating. Let t denote the
total amount of gas used by all of these houses during January. Estimate t using the data of
part (a). What estimator did you use in computing your estimate?
(c) Use the data in part (a) to estimate p, the proportion of all houses that used at least 100
therms.
(d) Give a point estimate of the population median usage (the middle value in the population of
all houses) based on the sample of part (a). What estimator did you use?

Solution :

(a) Compute a point estimate of µ


P10
i=1 xi
µ = x̄ = = 120.7
16
Therefore, µ
^ = 120.7 ■
(b) Estimate t using the data of part (a).
Note that t is the total amount of gas used by all of those houses during January.

t^ = 10000 × µ
^ = 120700 therms. ■

In this computing, we use the point estimate of the average gas usage during January by 10 houses
in particular area.
(c) Use the data in part (a) to estimate p.
x 8
p= = = 0.8
n 10
Therefore, p = 0.8 ■
(d) Give a point estimate of the population median usage.
x5 + x6
^ = MD =
µ̃ = 120
2
Therefore, µ̃ = 120 ■

By : Sun Bunra 30 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 3. Let X1 , X2 , . . . , Xn be a random sample from a distribution having finite variance σ 2 .


Show that
1 X
n 2
S2 = Xi − X̄
n − 1 i=1
is an unbiased estimator of σ 2 . Hint: Write

1 X 2
 
n
S2 = X − nX̄ 2 
n − 1 i=1 i

and compute E (S 2 )

Solution :
1 X
n 2
Show that S 2 = Xi − X̄ is an unbiased estimator of σ 2
n − 1 i=1

1 X 1 X 2
 
n 2 n
2
We have S = Xi − X̄ = X − nX̄ 2 
n − 1 i=1 n − 1 i=1 i

1 X 2
  
n
E (S 2 ) = E  X − nX̄ 2 
n − 1 i=1 i
2 
1 X X

n n
1
E (S 2 ) = E (Xi )2 − E  Xi  
n − 1 i=1 n(n − 1) i=1
2 
X X
  
n n
We have E (Xi2 ) = σ 2 + µ2 and E  Xi   = nσ 2 + E 2  Xi 
i=1 i=1
2 
X

n
we obtained E  Xi   = nσ 2 + (nµ)2
i=1
1
Thus, E (S 2 ) = (nσ 2 + nµ2 − σ 2 − nµ2 ) = σ 2
n−1
Therefore, S 2 is an unbiased estimator of σ 2 ■

Exercise 4. Suppose that X is the number of observed ”successes” in a sample of n observations


where p is the probability of success on each observation.
x
(a) Show that p^ = is an unbiased estimator of p.
n
q
(b) Show that the standard error of p^ is p(1 − p)/n. How would you estimate the standard error?

Solution :
X
(a) Show that p^ = is an unbiased estimator of p
n
Since X ∼ Bin(n, p)
X np
 
Then, E(^
p) = E = =p
n n
Therefore, p^ is an unbiased estimator of p ■.
q
(b) Show that the standard error of p^ is p(1 − p)/n
Let σp^ be a standard error.

By : Sun Bunra 31 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

r
q 1 q
so, σp^ = V (^
p) = × np(1 − p) = p(1 − p)/n
n2
q
Therefore, The standard error of p^ is p(1 − p)/n ■
Since the population standard deviation is rarely known,the standard error of proportion is usually
estimated as the sample standard deviation divided by the square root of its sample size.
Exercise 5. Let X1 , X2 , . . . , Xn be a random sample drawn from a distribution with mean µ and
X
n
variance σ 2 and let a1 , . . . , an be real numbers such that ai = 1.
i=1
X
n
^=
Define X ai Xi
i=1

(a) Show that X̄ is an unbiased estimator of µ.


X
n
^ (hence among all estimators of µ of the form
(b) Show that V (X̄) ≤ V (X) ai Xi , X̄ is the
i=1
MVUE).

Solution :
^ is an unbiased estimator of µ.
(a) Show that X
X
n
^=
We have X ai Xi
i=1

X X
 
n n
^ =E
Then, E(X) ai Xi  = ai E (Xi )
i=1 i=1

X
n
Since X1 , . . . , Xn ∼ N (µ; σ 2 ) E(X)
^ =µ ai = µ
i=1
^ is an unbiased estimator of µ
Therefore, X ■
^
(b) Show that V (X̄) ≤ V (X)
X X
 
n n
^ = σ2 1 
We have V (X) ai 2 and V (X̄) = V Xi  = σ 2 /n
i=1
n2 i=1

By Cauchy-Schwarz inequality:
(a1 b1 + . . . + an bn )2 ≤ (a1 2 + . . . + a2n ) (b1 2 + . . . + bn 2 ) for ai , bi are positive for all i ∈ N
1
Take b1 = . . . = bn = 1, then (a21 + . . . + a2n ) ≥
n
Therefore, V (X̄) ≤ V (X) ■^
Exercise 6. Let X1 , X2 , . . . , Xn be a random sample from a distribution with unknown mean
X1 + 2X2 + . . . + nXn
−∞ < µ < +∞, and unknown variance σ 2 > 0. Show that the statistic X̄ and Y = n(n+1)
2
are both unbiased estimators of µ. Further, show that V (X̄) < V (Y )

Solution :

Show that X̄ and Y are both unbiased estimator of µ


1X
n n(n+1) Xn
X1 + 2X2 + . . . + nXn
 
2
We have E(X̄) = Xi µ and E(Y ) = E = n(n+1)
E (Xi ) = µ
n i=1 1 + 2 + 3 + ... + n i=1
2

By : Sun Bunra 32 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Therefore, They are unbiased estimator ■.


Show that V (X̄) ≤ V (Y )
σ2 4σ 2 Xn
We have V (X̄) = and V (Y ) = 2 i2
n n (1 + n)2 i=1
By Cauchy-Schwarz inequality:
(a1 b1 + . . . + an bn )2 ≤ (a21 + . . . + a2n ) (b21 + . . . + b2n ) for ai , bi positive for all i ∈ N
Take b1 = . . . = bn = 1 and ai = i for all i ∈ [1, n]
2
n(1 + n)

We obtained, ≤ n (12 + 22 + . . . + n2 )
2
Then, P
1 4 ni=1 i2
 

n n2 (1 + n)2
multiply by σ 2 we get
V (X̄) ≤ V (Y )
Therefore, V (X̄) ≤ V (Y ) ■

Exercise 7. Using a long rod that has length µ, you are going to lay out a square plot in which the
length of each side is µ. Thus the area of the plot will be µ2 . However, you do not know the value
of µ, so you decide to make n independent measurements X1 , X2 , . . . , Xn of the length. Assume
that each Xi has mean µ (unbiased measurements) and variance σ 2 .
(a) Show that X̄ 2 is not an unbiased estimator for µ2 . [Hint: For any rv Y, E (Y 2 ) = V (Y )+[E(Y )]2 .
Apply this with Y = X̄.]
 
(b) For what value of k is the estimator X̄ 2 −kS 2 unbiased for µ2 ? [Hint: Compute E X̄ 2 − kS 2 .]

Solution :

(a) Show that X̄ 2 is not an unbiased estimator for µ2


 
We will show that E X̄ 2 ̸= µ2
 
V (X̄) = E X̄ 2 − E 2 X̄

Thus,  
E X̄ 2 = V (X̄) + E 2 X̄
Then,
  σ2
E X̄ 2 = + µ2 ̸= µ2
n
Therefore, X̄ 2 is not an unbiased estimator for µ2 ■
(b) For what value k is the estimator X̄ 2 − kS 2 unbiased for µ2 .
    σ2
We have E X̄ 2 − kS 2 = E X̄ 2 − kE (S 2 ) = + µ2 − kσ 2 = µ2 (It is unbiased)
n
1
Then, we obtained k =
n
1
Therefore, k = ■
n
Exercise 8. Let X1 , X2 , . . . , Xn be uniformly distributed on the interval [0, θ]. Recall that the
maximum likelihood estimator of θ is θ^ = max (Xi ).

By : Sun Bunra 33 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

(a) Argue intuitively why θ^ cannot be an unbiased estimator for θ.


(b) Suppose that E(θ) ^ = nθ/(n + 1). Is it reasonable that θ^ consistently underestimates θ ? Show
that the bias in the estimator approaches zero as n gets large.
(c) Propose an unbiased estimator for θ.
(d) Let Y = max (Xi ). Use the fact that Y ≤ y if and only if each Xi ≤ y to derive the cumulative
distribution function of Y . Then show that the probability density function of Y is
 n−1
 ny , 0≤y≤θ
f (y) = θn
0, otherwise.

Use this result to show that the maximum likelihood estimator for θ is biased.
(e) We have two unbiased estimators for θ : the moment estimator θ^1 = 2X̄ and θ^2 = [(n +
1)/n] max (Xi ), where
  observation in a random sample of size n. It can be
max (Xi ) is the largest
shown that V θ^1 = θ2 /(3n) and that V θ^2 = θ2 /[n(n + 2)] . Show that if n > 1, θ^2 is a better
estimator than θ^1 . In what sense is it a better estimator of θ ?

Solution :

(a) Argue intuitively why θ can not be an unbiased estimator for θ.


Intuitively, θ^ will be always be smaller than θ (Uniform distribution on interval [0, θ]). that is why
it should not be an unbiased estimator ■.
(b) Suppose that E(θ) ^ = n θ
n+1
That is reasonable that θ^ consistently underestimate, because it is biased estimator.
Note: a statistic is positively biased if it tends to overestimate the parameter. A statistic is
negatively biased if it tends to underestimate the parameter.

^ = E(θ)
^ −θ = n θ
B(θ) θ−θ =−
n+1 n+1

Therefore, as n → ∞ we get B(θ)


^ →0 ■
(c) Propose an unbiased estimator for θ
We have E(θ)^ = n θ
n+1
n+1^
 
Then, E θ =θ
n
n+1^
Conclusion, We can propose Tn = θ as an unbiased estimator of θ ■
n
(d) Let Y = max (Xi ).
Find cdf of Y .
We have P (Y < y) = P (max (Xi ) < y) and since X1 , . . . , Xn are independent.
We obtained, P (Y < y) = [P (X < y)]n
y
Since X is uniformly distributed, so P (X < y) = , for 0 ≤ y ≤ θ
θ
 n
y
Therefore, P (Y < y) = ■
θ

By : Sun Bunra 34 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

y n−1
so its pdf is given by: fY (y) = n for 0 ≤ y ≤ θ and zero elsewhere.
θn
Use this result to show that the maximum likelihood estimator for θ is biased.
Zθ n
^ ny n
We have E(θ) = n
dy = θ
0 θ n+1
Therefore, It is biased estimator ■.
n+1
(e) Given two unbiased estimators for θ, θ^1 = 2X̄, θ^2 = max (Xi )
n
  θ2   θ2
And we have V θ^1 = and V θ^2 =
3n n(n + 1)
   
For n ≥ 1 we have V θ^1 ≥ V θ^2

Therefore, θ^2 is a better estimator ■.

Exercise 9. A random sample X1 , X2 , . . . , Xn of size n is taken from a Poisson distribution with


a mean of λ, 0 < λ < ∞
^ = X̄.
(a) Show that the maximum likelihood estimator for λ is λ
(b) Let X equal the number of flaws per 100 feet of a used computer tape. Assume that X has a
Poisson distribution with a mean of λ. If 40 observations of X yielded 5 zeros, 7 ones, 12 twos,
9 threes, 5 fours, 1 five, and 1 six, find the maximum likelihood estimate of λ.

Solution :
^ = X̄
(a) Show that the maximum likelihood estimator for λ is λ
We have X1 , . . . , Xn ∼ Poi(λ)
e λ λ xi
Then p (xi ) = for xi ≥ 0
xi !
P
e−nλ λ ni=1 xi
since, likelihood function is L(x; θ) =
x1 !x2 ! . . . xn !
Pn !
λ i=1 xi
So, ln L(x; θ) = −nλ + ln
x1 !x2 ! . . . xn !

∂ X n Pn 1
Then, ln L(x; θ) = −n + xi × λ i=1 xi −1 × Pn
∂λ λ i=1 xi
i=1 Pn
∂ xi
• If, ln L(x; θ) = 0 ⇐⇒ −n + i=1 = 0
∂λ λ
^
Therefore, λM LE = X̄ ■
(b) Find maximum likelihood estimate of λ.
X 40
^= 1
We have λ xi
n i=1
^ = 2.075 flaws / feet
Then, λ
Therefore, The point estimate of λ is 2.075 flaws/feet ■

Exercise 10. Let f (x) = (1/θ)x(1−θ)/θ , 0 < x < 1, 0 < θ < ∞.


X
n
(a) Show that the maximum likelihood estimator of θ is θ^ = −(1/n) ln Xi .
i=1

By : Sun Bunra 35 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

^ = θ and thus that θ^ is an unbiased estimator of θ.


(b) Show that E(θ)

Solution :
1X
n
(a) Show that maximum likelihood estimator of θ is θ^ = ln (Xi )
n i=1
1 1−θ
We have f (xi ; θ) = x θ , 0 < x < 1 and 0 < θ < ∞
θ
Yn
1 Y 1−θ
n
We have likelihood function; L(x; θ) = f (xi ; θ) = n xi θ
i=1
θ i=1
Then,
1−θ X
n
ln L(x; θ) = −n ln(θ) + xi
θ i=1
1 X
n
∂ n
ln L(x; θ) = − − 2 ln (xi )
∂θ θ θ i=1
1 X
n
∂ n
ln L(x; θ) = 0 ⇐⇒ − 2 ln (xi ) = 0
∂θ θ θ i=1

1X
n
Then, θ = − ln (xi )
n i=1

1 X
n
Therefore, θ^M LE = − ln (xi ) ■
n i=1
^ =θ
(b) Show that E(θ)
1X 1X
 
n n
E − ln (xi ) = − E (ln Xi )
n i=1 n i=1
Z1 Z
1 1 1−θ
We have E(ln X) = ln xf (x; θ)dx = x θ ln xdx
0 θ 0
1
By changing variable let u = ln x so du = dx
x
Z
1 0 u
Then E(u) = ue θ du = −θ (Integrating by part)
θ −∞

1X 1X
 
n n
Therefore, E − ln (xi ) = − E (ln Xi ) = θ
n i=1 n i=1

^ =θ
Therefore, E(θ) ■

By : Sun Bunra 36 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 11. Let X1 , X2 , . . . , Xn be a random sample of size n from the exponential distribution
whose pdf is f (x; θ) = (1/θ)e−x/θ , 0 < x < ∞, 0 < θ < ∞.
(a) Show that X̄ is an unbiased estimator of θ.
(b) Show that the variance of X̄ is θ2 /n.
(c) What is a good estimate of θ if a random sample of size 5 yielded the sample values 3.5, 8.1, 0.9, 4.4,
and 0.5 ?

Solution :

(a) Show that X̄ is an unbiased estimator of θ.


1X
n
We have E(X̄) = E (Xi ) and since X1 , . . . , Xn ∼ Exp(θ)
n i=1

Then, E(X̄) = E(X) where E(X) = θ


Therefore, It is unbiased estimator of θ ■
θ2
(b) Show that variance of X̄ is
n
1 X
n
We have V (X̄) = 2 V (Xi )
n i=1
Since X1 , . . . , Xn ∼ Exp(θ)
1 θ2
Then, V (X̄) = V (X) =
n n
θ2
Therefore, variance of X̄ is ■
n
(c) What is a good estimate of θ if a random sample of size 5 yielded the sample value 3.5, 8.1, 0.9, 4.4
and 0.5.
By Cramer-Rao inequality, a good estimate must be satisfied V (θ) ^ = 1
nI(θ)
∂2
 
and I(θ) = −E ln f (x; ; θ)
∂θ2
We have
1 −x
 
ln f (x; θ) = ln eθ
θ
Then
∂ 1 x
ln f (x; θ) = − + 2
∂θ θ θ
Then
∂2 1 2x
2
ln f (x; θ) = 2 − 3
∂θ θ θ
So,
1 2x 1
 
I(θ) = −E 2
− 3 =
θ θ θ2
2
^ =θ
Then, V (θ)
n
Conclusion, a good estimate of θ is X̄ using the given data we get x̄ = 3.48
Therefore, a good estimate is 3.48 ■

By : Sun Bunra 37 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 12. A diagnostic test for a certain disense is applied to n individuals known to not have
the disease. Let X = the number among the n test results that are positive (indicating presence
of the disease, so X is the number of false positives) and p = the probability that a disease-free
individual’s test result is positive (i.e., p is the true proportion of test results from disease-free
individuals that are positive). Assume that only X is available rather than the actual sequence of
test results.
(a) Derive the maximum likelihood estimator of p. If n = 20 and x = 3, what is the estimate?
(b) Is the estimator of part (a) unbiased?
(c) If n = 20 and x = 3, what is the MLE of the probability (1 − p)5 that none of the next five
tests done on disease free individuals are positive?

Solution :

Let X = the number among the n test results that are positive.
p = the probability that a disease-free individual’s test result is positive.
(a) Derive the maximum likelihood estimator of p,If n = 20 and x = 3 what is the estimate?
We have X ∼ Bin(n, p)
So, P (X = x) = C(n, x)px (1 − p)n−x
we have a likelihood function, L(x; p) = C(n, x)px (1 − p)n−x
Then,
ln L(x; θ) = ln (C(n, x)px (1 − p)x )
= ln C(n; x) + ln (px ) + ln(1 − p)n−x
∂ x n−x
So, ln L(x; θ) = −
∂p p 1−p
∂ x n−x x
• If ln L(x; θ) = 0 ⇐⇒ − = 0 then p^ =
∂p p 1−p n
x
Therefore, p^ = ■
n
3
using the given data we get p^ = = 0.15
20
(b) In the estimate of part(a) is unbiased?
x X 1
 
we have p^ = then, E p^ = = E(X) = p
n n n
Therefore, It is unbiased ■.
(c) what is mle of the probability (1 − p)5
p = 0.15 Thus, (1 − p^)5 = (1 − 0.15)5 = 0.855 = 0.443
For n = 20 and x = 3^
Therefore, (1 − p^)5 = 0.443 ■

Exercise 13. The shear strength of each of ten test spot welds is determined, yielding the following
data (psi):
392 376 401 367 389 362 409 415 358 375
(a) Assuming that shear strength is normally distributed, estimate the true average shear strength
and standard deviation of shear strength using the method of maximum likelihood.
(b) Again assuming a normal distribution, estimate the strength value below which 95% of all

By : Sun Bunra 38 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

welds will have their strengths. [Hint: What is the 95 th percentile in terms of µ and σ ? Now
use the invariance principle.]

Solution :

(a) Assuming that shear strength is normally distributed, estimate the true average shear strength
and standard deviation of shear strength using the method of maximum likelihood.
Since X ∼ N (µ; σ 2 )
P10
i=1 xi
Then, by previous exercise we get µ
^ = X̄ = = 384.4
10
and for X1 , . . . , Xn ∼ N (µ, σ 2 );
we have Pn 2
n 1(xi −µ)
2 −n −
L(x; σ) = (2πσ ) 2 ×e 2σ 2

1 X
n
n
ln L(x; σ) = × ln (2πσ 2 ) − 2 (xi − µ)2
2 2σ i=1
1 X
n
∂ n 2
L(x; σ) = − × + 3 (xi − µ)2
∂σ 2 σ σ i=1

1 X
n
∂ n
• If , L(x; σ) = 0 ⇐⇒ − + 3 (xi − µ)2 = 0
∂σ σ σ i=1
v
u X
u1 n
Thus, σ
^=t (xi − µ)2
n i=1
v
u X
u1 n
Therefore, we have µ
^ = X̄ = 384.4 and σ
^=t (xi − µ)2 = 3556.4 ■
n i=1

(b) Again assuming a normal distribution, estimate the strength value below which 95% of all
welds will have their strengths. [Hint: What is the 95 th percentile in terms of µ and σ ? Now
use the invariance principle.]
We have P (X ≤ c) = 0.95
Z −µ c−µ
 
Since, P ≤ = 0.95
σ σ
c−µ
 
So, ϕ = 0.95
σ
Then, c^ = 1.65^
σ+µ
^ (by invariance principle)
Therefore, estimate of strength is c^ = 6252.46 ■
Exercise 14. At time t = 0, 20 identical components are tested. The lifetime distribution of each is
exponential with parameter λ. The experimenter then leaves the test facility unmonitored. On his
return 24 hours later, the experimenter immediately terminates the test after noticing that y = 15
of the 20 components are still in operation (so 5 have failed). Derive the MLE of λ. [Hint: Let
Y = the number that survive 24 hours. Then Y ∼ Bin(n, p). What is the mle of p ? Now notice
that p = P (Xi ≥ 24), where Xi is exponentially distributed. This relates λ to p, so the former can
be estimated once the latter has been.]

Solution :

Let Ti be the life time of component i th, Ti ∼ Exp(λ)

By : Sun Bunra 39 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Derive mle of λ
Let Y = the number that survive 24 hours. Y ∼ Bin(n; p)
24
and we have p = P (Ti ≥ 24) = e λ , since Y ∼ Bin(n; p)
Then,
p(y) = C(n, y)py (1 − p)n−y

where
ln L(y, p) = ln C(n, y)py (1 − p)n−y


= ln C(n, y) + y ln p + (n − y) ln(1 − p)
• If

ln L(y, p) = 0
∂p
y 15 ^ = 24
Then, p^ = = = 0.75 and λ
n 20 ln 0.75
^ = 24
Therefore, λ ■
ln 0.75

Exercise 15. Let X1 , X2 , . . . , Xn be a random sample from Bin(1, p) (i.e., n Bernoulli trials).
Thus,
Xn
Y = Xi ∼ Bin(n, p)
i=1

(a) Show that X̄ = Y /n is an unbiased estimator of p.


(b) Show that Var(X̄) = p(1 − p)/n.
(c) Show that E[X̄(1 − X̄)/n] = (n − 1) [p(1 − p)/n2 ].
(d) Find the value of c so that cX̄(1 − X̄) is an unbiased estimator of Var(X̄) = p(1 − p)/n

Solution :

(a) Show that X̄ = Y /n is an unbiased estimator of p.

1X
n
Y 1
 
E(X̄) = E = E (Xi ) = np × = p ■
n n i=1 n

Note that E (Xi ) = p because X is Bernoulli Distributed


p(1 − p)
(b) Show that Var(X̄) =
n

1 X
n
1 p(1 − p)
V (X̄) = 2 V (Xi ) = 2 × np(1 − p) =
n i=1 n n

p(1 − p)
Therefore, Var(X̄) = ■
n
(c) Show that E[X̄(1 − X̄)/n] = (n − 1) [p(1 − p)/n2 ]
1 1  
We have E[X̄(1 − X̄)/n] = E(X̄) − E X̄ 2
n n
  p(1 − p) − np2
By previous question, E(X̄) = p and E X̄ 2 = Var(X̄) − E 2 (X̄) =
n

By : Sun Bunra 40 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

p p2 p(1 − p)
Then, E[X̄(1 − X̄)/n] = − −
n n n2
E[X̄(1 − X̄)/n] = (n − 1)p(1 − p)/n2

Therefore, E[X̄(1 − X̄)/n] = (n − 1) [p(1 − p)/n2 ] ■


(d) Find the value c
1
by using the (c) question, we obtained c =
n−1
1
Therefore, c = ■
n−1
Exercise 16. Assume that the number of defects in a car has a Poisson distribution with parameter
λ. To estimate λ we obtain the random sample X1 , X2 , . . . , Xn .
(a) Find the Fisher information in a single observation using two methods.
(b) Find the Cramer-Rao lower bound for the variance of an unbiased estimator of λ.
(c) Find the MLE of λ and show that the MLE is an efficient estimator.

Solution :

(a) Find the Fisher information in a single observation using two methods.
• First method.
∂2
 
I(λ) = −E ln f (X; λ)
∂λ2
eλ λx
f (x; λ) =
x!
∂2 x
Then ln f (x; λ) = −λ + x ln λ − ln x! =⇒ ln f (x; λ) = − 2
∂λ2 λ
1 1
So, I(λ) = 2
E(X) =
λ λ
1
Therefore, I(λ) = ■
λ
• Second method
∂ X 1
   
I(λ) = V ln f (X; λ) = V −1 + =
∂λ λ λ
1
Therefore, I(λ) = ■
λ
(b) Find the Cramer-Rao lower bound for the variance of an unbiased estimator of λ.
1 λ
The lower bound is = ■
nI(λ) n
(c) Find the mle of λ and show that the mle is an efficient estimator.
By using method of moment,
we have 1st sample moment is E(X) = X̄ (1)
and 1 st population moment is E(X) = λ (2)
^ = X̄
By (1) and (2); we get λ
1 X
n
λ
and we have V (X̄) = 2
V (Xi ) = which is equal to the lower bound of Cramer-Rao inequal-
n i=1 n
ity.

By : Sun Bunra 41 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Thus, it is an efficient estimator.


^ = X̄
Therefore, The efficient estimator is λ ■
Exercise 17. Suppose the waiting time for a bus is uniformly distributed on [0, θ] and the results
x1 , . . . , xn of a random sample from this distribution have been observed.
(a) Find the MLE θ^ of θ.
n+1^
(b) Letting θ̄ = θ, show that θ̄ is unbiased and find its variance.
n
(c) Find the Cramer-Rao lower bound for the variance of an unbiased estimator of θ.

Solution :

(a) Find the mle of θ^ of θ


1
We have X1 , . . . Xn ∼ U [0, θ] and the likelihood function L(x; θ) = for 0 < xi < θ
θn
In order to maximize the likelihood function we choose θ^ = max (xi )
Therefore, The mle of θ^ = max (xi ) ■
n+1^
(b) Letting θ̃ = θ, show that θ̃ is unbiased and find its variance.
n
• Find cdf of Y .
We have P (θ^ < y) = P (max (Xi ) < y) and since X1 , . . . , Xn are independent.
We obtained, P (θ^ < y) = [P (X < y)]n
y
Since X is uniformly distributed, so P (X < y) = , for 0 ≤ y ≤ θ
θ
 n
y
Therefore, P (θ^ < y) = ■
θ
so its pdf is given by  n−1
n y , for 0 < y < θ
fθ^(y) = θn
0, otherwise.

^ = ny n n
Since, E(θ) n
dy = θ
0 θ n+1
n+1 ^
Then, E(θ) = θ
n
n+1^
Therefore, θ̃ = θ is unbiased estimator of θ ■.
n
• Find its variance.

n+1^ (n + 1)2 ^
 
V (θ̃) = V θ = V (θ)
n n2
 
^ = E θ^2 − E 2 (θ)
V (θ) ^

  ny n+1 n 2
and E θ^2 = dy = θ
0 θn n+2
n n2
 
^ =
Thus, V (θ) − θ2
n + 2 (n + 1)2
(n + 1)2 n n2 1
 
so, V (θ̃) = − θ2 = θ2 ■
n2 n + 2 (n + 1)2 n(n + 2)

By : Sun Bunra 42 / 43 ID e20201001


Institute of Technology of Cambodia Statistics ( 2022-2023 )

(c) Find the Cramer-Rao lower bound for the variance of an unbiased estimator of θ.
∂2
 
we have, I(θ) = −E ln f (x; θ)
∂θ2
1
Then, f (x; θ) = =⇒ ln f (x; θ) = − ln θ
θ
∂2 1 1
and 2
ln f (x; θ) = 2 so, I(θ) = − 2
∂θ θ θ
θ 2
Therefore, The lower bound is − ■
n
Exercise 18. An estimator θ^ is said to be consistent if for any ϵ > 0, P (|θ^ − θ| ≥ ϵ) → 0 as n → ∞.
That is, θ^ is consistent if, as the sample size gets larger, it is less and less likely that θ^ will be
further than ϵ from the true value of θ. Show that X̄ is a consistent estimator of µ when σ 2 < ∞
by using Chebyshev’s inequality.
Hint: (Chebyshev’s inequality) Let X be a random variable with finite expected value µ and finite
non-zero variance σ 2 . Then for any real number k > 0,
1
P (|X − µ| ≥ kσ) ≤ .
k2
Solution :

We will show that for ϵ > 0, P (|X̄ − µ| ≥ ϵ) → 0, as n → ∞


By using Chebyshev’s Inequality:
1
P (|X̄ − µ| ≥ ϵ) ≤
(ϵ/σ ′ )2

and since σ 2 < ∞, σ 2 = nσ ′2 . so,


σ2
P (|X̄ − µ| ≥ ϵ) ≤
nϵ2
as n → ∞ we have P (|X̄ − µ| ≥ ϵ) → 0, as n → ∞
Therefore, X̄ is a consistently estimator of µ ■

By : Sun Bunra 43 / 43 ID e20201001

You might also like