You are on page 1of 27

CHAPTER 1

Descriptive Statistics

1.1 Introduction
1.2 Basic concepts
1.3 Sampling schemes
1.4 Graphical representation of data
1.5 Numerical description of data
1.6 Computers and statistics
1.7 Chapter summary
1.8 Computer examples
Projects for Chapter 1
Chapter 1

Exercises 1.2

1.2.1
The suggested solutions:
For qualitative data we can have color, sex, race, Zip code and so on. For quantitative data we
can have age, temperature, time, height, weight and so on. For cross section data we can have
school funding for each department in 2000. For time series data we can have the crude oil
price from 1995 to 2008.

1.2.2
The suggested solutions:
For qualitative data we collect the frequency information of the data and we want to see the
comparison by either bar chart or pie chart.
For quantitative data we collect the numerical information of the data and we want to see the
comparison by histogram distribution.
For cross section data we collect different section data on the same time and we want to make
comparison between them.
For time series data we collect same type of data on different time spot and we want to see if
there is any trend or pattern of this data with time shifting.

1.2.3
The suggested questions can be
1. What types of data the amount is?
2. Are these Federal Agency get same amount of money? If not, why?
3. Which Federal Agency should get more money? Why?

The suggested inferences we can make is


1. These Federal Agency get different amount of money.
2. The differences of money between the Agencies are kind of big.

1.2.4
The suggested questions can be
1. Are the money same on each year or not?
2. Should we change the proportion between the Agencies or not?
3. Should we increase the total amount or not?

The suggested inferences we can make is


1. The total money tends to be the same.
2. The proportion between the Agencies tends to be the same.
Exercises 1.3

1.3.1
For stratified sample, we can say suppose we decide to sample 100 college students
from the population of 1000 (that is 10% of the population). We know these 1000 students
come from three different major, Math, Computer Science and Social Science. We have Math
200, CS 400 and SS 400 students. Then we choose 10% of each of them Math 20, CS 40 and
SS 40 by using random sampling within each major.
For cluster sample, we can say suppose we decide to sample some college students
from the population of 2000. We know these 2000 students come from 20 different countries
and we choose 3 out of the 20 countries by random sampling. Then we get all the individual
information from each of the 3 countries.

1.3.2
Answer will vary and try to cover all possible errors we studied in this chapter.

Exercises 1.4

1.4.1

By minitab

(a) bar graph


(b) pie chart

1.4.2
(a) Bar graph

(b) Pareto graph


(3) Pie chart

1.4.3
(a) bar graph

(b) Pareto chart


(c) Pie chart

1.4.4
(a)Bar graph

(b) Pareto graph


1.4.5

(a) bar graph

(b) Pie chart

1.4.6

(a) pie chart


(b) bar graph

1.4.7

(a) Pie chart

(b) bar graph


1.4.8

(a) Bar chart

(b) Pareto chart

1.4.9

Bar chart
1.4.10

1.4.11

(a) bar graph

(b) Pareto graph


1.4.12
(a)
Expenditure
Bar graph

Revenues
Bar graph
(b)
Expenditure
Pie chart

Revenues
Pie chart

1.4.13

1.4.14
(a) Stem and leaf
Stem-and-Leaf Display: C1
Stem-and-leaf of C1 N = 40
Leaf Unit = 1.0
2 0 00
12 0 2222223333
13 0 5
20 0 6666677
20 0 888899
14 1 111
11 1 223333
5 1 55
3 1 677

(b) Histogram

(c) Pie chart

1.4.15
( a ) Stem and leaf
Stem-and-leaf of C1 N = 20
Leaf Unit = 10

1 4 7
3 4 99
8 5 00011
10 5 22
10 5 4455
6 5 6667
2 5 9
1 6 0

( b ) Histogram

(c) Pie chart

1.4.16
(a)
Frequency table

Interval Frequency Relative Freq Percentage


5-9 1 .04 4
10-14 3 .12 12
15-19 5 .2 20
20-24 10 .4 40
25-29 5 .2 20
30-35 1 .04 4

(b)
Histogram

1.4.17
Exercises 1.5

1.5.1
Mean is 165.6667 and standard deviation is 63.15397

1.5.2

(a) Mean is 7.013889 and standard deviation is 63.15397


(b) Lower quantile is 6.625, median is 7.375 and upper quantiles is 7.5625. The inter quantile
range is 0.9375. The lower limit of outliers is 5.21875 and upper limit of outliers is 8.96875.
Therefore there are no outliers.

1.5.3
Required Data is 3,3,5,13 with mean 6, median 4 and mode 3. The standard deviation is
4.760952

1.5.4
(a) Mean is 1243.5, Variance is 792365.8 and Range is 2621.

(b) Lower Quantiles is 532.5, Median is 1083.5, Upper Quantiles is 1814.25 and Inter
Quantile Range is 1281.75. The lower limit of outliers is -1390.125 and upper limit of outliers
is 3736.875. Therefore, there are no outliers.

(c)
2500
2000
1500
1000
500

1.5.5

(a) Lower Quantiles is 80, Median is 95, Upper Quantiles is 115 and Inter Quantile Range is
35. The lower limit of outliers is 27.5 and upper limit of outliers is 167.5.

(b) The box plot is


120
100
80
60
40

(c) Therefore there are no outliers.

1.5.6
Mean is 11.8
N=50
Sample Variance is 34.653
Sample Standard Deviation is 5.887

1.5.7
(a)

(b)

= 5(2-11.8) +14(7-11.8) +15(12-11.8) +10(17-11.8) +6(22-11.8) = 0

1.5.8
1.5.9
(a) Mean is 33.105, Variance is 177.0430 and Range is 48.19.
(b) Lower Quantile is 24.9225, Median is 32 and Upper Quantiles is 42.985. The Inter
Quantile Range is 18.0625. The lower limit of outliers is -2.17125 and upper limit of outliers
is 70.07875. Therefore, there are no outliers.

(c)
50
40
30
20
10

(d)
Histogram of y

8
6
Frequency

4
2
0

0 10 20 30 40 50 60

=33.10 and standard deviation is 13.3058.


=[19.7942,46.4058] there are 21 data points which is 65 percentage.
=[6.4884,59.7116] there are 32 data points which is 100 percentage.
=[-6.8174,73.0174] there are 32 data points which is 100 percentage.
There are some different from the empirical rule (ER),
1 standard deviation is 60% and ER is 68%
2 standard deviations is 100% and ER is 95%

1.5.10
(a) Mean is 8.34, Variance is 24.21477 and Range is 16.7.
(b) Lower quantile is 3.65, Median is 8.1 and Upper Quantiles is 12.55. The Inter Quantile
Range is 8.9. The lower limit of outliers is -9.7 and upper limit of outliers is 25.9. Therefore,
there are no outliers.

(c)
15
10
5
0

(d)

Histogram of y
10
8
Frequency

6
4
2
0

0 5 10 15

(e)

and standard deviation is 4.92

there are 24 data points which is 60 percentage.

there are 40 data points which is 100 percentage.


there are 40 data points which is 100 percentage.

There are some different from the empirical rule (ER),


1 standard deviation is 60% and ER is 68%
2 standard deviations is 100% and ER is 95%

1.5.11

(a) Mean is 110, Standard Deviation is 83.4847.

(b) By assuming bell-shaped distribution, from empirical rule we can say that

-approximately 68% of the data are in the interval

-approximately 95% of the data are in the interval

-approximately 99.7% of the data are in the interval

1.5.12

(a) Lower Quantile is 39, Median is 41 and Upper Quantiles is 46. The Inter Quantile Range
is 7. Mean is 41.8, Standard Deviation is 11.30192

(b)

(c)
60
50
40
30
20

(d)
The lower limit of outliers is 28.5 and upper limit of outliers is 56.5. Therefore, we have two
outliers 18 and 60.

1.5.13
(a) Mean is 3.7433, Variance is 3.501 and Standard Deviation is 1.871323.
(b) Frequency table

Class Interval Frequency mi Mifi


1 0-1.6 4 .8 3.2
2 1.7-3.3 10 2.5 25
3 3.4-5 9 4.2 37.8
4 5.1-6.7 5 5.9 29.5
5 6.8-8.4 2 7.6 15.2

(c)
By grouped data, Mean is 3.69, Variance is 3.62 and Standard Deviation is 1.9.
The results are similar to the none grouped data.

1.5.14
(a) Mean is 60.47, Variance is 685.0851 and Standard Deviation is 26.17413.
(b)

SN Class Frequency
1 0-19 1
2 20-39 6
3 40-59 8
4 60-79 5
5 80-99 10

(c)
Class Interval fi mi

0-19 [0, 20) 1 10 10 100


20-39 [20, 40) 6 30 180 5400
40-59 [40,60) 8 50 400 20000
60-79 [60,80) 5 70 350 24500
80-99 [80,100) 10 90 900 81000
n = 30
=1840 =131000

The sample mean is :

= =61.333

The sample variance is:


=625.7471

The sample standard deviation is :

=25.0149

1.5.15
Class fi Cumulative fi Cumulative fi/n
10-14 895 895 0.0174
15-19 55,373 56,268 0.1093
20-24 122,591 178,859 0.3475
25-29 139,615 318,474 0.6188
30-34 127,502 445,976 0.8665
35-39 68,685 514,661 1.0000

The median is:

=27.248

1.5.16
(a) Mean is 177.5, Variance is 134.694 and Standard Deviation is 11.6058.
(b)

.
1.5.17
(a) Mean is 44.27, Variance is 536.15 and Standard Deviation is 23.15.
(b)

Exercises 1.8

1.8.1
(a)
Histogram of y
20
15
Frequency

10
5
0

66 68 70 72 74 76 78 80

(b) Mean is 74.0625, median is 74, variance is 7.223892 and standard deviation is 2.68773.
(c)
80
78
76
74
72
70
68
66

The lower limit of outliers is 66 and upper limit of outliers is 82. Therefore we have no
outlier.

1.8.2

Histogram of y
10
8
Frequency

6
4
2
0

0 10 20 30 40 50

y
(a)
(b) Mean is 20.16667, median is 18, variance is 125.7299 and standard deviation is 11.21293.

(c)
40
30
20
10
0

The lower limit of outliers is -16.25 and upper limit of outliers is 57.75. Therefore we have no
outlier.

You might also like