Professional Documents
Culture Documents
Chapter 2
Chapter 2
DESCRIPTIVE STATISTICS
1/30
Outline
2/30
In a study, only certain characteristics of the objects in a population
are of interest. A characteristic can be
I size of pizza (S, M, L)
I opinion to a product (like very much, somewhat like, neutral,
somewhat do not like, dislike)
I number of flaws on the surface of each casting
I thickness of each capsule wall
3/30
Basic Concepts
4/30
Basic Concepts
4/30
Basic Concepts
5/30
Example Revisit
6/30
Outline
7/30
Measures
1. Frequency Distribution
is a listing of the distinct values (classes) and their frequencies
(counts)
2. Relative Frequency Distribution
is a listing of the distinct values and their relative frequencies
(percentages).
Hence
Frequency = fi ,
fi
Relative Frequency = ,
n
where
fi : frequency for the ith value
n : total number of observations
8/30
Example
Students in an introductory stats class were asked for their political party
affiliation: Democratic (D), Republican (R), and Other (O)
1. Pie chart
widely used for showing fractions - percent of a whole
Political Affiliation
Other
22.5% Democratic
32.5%
Republican
45.0%
10/30
Graphical Displays
2. Bar chart
Note:
I bars do not touch each other!
I The vertical axis can also be Frequency.
I The order of classes can be changed.
11/30
Outline
12/30
Measures
I Single-Value Grouping
suitable for discrete quantitative variable with few observations
I Cutpoint Grouping
suitable for continuous quantitative variable
13/30
Single-Value Grouping
Each class represents a single possible value
Example
The authors of “Behavioral Aspects of Raccoon Mating System"
monitored raccoons in Texas during the 1990-1992 mating seasons.
29 female raccoons were observed and number of male partners
were recorded:
14/30
Cutpoint Grouping
Example (Suspended solids)
The concentration of suspended solids in river waters is an
important environmental characteristic. The following observations
report concentrations in ppm for 50 different rivers
15/30
Some terms
I Class: represents values in an interval
Ex. 20 - (30): 20 inclusive up to 30 exclusive
I Class width: Difference between cutpoints of a class
Ex. 30-20=10
I Class midpoint: Average of the two cutpoints of a class
Ex. (20+30)/2=15
16/30
Group data using classes of equal width say 10 (you can also use other values)
I The minimum value 27.1 must be included in the first class, so we start with the
class 20 - (30).
30 is not included in this class!
I The maximum value 94.6 must be included in the last class, so we end with the
class 90 - (100).
100 is not included in this class!
17/30
Graphical Displays
1. Histogram
18/30
Figure 1: (Frequency) Histogram Figure 2: Relative Frequency Histogram
19/30
Distribution Shapes
20/30
Histogram vs. Bar chart
21/30
Graphical Displays
22/30
Graphical Displays
2. Ogive Plot (Cumulative Relative Distribution Plot)
I horizontal axis: class end points
I vertical axis: relative cumulative frequencies
2nd & 5th columns provide the cumulative relative distribution of the data
(ogive plot as well)
23/30
O - Give
1
0.9
24/30
Two types of questions (ogive plot)
1. Value (horizontal axis) is given, what’s the percentage below (vertical axis)?
Example (Suspended solids)
What is the percentage of the data with level of suspended solids below 62 ppm?
O - Give
1
0.9
Rekatuve Cumulative Frequency
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20 30 40 50 60 70 80 90 100
SS (ppm)
25/30
Two types of questions (ogive plot)
2. Percentage below (vertical axis) is given, what’s the value (horizontal axis)?
Example (Suspended solids)
What is the level of suspended solids that break the data into 90% of values below
(equivalently 10% above)?
O - Give
1
0.9
Rekatuve Cumulative Frequency
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20 30 40 50 60 70 80 90 100
SS (ppm)
Approximately 79 ppm
26/30
Percentiles (using Ogives)
Percentile
The pth percentile is a number (on horizontal axis) with p% of
the values below it and (100-p)% above it.
27/30
Example (Suspended solids)
28/30
Example (Suspended solids)
29/30
Population and Sample Distributions
I The distribution of a population is called the population
distribution.
I The distribution of a sample is called the sample distribution.
IMPORTANT!!
Sample distributions vary from sample to sample.
Population distribution is unique.
30/30