You are on page 1of 31

STA 308: Chapter 2

DESCRIPTIVE STATISTICS

1/30
Outline

Variable and Data

Techniques for Qualitative Variables

Techniques for Quantitative Variables

2/30
In a study, only certain characteristics of the objects in a population
are of interest. A characteristic can be
I size of pizza (S, M, L)
I opinion to a product (like very much, somewhat like, neutral,
somewhat do not like, dislike)
I number of flaws on the surface of each casting
I thickness of each capsule wall

3/30
Basic Concepts

I Variable: A characteristic whose value may change from one object


to another.
I Quantitative (Numerical)
I Qualitative (Categorical)

I Data: Information obtained by observing values of a variable

The key difference between quantitative and qualitative variables:


quantitative data has numerical meaning and you can apply basic
operations (+, -, ×, ...) to them!

Q: How about zip code?

4/30
Basic Concepts

I Variable: A characteristic whose value may change from one object


to another.
I Quantitative (Numerical)
I Qualitative (Categorical)

I Data: Information obtained by observing values of a variable

The key difference between quantitative and qualitative variables:


quantitative data has numerical meaning and you can apply basic
operations (+, -, ×, ...) to them!

Q: How about zip code?


Qualitative! It does not have numerical meaning (adding two zip codes is
not meaningful).

4/30
Basic Concepts

Quantitative Variables can be further classified into two types


I Discrete whose possible values can be listed (counted) even though
the list may continue infinitely
I Continuous whose possible values form some interval of numbers

The statistical technique employed depends on the type of data!

5/30
Example Revisit

I size of pizza (S, M, L)


Qualitative
I opinion to a product (like very much, somewhat like, neutral,
somewhat do not like, dislike)
Qualitative
I number of flaws on the surface of each casting
Quantitative, Discrete
I thickness of each capsule wall
Quantitative, Continuous

6/30
Outline

Variable and Data

Techniques for Qualitative Variables

Techniques for Quantitative Variables

7/30
Measures

1. Frequency Distribution
is a listing of the distinct values (classes) and their frequencies
(counts)
2. Relative Frequency Distribution
is a listing of the distinct values and their relative frequencies
(percentages).

Hence
Frequency = fi ,
fi
Relative Frequency = ,
n
where
fi : frequency for the ith value
n : total number of observations
8/30
Example
Students in an introductory stats class were asked for their political party
affiliation: Democratic (D), Republican (R), and Other (O)

I 1st & 2nd columns provide a frequency distribution of the data


I 1st & 3rd columns provide a relative frequency distribution of the data
9/30
Graphical Displays

1. Pie chart
widely used for showing fractions - percent of a whole
Political Affiliation

Other
22.5% Democratic
32.5%

Republican
45.0%

10/30
Graphical Displays
2. Bar chart

Note:
I bars do not touch each other!
I The vertical axis can also be Frequency.
I The order of classes can be changed.
11/30
Outline

Variable and Data

Techniques for Qualitative Variables

Techniques for Quantitative Variables

12/30
Measures

Some measures used for qualitative variables can also be applied to


quantitative variables after additional effort (grouping)

I Single-Value Grouping
suitable for discrete quantitative variable with few observations
I Cutpoint Grouping
suitable for continuous quantitative variable

13/30
Single-Value Grouping
Each class represents a single possible value
Example
The authors of “Behavioral Aspects of Raccoon Mating System"
monitored raccoons in Texas during the 1990-1992 mating seasons.
29 female raccoons were observed and number of male partners
were recorded:

14/30
Cutpoint Grouping
Example (Suspended solids)
The concentration of suspended solids in river waters is an
important environmental characteristic. The following observations
report concentrations in ppm for 50 different rivers

15/30
Some terms
I Class: represents values in an interval
Ex. 20 - (30): 20 inclusive up to 30 exclusive
I Class width: Difference between cutpoints of a class
Ex. 30-20=10
I Class midpoint: Average of the two cutpoints of a class
Ex. (20+30)/2=15

Guidelines for grouping data


1. Number of classes small enough to provide effective summary &
large enough to display relevant characteristics
2. Each observation must belong to one and only one class
3. Whenever feasible, all classes should have the same width

16/30
Group data using classes of equal width say 10 (you can also use other values)
I The minimum value 27.1 must be included in the first class, so we start with the
class 20 - (30).
30 is not included in this class!
I The maximum value 94.6 must be included in the last class, so we end with the
class 90 - (100).
100 is not included in this class!

17/30
Graphical Displays

1. Histogram

18/30
Figure 1: (Frequency) Histogram Figure 2: Relative Frequency Histogram

I Shape of both graphs are the same (frequencies and relative


frequencies are proportional)
I Inspection of the histogram reveals the general pattern or
distribution of values

19/30
Distribution Shapes

I important aspect of the distribution of quantitative data


I plays a role in determining the correct method of statistical
analysis

Figure 4: left skewed Figure 5: right skewed


Figure 3: symmetric (left tailed, negative skewed) (right tailed, positive
skewed)

20/30
Histogram vs. Bar chart

Histogram Bar chart


I For quantitative data. I For qualitative data.
I No gaps between each bar! I bars do not touch each other!
I The order of classes can I The order of classes can be
NOT be changed. changed.

21/30
Graphical Displays

I Cumulative Frequency: total number of observations whose values


are less than a specified class end point
Cumulative Frequency
I Cumulative relative frequency= the total number of observations

Example (Suspended solids)


cumulative frequency of class 30 - (40)= 1 + 8=9.
relative cumulative frequency of class 30 - (40)=9/50.

22/30
Graphical Displays
2. Ogive Plot (Cumulative Relative Distribution Plot)
I horizontal axis: class end points
I vertical axis: relative cumulative frequencies

Example (Suspended solids)

2nd & 5th columns provide the cumulative relative distribution of the data
(ogive plot as well)
23/30
O - Give
1
0.9

Rekatuve Cumulative Frequency


0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20 30 40 50 60 70 80 90 100
SS (ppm)

Use the three bullets to check if your plot is correct


I The vertical axis always starts from 0! Thus, you need to create an extra first
row (the 10-(20) class in last slide)
I The vertical axis always ends at 1!
I The ogive plot is always non-decreasing!

24/30
Two types of questions (ogive plot)
1. Value (horizontal axis) is given, what’s the percentage below (vertical axis)?
Example (Suspended solids)
What is the percentage of the data with level of suspended solids below 62 ppm?

O - Give
1
0.9
Rekatuve Cumulative Frequency

0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20 30 40 50 60 70 80 90 100
SS (ppm)

Approximately 50% (0.5)

25/30
Two types of questions (ogive plot)
2. Percentage below (vertical axis) is given, what’s the value (horizontal axis)?
Example (Suspended solids)
What is the level of suspended solids that break the data into 90% of values below
(equivalently 10% above)?

O - Give
1
0.9
Rekatuve Cumulative Frequency

0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
20 30 40 50 60 70 80 90 100
SS (ppm)

Approximately 79 ppm

26/30
Percentiles (using Ogives)

Percentile
The pth percentile is a number (on horizontal axis) with p% of
the values below it and (100-p)% above it.

27/30
Example (Suspended solids)

28/30
Example (Suspended solids)

29/30
Population and Sample Distributions
I The distribution of a population is called the population
distribution.
I The distribution of a sample is called the sample distribution.

IMPORTANT!!
Sample distributions vary from sample to sample.
Population distribution is unique.
30/30

You might also like