You are on page 1of 15

PROBABILITY AND STATISTICS

LECTURE 2
DESCRIPTIVE STATISTICS II

Adapted from http://www.prenhall.com/mcclave

OUTLINE

1. Graphical methods for describing 2 variables


2. Numerical methods for describing data:
- Measures of central tendancy
- Measures of variation
- Measures of relative standing

DESCRIPTIVE STATISTICS

Descriptive Statistics

Graphical Numerical
Methods Methods

Now, we will continue to explore graphical methods


GRAPHICAL METHODS FOR 2
CATEGORICAL VARIABLES

2 categorical
variables

Tabulating Data Graphing Data

Crosstabulation
table Clustered Bar Stacked
Chart bar chart

CROSSTABULATION TABLE

 3 x 3 Crosstabulation Table for Investment Choices by


Investor (values in $1000’s)

Investment Investor A Investor B Investor C Total


Category

Stocks 46 55 27 128
Bonds 32 44 19 95
Cash 15 20 33 68

Total 93 119 79 291

Q: Identify the number and type of variables?

CLUSTERED HORIZONTAL BAR CHART

Note that in this case, we can alternatively use a


vertical chart
STACKED BAR CHART

What is the variable “Investment category” represented by?

SMALL DISCUSSION: WHAT IS BAD ABOUT THE


FOLLOWING GRAPH?

Note that for the


graphs presented
in lectures, titles
and axis labels are
sometimes omitted
to save space. You
should be aware
that this is not a
good practice.

ACTIVITY

Exploring graphs via


http://www.r-graph-gallery.com/all-graphs/
GRAPHICAL METHODS FOR 2
NUMERICAL VARIABLES

2 numerical
variables

Scatter plot

SCATTER PLOT

used for paired observations taken from two


numerical variables
2 axes:
 dependent variable
 independent variable

EXAMPLE
Average SAT scores by
state: 1998
Verbal Math
Alabama 562 558
Alaska 521 520
Arizona 525 528
Arkansas 568 555
California 497 516
Colorado 537 542
Connecticu
t 510 509
Delaware 501 493
D.C. 488 476
Florida 500 501
Georgia 486 482
Hawaii 483 513
ANALYZING SCATTER PLOT

Look for:
 Overall pattern
Form
Direction
Strength

 possible clusters/groups
 possible outliers

Q: What is an outlier?

DESCRIPTIVE STATISTICS

Descriptive Statistics

Graphical Numerical
Methods Methods

We will now explore numerical methods

NUMERICAL METHODS TO DESCRIBE AND


SUMMARIZE DATA

Describing Data Numerically

Relative standing
Central Tendency Variation
Percentile
Arithmetic Mean Range
Median Interquartile Range
Mode Variance

Standard Deviation
MEASURES OF CENTRAL
TENDENCY

ARITHMETIC MEAN

MEAN EXAMPLE
Raw Data:10.3 4.9 8.9 11.7 6.3 7.7

How would the formula change if the data


represent a population?
MEDIAN

1. Measure of Central Tendency


2. Middle Value In Ordered Sequence
 If Odd n, Middle Value of Sequence
 If Even n, Average of 2 Middle Values

3. Position of Median (for sample)

4. Not Affected by Extreme Values

MEDIAN EXAMPLE
ODD-SIZED SAMPLE
 Raw Data: 24.1 22.6 21.5 23.7 22.6
 Ordered: 21.5 22.6 22.6 23.7 24.1
 Position: 1 2 3 4 5

MEDIAN EXAMPLE
EVEN-SIZED SAMPLE

 Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7


 Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
 Position: 1 2 3 4 5 6
MODE

 1. Value That Occurs Most Often


 2. Not Affected by Extreme Values
 3. May Be No Mode or Several Modes
 4. May Be Used for Numerical & Categorical
Data

MODE EXAMPLE

No Mode
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
One Mode
Raw Data: 6.3 4.9 8.9 6.3 4.9 4.9
More Than 1 Mode
Raw Data: 21 28 28 41 43 43

Q: can you give an example of mode for categorical


data?

DATA TYPE CONSIDERATIONS

 The mean, median, and mode are appropriate for


which type of data?
MEASURES OF VARIATION
(OR VARIABILITY)

RANGE

1. Difference Between Largest & Smallest Observations


2. Simple to compute and interpret
3. Affected by outliers
4. Ignores How Data Are Distributed

7 8 9 10 7 8 9 10

VARIANCE &
STANDARD DEVIATION

1. Most Common Measures


2. Involve all values in sample (or population)
3. Show Variation About Mean (X or )
4. Affected by outliers

X = 8.3

4 6 8 10 12
Variance

STANDARD DEVIATION

EXAMPLE
6 8 10 12 14 9 11 7 13 11
Calculate the sample variance and standard
deviation
VARIANCE AND SD

CHEBYSHEV’S RULE

EXAMPLE

EXAMPLE

EMPIRICAL RULE

EXAMPLE
Consider a very large number of students
taking a college entrance exam such as the
SAT.
Suppose that the distribution of SAT score is
bell-shaped, the mean score on the mathematics
section of the SAT is 550 with a standard
deviation of 50.
Measures of relative standing

PERCENTILE

• Indicates position of a value relative to entire data set (it


is a measure of relative standing)
• Generally used for large data sets
• Example: an IQ score at the 90th percentile

 Question: A oil company’s sales are in the
80th percentile of all companies in the industry.
What does it mean?

QUARTILES

 Split Ordered Data into 4 Quarters

25% 25% 25% 25%


Q1 Q2 Q3
INTERQUARTILE RANGE

BOX PLOT

• How to construct
• How to represent outliers
• Use a boxplot to assess and compare the
shape, central tendency, and variability of
distributions and to look for potential
outliers.
• Sample size: n at least 20

SHAPE & BOX PLOT

Source: https://www.slideshare.net/mido02/chap-3gbu
CONCLUSION

1. Graphical methods for describing 2


variables
2. Numerical methods for describing
data:
- Measures of central tendency
- Measures of variation
- Measures of relative standing

You might also like