Professional Documents
Culture Documents
1. Statistics is a
a. Method for organizing and analyzing data
b. Method for describing data
c. Method for making educated guesses based on limited data
d. Method for summarizing data
e. All of the above
4. Using the Nielsen television ratings to estimate the number of television
viewers represents
a. Descriptive statistics
b. Inferential statistics
c. Deductive statistics
d. Inductive statistics
e. A population
True/False (If false, explain why)
1. When we use the Nielsen television ratings to estimate the
number of television viewers, we are using descriptive statistics.
2. The average salary for a graduating senior majoring in finance is a
descriptive statistic.
3. Descriptive statistics can be useful in contract negotiations.
4. Inferential statistics can be used to compare two sets of data.
5. Descriptive statistics is used to make educated guesses about
unknown events.
6. Depending on the method of sampling, a sample can have more
observations than the population.
7. In a population with 10 objects, if the simple random sampling
method is used to sample 3 objects, 1000 possible samples can
be generated.
Frequency Distributions
24, 35, 17, 21, 24, 37, 26, 46, 58, 30,
32, 13, 12, 38, 41, 43, 44, 27, 53, 27
Frequency Distribution
Histogram
7 6
6 5
5 4 No gaps
Frequency
4 3 between
3 2 bars, since
2
continuous
1 0 0 data
0
5 15 25 36 45 55 More
Class Midpoints
Questions for Grouping Data
into Classes
Frequency
distribution with gaps from empty 2
1.5
classes 1
0.5
4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
More
frequency varies across classes Temperature
12
10
Frequency
n
6
2
and yield a blocky distribution 0
0 30 60 More
1
Select
Tools/Data Analysis
Histograms in Excel
(continued)
2
Choose Histogram
3
Input data and bin ranges
Stem Leaf
n 613 would become 6 1
n 776 would become 7 8
n . . .
n 1224 becomes 12 2
Bar and Pie Charts
n Height of bar or size of pie slice shows the
frequency or percentage for each
category
Pie Chart Example
Current Investment Portfolio
Investment Amount Percentage Savings
Type (in thousands $)
15%
Stocks 46.5 42.27 Stocks
Bonds 32.0 29.09 42%
CD
CD 15.5 14.09 14%
Savings 16.0 14.55
Total 110 100
Bonds Percentages
(Variables are Qualitative) are rounded to
29% the nearest
percent
Bar Chart Example
Investor's Portfolio
Savings
CD
Bonds
Stocks
0 10 20 30 40 50
Amount in $1000's
Bar Chart Example
Number o f Frequency
days read Newspaper readership per week
0 44
1 24 50
2 18 40
Freuency
3 16
30
4 20
20
5 22
6 26 10
7 30 0
Total 200 0 1 2 3 4 5 6 7
Number of days newspaper is read per week
Line Charts and
Scatter Diagrams
Inflation
Year
1985
Rate
3.56
U.S. Inflation Rate
1986 1.86 6
1987 3.65
5
Inflation Rate (%)
1988 4.14
1989 4.82 4
1990 5.40
1991 4.21 3
1992 3.01
1993 2.99 2
1994 2.56
1
1995 2.83
1996 2.95 0
1997 2.29
1984 1986 1988 1990 1992 1994 1996 1998 2000 2002
1998 1.56
1999 2.21 Year
2000 3.36
2001 2.85
2002 1.58
Scatter Diagram Example
29 146
150
33 160
38 167 100
42 170 50
50 188 0
55 195 0 10 20 30 40 50 60 70
60 200
Volume per Day
Types of Relationships
n Linear Relationships
Y Y
X X
Types of Relationships
(continued)
n Curvilinear Relationships
Y Y
X X
Types of Relationships
(continued)
n No Relationship
Y Y
X X
Describing Data Using
Numerical Measures
Summary Measures
Coefficient of
Variation
Mean (Arithmetic Average)
∑x i
x1 + x 2 + ! + x n
x= i=1
=
n n
n Population mean N = Population Size
N
∑x i
x1 + x 2 + ! + xN
i=1
µ= =
N N
Mean (Arithmetic Average)
(continued)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 + 2 + 3 + 4 + 5 15 1 + 2 + 3 + 4 + 10 20
= =3 = =4
5 5 5 5
Median
n Not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 Median = 3
n In an ordered array, the median is the “middle”
number
n If n or N is odd, the median is the middle number
n If n or N is even, the median is the average of the
two middle numbers
Mode
n A measure of central tendency
n Value that occurs most often
n Not affected by extreme values
n Used for either numerical or categorical data
n There may may be no mode
n There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
Mode = 5 No Mode
Measures of Center and Location
Overview
Center and Location
∑x ∑ wx i i
x= i=1
i
XW =
n ∑w i
N
∑ wx
∑x i µW = i i
µ= i=1
N ∑w i
Weighted Mean
$2,000,000
500,000 $500 K
300,000 $300 K
100,000
100,000
$100 K
$100 K
Summary Statistics
House Prices:
n Mean: ($3,000,000/5)
$2,000,000
= $600,000
500,000
300,000
100,000
100,000 n Median: middle value of ranked data
Sum 3,000,000
= $300,000
Mean < Median < Mode Mean = Median = Mode < Median < Mean
(Longer tail extends to left)
Mode (Longer tail extends to right)
Other Location Measures
Other Measures
of Location
Percentiles Quartiles
The pth percentile in a data array: n 1st quartile = 25th percentile
n p% are less than or equal to this
value
n 2nd quartile = 50th percentile
n (100 – p)% are greater than or = median
equal to this value
(where 0 ≤ p ≤ 100)
n 3rd quartile = 75th percentile
Percentiles
Q1 Q2 Q3
n Example: Find the first quartile
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
25
Q1 = 25th percentile, so find the (9+1) = 2.5 position
100
so use the value half way between the 2nd and 3rd values,
so Q1 = 12.5
Box and Whisker Plot
n A Graphical display of data using 5-number
summary:
Minimum -- Q1 -- Median -- Q3 -- Maximum
Example:
n A Box and Whisker plot can be shown in either vertical
or horizontal format
Distribution Shape and
Box and Whisker Plot
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
Box-and-Whisker Plot Example
0 2 3 5 27
n This data is very right skewed, as the plot depicts
Measures of Variation
Variation
Sample Sample
Variance Standard
Deviation
Variation
Same center,
different variation
Range
n Simplest measure of variation
n Difference between the largest and the smallest
observations:
Range = xmaximum – xminimum
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Interquartile Range
Example:
X Median X
minimum Q1 (Q2) Q3 maximum
Interquartile range
= 57 – 30 = 27
Variance
2
∑ (x − x)
i
s = i=1
n - 1
n Population variance: N
2
2
∑ (x − μ)
i
σ = i=1
N
Standard Deviation
126
= = 4.2426
7
Comparing Standard Deviations
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = .9258
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.57
Coefficient of Variation
n Measures relative variation
n Always in percentage (%)
n Shows variation relative to mean
n Is used to compare two or more sets of data
measured in different units
Population Sample
⎛ σ ⎞ ⎛ s ⎞
CV = ⎜ ⎟ ⋅ 100%
⎜ μ ⎟ CV = ⎜ ⎟ ⋅ 100%
⎜ x ⎟
⎝ ⎠ ⎝ ⎠
Comparing Coefficient
of Variation
n Stock A:
n Average price l ast y ear = $50
⎛ s ⎞ $5
CVA = ⎜⎜ ⎟⎟ ⋅ 100% = ⋅ 100% = 10%
⎝ x ⎠ $50 Both stocks
n Stock B: have the same
standard
n Average price last year = $100 deviation, but
stock B is less
n Standard deviation = $5 variable relative
to its price
⎛ s ⎞ $5
CVB = ⎜⎜ ⎟⎟ ⋅ 100% = ⋅ 100% = 5%
⎝ x ⎠ $100
The Empirical Rule
n If the data distribution is bell-shaped, then
the interval:
n μ ± 1σ contains about 68% of the values in
the population or the sample
X
68%
μ
μ ± 1σ
The Empirical Rule
n μ ± 2σ contains about 95% of the values in
the population or the sample
n μ ± 3σ contains about 99.7% of the values
in the population or the sample
95% 99.7%
μ ± 2σ μ ± 3σ
Standardized Data Values
x −μ
z =
σ
where:
n x = original data value
n μ = population mean
n z = standard score
x−x
z =
s
where:
n x = original data value
n x = sample mean
n z = standard score
n Click OK
Excel output
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
GROUP EXERCISE
Suppose you are given the following sales of 12
car dealers for last month.
38, 41, 67, 63, 32, 50, 58, 74, 28, 69, 43, 63
1. Compute the mean, median, and mode.
2. Compute the population variance, standard
deviation, and coefficient of variation.
GROUP EXERCISE
3. You are given the following grades from a midterm exam in
English: 95, 34, 83, 92, 94, 88, 99. Find the median and range of
the scores.
4. You are given the following two groups of numbers
1,000 2,000 3,000
and 1,000,000 2,000,000 3,000,000
Is it true that the second group of numbers has a higher
dispersion because it has a higher standard deviation?
GROUP EXERCISE
n When examining the desirability of a business
venture, we sometimes use the variance of the
profits to measure the risk of the project. Briefly
explain why the variance may not be a good
measure of business risk.
GROUP EXERCISE
Suppose you are given the following sales of 12 car dealers for last month.
38, 41, 67, 63, 32, 50, 58, 74, 28, 69, 43, 63
1. Compute the mean, median, and mode.
• mean = (38 + 41 + 67 + 63 + 32 + 50 + 58 + 74 + 28 + 69 + 43 + 63) / 12 =
52.167
• The median is the number in the middle. When the number of observations
is an even number, the median is the mean of the middle two numbers. To
find the median, we first rank the data from the smallest to the largest. The
middle two numbers in our example are 50 (5th observation) and 58 (6th
observation). Their mean is (50+58)/2=54.
• The mode is the number that occurs most frequently. Since each number
occurs only once in this problem, there is no mode. Note that the mean and
median are equal to each other in this example merely by coincidence.
GROUP EXERCISE
Suppose you are given the following sales of 12 car dealers for last month.
38, 41, 67, 63, 32, 50, 58, 74, 28, 69, 43, 63
2. Compute the population variance, standard deviation, and coefficient of
variation.
s= σ2 = 114=15.53