Professional Documents
Culture Documents
Session 1 - 3
1
10/12/2020
Statistics
The term statistics can refer to numerical facts such as averages,
medians, percents, and index numbers that help us understand a
variety of business and economic situations.
Statistics can also refer to the art and science of collecting, analyzing,
presenting, and interpreting data.
Applications in
Business and Economics
Accounting
Public accounting firms use statistical sampling procedures when conducting
audits for their clients.
Economics
Finance
Financial advisors use price-earnings ratios and dividend yields to guide their
investment advice.
Applications in
Business and Economics
Marketing
Production
A variety of statistical quality control charts are used to monitor the output of a
production process.
Information Systems
A variety of statistical information helps administrators assess the performance
of computer networks.
Data and Data Sets
Nominal Interval
Ordinal Ratio
Nominal
Example:
Students of a university are classified by the school
in which they are enrolled using a nonnumeric label
such as Business, Humanities, Education, and so on.
Alternatively, a numeric code could be used for the
school variable (e.g. 1 denotes Business, 2 denotes
Humanities, 3 denotes Education, and so on).x
Scales of Measurement
Ordinal
Example:
Students of a university are classified by their
class standing using a nonnumeric label such as
Freshman, Sophomore, Junior, or Senior.
Alternatively, a numeric code could be used for
the class standing variable (e.g. 1 denotes
Freshman, 2 denotes Sophomore, and so on).
Scales of Measurement
Interval
Example:
Melissa has an SAT score of 1985, while Kevin
has an SAT score of 1880. Melissa scored 105
points more than Kevin.
Scales of Measurement
Ratio
Data
Qualitative Quantitative
Time Requirement
Searching for information can be time consuming.
Information may no longer be useful by the time it
is available.
Cost of Acquisition
Organizations often charge for information even
when it is not their primary business activity.
Data Errors
Using any data that happen to be available or were
acquired with little care can lead to misleading
information.
Descriptive Statistics
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Tabular Summary:
Frequency and Percent Frequency
Example: Hudson Auto
Parts Percent
Cost ($) Frequency Frequency
50-59 2 4
60-69 13 26 (2/50)100
70-79 16 32
80-89 7 14
90-99 7 14
100-109 5 10
50 100
Graphical Summary: Histogram
Example: Hudson Auto
18 Tune-up Parts Cost
16
14
12
Frequency
10
8
6
4
2
Parts
50-59 60-69 70-79 80-89 90-99 100-110 Cost ($)
Numerical Descriptive
Statistics
The most common numerical descriptive statistic
is the average (or mean).
The average demonstrates a measure of the central
tendency, or central location, of the data for a variable.
Hudson’s average cost of parts, based on the 50
tune-ups studied, is $79 (found by summing the
50 cost values and then dividing by 50).
Statistical Inference
Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20
Relative Frequency Distribution
Relative Percent
Rating Frequency Frequency
Poor .10 10
Below Average .15 15
Average .25 25 .10(100) = 10
Above Average .45 45
Excellent .05 5
Total 1.00 100
1/20 = .05
Bar Chart
A bar chart is a graphical display for depicting
qualitative data.
On one axis (usually the horizontal axis), we specify
the labels that are used for each of the classes.
A frequency, relative frequency, or percent frequency
scale can be used for the other axis (usually the
vertical axis).
Using a bar of fixed width drawn above each class
label, we extend the height appropriately.
The bars are separated to emphasize the fact that
each class is a separate category.
Bar Chart
6
5
4
3
2
1
Rating
Poor Below Average Above Excellent
Average Average
Pareto Diagram
50 60 70 80 90 100 110
Cost ($)
Histogram
Another common graphical display of quantitative data
is a histogram.
10
8
6
4
2
Parts
50-59 60-69 70-79 80-89 90-99 100-110 Cost ($)
Cumulative Distributions
Cumulative Cumulative
Cumulative Relative Percent
Cost ($) Frequency Frequency Frequency
< 59 2 .04 4
< 69 15 .30 30
< 79 31 2 + 13 .62 15/50 62 .30(100)
< 89 38 .76 76
< 99 45 .90 90
< 109 50 1.00 100
Stem-and-Leaf Display
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Stem-and-Leaf Display
5 2 7
6 2 2 2 2 5 6 7 8 8 8 9 9 9
7 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9
8 0 0 2 3 5 8 9
9 1 3 7 7 7 8 9
10 1 4 5 5 9
a stem
a leaf
Stretched Stem-and-Leaf
Display
If we believe the original stem-and-leaf display has
condensed the data too much, we can stretch the display
vertically by using two stems for each leading digit(s).
Leaf Units
A single digit is used to define each leaf.
In the preceding example, the leaf unit was 1.
Leaf units may be 100, 10, 1, 0.1, and so on.
Where the leaf unit is not shown, it is assumed to
equal 1.
The leaf unit indicates how to multiply the stem-
and-leaf numbers in order to approximate the
original data.
Example: Leaf
If we have data with values such as
Unit = 0.1
8.6 11.7 9.4 9.1 10.2 11.0 8.8
Leaf Unit = 10
16 8 The 82 in 1682
17 1 9 is rounded down
18 0 3 to 80 and is
represented as an 8.
19 1 7
Summarizing Data for Two Variables
Using Tables
Total 30 20 35 15 100
Crosstabulation
Frequency
Example: Finger Lakes Homes distribution
for the
price range
variable
Price Home Style
Range Colonial Log Split A-Frame Total
< $200,000 18 6 19 12 55
> $200,000 12 14 16 3 45
Total 30 20 35 15 100
x
Scatter Diagram
A Negative Relationship
x
Scatter Diagram
No Apparent Relationship
x
Scatter Diagram
Example: Panthers Football Team
The Panthers football team is interested in
investigating the relationship, if any, between
interceptions made and points scored.
x = Number of y = Number of
Interceptions Points Scored
1 14
3 24
2 18
1 17
3 30
Scatter Diagram and Trendline
y
Number of Points Scored 35
30
25
20
15
10
5
0 x
0 1 2 3 4
Number of Interceptions
Example: Panthers Football Team
12 < $200,000
> $200,000
10
8
6
4
2
Home Style
Colonial Log Split-Level A-Frame
Stacked Bar Chart
24 < $200,000
> $200,000
20
16
12
8
4
Home Style
Colonial Log Split A-Frame
Choosing the Type of Graphical Display
85
A sample statistic is referred to
as the point estimator of the
corresponding population parameter.
Mean
Perhaps the most important measure of location is the mean.
86
Sample Mean
Number of
observations
in the sample
87
Population Mean m
Number of
observations in
the population
88
Sample Mean
89
Sample Mean
Example: Apartment Rents
x xi 34, 356
490.80
n 70
445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440
90
Weighted Mean
In some instances the mean is computed by giving each
observation a weight that reflects its relative importance.
91
Weighted Mean
If data is from
a population, Numerator:
sum of the weighted
m replaces x.
data values
x=
å wx i i
åw i
Denominator:
sum of the
where: weights
xi = value of observation i
wi = weight for observation i
92
Weighted Mean
Example: Construction Wages
Ron Butler, a home builder, is looking over the expenses he
incurred for a house he just built. For the purpose of pricing
future projects, he would like to know the average wage
($/hour) he paid the workers he employed. Listed below are
the categories of worker he employed, along with their
respective wage and total hours worked.
93
Painter 19.75 270
Plumber 24.16 160
Weighted Mean
m=
å wx i i
=
31873.7
=20.0464 » $20.05
åw i
1590
94
FYI, equally-weighted (simple) mean = $21.21
Median
The median of a data set is the value in the middle when the data
items are arranged in ascending order.
Whenever a data set has extreme values, the median is the
preferred measure of central location.
The median is the measure of location most often reported for
annual income and property value data.
A few extremely large incomes or property values can inflate the
mean.
95
Median
For an odd number of observations:
26 18 27 12 14 27 19 7 observations
12 14 18 19 26 27 27 in ascending order
Median = 19
96
Median
For an even number of observations:
26 18 27 12 14 27 30 19 8 observations
12 14 18 19 26 27 27 30 in ascending order
97
Median
Example: Apartment Rents
Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
98
Note: Data is in ascending order.
Trimmed Mean
Another measure, sometimes used when extreme values are
present, is the trimmed mean.
It is obtained by deleting a percentage of the smallest and largest
values from a data set and then computing the mean of the
remaining values.
99
Geometric Mean
The geometric mean is calculated by finding the nth root of the
product of n values.
10
Geometric Mean
10
Geometric Mean
Example: Rate of Return
Period Return (%) Growth Factor
1 -6.0 0.940
2 -8.0 0.920
3 -4.0 0.960
4 2.0 1.020
5 5.4 1.054
x g = 5 (.94)(.92)(.96)(1.02)(1.054)
1
5
=[.89254] =.97752
Average growth rate per period
is (.97752 - 1) (100) = -2.248%
10
Mode
The mode of a data set is the value that occurs with greatest
frequency.
If the data have exactly two modes, the data are bimodal.
If the data have more than two modes, the data are multimodal.
10
Mode
Example: Apartment Rents
450 occurred most frequently (7 times)
Mode = 450
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
10
Note: Data is in ascending order.
Percentiles
10
Percentiles
10
80th Percentile
Example: Apartment Rents
i = (p/100)n = (80/100)70 = 56
Averaging the 56th and 57th data values:
80th Percentile = (535 + 549)/2 = 542
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
10
575 575 580 590 600 600 600 600 615 615
Quartiles
Quartiles are specific percentiles.
First Quartile = 25th Percentile
Second Quartile = 50th Percentile = Median
Third Quartile = 75th Percentile
10
Third Quartile
Example: Apartment Rents
Third quartile = 75th percentile
i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
111
Measures of Variability
Range
Interquartile Range
Variance
Standard Deviation
Coefficient of Variation
11
Range
The range of a data set is the difference between the largest and
smallest data values.
It is the simplest measure of variability.
11
Interquartile Range
The interquartile range of a data set is the difference between the
third quartile and the first quartile.
It is the range for the middle 50% of the data.
It overcomes the sensitivity to extreme data values.
Interquartile Range
Example: Apartment Rents
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
117
Variance
2 å ( x - x )2
i s 2
=
å (x
i
- m )2
s = N
n- 1
for a for a
sample population
11
Standard Deviation
s= s 2
s = s2
For a Sample For a Population
11
Coefficient of Variation
12
Sample Variance, Standard Deviation,
And Coefficient of Variation
Example: Apartment Rents
Variance
s2 =
å ( x i - x )2
= 2,996.16
n- 1
Standard Deviation
the standard
deviation is
s = s = 2996.16 = 54.74
2
about 11%
of the mean
Coefficient of Variation
æç s ö÷ æ ç 54.74 ö
÷% = 11.15%
ç ´ 100 %
÷ ç= ´ 100
12
÷
è x ø è 490.80 ø
Distribution Shape: Skewness
An important measure of the shape of a distribution is called
skewness.
The formula for the skewness of sample data is
3
n æx - xö
Skewness =
(n - 1)(n - 2)
å ç i ÷
è s ø
12
Distribution Shape: Skewness
Symmetric (not skewed)
Skewness is zero.
Mean and median are equal.
.35
Skewness = 0
.30
Relative Frequency
.25
.20
.15
.10
.05
12
0
Distribution Shape: Skewness
Moderately Skewed Left
Skewness is negative.
Mean will usually be less than the median.
.35
Skewness = - .31
.30
Relative Frequency
.25
.20
.15
.10
.05
12
0
Distribution Shape: Skewness
Moderately Skewed Right
Skewness is positive.
Mean will usually be more than the median.
.35
Skewness = .31
.30
Relative Frequency
.25
.20
.15
.10
.05
12
0
Distribution Shape: Skewness
Highly Skewed Right
Skewness is positive (often above 1.0).
Mean will usually be more than the median.
.35
Skewness = 1.25
.30
Relative Frequency
.25
.20
.15
.10
.05
12
0
Distribution Shape: Skewness
Example: Apartment Rents
Seventy efficiency apartments were randomly sampled in
a college town. The monthly rent prices for the
apartments are listed below in ascending order.
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
12
Distribution Shape: Skewness
Example: Apartment Rents
.25
.20
.15
.10
.05
12
0
Five-Number Summaries
and Box Plots
12
Five-Number Summary
1 Smallest Value
2 First Quartile
3 Median
4 Third Quartile
5 Largest Value
13
Five-Number Summary
Example: Apartment Rents
Lowest Value = 425 First Quartile = 445
Median = 475
Third Quartile = 525 Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
13
Box Plot
13
Box Plot
Example: Apartment Rents
A box is drawn with its ends located at the first and third quartiles
400 425 450 475 500 525 550 575 600 625
13
Q1 = 445 Q3 = 525
Q2 = 475
Box Plot
Limits are located (not drawn) using the interquartile range
(IQR).
Data outside these limits are considered outliers.
The locations of each outlier is shown with the symbol * .
13
Box Plot
Example: Apartment Rents
Whiskers (dashed lines) are drawn from the ends of the box
to the smallest and largest data values inside the limits.
400 425 450 475 500 525 550 575 600 625
13
Smallest value Largest value
inside limits = 425 inside limits = 615
Measures of Association
Between Two Variables
Thus far we have examined numerical methods used to
summarize the data for one variable at a time.
13
Covariance
13
Covariance
The covariance is computed as follows:
å ( xi - x )( yi - y ) for
sxy = samples
n- 1
s =
å (xi - m x )( yi - m y ) for
xy
N populations
13
Correlation Coefficient
13
Correlation Coefficient
The correlation coefficient is computed as follows:
sxy s xy
rxy = r xy =
sx s y s xs y
for for
samples populations
14
Correlation Coefficient
14
Covariance and Correlation Coefficient
Example: Golfing Study
A golfer is interested in investigating the relationship,
if any, between driving distance and 18-hole score.
14
272.9 69
Covariance and Correlation Coefficient
Example: Golfing Study
x y (x i - x ) (y i - y ) (x i - x )(y i - y )
277.6 69 10.65 -1.0 -10.65
259.5 71 -7.45 1.0 -7.45
269.1 70 2.15 0 0
267.0 70 0.05 0 0
255.6 71 -11.35 1.0 -11.35
272.9 69 5.95 -1.0 -5.95
Average 267.0 70.0 Total -35.40
14
Std. Dev. 8.2192 .8944
Covariance and Correlation Coefficient
Example: Golfing Study
Sample Covariance
sxy =
å (x - x )(y
i
- y ) - 35.40
i
= = - 7.08
n- 1 6- 1
Sample Correlation Coefficient
sxy - 7.08
rxy = = = -.9631
sx sy (8.2192)(.8944)
14
Introduction to Probability
Experiments, Counting Rules, and Assigning Probabilities
14
Uncertainties
Managers often base their decisions on an analysis
of uncertainties such as the following:
14
Probability
14
Probability as a Numerical Measure
of the Likelihood of Occurrence
Increasing Likelihood of Occurrence
0 .5 1
Probability:
14
Statistical Experiments
15
An Experiment and Its Sample Space
15
An Experiment and Its Sample Space
Example: Bradley Investments
Bradley has invested in two stocks, Markley Oil and Collins
Mining. Bradley has determined that the possible outcomes of
these investments three months from now are as follows.
15
-20
A Counting Rule for
Multiple-Step Experiments
15
A Counting Rule for
Multiple-Step Experiments
Example: Bradley Investments
Markley Oil: n1 = 4
Collins Mining: n2 = 2
Total Number of
Experimental Outcomes: n1n2 = (4)(2) = 8
15
Tree Diagram
Example: Bradley Investments
Markley Oil Collins Mining Experimental
(Stage 1) (Stage 2) Outcomes
Gain 8 (10, 8) Gain $18,000
(10, -2) Gain $8,000
Gain 10 Lose 2
Gain 8 (5, 8) Gain $13,000
N
æ N ö N!
C =ç
nç ÷
÷ =
è n ø n!(N - n)!
15
0! = 1
Counting Rule for Permutations
Number of Permutations of N Objects Taken n at a Time
A third useful counting rule enables us to count
the number of experimental outcomes when n
objects are to be selected from a set of N objects,
where the order of selection is important.
N
æ N ö N!
P =n!ç
n ç ÷
÷ =
è n ø (N - n)!
15
n! = n(n -1)(n -2) . . . (2)(1)
0! = 1
Assigning Probabilities
Basic Requirements for Assigning Probabilities
where:
Ei is the ith experimental outcome
and P(Ei) is its probability
15
Assigning Probabilities
Basic Requirements for Assigning Probabilities
where:
n is the number of experimental outcomes
15
Assigning Probabilities
Classical Method
Assigning probabilities based on the assumption
of equally likely outcomes
Subjective Method
Assigning probabilities based on judgment
16
Classical Method
Example: Rolling a Die
If an experiment has n possible outcomes, the
classical method would assign a probability of 1/n to
each outcome.
16
Relative Frequency Method
Example: Lucas Tool Rental
Lucas Tool Rental would like to assign probabilities
to the number of car polishers it rents each day.
Office records show the following frequencies of daily
rentals for the last 40 days.
Number of Number
Polishers Rented of Days
0 4
1 6
2 18
3 10
4 2
16
Relative Frequency Method
Example: Lucas Tool Rental
Each probability assignment is given by dividing the
frequency (number of days) by the total frequency (total
number of days).
Number of Number
Polishers Rented of Days Probability
0 4 .10
1 6 .15
2 18 .45 4/40
3 10 .25
4 2 .05
40 1.00
16
Subjective Method
16
Subjective Method
Example: Bradley Investments
An analyst made the following probability estimates.
16
(-20, -2) $22,000 Loss .06
Events and Their Probabilities
16
Events and Their Probabilities
Example: Bradley Investments
16
Events and Their Probabilities
Example: Bradley Investments
16
Some Basic Relationships of Probability
There are some basic probability relationships that can be used to
compute the probability of an event without knowledge of all
the sample point probabilities.
Complement of an Event
16
Complement of an Event
Sample
Event A Ac Space S
Venn
17
Diagram
Union of Two Events
Sample
Event A Event B Space S
171
Union of Two Events
Example: Bradley Investments
17
= .82
Intersection of Two Events
Sample
Event A Event B Space S
Intersection of A and B
17
Intersection of Two Events
Example: Bradley Investments
17
Addition Law
17
Addition Law
Example: Bradley Investments
Sample
Event A Event B Space S
177
Mutually Exclusive Events
There is no need to
include “-P(A B”
17
Mutual Exclusiveness and Independence
18
10/12/2020