You are on page 1of 180

Probability and Statistics 1

Session 1 - 3

Instructor: Prof. Deepika Jain

E-mail id: deepika.jain@iimrohtak.ac.in

1
10/12/2020
Statistics
The term statistics can refer to numerical facts such as averages,
medians, percents, and index numbers that help us understand a
variety of business and economic situations.

Statistics can also refer to the art and science of collecting, analyzing,
presenting, and interpreting data.
Applications in
Business and Economics
Accounting
Public accounting firms use statistical sampling procedures when conducting
audits for their clients.

Economics

Economists use statistical information in making forecasts about the future of


the economy or some aspect of it.

Finance
Financial advisors use price-earnings ratios and dividend yields to guide their
investment advice.
Applications in
Business and Economics
Marketing

Electronic point-of-sale scanners at retail checkout counters are used to collect


data for a variety of marketing research applications.

Production
A variety of statistical quality control charts are used to monitor the output of a
production process.

Information Systems
A variety of statistical information helps administrators assess the performance
of computer networks.
Data and Data Sets

Data are the facts and figures collected, analyzed, and


summarized for presentation and interpretation.

All the data collected in a particular study are referred to as the


data set for the study.
Elements, Variables, and Observations
Elements are the entities on which data are collected.
A variable is a characteristic of interest for the elements.

The set of measurements obtained for a particular element


is called an observation.
A data set with n elements contains n observations.

The total number of data values in a complete data set is the


number of elements multiplied by the number of variables.
Scales of Measurement

Nominal Interval
Ordinal Ratio

The scale determines the amount of information contained in the data.

The scale indicates the data summarization and


statistical analyses that are most appropriate.
Scales of Measurement
Nominal

Data are labels or names used to identify an attribute


of the element.

A nonnumeric label or numeric code may be used.


Scales of Measurement

Nominal

Example:
Students of a university are classified by the school
in which they are enrolled using a nonnumeric label
such as Business, Humanities, Education, and so on.
Alternatively, a numeric code could be used for the
school variable (e.g. 1 denotes Business, 2 denotes
Humanities, 3 denotes Education, and so on).x
Scales of Measurement
Ordinal

The data have the properties of nominal data and the


order or rank of the data is meaningful.

A nonnumeric label or numeric code may be used.


Scales of Measurement
Ordinal

Example:
Students of a university are classified by their
class standing using a nonnumeric label such as
Freshman, Sophomore, Junior, or Senior.
Alternatively, a numeric code could be used for
the class standing variable (e.g. 1 denotes
Freshman, 2 denotes Sophomore, and so on).
Scales of Measurement
Interval

The data have the properties of ordinal data, and


the interval between observations is expressed in
terms of a fixed unit of measure.

Interval data are always numeric.

Example:
Melissa has an SAT score of 1985, while Kevin
has an SAT score of 1880. Melissa scored 105
points more than Kevin.
Scales of Measurement
Ratio

The data have all the properties of interval data


and the ratio of two values is meaningful.

Variables such as distance, height, weight, and time


use the ratio scale.
This scale must contain a zero value that indicates
that nothing exists for the variable at the zero point.
Example:
Melissa’s college record shows 36 credit hours
earned, while Kevin’s record shows 72 credit
hours earned. Kevin has twice as many credit
hours earned as Melissa.
Qualitative and Quantitative Data

Data can be further classified as being qualitative


or quantitative.

The statistical analysis that is appropriate depends


on whether the data for the variable are qualitative
or quantitative.

In general, there are more alternatives for statistical


analysis when the data are quantitative.
Qualitative Data
Labels or names used to identify an attribute of each
element

Use either the nominal or ordinal scale of measurement

Can be either numeric or nonnumeric

Appropriate statistical analyses are rather limited


Quantitative Data

Quantitative data indicate how many or how much:

discrete, if measuring how many

continuous, if measuring how much

Quantitative data are always numeric.

Ordinary arithmetic operations are meaningful for


quantitative data.
Scales of Measurement

Data

Qualitative Quantitative

Numeric Non-numeric Numeric

Nominal Ordinal Nominal Ordinal Interval Ratio


Cross-Sectional Data

Cross-sectional data are collected at the same or


approximately the same point in time.

Example: data detailing the number of building


permits issued in November 2012 in each of the
counties of Ohio
Time Series Data

Time series data are collected over several time


periods.

Example: data detailing the number of building


permits issued in Lucas County, Ohio in each of
the last 36 months

Graphs of time series help analysts understand


• what happened in the past,
• identify any trends over time, and
• project future levels for the time series
Data Sources
Existing Sources

Internal company records – almost any department


Business database services – Dow Jones & Co.
Government agencies - U.S. Department of Labor
Industry associations – Travel Industry Association
of America
Special-interest organizations – Graduate Management
Admission Council
Internet – more and more firms
Data Sources

Data Available From Internal Company Records


Record Some of the Data Available
Employee records name, address, social security number
Production records part number, quantity produced,
direct labor cost, material cost
Inventory records part number, quantity in stock,
reorder level, economic order quantity
Sales records product number, sales volume, sales
volume by region
Credit records customer name, credit limit, accounts
receivable balance
Customer profile age, gender, income, household size
Data Sources

Data Available From Selected Government Agencies


Government Agency Some of the Data Available
Census Bureau Population data, number of
www.census.gov households, household income
Federal Reserve Board Data on money supply, exchange
www.federalreserve.gov rates, discount rates
Office of Mgmt. & Budget Data on revenue, expenditures, debt
www.whitehouse.gov/omb of federal government
Department of Commerce Data on business activity, value of
www.doc.gov shipments, profit by industry
Bureau of Labor Statistics Customer spending, unemployment
www.bls.gov rate, hourly earnings, safety record
Data Acquisition Considerations

Time Requirement
Searching for information can be time consuming.
Information may no longer be useful by the time it
is available.
Cost of Acquisition
Organizations often charge for information even
when it is not their primary business activity.
Data Errors
Using any data that happen to be available or were
acquired with little care can lead to misleading
information.
Descriptive Statistics

Most of the statistical information in newspapers, magazines,


company reports, and other publications consists of data that are
summarized and presented in a form that is easy to understand.

Such summaries of data, which may be tabular, graphical,


or numerical, are referred to as descriptive statistics.
Example: Hudson Auto Repair

The manager of Hudson Auto would like to have a better


understanding of the cost of parts used in the engine
tune-ups performed in her shop. She examines 50
customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed on the next slide.
Sample of Parts Cost ($) for 50 Tune-ups

91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Tabular Summary:
Frequency and Percent Frequency
Example: Hudson Auto
Parts Percent
Cost ($) Frequency Frequency
50-59 2 4
60-69 13 26 (2/50)100
70-79 16 32
80-89 7 14
90-99 7 14
100-109 5 10
50 100
Graphical Summary: Histogram
Example: Hudson Auto
18 Tune-up Parts Cost
16
14
12
Frequency

10
8
6
4
2
Parts
50-59 60-69 70-79 80-89 90-99 100-110 Cost ($)
Numerical Descriptive
Statistics
The most common numerical descriptive statistic
is the average (or mean).
The average demonstrates a measure of the central
tendency, or central location, of the data for a variable.
Hudson’s average cost of parts, based on the 50
tune-ups studied, is $79 (found by summing the
50 cost values and then dividing by 50).
Statistical Inference

Population -the set of all elements of interest in a


particular study
Sample -a subset of the population

Statistical inference -the process of using data obtained


from a sample to make estimates
and test hypotheses about the
characteristics of a population
Census -collecting data for the entire population

Sample survey -collecting data for a sample


Descriptive Statistics:
Tabular and Graphical Displays
Summarizing Data for a Qualitative Variable
Summarizing Data for a Quantitative Variable

Qualitative data use labels or names to identify


categories of like items.

Quantitative data are numerical values that indicate


how much or how many.
Summarizing Qualitative Data
Frequency Distribution
Relative Frequency Distribution
Percent Frequency Distribution
Bar Chart
Pie Chart
Frequency Distribution

A frequency distribution is a tabular summary of data


showing the number (frequency) of observations in
each of several non-overlapping categories or classes.

The objective is to provide insights about the data


that cannot be quickly obtained by looking only at
the original data.
Frequency Distribution

Example: Marada Inn


Guests staying at Marada Inn were asked to rate the quality
of their accommodations as being excellent, above average,
average, below average, or poor. The ratings provided by a
sample of 20 guests are:

Below Average Average Above Average


Above Average Above Average Above Average
Above Average Below Average Below Average
Average Poor Poor
Above Average Excellent Above Average
Average Above Average Average
Above Average Average
Frequency Distribution

Example: Marada Inn

Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20
Relative Frequency Distribution

The relative frequency of a class is the fraction or


proportion of the total number of data items
belonging to the class.

A relative frequency distribution is a tabular


summary of a set of data showing the relative
frequency for each class.
Percent Frequency Distribution
The percent frequency of a class is the relative
frequency multiplied by 100.

A percent frequency distribution is a tabular


summary of a set of data showing the percent
frequency for each class.
Relative Frequency and
Percent Frequency Distributions
Example: Marada Inn

Relative Percent
Rating Frequency Frequency
Poor .10 10
Below Average .15 15
Average .25 25 .10(100) = 10
Above Average .45 45
Excellent .05 5
Total 1.00 100

1/20 = .05
Bar Chart
A bar chart is a graphical display for depicting
qualitative data.
On one axis (usually the horizontal axis), we specify
the labels that are used for each of the classes.
A frequency, relative frequency, or percent frequency
scale can be used for the other axis (usually the
vertical axis).
Using a bar of fixed width drawn above each class
label, we extend the height appropriately.
The bars are separated to emphasize the fact that
each class is a separate category.
Bar Chart

10 Marada Inn Quality Ratings


9
8
7
Frequency

6
5
4
3
2
1
Rating
Poor Below Average Above Excellent
Average Average
Pareto Diagram

In quality control, bar charts are used to identify the


most important causes of problems.

When the bars are arranged in descending order of height


from left to right (with the most frequently occurring cause
appearing first) the bar chart is called a Pareto diagram.

This diagram is named for its founder, Vilfredo Pareto,


an Italian economist.
Pie Chart
The pie chart is a commonly used graphical display for
presenting relative frequency and percent frequency
distributions for categorical data.
First draw a circle; then use the relative frequencies to
subdivide the circle into sectors that correspond to the
relative frequency for each class.
Since there are 360 degrees in a circle, a class with a relative
frequency of .25 would consume .25(360) = 90 degrees of the
circle.
Pie Chart

Marada Inn Quality Ratings


Excellent
5%
Poor
10%
Below
Average
Above 15%
Average
45%
Average
25%
Summarizing Quantitative
Data
Frequency Distribution
Relative Frequency and Percent Frequency Distributions
Dot Plot
Histogram
Cumulative Distributions
Stem-and-Leaf Displays
Frequency Distribution
Example: Hudson Auto Repair
The manager of Hudson Auto would like to gain a
better understanding of the cost of parts used in the
engine tune-ups performed in the shop. She examines
50 customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide.
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Frequency Distribution

The three steps necessary to define the classes for a


frequency distribution with quantitative data are:
1. Determine the number of non-overlapping classes.
2. Determine the width of each class.
3. Determine the class limits.
Frequency Distribution
Guidelines for Determining the Number of Classes
Use between 5 and 20 classes.
Data sets with a larger number of elements usually
require a larger number of classes.
Smaller data sets usually require fewer classes.

The goal is to use enough classes to show the


variation in the data, but not so many classes that
some contain only a few data items.
Frequency Distribution
Guidelines for Determining the Width of Each Class
Use classes of equal width.
Approximate Class Width =

Largest Data Value  Smallest Data Value


Number of Classes

Making the classes the same width reduces


the chance of inappropriate interpretations.
Frequency Distribution

Note on Number of Classes and Class Width

In practice, the number of classes and the appropriate


class width are determined by trial and error.

Once a possible number of classes is chosen, the


appropriate class width is found.
The process can be repeated for a different number of
classes.
Ultimately, the analyst uses judgment to determine
the combination of the number of classes and class
width that provides the best frequency distribution
for summarizing the data.
Frequency Distribution

Guidelines for Determining the Class Limits


Class limits must be chosen so that each data item
belongs to one and only one class.
The lower class limit identifies the smallest possible data
value assigned to the class.
The upper class limit identifies the largest possible data
value assigned to the class.
The appropriate values for the class limits depend on the
level of accuracy of the data.

An open-end class requires only a lower class limit


or an upper class limit.
Frequency Distribution

Example: Hudson Auto Repair


If we choose six classes:
Approximate Class Width = (109 - 52)/6 = 9.5   10
Parts Cost ($) Frequency
50-59 2
60-69 13
70-79 16
80-89 7
90-99 7
100-109 5
Total 50
Relative Frequency and
Percent Frequency Distributions
Example: Hudson Auto Repair

Parts Relative Percent


Cost ($) Frequency Frequency
50-59 .04 4
60-69 .26 2/50 26 .04(100)
70-79 .32 32
80-89 .14 14 Percent
frequency is
90-99 .14 14 the relative
100-109 .10 10 frequency
Total 1.00 100 multiplied
by 100.
Dot Plot

One of the simplest graphical summaries of data is a dot


plot.
A horizontal axis shows the range of data values.
Then each data value is represented by a dot placed
above the axis.
Dot Plot

Example: Hudson Auto Repair

Tune-up Parts Cost

50 60 70 80 90 100 110
Cost ($)
Histogram
Another common graphical display of quantitative data
is a histogram.

The variable of interest is placed on the horizontal axis.

A rectangle is drawn above each class interval with its


height corresponding to the interval’s frequency, relative
frequency, or percent frequency.
Unlike a bar graph, a histogram has no natural separation
between rectangles of adjacent classes.
Histogram
Example: Hudson Auto Repair
18
Tune-up Parts Cost
16
14
12
Frequency

10
8
6
4
2
Parts
50-59 60-69 70-79 80-89 90-99 100-110 Cost ($)
Cumulative Distributions

Cumulative frequency distribution -shows the


number of items with values less than or equal to the
upper limit of each class..

Cumulative relative frequency distribution – shows


the proportion of items with values less than or
equal to the upper limit of each class.

Cumulative percent frequency distribution – shows


the percentage of items with values less than or
equal to the upper limit of each class.
Cumulative Distributions
Hudson Auto Repair

Cumulative Cumulative
Cumulative Relative Percent
Cost ($) Frequency Frequency Frequency
< 59 2 .04 4
< 69 15 .30 30
< 79 31 2 + 13 .62 15/50 62 .30(100)
< 89 38 .76 76
< 99 45 .90 90
< 109 50 1.00 100
Stem-and-Leaf Display

A stem-and-leaf display shows both the rank order and


shape of the distribution of the data.
It is similar to a histogram on its side, but it has the
advantage of showing the actual data values.
The first digits of each data item are arranged to the left
of a vertical line.
To the right of the vertical line we record the last digit
for each item in rank order.
Each line in the display is referred to as a stem.
Each digit on a stem is a leaf.
Example: Hudson Auto Repair

The manager of Hudson Auto would like to gain a


better understanding of the cost of parts used in the
engine tune-ups performed in the shop. She examines
50 customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide.

91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Stem-and-Leaf Display

Example: Hudson Auto Repair

5 2 7
6 2 2 2 2 5 6 7 8 8 8 9 9 9
7 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9
8 0 0 2 3 5 8 9
9 1 3 7 7 7 8 9
10 1 4 5 5 9

a stem
a leaf
Stretched Stem-and-Leaf
Display
If we believe the original stem-and-leaf display has
condensed the data too much, we can stretch the display
vertically by using two stems for each leading digit(s).

Whenever a stem value is stated twice, the first value


corresponds to leaf values of 0 -4, and the second value
corresponds to leaf values of 5 -9.
Stretched Stem-and-Leaf Display

Example: Hudson Auto Repair


5 2
5 7
6 2 2 2 2
6 5 6 7 8 8 8 9 9 9
7 1 1 2 2 3 4 4
7 5 5 5 6 7 8 9 9 9
8 0 0 2 3
8 5 8 9
9 1 3
9 7 7 7 8 9
10 1 4
10 5 5 9
Stem-and-Leaf Display

Leaf Units
A single digit is used to define each leaf.
In the preceding example, the leaf unit was 1.
Leaf units may be 100, 10, 1, 0.1, and so on.
Where the leaf unit is not shown, it is assumed to
equal 1.
The leaf unit indicates how to multiply the stem-
and-leaf numbers in order to approximate the
original data.
Example: Leaf
If we have data with values such as
Unit = 0.1
8.6 11.7 9.4 9.1 10.2 11.0 8.8

a stem-and-leaf display of these data will be

Leaf Unit = 0.1


8 6 8
9 1 4
10 2
11 0 7
Example: Leaf Unit = 10

If we have data with values such as


1806 1717 1974 1791 1682 1910 1838

a stem-and-leaf display of these data will be

Leaf Unit = 10
16 8 The 82 in 1682
17 1 9 is rounded down
18 0 3 to 80 and is
represented as an 8.
19 1 7
Summarizing Data for Two Variables
Using Tables

Often a manager is interested in tabular and graphical


methods that will help understand the relationship between
two variables.

Crosstabulation is a method for summarizing the data for


two variables.
Crosstabulation

A crosstabulation is a tabular summary of data for two


variables.
Crosstabulation can be used when:
one variable is qualitative and the other is
quantitative,
both variables are qualitative, or
both variables are quantitative.
The left and top margin labels define the classes for the
two variables.
Crosstabulation

Example: Finger Lakes Homes


The number of Finger Lakes homes sold for each style
and price for the past two years is shown below.
quantitative categorical
variable variable
Price Home Style
Range Colonial Log Split A-Frame Total
< $200,000 18 6 19 12 55
> $200,000 12 14 16 3 45

Total 30 20 35 15 100
Crosstabulation
Frequency
Example: Finger Lakes Homes distribution
for the
price range
variable
Price Home Style
Range Colonial Log Split A-Frame Total
< $200,000 18 6 19 12 55
> $200,000 12 14 16 3 45

Total 30 20 35 15 100

Frequency distribution for


the home style variable
Crosstabulation: Simpson’s Paradox

Data in two or more cross tabulations are often aggregated


to produce a summary cross tabulation.

We must be careful in drawing conclusions about the


relationship between the two variables in the aggregated
cross tabulation.

In some cases the conclusions based upon an aggregated


crosstabulation can be completely reversed if we look at
the unaggregated data. The reversal of conclusions based
on aggregate and unaggregated data is called Simpson’s
paradox.
Summarizing Data for Two Variables Using
Graphical Displays

In most cases, a graphical display is more useful than a


table for recognizing patterns and trends.

Displaying data in creative ways can lead to powerful


insights.

Scatter diagrams and trendlines are useful in exploring


the relationship between two variables.
Scatter Diagram and Trendline

A scatter diagram is a graphical presentation of the


relationship between two quantitative variables.
One variable is shown on the horizontal axis and the
other variable is shown on the vertical axis.
The general pattern of the plotted points suggest the
overall relationship between the variables.

A trendline provides an approximation of the relationship.


Scatter Diagram
A Positive Relationship

x
Scatter Diagram
A Negative Relationship

x
Scatter Diagram
No Apparent Relationship

x
Scatter Diagram
Example: Panthers Football Team
The Panthers football team is interested in
investigating the relationship, if any, between
interceptions made and points scored.

x = Number of y = Number of
Interceptions Points Scored
1 14
3 24
2 18
1 17
3 30
Scatter Diagram and Trendline
y
Number of Points Scored 35
30
25
20
15
10
5
0 x
0 1 2 3 4
Number of Interceptions
Example: Panthers Football Team

Insights Gained from the Preceding Scatter Diagram


The scatter diagram indicates a positive relationship
between the number of interceptions and the number of
points scored.

Higher points scored are associated with a higher number


of interceptions.

The relationship is not perfect; all plotted points in the


scatter diagram are not on a straight line.
Side-by-Side Bar Chart

A side-by-side bar chart is a graphical display for


depicting multiple bar charts on the same display.
Each cluster of bars represents one value of the
first variable.
Each bar within a cluster represents one value of
the second variable.
Side-by-Side Bar Chart
Finger Lake Homes
20
18
16
14
Frequency

12 < $200,000
> $200,000
10
8
6
4
2
Home Style
Colonial Log Split-Level A-Frame
Stacked Bar Chart

A stacked bar chart is another way to display and compare


two variables on the same display.
It is a bar chart in which each bar is broken into rectangular
segments of a different color.

If percentage frequencies are displayed, all bars will be of


the same height (or length), extending to the 100% mark.
Stacked Bar Chart
Finger Lake Homes
40
36
32
28
Frequency

24 < $200,000
> $200,000
20
16
12
8
4
Home Style
Colonial Log Split A-Frame
Choosing the Type of Graphical Display

Displays used to show the distribution of data:


Bar Chart Pie Chart Dot Plot

Histogram Stem-and-Leaf Display

Displays used to make comparisons:


Side-by-Side Bar Chart Stacked Bar Chart

Displays used to show relationships:


Scatter Diagram Trendline
Tabular and Graphical Displays
Data
Qualitative Data Quantitative Data

Tabular Graphical Tabular Graphical


Displays Displays Displays Displays

• Frequency • Bar Chart • Frequency • Dot Plot


Distribution • Pie Chart Distribution • Histogram
• Rel. Freq. Dist. • Side-by-Side • Rel. Freq. Dist. • Stem-and-
• Percent Freq. Bar Chart • % Freq. Dist. Leaf Display
Distribution • Stacked • Cum. Freq. Dist. • Scatter
• Crosstabulation Bar Chart • Cum. Rel. Freq. Diagram
Distribution
• Cum. % Freq.
Distribution
• Crosstabulation
Measures of Location
Mean
Weighted Mean
If the measures are computed
Median for data from a sample,
Geometric Mean they are called sample statistics.
Mode
If the measures are computed
Percentiles for data from a population,
Quartiles they are called population parameters.

85
A sample statistic is referred to
as the point estimator of the
corresponding population parameter.
Mean
Perhaps the most important measure of location is the mean.

The mean provides a measure of central location.


The mean of a data set is the average of all the data values.

The sample mean is the point estimator of the population


mean m.

86
Sample Mean

Sum of the values


of the n observations
x i
x
n

Number of
observations
in the sample

87
Population Mean m

Sum of the values


of the N observations
x i

N

Number of
observations in
the population

88
Sample Mean

Example: Apartment Rents


Seventy efficiency apartments were randomly sampled
in a small college town. The monthly rent prices for
these apartments are listed below.
445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440

89
Sample Mean
Example: Apartment Rents

x  xi 34, 356
  490.80
n 70
445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440

90
Weighted Mean
In some instances the mean is computed by giving each
observation a weight that reflects its relative importance.

The choice of weights depends on the application.


The weights might be the number of credit hours earned for
each grade, as in GPA.
In other weighted mean computations, quantities such as
pounds, dollars, or volume are frequently used.

91
Weighted Mean
If data is from
a population, Numerator:
sum of the weighted
m replaces x.
data values

x=
å wx i i

åw i
Denominator:
sum of the
where: weights
xi = value of observation i
wi = weight for observation i

92
Weighted Mean
Example: Construction Wages
Ron Butler, a home builder, is looking over the expenses he
incurred for a house he just built. For the purpose of pricing
future projects, he would like to know the average wage
($/hour) he paid the workers he employed. Listed below are
the categories of worker he employed, along with their
respective wage and total hours worked.

Worker Wage ($/hr) Total Hours


Carpenter 21.60 520
Electrician 28.72 230
Laborer 11.80 410

93
Painter 19.75 270
Plumber 24.16 160
Weighted Mean

Example: Construction Wages


Worker xi wi wi x i
Carpenter 21.60 520 11232.0
Electrician 28.72 230 6605.6
Laborer 11.80 410 4838.0
Painter 19.75 270 5332.5
Plumber 24.16 160 3865.6
1590 31873.7

m=
å wx i i
=
31873.7
=20.0464 » $20.05
åw i
1590

94
FYI, equally-weighted (simple) mean = $21.21
Median
The median of a data set is the value in the middle when the data
items are arranged in ascending order.
Whenever a data set has extreme values, the median is the
preferred measure of central location.
The median is the measure of location most often reported for
annual income and property value data.
A few extremely large incomes or property values can inflate the
mean.

95
Median
For an odd number of observations:

26 18 27 12 14 27 19 7 observations

12 14 18 19 26 27 27 in ascending order

the median is the middle value.

Median = 19

96
Median
For an even number of observations:

26 18 27 12 14 27 30 19 8 observations

12 14 18 19 26 27 27 30 in ascending order

the median is the average of the middle two values.

Median = (19 + 26)/2 = 22.5

97
Median
Example: Apartment Rents
Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

98
Note: Data is in ascending order.
Trimmed Mean
Another measure, sometimes used when extreme values are
present, is the trimmed mean.
It is obtained by deleting a percentage of the smallest and largest
values from a data set and then computing the mean of the
remaining values.

For example, the 5% trimmed mean is obtained by removing the


smallest 5% and the largest 5% of the data values and then
computing the mean of the remaining values.

99
Geometric Mean
The geometric mean is calculated by finding the nth root of the
product of n values.

It is often used in analyzing growth rates in financial data (where


using the arithmetic mean will provide misleading results).

It should be applied anytime you want to determine the mean


rate of change over several successive periods (be it years,
quarters, weeks, . . .).

Other common applications include: changes in populations of


species, crop yields, pollution levels, and birth and death rates.

10
Geometric Mean

10
Geometric Mean
Example: Rate of Return
Period Return (%) Growth Factor
1 -6.0 0.940
2 -8.0 0.920
3 -4.0 0.960
4 2.0 1.020
5 5.4 1.054

x g = 5 (.94)(.92)(.96)(1.02)(1.054)
1
5
=[.89254] =.97752
Average growth rate per period
is (.97752 - 1) (100) = -2.248%
10
Mode
The mode of a data set is the value that occurs with greatest
frequency.

The greatest frequency can occur at two or more different values.

If the data have exactly two modes, the data are bimodal.

If the data have more than two modes, the data are multimodal.

10
Mode
Example: Apartment Rents
450 occurred most frequently (7 times)
Mode = 450
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

10
Note: Data is in ascending order.
Percentiles

A percentile provides information about how the data are spread


over the interval from the smallest value to the largest value.

Admission test scores for colleges and universities are frequently


reported in terms of percentiles.

The pth percentile of a data set is a value such that at least p


percent of the items take on this value or less and at least (100 -
p) percent of the items take on this value or more.

10
Percentiles

Arrange the data in ascending order.

Compute index i, the position of the pth percentile.


i = (p/100)n

If i is not an integer, round up. The pth percentile


is the value in the ith position.

If i is an integer, the pth percentile is the average


of the values in positions i and i +1.

10
80th Percentile
Example: Apartment Rents
i = (p/100)n = (80/100)70 = 56
Averaging the 56th and 57th data values:
80th Percentile = (535 + 549)/2 = 542
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.


10
80th Percentile
Example: Apartment Rents
“At least 80% of the “At least 20% of the
items take on a items take on a
value of 542 or less.” value of 542 or more.”
56/70 = .8 or 80% 14/70 = .2 or 20%
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570

10
575 575 580 590 600 600 600 600 615 615
Quartiles
Quartiles are specific percentiles.
First Quartile = 25th Percentile
Second Quartile = 50th Percentile = Median
Third Quartile = 75th Percentile

10
Third Quartile
Example: Apartment Rents
Third quartile = 75th percentile
i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.


11
Measures of Variability
It is often desirable to consider measures of variability (dispersion),
as well as measures of location.
For example, in choosing supplier A or supplier B we might
consider not only the average delivery time for each, but also the
variability in delivery time for each.

111
Measures of Variability
Range
Interquartile Range
Variance
Standard Deviation
Coefficient of Variation

11
Range
The range of a data set is the difference between the largest and
smallest data values.
It is the simplest measure of variability.

It is very sensitive to the smallest and largest data values.


Range
Example: Apartment Rents
Range = largest value - smallest value
Range = 615 - 425 = 190
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.

11
Interquartile Range
The interquartile range of a data set is the difference between the
third quartile and the first quartile.
It is the range for the middle 50% of the data.
It overcomes the sensitivity to extreme data values.
Interquartile Range
Example: Apartment Rents
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.


11
Variance

The variance is a measure of variability that utilizes


all the data.

It is based on the difference between the value of


each observation (xi) and the mean (for a x sample,
m for a population).

The variance is useful in comparing the variability


of two or more variables.

117
Variance

The variance is the average of the squared differences between


each data value and the mean.
The variance is computed as follows:

2 å ( x - x )2
i s 2
=
å (x
i
- m )2
s = N
n- 1

for a for a
sample population

11
Standard Deviation

The standard deviation of a data set is the positive square


root of the variance.

It is measured in the same units as the data, making it more


easily interpreted than the variance.

The standard deviation is computed as follows:

s= s 2
s = s2
For a Sample For a Population
11
Coefficient of Variation

The coefficient of variation indicates how large the


standard deviation is in relation to the mean.

The coefficient of variation is computed as follows:


æs ö æs ö
ç ´ 100 ÷% ç ´ 100 ÷%
èx ø èm ø
for a for a
sample population

12
Sample Variance, Standard Deviation,
And Coefficient of Variation
Example: Apartment Rents
Variance
s2 =
å ( x i - x )2
= 2,996.16
n- 1

Standard Deviation
the standard
deviation is
s = s = 2996.16 = 54.74
2

about 11%
of the mean
Coefficient of Variation
æç s ö÷ æ ç 54.74 ö
÷% = 11.15%
ç ´ 100 %
÷ ç= ´ 100

12
÷
è x ø è 490.80 ø
Distribution Shape: Skewness
An important measure of the shape of a distribution is called
skewness.
The formula for the skewness of sample data is

3
n æx - xö
Skewness =
(n - 1)(n - 2)
å ç i ÷
è s ø

12
Distribution Shape: Skewness
Symmetric (not skewed)
Skewness is zero.
Mean and median are equal.

.35
Skewness = 0
.30
Relative Frequency

.25
.20
.15
.10
.05

12
0
Distribution Shape: Skewness
Moderately Skewed Left
Skewness is negative.
Mean will usually be less than the median.

.35
Skewness = - .31
.30
Relative Frequency

.25
.20
.15
.10
.05

12
0
Distribution Shape: Skewness
Moderately Skewed Right
Skewness is positive.
Mean will usually be more than the median.

.35
Skewness = .31
.30
Relative Frequency

.25
.20
.15
.10
.05

12
0
Distribution Shape: Skewness
Highly Skewed Right
Skewness is positive (often above 1.0).
Mean will usually be more than the median.

.35
Skewness = 1.25
.30
Relative Frequency

.25
.20
.15
.10
.05

12
0
Distribution Shape: Skewness
Example: Apartment Rents
Seventy efficiency apartments were randomly sampled in
a college town. The monthly rent prices for the
apartments are listed below in ascending order.

425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

12
Distribution Shape: Skewness
Example: Apartment Rents

.35 Skewness = .92


.30
Relative Frequency

.25

.20
.15

.10
.05

12
0
Five-Number Summaries
and Box Plots

Summary statistics and easy-to-draw graphs can be used to


quickly summarize large quantities of data.

Two tools that accomplish this are five-number summaries


and box plots.

12
Five-Number Summary

1 Smallest Value

2 First Quartile

3 Median

4 Third Quartile

5 Largest Value

13
Five-Number Summary
Example: Apartment Rents
Lowest Value = 425 First Quartile = 445
Median = 475
Third Quartile = 525 Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

13
Box Plot

A box plot is a graphical summary of data that is based on a


five-number summary.

A key to the development of a box plot is the computation of


the median and the quartiles Q1 and Q3.

Box plots provide another way to identify outliers.

13
Box Plot
Example: Apartment Rents

A box is drawn with its ends located at the first and third quartiles

A vertical line is drawn in the box at the location of the median


(second quartile).

400 425 450 475 500 525 550 575 600 625

13
Q1 = 445 Q3 = 525
Q2 = 475
Box Plot
Limits are located (not drawn) using the interquartile range
(IQR).
Data outside these limits are considered outliers.
The locations of each outlier is shown with the symbol * .

13
Box Plot
Example: Apartment Rents
Whiskers (dashed lines) are drawn from the ends of the box
to the smallest and largest data values inside the limits.

400 425 450 475 500 525 550 575 600 625

13
Smallest value Largest value
inside limits = 425 inside limits = 615
Measures of Association
Between Two Variables
Thus far we have examined numerical methods used to
summarize the data for one variable at a time.

Often a manager or decision maker is interested in the


relationship between two variables.

Two descriptive measures of the relationship between two


variables are covariance and correlation coefficient.

13
Covariance

The covariance is a measure of the linear association


between two variables.

Positive values indicate a positive relationship.

Negative values indicate a negative relationship.

13
Covariance
The covariance is computed as follows:

å ( xi - x )( yi - y ) for
sxy = samples
n- 1

s =
å (xi - m x )( yi - m y ) for
xy
N populations

13
Correlation Coefficient

Correlation is a measure of linear association and not


necessarily causation.

Just because two variables are highly correlated, it does not


mean that one variable is the cause of the other.

13
Correlation Coefficient
The correlation coefficient is computed as follows:

sxy s xy
rxy = r xy =
sx s y s xs y

for for
samples populations

14
Correlation Coefficient

The coefficient can take on values between -1 and +1.

Values near -1 indicate a strong negative linear relationship.

Values near +1 indicate a strong positive linear relationship.

The closer the correlation is to zero, the weaker the


relationship.

14
Covariance and Correlation Coefficient
Example: Golfing Study
A golfer is interested in investigating the relationship,
if any, between driving distance and 18-hole score.

Average Driving Average


Distance (yds.) 18-Hole Score
277.6 69
259.5 71
269.1 70
267.0 70
255.6 71

14
272.9 69
Covariance and Correlation Coefficient
Example: Golfing Study

x y (x i - x ) (y i - y ) (x i - x )(y i - y )
277.6 69 10.65 -1.0 -10.65
259.5 71 -7.45 1.0 -7.45
269.1 70 2.15 0 0
267.0 70 0.05 0 0
255.6 71 -11.35 1.0 -11.35
272.9 69 5.95 -1.0 -5.95
Average 267.0 70.0 Total -35.40

14
Std. Dev. 8.2192 .8944
Covariance and Correlation Coefficient
Example: Golfing Study
Sample Covariance

sxy =
å (x - x )(y
i
- y ) - 35.40
i
= = - 7.08
n- 1 6- 1
Sample Correlation Coefficient
sxy - 7.08
rxy = = = -.9631
sx sy (8.2192)(.8944)

14
Introduction to Probability
Experiments, Counting Rules, and Assigning Probabilities

Events and Their Probability


Some Basic Relationships of Probability

14
Uncertainties
Managers often base their decisions on an analysis
of uncertainties such as the following:

What are the chances that sales will decrease


if we increase prices?

What is the likelihood a new assembly method


will increase productivity?

What are the odds that a new investment will


be profitable?

14
Probability

Probability is a numerical measure of the likelihood


that an event will occur.

Probability values are always assigned on a scale


from 0 to 1.

A probability near zero indicates an event is quite


unlikely to occur.

A probability near one indicates an event is almost


certain to occur.

14
Probability as a Numerical Measure
of the Likelihood of Occurrence
Increasing Likelihood of Occurrence

0 .5 1
Probability:

The event The occurrence The event


is very of the event is is almost
unlikely just as likely as certain
to occur. it is unlikely. to occur.

14
Statistical Experiments

In statistics, the notion of an experiment differs somewhat


from that of an experiment in the physical sciences.

In statistical experiments, probability determines


outcomes.

Even though the experiment is repeated in exactly the same


way, an entirely different outcome may occur.

For this reason, statistical experiments are some-times called


random experiments.
14
An Experiment and Its Sample Space

An experiment is any process that generates well-defined


outcomes.

The sample space for an experiment is the set of all


experimental outcomes.

An experimental outcome is also called a sample point.

15
An Experiment and Its Sample Space

Experiment Experiment Outcomes


Toss a coin Head, tail
Inspection a part Defective, non-defective
Conduct a sales call Purchase, no purchase
Roll a die 1, 2, 3, 4, 5, 6
Play a football game Win, lose, tie

15
An Experiment and Its Sample Space
Example: Bradley Investments
Bradley has invested in two stocks, Markley Oil and Collins
Mining. Bradley has determined that the possible outcomes of
these investments three months from now are as follows.

Investment Gain or Loss


in 3 Months (in $000)
Markley Oil Collins Mining
10 8
5 -2
0

15
-20
A Counting Rule for
Multiple-Step Experiments

If an experiment consists of a sequence of k steps in which there


are n1 possible results for the first step, n2 possible results for the
second step, and so on, then the total number of experimental
outcomes is given by (n1)(n2) . . . (nk).

A helpful graphical representation of a multiple-step experiment


is a tree diagram.

15
A Counting Rule for
Multiple-Step Experiments
Example: Bradley Investments

Bradley Investments can be viewed as a two-step


experiment. It involves two stocks, each with a set of
experimental outcomes.

Markley Oil: n1 = 4
Collins Mining: n2 = 2
Total Number of
Experimental Outcomes: n1n2 = (4)(2) = 8

15
Tree Diagram
Example: Bradley Investments
Markley Oil Collins Mining Experimental
(Stage 1) (Stage 2) Outcomes
Gain 8 (10, 8) Gain $18,000
(10, -2) Gain $8,000
Gain 10 Lose 2
Gain 8 (5, 8) Gain $13,000

Lose 2 (5, -2) Gain $3,000


Gain 5
Gain 8
(0, 8) Gain $8,000
Even
(0, -2) Lose $2,000
Lose 20 Lose 2
Gain 8 (-20, 8) Lose $12,000
Lose 2
15
(-20, -2) Lose $22,000
Counting Rule for Combinations
Number of Combinations of N Objects Taken n at a Time
A second useful counting rule enables us to count the
number of experimental outcomes when n objectsare to
be selected from a set of N objects.

N
æ N ö N!
C =ç
nç ÷
÷ =
è n ø n!(N - n)!

where: N! = N(N -1)(N -2) . . . (2)(1)


n! = n(n -1)(n -2) . . . (2)(1)

15
0! = 1
Counting Rule for Permutations
Number of Permutations of N Objects Taken n at a Time
A third useful counting rule enables us to count
the number of experimental outcomes when n
objects are to be selected from a set of N objects,
where the order of selection is important.

N
æ N ö N!
P =n!ç
n ç ÷
÷ =
è n ø (N - n)!

where: N! = N(N -1)(N -2) . . . (2)(1)

15
n! = n(n -1)(n -2) . . . (2)(1)
0! = 1
Assigning Probabilities
Basic Requirements for Assigning Probabilities

1. The probability assigned to each experimental


outcome must be between 0 and 1, inclusively.

0 < P(Ei) < 1 for all i

where:
Ei is the ith experimental outcome
and P(Ei) is its probability

15
Assigning Probabilities
Basic Requirements for Assigning Probabilities

2. The sum of the probabilities for all experimental


outcomes must equal 1.

P(E1) + P(E2) + . . . + P(En) = 1

where:
n is the number of experimental outcomes

15
Assigning Probabilities
Classical Method
Assigning probabilities based on the assumption
of equally likely outcomes

Relative Frequency Method


Assigning probabilities based on experimentation
or historical data

Subjective Method
Assigning probabilities based on judgment

16
Classical Method
Example: Rolling a Die
If an experiment has n possible outcomes, the
classical method would assign a probability of 1/n to
each outcome.

Experiment: Rolling a die


Sample Space: S = {1, 2, 3, 4, 5, 6}
Probabilities: Each sample point has a
1/6 chance of occurring

16
Relative Frequency Method
Example: Lucas Tool Rental
Lucas Tool Rental would like to assign probabilities
to the number of car polishers it rents each day.
Office records show the following frequencies of daily
rentals for the last 40 days.
Number of Number
Polishers Rented of Days
0 4
1 6
2 18
3 10
4 2
16
Relative Frequency Method
Example: Lucas Tool Rental
Each probability assignment is given by dividing the
frequency (number of days) by the total frequency (total
number of days).

Number of Number
Polishers Rented of Days Probability
0 4 .10
1 6 .15
2 18 .45 4/40
3 10 .25
4 2 .05
40 1.00
16
Subjective Method

When economic conditions and a company’s circumstances change


rapidly it might be inappropriate to assign probabilities based
solely on historical data.

We can use any data available as well as our experience and


intuition, but ultimately a probability value should express our
degree of belief that the experimental outcome will occur.

The best probability estimates often are obtained by combining


the estimates from the classical or relative frequency approach
with the subjective estimate.

16
Subjective Method
Example: Bradley Investments
An analyst made the following probability estimates.

Exper. Outcome Net Gain or Loss Probability


(10, 8) $18,000 Gain .20
(10, -2) $8,000 Gain .08
(5, 8) $13,000 Gain .16
(5, -2) $3,000 Gain .26
(0, 8) $8,000 Gain .10
(0, -2) $2,000 Loss .12
(-20, 8) $12,000 Loss .02

16
(-20, -2) $22,000 Loss .06
Events and Their Probabilities

An event is a collection of sample points.

The probability of any event is equal to the sum of


the probabilities of the sample points in the event.

If we can identify all the sample points of an


experiment and assign a probability to each, we
can compute the probability of an event.

16
Events and Their Probabilities
Example: Bradley Investments

Event M = Markley Oil Profitable


M = {(10, 8), (10, -2), (5, 8), (5, -2)}
P(M) = P(10, 8) + P(10, -2) + P(5, 8) + P(5, -2)
= .20 + .08 + .16 + .26
= .70

16
Events and Their Probabilities
Example: Bradley Investments

Event C = Collins Mining Profitable


C = {(10, 8), (5, 8), (0, 8), (-20, 8)}
P(C) = P(10, 8) + P(5, 8) + P(0, 8) + P(-20, 8)
= .20 + .16 + .10 + .02
= .48

16
Some Basic Relationships of Probability
There are some basic probability relationships that can be used to
compute the probability of an event without knowledge of all
the sample point probabilities.

Complement of an Event

Union of Two Events

Intersection of Two Events

Mutually Exclusive Events

16
Complement of an Event

The complement of event A is defined to be the event


consisting of all sample points that are not in A.

The complement of A is denoted by Ac.

Sample
Event A Ac Space S

Venn

17
Diagram
Union of Two Events

The union of events A and B is the event containing


all sample points that are in A or B or both.

The union of events A and B is denoted by A B

Sample
Event A Event B Space S

171
Union of Two Events
Example: Bradley Investments

Event M = Markley Oil Profitable


Event C = Collins Mining Profitable
M C = Markley Oil Profitable
or Collins Mining Profitable (or both)
M C = {(10, 8), (10, -2), (5, 8), (5, -2), (0, 8), (-20, 8)}
P(M C) = P(10, 8) + P(10, -2) + P(5, 8) + P(5, -2)
+ P(0, 8) + P(-20, 8)
= .20 + .08 + .16 + .26 + .10 + .02

17
= .82
Intersection of Two Events

The intersection of events A and B is the set of all


sample points that are in both A and B.

The intersection of events A and B is denoted by A 

Sample
Event A Event B Space S

Intersection of A and B
17
Intersection of Two Events
Example: Bradley Investments

Event M = Markley Oil Profitable


Event C = Collins Mining Profitable
M C = Markley Oil Profitable
and Collins Mining Profitable
M C = {(10, 8), (5, 8)}
P(M C) = P(10, 8) + P(5, 8)
= .20 + .16
= .36

17
Addition Law

The addition law provides a way to compute the probability of


event A, or B, or both A and B occurring.

The law is written as:

P(A B) = P(A) + P(B) -P(A  B

17
Addition Law
Example: Bradley Investments

Event M = Markley Oil Profitable


Event C = Collins Mining Profitable
M C= Markley Oil Profitable
or Collins Mining Profitable
We know: P(M) = .70, P(C) = .48, P(M C) = .36
Thus: P(M  C) = P(M) + P(C) -P(M  C)
= .70 + .48 -.36
= .82
(This result is the same as that obtained earlier
using the definition of the probability of an event.)
17
Mutually Exclusive Events

Two events are said to be mutually exclusive if the


events have no sample points in common.

Two events are mutually exclusive if, when one event


occurs, the other cannot occur.

Sample
Event A Event B Space S

177
Mutually Exclusive Events

If events A and B are mutually exclusive, P(A  B = 0.

The addition law for mutually exclusive events is:

P(A B) = P(A) + P(B)

There is no need to
include “-P(A  B”

17
Mutual Exclusiveness and Independence

Do not confuse the notion of mutually exclusive


events with that of independent events.

Two events with nonzero probabilities cannot be


both mutually exclusive and independent.

If one mutually exclusive event is known to occur,


the other cannot occur.; thus, the probability of the
other event occurring is reduced to zero (and they
are therefore dependent).

Two events that are not mutually exclusive, might


or might not be independent.
17
Thank You

18
10/12/2020

You might also like