Session 1,2, and 3

Probability and Statistics 1
Session 1 - 3
Instructor: Prof. Deepika Jain
E-mail id: deepika.jain@iimrohtak.ac.in
1
10/12/2020
Statistics
The term statistics can refer to numerical facts such as averages,
medians, percents, and index numbers that help us understand a
variety of business and economic situations.
Statistics can also refer to the art and science of collecting, analyzing,
presenting, and interpreting data.
Applications in
Business and Economics
Accounting
Public accounting firms use statistical sampling procedures when conducting
audits for their clients.
Economics
Economists use statistical information in making forecasts about the future of

the economy or some aspect of it.
Finance
Financial advisors use price-earnings ratios and dividend yields to guide their
investment advice.
Applications in
Business and Economics
Marketing
Electronic point-of-sale scanners at retail checkout counters are used to collect

data for a variety of marketing research applications.
Production
A variety of statistical quality control charts are used to monitor the output of a
production process.
Information Systems
A variety of statistical information helps administrators assess the performance
of computer networks.
Data and Data Sets
Data are the facts and figures collected, analyzed, and

summarized for presentation and interpretation.
All the data collected in a particular study are referred to as the

data set for the study.
Elements, Variables, and Observations
Elements are the entities on which data are collected.
A variable is a characteristic of interest for the elements.
The set of measurements obtained for a particular element

is called an observation.
A data set with n elements contains n observations.
The total number of data values in a complete data set is the

number of elements multiplied by the number of variables.
Scales of Measurement
Nominal Interval
Ordinal Ratio
The scale determines the amount of information contained in the data.
The scale indicates the data summarization and

statistical analyses that are most appropriate.
Nominal
Data are labels or names used to identify an attribute

of the element.
A nonnumeric label or numeric code may be used.

Nominal
Example:
Students of a university are classified by the school
in which they are enrolled using a nonnumeric label
such as Business, Humanities, Education, and so on.
Alternatively, a numeric code could be used for the
school variable (e.g. 1 denotes Business, 2 denotes
Humanities, 3 denotes Education, and so on).x
Ordinal
The data have the properties of nominal data and the

order or rank of the data is meaningful.
A nonnumeric label or numeric code may be used.

Ordinal
Example:
Students of a university are classified by their
class standing using a nonnumeric label such as
Freshman, Sophomore, Junior, or Senior.
Alternatively, a numeric code could be used for
the class standing variable (e.g. 1 denotes
Freshman, 2 denotes Sophomore, and so on).
Interval
The data have the properties of ordinal data, and

the interval between observations is expressed in
terms of a fixed unit of measure.
Interval data are always numeric.
Example:
Melissa has an SAT score of 1985, while Kevin
has an SAT score of 1880. Melissa scored 105
points more than Kevin.
Ratio
The data have all the properties of interval data

and the ratio of two values is meaningful.
Variables such as distance, height, weight, and time

use the ratio scale.
This scale must contain a zero value that indicates
that nothing exists for the variable at the zero point.
Example:
Melissa’s college record shows 36 credit hours
earned, while Kevin’s record shows 72 credit
hours earned. Kevin has twice as many credit
hours earned as Melissa.
Qualitative and Quantitative Data
Data can be further classified as being qualitative

or quantitative.
The statistical analysis that is appropriate depends

on whether the data for the variable are qualitative
or quantitative.
In general, there are more alternatives for statistical

analysis when the data are quantitative.
Qualitative Data
Labels or names used to identify an attribute of each
element
Use either the nominal or ordinal scale of measurement
Can be either numeric or nonnumeric
Appropriate statistical analyses are rather limited

Quantitative Data
Quantitative data indicate how many or how much:
discrete, if measuring how many
continuous, if measuring how much
Quantitative data are always numeric.
Ordinary arithmetic operations are meaningful for

quantitative data.
Data
Qualitative Quantitative
Numeric Non-numeric Numeric
Nominal Ordinal Nominal Ordinal Interval Ratio

Cross-Sectional Data
Cross-sectional data are collected at the same or

approximately the same point in time.
Example: data detailing the number of building

permits issued in November 2012 in each of the
counties of Ohio
Time Series Data
Time series data are collected over several time

periods.
Example: data detailing the number of building

permits issued in Lucas County, Ohio in each of
the last 36 months
Graphs of time series help analysts understand

• what happened in the past,
• identify any trends over time, and
• project future levels for the time series
Data Sources
Existing Sources
Internal company records – almost any department

Business database services – Dow Jones & Co.
Government agencies - U.S. Department of Labor
Industry associations – Travel Industry Association
of America
Special-interest organizations – Graduate Management
Admission Council
Internet – more and more firms
Data Sources
Data Available From Internal Company Records

Record Some of the Data Available
Employee records name, address, social security number
Production records part number, quantity produced,
direct labor cost, material cost
Inventory records part number, quantity in stock,
reorder level, economic order quantity
Sales records product number, sales volume, sales
volume by region
Credit records customer name, credit limit, accounts
receivable balance
Customer profile age, gender, income, household size
Data Sources
Data Available From Selected Government Agencies

Government Agency Some of the Data Available
Census Bureau Population data, number of
www.census.gov households, household income
Federal Reserve Board Data on money supply, exchange
www.federalreserve.gov rates, discount rates
Office of Mgmt. & Budget Data on revenue, expenditures, debt
www.whitehouse.gov/omb of federal government
Department of Commerce Data on business activity, value of
www.doc.gov shipments, profit by industry
Bureau of Labor Statistics Customer spending, unemployment
www.bls.gov rate, hourly earnings, safety record
Data Acquisition Considerations
Time Requirement
Searching for information can be time consuming.
Information may no longer be useful by the time it
is available.
Cost of Acquisition
Organizations often charge for information even
when it is not their primary business activity.
Data Errors
Using any data that happen to be available or were
acquired with little care can lead to misleading
information.
Descriptive Statistics
Most of the statistical information in newspapers, magazines,

company reports, and other publications consists of data that are
summarized and presented in a form that is easy to understand.
Such summaries of data, which may be tabular, graphical,

or numerical, are referred to as descriptive statistics.
Example: Hudson Auto Repair
The manager of Hudson Auto would like to have a better

understanding of the cost of parts used in the engine
tune-ups performed in her shop. She examines 50
customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed on the next slide.
Sample of Parts Cost ($) for 50 Tune-ups
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Tabular Summary:
Frequency and Percent Frequency
Example: Hudson Auto
Parts Percent
Cost ($) Frequency Frequency
50-59 2 4
60-69 13 26 (2/50)100
70-79 16 32
80-89 7 14
90-99 7 14
100-109 5 10
50 100
Graphical Summary: Histogram
Example: Hudson Auto
18 Tune-up Parts Cost
16
14
12
Frequency
10
8
6
4
2
Parts
50-59 60-69 70-79 80-89 90-99 100-110 Cost ($)
Numerical Descriptive
Statistics
The most common numerical descriptive statistic
is the average (or mean).
The average demonstrates a measure of the central
tendency, or central location, of the data for a variable.
Hudson’s average cost of parts, based on the 50
tune-ups studied, is $79 (found by summing the
50 cost values and then dividing by 50).
Statistical Inference
Population -the set of all elements of interest in a

particular study
Sample -a subset of the population
Statistical inference -the process of using data obtained

from a sample to make estimates
and test hypotheses about the
characteristics of a population
Census -collecting data for the entire population
Sample survey -collecting data for a sample

Descriptive Statistics:
Tabular and Graphical Displays
Summarizing Data for a Qualitative Variable
Summarizing Data for a Quantitative Variable
Qualitative data use labels or names to identify

categories of like items.
Quantitative data are numerical values that indicate

how much or how many.
Summarizing Qualitative Data
Frequency Distribution
Relative Frequency Distribution
Percent Frequency Distribution
Bar Chart
Pie Chart
A frequency distribution is a tabular summary of data

showing the number (frequency) of observations in
each of several non-overlapping categories or classes.
The objective is to provide insights about the data

that cannot be quickly obtained by looking only at
the original data.
Example: Marada Inn

Guests staying at Marada Inn were asked to rate the quality
of their accommodations as being excellent, above average,
average, below average, or poor. The ratings provided by a
sample of 20 guests are:
Below Average Average Above Average

Above Average Above Average Above Average
Above Average Below Average Below Average
Average Poor Poor
Above Average Excellent Above Average
Average Above Average Average
Above Average Average
Example: Marada Inn
Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20
Relative Frequency Distribution
The relative frequency of a class is the fraction or

proportion of the total number of data items
belonging to the class.
A relative frequency distribution is a tabular

summary of a set of data showing the relative
frequency for each class.
Percent Frequency Distribution
The percent frequency of a class is the relative
frequency multiplied by 100.
A percent frequency distribution is a tabular

summary of a set of data showing the percent
frequency for each class.
Relative Frequency and
Percent Frequency Distributions
Example: Marada Inn
Relative Percent
Rating Frequency Frequency
Poor .10 10
Below Average .15 15
Average .25 25 .10(100) = 10
Above Average .45 45
Excellent .05 5
Total 1.00 100
1/20 = .05
Bar Chart
A bar chart is a graphical display for depicting
qualitative data.
On one axis (usually the horizontal axis), we specify
the labels that are used for each of the classes.
A frequency, relative frequency, or percent frequency
scale can be used for the other axis (usually the
vertical axis).
Using a bar of fixed width drawn above each class
label, we extend the height appropriately.
The bars are separated to emphasize the fact that
each class is a separate category.
Bar Chart
10 Marada Inn Quality Ratings

9
8
7
Frequency
6
5
4
3
2
1
Rating
Poor Below Average Above Excellent
Average Average
Pareto Diagram
In quality control, bar charts are used to identify the

most important causes of problems.
When the bars are arranged in descending order of height

from left to right (with the most frequently occurring cause
appearing first) the bar chart is called a Pareto diagram.
This diagram is named for its founder, Vilfredo Pareto,

an Italian economist.
Pie Chart
The pie chart is a commonly used graphical display for
presenting relative frequency and percent frequency
distributions for categorical data.
First draw a circle; then use the relative frequencies to
subdivide the circle into sectors that correspond to the
relative frequency for each class.
Since there are 360 degrees in a circle, a class with a relative
frequency of .25 would consume .25(360) = 90 degrees of the
circle.
Pie Chart
Marada Inn Quality Ratings

Excellent
5%
Poor
10%
Below
Average
Above 15%
Average
45%
Average
25%
Summarizing Quantitative
Data
Relative Frequency and Percent Frequency Distributions
Dot Plot
Histogram
Cumulative Distributions
Stem-and-Leaf Displays
The manager of Hudson Auto would like to gain a
better understanding of the cost of parts used in the
engine tune-ups performed in the shop. She examines
50 customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide.
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
The three steps necessary to define the classes for a

frequency distribution with quantitative data are:
1. Determine the number of non-overlapping classes.
2. Determine the width of each class.
3. Determine the class limits.
Guidelines for Determining the Number of Classes
Use between 5 and 20 classes.
Data sets with a larger number of elements usually
require a larger number of classes.
Smaller data sets usually require fewer classes.
The goal is to use enough classes to show the

variation in the data, but not so many classes that
some contain only a few data items.
Guidelines for Determining the Width of Each Class
Use classes of equal width.
Approximate Class Width =
Largest Data Value  Smallest Data Value

Number of Classes
Making the classes the same width reduces

the chance of inappropriate interpretations.
Note on Number of Classes and Class Width
In practice, the number of classes and the appropriate

class width are determined by trial and error.
Once a possible number of classes is chosen, the

appropriate class width is found.
The process can be repeated for a different number of
classes.
Ultimately, the analyst uses judgment to determine
the combination of the number of classes and class
width that provides the best frequency distribution
for summarizing the data.
Guidelines for Determining the Class Limits

Class limits must be chosen so that each data item
belongs to one and only one class.
The lower class limit identifies the smallest possible data
value assigned to the class.
The upper class limit identifies the largest possible data
value assigned to the class.
The appropriate values for the class limits depend on the
level of accuracy of the data.
An open-end class requires only a lower class limit

or an upper class limit.

If we choose six classes:
Approximate Class Width = (109 - 52)/6 = 9.5   10
Parts Cost ($) Frequency
50-59 2
60-69 13
70-79 16
80-89 7
90-99 7
100-109 5
Total 50
Relative Frequency and
Percent Frequency Distributions
Parts Relative Percent

Cost ($) Frequency Frequency
50-59 .04 4
60-69 .26 2/50 26 .04(100)
70-79 .32 32
80-89 .14 14 Percent
frequency is
90-99 .14 14 the relative
100-109 .10 10 frequency
Total 1.00 100 multiplied
by 100.
Dot Plot
One of the simplest graphical summaries of data is a dot

plot.
A horizontal axis shows the range of data values.
Then each data value is represented by a dot placed
above the axis.
Dot Plot
Tune-up Parts Cost
50 60 70 80 90 100 110
Cost ($)
Histogram
Another common graphical display of quantitative data
is a histogram.
The variable of interest is placed on the horizontal axis.
A rectangle is drawn above each class interval with its

height corresponding to the interval’s frequency, relative
frequency, or percent frequency.
Unlike a bar graph, a histogram has no natural separation
between rectangles of adjacent classes.
Histogram
18
Tune-up Parts Cost
16
14
12
Frequency
10
8
6
4
2
Parts
50-59 60-69 70-79 80-89 90-99 100-110 Cost ($)
Cumulative frequency distribution -shows the

number of items with values less than or equal to the
upper limit of each class..
Cumulative relative frequency distribution – shows

the proportion of items with values less than or
equal to the upper limit of each class.
Cumulative percent frequency distribution – shows

the percentage of items with values less than or
equal to the upper limit of each class.
Hudson Auto Repair
Cumulative Cumulative
Cumulative Relative Percent
Cost ($) Frequency Frequency Frequency
< 59 2 .04 4
< 69 15 .30 30
< 79 31 2 + 13 .62 15/50 62 .30(100)
< 89 38 .76 76
< 99 45 .90 90
< 109 50 1.00 100
Stem-and-Leaf Display
A stem-and-leaf display shows both the rank order and

shape of the distribution of the data.
It is similar to a histogram on its side, but it has the
advantage of showing the actual data values.
The first digits of each data item are arranged to the left
of a vertical line.
To the right of the vertical line we record the last digit
for each item in rank order.
Each line in the display is referred to as a stem.
Each digit on a stem is a leaf.
The manager of Hudson Auto would like to gain a

better understanding of the cost of parts used in the
engine tune-ups performed in the shop. She examines
50 customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide.
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
5 2 7
6 2 2 2 2 5 6 7 8 8 8 9 9 9
7 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9
8 0 0 2 3 5 8 9
9 1 3 7 7 7 8 9
10 1 4 5 5 9
a stem
a leaf
Stretched Stem-and-Leaf
Display
If we believe the original stem-and-leaf display has
condensed the data too much, we can stretch the display
vertically by using two stems for each leading digit(s).
Whenever a stem value is stated twice, the first value

corresponds to leaf values of 0 -4, and the second value
corresponds to leaf values of 5 -9.
Stretched Stem-and-Leaf Display

5 2
5 7
6 2 2 2 2
6 5 6 7 8 8 8 9 9 9
7 1 1 2 2 3 4 4
7 5 5 5 6 7 8 9 9 9
8 0 0 2 3
8 5 8 9
9 1 3
9 7 7 7 8 9
10 1 4
10 5 5 9
Leaf Units
A single digit is used to define each leaf.
In the preceding example, the leaf unit was 1.
Leaf units may be 100, 10, 1, 0.1, and so on.
Where the leaf unit is not shown, it is assumed to
equal 1.
The leaf unit indicates how to multiply the stem-
and-leaf numbers in order to approximate the
original data.
Example: Leaf
If we have data with values such as
Unit = 0.1
8.6 11.7 9.4 9.1 10.2 11.0 8.8
a stem-and-leaf display of these data will be
Leaf Unit = 0.1

8 6 8
9 1 4
10 2
11 0 7
Example: Leaf Unit = 10
If we have data with values such as

1806 1717 1974 1791 1682 1910 1838
a stem-and-leaf display of these data will be
Leaf Unit = 10
16 8 The 82 in 1682
17 1 9 is rounded down
18 0 3 to 80 and is
represented as an 8.
19 1 7
Summarizing Data for Two Variables
Using Tables
Often a manager is interested in tabular and graphical

methods that will help understand the relationship between
two variables.
Crosstabulation is a method for summarizing the data for

two variables.
Crosstabulation
A crosstabulation is a tabular summary of data for two

variables.
Crosstabulation can be used when:
one variable is qualitative and the other is
quantitative,
both variables are qualitative, or
both variables are quantitative.
The left and top margin labels define the classes for the
two variables.
Crosstabulation
Example: Finger Lakes Homes

The number of Finger Lakes homes sold for each style
and price for the past two years is shown below.
quantitative categorical
variable variable
Price Home Style
Range Colonial Log Split A-Frame Total
< $200,000 18 6 19 12 55
> $200,000 12 14 16 3 45
Total 30 20 35 15 100
Crosstabulation
Frequency
Example: Finger Lakes Homes distribution
for the
price range
variable
Price Home Style
Range Colonial Log Split A-Frame Total
< $200,000 18 6 19 12 55
> $200,000 12 14 16 3 45
Total 30 20 35 15 100
Frequency distribution for

the home style variable
Crosstabulation: Simpson’s Paradox
Data in two or more cross tabulations are often aggregated

to produce a summary cross tabulation.
We must be careful in drawing conclusions about the

relationship between the two variables in the aggregated
cross tabulation.
In some cases the conclusions based upon an aggregated

crosstabulation can be completely reversed if we look at
the unaggregated data. The reversal of conclusions based
on aggregate and unaggregated data is called Simpson’s
paradox.
Summarizing Data for Two Variables Using
Graphical Displays
In most cases, a graphical display is more useful than a

table for recognizing patterns and trends.
Displaying data in creative ways can lead to powerful

insights.
Scatter diagrams and trendlines are useful in exploring

the relationship between two variables.
Scatter Diagram and Trendline
A scatter diagram is a graphical presentation of the

relationship between two quantitative variables.
One variable is shown on the horizontal axis and the
other variable is shown on the vertical axis.
The general pattern of the plotted points suggest the
overall relationship between the variables.
A trendline provides an approximation of the relationship.

Scatter Diagram
A Positive Relationship
x
Scatter Diagram
A Negative Relationship
x
Scatter Diagram
No Apparent Relationship
x
Scatter Diagram
Example: Panthers Football Team
The Panthers football team is interested in
investigating the relationship, if any, between
interceptions made and points scored.
x = Number of y = Number of
Interceptions Points Scored
1 14
3 24
2 18
1 17
3 30
Scatter Diagram and Trendline
y
Number of Points Scored 35
30
25
20
15
10
5
0 x
0 1 2 3 4
Number of Interceptions
Example: Panthers Football Team
Insights Gained from the Preceding Scatter Diagram

The scatter diagram indicates a positive relationship
between the number of interceptions and the number of
points scored.
Higher points scored are associated with a higher number

of interceptions.
The relationship is not perfect; all plotted points in the

scatter diagram are not on a straight line.
Side-by-Side Bar Chart
A side-by-side bar chart is a graphical display for

depicting multiple bar charts on the same display.
Each cluster of bars represents one value of the
first variable.
Each bar within a cluster represents one value of
the second variable.
Side-by-Side Bar Chart
Finger Lake Homes
20
18
16
14
Frequency
12 < $200,000
> $200,000
10
8
6
4
2
Home Style
Colonial Log Split-Level A-Frame
Stacked Bar Chart
A stacked bar chart is another way to display and compare

two variables on the same display.
It is a bar chart in which each bar is broken into rectangular
segments of a different color.
If percentage frequencies are displayed, all bars will be of

the same height (or length), extending to the 100% mark.
Stacked Bar Chart
Finger Lake Homes
40
36
32
28
Frequency
24 < $200,000
> $200,000
20
16
12
8
4
Home Style
Colonial Log Split A-Frame
Choosing the Type of Graphical Display
Displays used to show the distribution of data:

Bar Chart Pie Chart Dot Plot
Histogram Stem-and-Leaf Display
Displays used to make comparisons:

Side-by-Side Bar Chart Stacked Bar Chart
Displays used to show relationships:

Scatter Diagram Trendline
Tabular and Graphical Displays
Data
Qualitative Data Quantitative Data
Tabular Graphical Tabular Graphical

Displays Displays Displays Displays
• Frequency • Bar Chart • Frequency • Dot Plot

Distribution • Pie Chart Distribution • Histogram
• Rel. Freq. Dist. • Side-by-Side • Rel. Freq. Dist. • Stem-and-
• Percent Freq. Bar Chart • % Freq. Dist. Leaf Display
Distribution • Stacked • Cum. Freq. Dist. • Scatter
• Crosstabulation Bar Chart • Cum. Rel. Freq. Diagram
Distribution
• Cum. % Freq.
Distribution
• Crosstabulation
Measures of Location
Mean
Weighted Mean
If the measures are computed
Median for data from a sample,
Geometric Mean they are called sample statistics.
Mode
If the measures are computed
Percentiles for data from a population,
Quartiles they are called population parameters.
85
A sample statistic is referred to
as the point estimator of the
corresponding population parameter.
Mean
Perhaps the most important measure of location is the mean.
The mean provides a measure of central location.

The mean of a data set is the average of all the data values.
The sample mean is the point estimator of the population

mean m.
86
Sample Mean
Sum of the values

of the n observations
x i
x
n
Number of
observations
in the sample
87
Population Mean m
Sum of the values

of the N observations
x i

N
Number of
observations in
the population
88
Sample Mean
Example: Apartment Rents

Seventy efficiency apartments were randomly sampled
in a small college town. The monthly rent prices for
these apartments are listed below.
445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440
89
Sample Mean
x  xi 34, 356
  490.80
n 70
445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440
90
Weighted Mean
In some instances the mean is computed by giving each
observation a weight that reflects its relative importance.
The choice of weights depends on the application.

The weights might be the number of credit hours earned for
each grade, as in GPA.
In other weighted mean computations, quantities such as
pounds, dollars, or volume are frequently used.
91
Weighted Mean
If data is from
a population, Numerator:
sum of the weighted
m replaces x.
data values
x=
å wx i i
åw i
Denominator:
sum of the
where: weights
xi = value of observation i
wi = weight for observation i
92
Weighted Mean
Example: Construction Wages
Ron Butler, a home builder, is looking over the expenses he
incurred for a house he just built. For the purpose of pricing
future projects, he would like to know the average wage
($/hour) he paid the workers he employed. Listed below are
the categories of worker he employed, along with their
respective wage and total hours worked.
Worker Wage ($/hr) Total Hours

Carpenter 21.60 520
Electrician 28.72 230
Laborer 11.80 410
93
Painter 19.75 270
Plumber 24.16 160
Weighted Mean
Example: Construction Wages

Worker xi wi wi x i
Carpenter 21.60 520 11232.0
Electrician 28.72 230 6605.6
Laborer 11.80 410 4838.0
Painter 19.75 270 5332.5
Plumber 24.16 160 3865.6
1590 31873.7
m=
å wx i i
=
31873.7
=20.0464 » $20.05
åw i
1590
94
FYI, equally-weighted (simple) mean = $21.21
Median
The median of a data set is the value in the middle when the data
items are arranged in ascending order.
Whenever a data set has extreme values, the median is the
preferred measure of central location.
The median is the measure of location most often reported for
annual income and property value data.
A few extremely large incomes or property values can inflate the
mean.
95
Median
For an odd number of observations:
26 18 27 12 14 27 19 7 observations
12 14 18 19 26 27 27 in ascending order
the median is the middle value.
Median = 19
96
Median
For an even number of observations:
26 18 27 12 14 27 30 19 8 observations
12 14 18 19 26 27 27 30 in ascending order
the median is the average of the middle two values.
Median = (19 + 26)/2 = 22.5
97
Median
Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
98
Note: Data is in ascending order.
Trimmed Mean
Another measure, sometimes used when extreme values are
present, is the trimmed mean.
It is obtained by deleting a percentage of the smallest and largest
values from a data set and then computing the mean of the
remaining values.
For example, the 5% trimmed mean is obtained by removing the

smallest 5% and the largest 5% of the data values and then
computing the mean of the remaining values.
99
Geometric Mean
The geometric mean is calculated by finding the nth root of the
product of n values.
It is often used in analyzing growth rates in financial data (where

using the arithmetic mean will provide misleading results).
It should be applied anytime you want to determine the mean

rate of change over several successive periods (be it years,
quarters, weeks, . . .).
Other common applications include: changes in populations of

species, crop yields, pollution levels, and birth and death rates.
10
Geometric Mean
10
Geometric Mean
Example: Rate of Return
Period Return (%) Growth Factor
1 -6.0 0.940
2 -8.0 0.920
3 -4.0 0.960
4 2.0 1.020
5 5.4 1.054
x g = 5 (.94)(.92)(.96)(1.02)(1.054)
1
5
=[.89254] =.97752
Average growth rate per period
is (.97752 - 1) (100) = -2.248%
10
Mode
The mode of a data set is the value that occurs with greatest
frequency.
The greatest frequency can occur at two or more different values.
If the data have exactly two modes, the data are bimodal.
If the data have more than two modes, the data are multimodal.
10
Mode
450 occurred most frequently (7 times)
Mode = 450
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
10
Percentiles
A percentile provides information about how the data are spread

over the interval from the smallest value to the largest value.
Admission test scores for colleges and universities are frequently

reported in terms of percentiles.
The pth percentile of a data set is a value such that at least p

percent of the items take on this value or less and at least (100 -
p) percent of the items take on this value or more.
10
Percentiles
Arrange the data in ascending order.
Compute index i, the position of the pth percentile.

i = (p/100)n
If i is not an integer, round up. The pth percentile

is the value in the ith position.
If i is an integer, the pth percentile is the average

of the values in positions i and i +1.
10
80th Percentile
i = (p/100)n = (80/100)70 = 56
Averaging the 56th and 57th data values:
80th Percentile = (535 + 549)/2 = 542
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

10
80th Percentile
“At least 80% of the “At least 20% of the
items take on a items take on a
value of 542 or less.” value of 542 or more.”
56/70 = .8 or 80% 14/70 = .2 or 20%
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
10
575 575 580 590 600 600 600 600 615 615
Quartiles
Quartiles are specific percentiles.
First Quartile = 25th Percentile
Second Quartile = 50th Percentile = Median
Third Quartile = 75th Percentile
10
Third Quartile
Third quartile = 75th percentile
i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

11
Measures of Variability
It is often desirable to consider measures of variability (dispersion),
as well as measures of location.
For example, in choosing supplier A or supplier B we might
consider not only the average delivery time for each, but also the
variability in delivery time for each.
111
Measures of Variability
Range
Interquartile Range
Variance
Standard Deviation
Coefficient of Variation
11
Range
The range of a data set is the difference between the largest and
smallest data values.
It is the simplest measure of variability.
It is very sensitive to the smallest and largest data values.

Range
Range = largest value - smallest value
Range = 615 - 425 = 190
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
11
Interquartile Range
The interquartile range of a data set is the difference between the
third quartile and the first quartile.
It is the range for the middle 50% of the data.
It overcomes the sensitivity to extreme data values.
Interquartile Range
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

11
Variance
The variance is a measure of variability that utilizes

all the data.
It is based on the difference between the value of

each observation (xi) and the mean (for a x sample,
m for a population).
The variance is useful in comparing the variability

of two or more variables.
117
Variance
The variance is the average of the squared differences between

each data value and the mean.
The variance is computed as follows:
2 å ( x - x )2
i s 2
=
å (x
i
- m )2
s = N
n- 1
for a for a
sample population
11
Standard Deviation
The standard deviation of a data set is the positive square

root of the variance.
It is measured in the same units as the data, making it more

easily interpreted than the variance.
The standard deviation is computed as follows:
s= s 2
s = s2
For a Sample For a Population
11
The coefficient of variation indicates how large the

standard deviation is in relation to the mean.
The coefficient of variation is computed as follows:

æs ö æs ö
ç ´ 100 ÷% ç ´ 100 ÷%
èx ø èm ø
for a for a
sample population
12
Sample Variance, Standard Deviation,
And Coefficient of Variation
Variance
s2 =
å ( x i - x )2
= 2,996.16
n- 1
Standard Deviation
the standard
deviation is
s = s = 2996.16 = 54.74
2
about 11%
of the mean
æç s ö÷ æ ç 54.74 ö
÷% = 11.15%
ç ´ 100 %
÷ ç= ´ 100
12
÷
è x ø è 490.80 ø
Distribution Shape: Skewness
An important measure of the shape of a distribution is called
skewness.
The formula for the skewness of sample data is
3
n æx - xö
Skewness =
(n - 1)(n - 2)
å ç i ÷
è s ø
12
Symmetric (not skewed)
Skewness is zero.
Mean and median are equal.
.35
Skewness = 0
.30
Relative Frequency
.25
.20
.15
.10
.05
12
0
Moderately Skewed Left
Skewness is negative.
Mean will usually be less than the median.
.35
Skewness = - .31
.30
Relative Frequency
.25
.20
.15
.10
.05
12
0
Moderately Skewed Right
Skewness is positive.
Mean will usually be more than the median.
.35
Skewness = .31
.30
Relative Frequency
.25
.20
.15
.10
.05
12
0
Highly Skewed Right
Skewness is positive (often above 1.0).
Mean will usually be more than the median.
.35
Skewness = 1.25
.30
Relative Frequency
.25
.20
.15
.10
.05
12
0
Seventy efficiency apartments were randomly sampled in
a college town. The monthly rent prices for the
apartments are listed below in ascending order.
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
12
.35 Skewness = .92

.30
Relative Frequency
.25
.20
.15
.10
.05
12
0
Five-Number Summaries
and Box Plots
Summary statistics and easy-to-draw graphs can be used to

quickly summarize large quantities of data.
Two tools that accomplish this are five-number summaries

and box plots.
12
Five-Number Summary
1 Smallest Value
2 First Quartile
3 Median
4 Third Quartile
5 Largest Value
13
Five-Number Summary
Lowest Value = 425 First Quartile = 445
Median = 475
Third Quartile = 525 Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
13
Box Plot
A box plot is a graphical summary of data that is based on a

five-number summary.
A key to the development of a box plot is the computation of

the median and the quartiles Q1 and Q3.
Box plots provide another way to identify outliers.
13
Box Plot
A box is drawn with its ends located at the first and third quartiles
A vertical line is drawn in the box at the location of the median

(second quartile).
400 425 450 475 500 525 550 575 600 625
13
Q1 = 445 Q3 = 525
Q2 = 475
Box Plot
Limits are located (not drawn) using the interquartile range
(IQR).
Data outside these limits are considered outliers.
The locations of each outlier is shown with the symbol * .
13
Box Plot
Whiskers (dashed lines) are drawn from the ends of the box
to the smallest and largest data values inside the limits.
400 425 450 475 500 525 550 575 600 625
13
Smallest value Largest value
inside limits = 425 inside limits = 615
Measures of Association
Between Two Variables
Thus far we have examined numerical methods used to
summarize the data for one variable at a time.
Often a manager or decision maker is interested in the

relationship between two variables.
Two descriptive measures of the relationship between two

variables are covariance and correlation coefficient.
13
Covariance
The covariance is a measure of the linear association

between two variables.
Positive values indicate a positive relationship.
Negative values indicate a negative relationship.
13
Covariance
The covariance is computed as follows:
å ( xi - x )( yi - y ) for
sxy = samples
n- 1
s =
å (xi - m x )( yi - m y ) for
xy
N populations
13
Correlation Coefficient
Correlation is a measure of linear association and not

necessarily causation.
Just because two variables are highly correlated, it does not

mean that one variable is the cause of the other.
13
The correlation coefficient is computed as follows:
sxy s xy
rxy = r xy =
sx s y s xs y
for for
samples populations
14
The coefficient can take on values between -1 and +1.
Values near -1 indicate a strong negative linear relationship.
Values near +1 indicate a strong positive linear relationship.
The closer the correlation is to zero, the weaker the

relationship.
14
Covariance and Correlation Coefficient
Example: Golfing Study
A golfer is interested in investigating the relationship,
if any, between driving distance and 18-hole score.
Average Driving Average

Distance (yds.) 18-Hole Score
277.6 69
259.5 71
269.1 70
267.0 70
255.6 71
14
272.9 69
x y (x i - x ) (y i - y ) (x i - x )(y i - y )
277.6 69 10.65 -1.0 -10.65
259.5 71 -7.45 1.0 -7.45
269.1 70 2.15 0 0
267.0 70 0.05 0 0
255.6 71 -11.35 1.0 -11.35
272.9 69 5.95 -1.0 -5.95
Average 267.0 70.0 Total -35.40
14
Std. Dev. 8.2192 .8944
Sample Covariance
sxy =
å (x - x )(y
i
- y ) - 35.40
i
= = - 7.08
n- 1 6- 1
Sample Correlation Coefficient
sxy - 7.08
rxy = = = -.9631
sx sy (8.2192)(.8944)
14
Introduction to Probability
Experiments, Counting Rules, and Assigning Probabilities
Events and Their Probability

Some Basic Relationships of Probability
14
Uncertainties
Managers often base their decisions on an analysis
of uncertainties such as the following:
What are the chances that sales will decrease

if we increase prices?
What is the likelihood a new assembly method

will increase productivity?
What are the odds that a new investment will

be profitable?
14
Probability
Probability is a numerical measure of the likelihood

that an event will occur.
Probability values are always assigned on a scale

from 0 to 1.
A probability near zero indicates an event is quite

unlikely to occur.
A probability near one indicates an event is almost

certain to occur.
14
Probability as a Numerical Measure
of the Likelihood of Occurrence
Increasing Likelihood of Occurrence
0 .5 1
Probability:
The event The occurrence The event

is very of the event is is almost
unlikely just as likely as certain
to occur. it is unlikely. to occur.
14
Statistical Experiments
In statistics, the notion of an experiment differs somewhat

from that of an experiment in the physical sciences.
In statistical experiments, probability determines

outcomes.
Even though the experiment is repeated in exactly the same

way, an entirely different outcome may occur.
For this reason, statistical experiments are some-times called

random experiments.
14
An Experiment and Its Sample Space
An experiment is any process that generates well-defined

outcomes.
The sample space for an experiment is the set of all

experimental outcomes.
An experimental outcome is also called a sample point.
15
Experiment Experiment Outcomes

Toss a coin Head, tail
Inspection a part Defective, non-defective
Conduct a sales call Purchase, no purchase
Roll a die 1, 2, 3, 4, 5, 6
Play a football game Win, lose, tie
15
Example: Bradley Investments
Bradley has invested in two stocks, Markley Oil and Collins
Mining. Bradley has determined that the possible outcomes of
these investments three months from now are as follows.
Investment Gain or Loss

in 3 Months (in $000)
Markley Oil Collins Mining
10 8
5 -2
0
15
-20
A Counting Rule for
Multiple-Step Experiments
If an experiment consists of a sequence of k steps in which there

are n1 possible results for the first step, n2 possible results for the
second step, and so on, then the total number of experimental
outcomes is given by (n1)(n2) . . . (nk).
A helpful graphical representation of a multiple-step experiment

is a tree diagram.
15
A Counting Rule for
Multiple-Step Experiments
Bradley Investments can be viewed as a two-step

experiment. It involves two stocks, each with a set of
experimental outcomes.
Markley Oil: n1 = 4
Collins Mining: n2 = 2
Total Number of
Experimental Outcomes: n1n2 = (4)(2) = 8
15
Tree Diagram
Markley Oil Collins Mining Experimental
(Stage 1) (Stage 2) Outcomes
Gain 8 (10, 8) Gain $18,000
(10, -2) Gain $8,000
Gain 10 Lose 2
Gain 8 (5, 8) Gain $13,000
Lose 2 (5, -2) Gain $3,000

Gain 5
Gain 8
(0, 8) Gain $8,000
Even
(0, -2) Lose $2,000
Lose 20 Lose 2
Gain 8 (-20, 8) Lose $12,000
Lose 2
15
(-20, -2) Lose $22,000
Counting Rule for Combinations
Number of Combinations of N Objects Taken n at a Time
A second useful counting rule enables us to count the
number of experimental outcomes when n objectsare to
be selected from a set of N objects.
N
æ N ö N!
C =ç
nç ÷
÷ =
è n ø n!(N - n)!
where: N! = N(N -1)(N -2) . . . (2)(1)

n! = n(n -1)(n -2) . . . (2)(1)
15
0! = 1
Counting Rule for Permutations
Number of Permutations of N Objects Taken n at a Time
A third useful counting rule enables us to count
the number of experimental outcomes when n
objects are to be selected from a set of N objects,
where the order of selection is important.
N
æ N ö N!
P =n!ç
n ç ÷
÷ =
è n ø (N - n)!
where: N! = N(N -1)(N -2) . . . (2)(1)
15
n! = n(n -1)(n -2) . . . (2)(1)
0! = 1
Assigning Probabilities
Basic Requirements for Assigning Probabilities
1. The probability assigned to each experimental

outcome must be between 0 and 1, inclusively.
0 < P(Ei) < 1 for all i
where:
Ei is the ith experimental outcome
and P(Ei) is its probability
15
Basic Requirements for Assigning Probabilities
2. The sum of the probabilities for all experimental

outcomes must equal 1.
P(E1) + P(E2) + . . . + P(En) = 1
where:
n is the number of experimental outcomes
15
Classical Method
Assigning probabilities based on the assumption
of equally likely outcomes
Relative Frequency Method

Assigning probabilities based on experimentation
or historical data
Subjective Method
Assigning probabilities based on judgment
16
Classical Method
Example: Rolling a Die
If an experiment has n possible outcomes, the
classical method would assign a probability of 1/n to
each outcome.
Experiment: Rolling a die

Sample Space: S = {1, 2, 3, 4, 5, 6}
Probabilities: Each sample point has a
1/6 chance of occurring
16
Example: Lucas Tool Rental
Lucas Tool Rental would like to assign probabilities
to the number of car polishers it rents each day.
Office records show the following frequencies of daily
rentals for the last 40 days.
Number of Number
Polishers Rented of Days
0 4
1 6
2 18
3 10
4 2
16
Example: Lucas Tool Rental
Each probability assignment is given by dividing the
frequency (number of days) by the total frequency (total
number of days).
Number of Number
Polishers Rented of Days Probability
0 4 .10
1 6 .15
2 18 .45 4/40
3 10 .25
4 2 .05
40 1.00
16
Subjective Method
When economic conditions and a company’s circumstances change

rapidly it might be inappropriate to assign probabilities based
solely on historical data.
We can use any data available as well as our experience and

intuition, but ultimately a probability value should express our
degree of belief that the experimental outcome will occur.
The best probability estimates often are obtained by combining

the estimates from the classical or relative frequency approach
with the subjective estimate.
16
Subjective Method
An analyst made the following probability estimates.
Exper. Outcome Net Gain or Loss Probability

(10, 8) $18,000 Gain .20
(10, -2) $8,000 Gain .08
(5, 8) $13,000 Gain .16
(5, -2) $3,000 Gain .26
(0, 8) $8,000 Gain .10
(0, -2) $2,000 Loss .12
(-20, 8) $12,000 Loss .02
16
(-20, -2) $22,000 Loss .06
Events and Their Probabilities
An event is a collection of sample points.
The probability of any event is equal to the sum of

the probabilities of the sample points in the event.
If we can identify all the sample points of an

experiment and assign a probability to each, we
can compute the probability of an event.
16
Event M = Markley Oil Profitable

M = {(10, 8), (10, -2), (5, 8), (5, -2)}
P(M) = P(10, 8) + P(10, -2) + P(5, 8) + P(5, -2)
= .20 + .08 + .16 + .26
= .70
16
Event C = Collins Mining Profitable

C = {(10, 8), (5, 8), (0, 8), (-20, 8)}
P(C) = P(10, 8) + P(5, 8) + P(0, 8) + P(-20, 8)
= .20 + .16 + .10 + .02
= .48
16
Some Basic Relationships of Probability
There are some basic probability relationships that can be used to
compute the probability of an event without knowledge of all
the sample point probabilities.
Complement of an Event
Union of Two Events
Intersection of Two Events
Mutually Exclusive Events
16
Complement of an Event
The complement of event A is defined to be the event

consisting of all sample points that are not in A.
The complement of A is denoted by Ac.
Sample
Event A Ac Space S
Venn
17
Diagram
Union of Two Events
The union of events A and B is the event containing

all sample points that are in A or B or both.
The union of events A and B is denoted by A B
Sample
Event A Event B Space S
171
Union of Two Events

M C = Markley Oil Profitable
or Collins Mining Profitable (or both)
M C = {(10, 8), (10, -2), (5, 8), (5, -2), (0, 8), (-20, 8)}
P(M C) = P(10, 8) + P(10, -2) + P(5, 8) + P(5, -2)
+ P(0, 8) + P(-20, 8)
= .20 + .08 + .16 + .26 + .10 + .02
17
= .82
The intersection of events A and B is the set of all

sample points that are in both A and B.
The intersection of events A and B is denoted by A 
Sample
Intersection of A and B
17

M C = Markley Oil Profitable
and Collins Mining Profitable
M C = {(10, 8), (5, 8)}
P(M C) = P(10, 8) + P(5, 8)
= .20 + .16
= .36
17
Addition Law
The addition law provides a way to compute the probability of

event A, or B, or both A and B occurring.
The law is written as:
P(A B) = P(A) + P(B) -P(A  B
17
Addition Law

M C= Markley Oil Profitable
or Collins Mining Profitable
We know: P(M) = .70, P(C) = .48, P(M C) = .36
Thus: P(M  C) = P(M) + P(C) -P(M  C)
= .70 + .48 -.36
= .82
(This result is the same as that obtained earlier
using the definition of the probability of an event.)
17
Two events are said to be mutually exclusive if the

events have no sample points in common.
Two events are mutually exclusive if, when one event

occurs, the other cannot occur.
Sample
177
If events A and B are mutually exclusive, P(A  B = 0.
The addition law for mutually exclusive events is:
P(A B) = P(A) + P(B)
There is no need to
include “-P(A  B”
17
Mutual Exclusiveness and Independence
Do not confuse the notion of mutually exclusive

events with that of independent events.
Two events with nonzero probabilities cannot be

both mutually exclusive and independent.
If one mutually exclusive event is known to occur,

the other cannot occur.; thus, the probability of the
other event occurring is reduced to zero (and they
are therefore dependent).
Two events that are not mutually exclusive, might

or might not be independent.
17
Thank You
18
10/12/2020

Session 1,2, and 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Session 1,2, and 3

Uploaded by

Copyright:

Available Formats

Probability and Statistics 1

Instructor: Prof. Deepika Jain

E-mail id: deepika.jain@iimrohtak.ac.in

Economists use statistical information in making forecasts about the future of

Electronic point-of-sale scanners at retail checkout counters are used to collect

Data are the facts and figures collected, analyzed, and

All the data collected in a particular study are referred to as the

The set of measurements obtained for a particular element

The total number of data values in a complete data set is the

The scale determines the amount of information contained in the data.

The scale indicates the data summarization and

Data are labels or names used to identify an attribute

A nonnumeric label or numeric code may be used.

The data have the properties of nominal data and the

A nonnumeric label or numeric code may be used.

The data have the properties of ordinal data, and

Interval data are always numeric.

The data have all the properties of interval data

Variables such as distance, height, weight, and time

Data can be further classified as being qualitative

The statistical analysis that is appropriate depends

In general, there are more alternatives for statistical

Use either the nominal or ordinal scale of measurement

Can be either numeric or nonnumeric

Appropriate statistical analyses are rather limited

Quantitative data indicate how many or how much:

discrete, if measuring how many

continuous, if measuring how much

Quantitative data are always numeric.

Ordinary arithmetic operations are meaningful for

Numeric Non-numeric Numeric

Nominal Ordinal Nominal Ordinal Interval Ratio

Cross-sectional data are collected at the same or

Example: data detailing the number of building

Time series data are collected over several time

Example: data detailing the number of building

Graphs of time series help analysts understand

Internal company records – almost any department

Data Available From Internal Company Records

Data Available From Selected Government Agencies

Most of the statistical information in newspapers, magazines,

Such summaries of data, which may be tabular, graphical,

The manager of Hudson Auto would like to have a better

Population -the set of all elements of interest in a

Statistical inference -the process of using data obtained

Sample survey -collecting data for a sample

Qualitative data use labels or names to identify

Quantitative data are numerical values that indicate

A frequency distribution is a tabular summary of data

The objective is to provide insights about the data

Example: Marada Inn

Below Average Average Above Average

Example: Marada Inn

The relative frequency of a class is the fraction or

A relative frequency distribution is a tabular

A percent frequency distribution is a tabular

10 Marada Inn Quality Ratings

In quality control, bar charts are used to identify the

When the bars are arranged in descending order of height

This diagram is named for its founder, Vilfredo Pareto,

Marada Inn Quality Ratings

The three steps necessary to define the classes for a

The goal is to use enough classes to show the