Quantitative Methods - Frequency Distributions

Quantitative Methods -1
Jighyasu Gaur
Statistics
•The term statistics can refer to
numerical facts such as averages,
medians, percent, and index numbers
that help us understand a variety of
business and economic situations.
•Statistics can also refer to the art and

science of collecting, analyzing,
presenting, and interpreting data.
2
•Data
– Data are facts and figures collected, analysed,
and summarized for presentation and
interpretation.
•Data Set
– All the data collected in a particular study are
referred to as the data set for the study.
3
What is Categorical data?
Slide4
Summarizing Categorical Data
 Frequency Distribution
 Relative Frequency Distribution
 Percent Frequency Distribution
 Bar Chart
 Pie Chart
Slide5
Frequency Distribution
A frequency distribution is a tabular summary of

data showing the frequency (or number) of items
in each of several non-overlapping classes.
The objective is to provide insights about the data

that cannot be quickly obtained by looking only at
the original data.
Slide6
 Example: Marada Inn

Guests staying at Marada Inn were asked to rate the
quality of their accommodations as being excellent,
above average, average, below average, or poor. The
ratings provided by a sample of 20 guests are:
Below Average Average Above Average
Above Average Above Average Above Average
Above Average Below Average Below Average
Average Poor Poor
Above Average Excellent Above Average
Average Above Average Average
Above Average Average
Slide7
Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20
Slide8
Relative Frequency Distribution
The relative frequency of a class is the fraction or

proportion of the total number of data items
belonging to the class.
A relative frequency distribution is a tabular

summary of a set of data showing the relative
frequency for each class.
Slide9
Percent Frequency Distribution
The percent frequency of a class is the relative

frequency multiplied by 100.
A percent frequency distribution is a tabular

summary of a set of data showing the percent
frequency for each class.
Slide10
Relative Frequency and
Percent Frequency Distributions
Relative Percent
Rating Frequency Frequency
Poor .10 10
Below Average .15 15
Average .25 25 .10(100) = 10
Above Average .45 45
Excellent .05 5
Total 1.00 100
1/20 = .05
Slide11
Bar Chart
 A bar chart is a graphical device for depicting

qualitative data.
 On one axis (usually the horizontal axis), we specify
the labels that are used for each of the classes.
 A frequency, relative frequency, or percent frequency
scale can be used for the other axis (usually the
vertical axis).
 Using a bar of fixed width drawn above each class
label, we extend the height appropriately.
 The bars are separated to emphasize the fact that each
class is a separate category.
Slide12
Bar Chart
10 Marada Inn Quality Ratings

9
8
7
Frequency
6
5
4
3
2
1
Rating
Poor Below Average Above Excellent
Average Average
Slide13
Pareto Diagram
 In quality control, bar charts are used to identify the

most important causes of problems.
 When the bars are arranged in descending order of
height from left to right (with the most frequently
occurring cause appearing first) the bar chart is
called a Pareto diagram.
 This diagram is named for its founder, Vilfredo
Pareto, an Italian economist.
Slide14
Pie Chart
 The pie chart is a commonly used graphical device

for presenting relative frequency and percent
frequency distributions for categorical data.
 First draw a circle; then use the relative frequencies
to subdivide the circle into sectors that correspond to
the relative frequency for each class.
 Since there are 360 degrees in a circle, a class with a
relative frequency of .25 would consume .25(360) = 90
degrees of the circle.
Slide15
Pie Chart
Marada Inn Quality Ratings

Excellent
5%
Poor
10%
Below
Average
Above 15%
Average
45%
Average
25%
Slide16
Example: Marada Inn
 Insights Gained from the Preceding Pie Chart

• One-half of the customers surveyed gave Marada
a quality rating of “above average” or “excellent”
(looking at the left side of the pie). This might
please the manager.
• For each customer who gave an “excellent” rating,
there were two customers who gave a “poor”
rating (looking at the top of the pie). This should
displease the manager.
Slide17
Summarizing Quantitative Data
 Frequency Distribution
 Relative Frequency and
 Dot Plot
 Histogram
 Cumulative Distributions
 Ogive
Slide18
 Example: Hudson Auto Repair

The manager of Hudson Auto would like to gain a
better understanding of the cost of parts used in the
engine tune-ups performed in the shop. She examines
50 customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide.
Slide19

Sample of Parts Cost($) for 50 Tune-ups
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Slide20
The three steps necessary to define the classes for a

frequency distribution with quantitative data are:
1. Determine the number of non-overlapping classes.
2. Determine the width of each class.
3. Determine the class limits.
Slide21
 Guidelines for Determining the Number of Classes

• Use between 5 and 20 classes.
• Data sets with a larger number of elements
usually require a larger number of classes.
• Smaller data sets usually require fewer classes.
The goal is to use enough classes to show the

variation in the data, but not so many classes
that some contain only a few data items.
Slide22
 Guidelines for Determining the Width of Each Class

• Use classes of equal width.
• Approximate Class Width =
Making the classes the same

width reduces the chance of
inappropriate interpretations.
Slide23
 Note on Number of Classes and Class Width

• In practice, the number of classes and the
appropriate class width are determined by trial
and error.
• Once a possible number of classes is chosen, the
appropriate class width is found.
• The process can be repeated for a different
number of classes.
• Ultimately, the analyst uses judgment to
determine the combination of the number of
classes and class width that provides the best
frequency distribution for summarizing the data.
Slide24
 Guidelines for Determining the Class Limits

• Class limits must be chosen so that each data
item belongs to one and only one class.
• The lower class limit identifies the smallest
possible data value assigned to the class.
• The upper class limit identifies the largest
possible data value assigned to the class.
• The appropriate values for the class limits
depend on the level of accuracy of the data.
An open-end class requires only a

lower class limit or an upper class limit.
Slide25

If we choose six classes:
Approximate Class Width = (109 - 52)/6 = 9.5   10
Parts Cost ($) Frequency
50-59 2
60-69 13
70-79 16
80-89 7
90-99 7
100-109 5
Total 50
Slide26
Parts Relative Percent

Cost ($) Frequency Frequency
50-59 .04 4
60-69 .26 2/50 26 .04(100)
70-79 .32 32
80-89 .14 14 Percent
frequency is
90-99 .14 14 the relative
100-109 .10 10 frequency
Total 1.00 100 multiplied
by 100.
Slide27
Insights Gained from the % Frequency Distribution:
• Only 4% of the parts costs are in the $50-59 class.
• 30% of the parts costs are under $70.
• The greatest percentage (32% or almost one-third)
of the parts costs are in the $70-79 class.
• 10% of the parts costs are $100 or more.
Slide28
Histogram
 Another common graphical presentation of

quantitative data is a histogram.
 The variable of interest is placed on the horizontal
axis.
 A rectangle is drawn above each class interval with
its height corresponding to the interval’s frequency,
relative frequency, or percent frequency.
 Unlike a bar graph, a histogram has no natural
separation between rectangles of adjacent classes.
Slide29
Histogram

18
Tune-up Parts Cost
16
14
Frequency
12
10
8
6
4
2
Parts
50-59 60-69 70-79 80-89 90-99 100-110 Cost ($)
Slide30
Histograms Showing Skewness
 Symmetric
• Left tail is the mirror image of the right tail
• Examples: heights and weights of people
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Slide31
 Moderately Skewed Left

• A longer tail to the left
• Example: exam scores
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Slide32
 Moderately Right Skewed

• A Longer tail to the right
• Example: housing values
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Slide33
 Highly Skewed Right

• A very long tail to the right
• Example: executive salaries
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Slide34
Cumulative Distributions
Cumulative frequency distribution - shows the

number of items with values less than or equal to the
upper limit of each class..
Cumulative relative frequency distribution – shows

the proportion of items with values less than or
equal to the upper limit of each class.
Cumulative percent frequency distribution – shows

the percentage of items with values less than or
equal to the upper limit of each class.
Slide35
 The last entry in a cumulative frequency distribution

always equals the total number of observations.
 The last entry in a cumulative relative frequency
distribution always equals 1.00.
 The last entry in a cumulative percent frequency
distribution always equals 100.
Slide36
 Hudson Auto Repair
Cumulative Cumulative
Cumulative Relative Percent
Cost ($) Frequency Frequency Frequency
< 59 2 .04 4
< 69 15 .30 30
< 79 31 2 + 13 .62 15/50 62 .30(100)
< 89 38 .76 76
< 99 45 .90 90
< 109 50 1.00 100
Slide37
Descriptive Statistics
Slide38
Descriptive Statistics
Most of the statistical information in newspapers,

magazines, company reports, and other
publications consists of data that are summarized
and presented in a form that is easy to understand.
Such summaries of data, which may be tabular,

graphical, or numerical, are referred to as descriptive
statistics.
Slide39
Descriptive Statistics: Numerical Measures
 Measures of Location
 Measures of Variability
Slide40
Measures of Location
 Mean
If the measures are computed
 Median
for data from a sample,
 Mode they are called sample statistics.
 Percentiles
 Quartiles If the measures are computed
for data from a population,
they are called population parameters.
A sample statistic is referred to

as the point estimator of the
corresponding population parameter.
Slide41
Mean
 Perhaps the most important measure of location is

the mean.
 The mean provides a measure of central location.
 The mean of a data set is the average of all the data
values.
 The sample mean is the point estimator of the
population mean m.
Slide42
Sample Mean
Sum of the values

of the n observations
Number of
observations
in the sample
Slide43
Population Mean m
Sum of the values

of the N observations
Number of
observations in
the population
Slide44
Sample Mean
 Example: Apartment Rents

Seventy efficiency apartments were randomly
sampled in a small college town. The monthly rent
prices for these apartments are listed below.
445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440
Slide45
Sample Mean
445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440
Slide46
Median
 The median of a data set is the value in the middle

when the data items are arranged in ascending order.
 Whenever a data set has extreme values, the median
is the preferred measure of central location.
 The median is the measure of location most often
reported for annual income and property value data.
 A few extremely large incomes or property values
can inflate the mean.
Slide47
Median
 For an odd number of observations:
26 18 27 12 14 27 19 7 observations
12 14 18 19 26 27 27 in ascending order
the median is the middle value.
Median = 19
Slide48
Median
 For an even number of observations:
26 18 27 12 14 27 30 19 8 observations
12 14 18 19 26 27 27 30 in ascending order
the median is the average of the middle two values.
Median = (19 + 26)/2 = 22.5
Slide49
Median

Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending order.
Slide50
Trimmed Mean
 Another measure, sometimes used when extreme

values are present, is the trimmed mean.
 It is obtained by deleting a percentage of the
smallest and largest values from a data set and then
computing the mean of the remaining values.
 For example, the 5% trimmed mean is obtained by
removing the smallest 5% and the largest 5% of the
data values and then computing the mean of the
remaining values.
Slide51
Mode
 The mode of a data set is the value that occurs with

greatest frequency.
 The greatest frequency can occur at two or more
different values.
 If the data have exactly two modes, the data are
bimodal.
 If the data have more than two modes, the data are
multimodal.
 Caution: If the data are bimodal or multimodal,
Excel’s MODE function will incorrectly identify a
single mode.
Slide52
Mode

450 occurred most frequently (7 times)
Mode = 450
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Slide53
Descriptive Statistics: Numerical Measures for
Grouped Data
 Measures of Location : Mean
Class 0-1 1-2 2-3 3-4 4-5 5-6

Frequency 1 4 8 6 3 1
Slide54
Grouped Data
Class x f fx
0-1 0.5 1 0.5
1-2 1.5 4 6
2-3 2.5 8 20
3-4 3.5 6 21
4-5 4.5 3 13.5
5-6 5.5 1 5.5
Total 23 66.5
Mean = (66.5/23) = 2.9

Slide55
Grouped Data
 Find mean for the following data set
Class 05-10 10-15 15-20 20-25 25-30
Frequency 10 12 16 14 8
Slide56
Grouped Data
Class x f fx
05-10 7.50 10 75
10-15 12.50 12 150
15-20 17.50 16 280
20-25 22.50 14 315
25-30 27.50 8 220
Total 60 1040
Mean = (1040/60) = 17.33

Slide57
Percentiles
 A percentile provides information about how the

data are spread over the interval from the smallest
value to the largest value.
 Admission test scores for colleges and universities
are frequently reported in terms of percentiles.
 The pth percentile of a data set is a value such that at
least p percent of the items take on this value or less
and at least (100 - p) percent of the items take on this
value or more.
Slide58
Percentiles
Arrange the data in ascending order.
Compute index i, the position of the pth percentile.

i = (p/100)n
If i is not an integer, round up. The p th percentile

is the value in the i th position.
If i is an integer, the p th percentile is the average

of the values in positions i and i +1.
Slide59
80th Percentile

i = (p/100)n = (80/100)70 = 56
Averaging the 56th and 57th data values:
80th Percentile = (535 + 549)/2 = 542
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Slide60
80th Percentile

“At least 80% of the “At least 20% of the
items take on a items take on a
value of 542 or less.” value of 542 or more.”
56/70 = .8 or 80% 14/70 = .2 or 20%
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Slide61
Quartiles
 Quartiles are specific percentiles.

 First Quartile = 25th Percentile
 Second Quartile = 50th Percentile = Median
 Third Quartile = 75th Percentile
Slide62
Third Quartile

Third quartile = 75th percentile
i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Slide63
Measures of Variability
 It is often desirable to consider measures of variability

(dispersion), as well as measures of location.
 For example, in choosing supplier A or supplier B we
might consider not only the average delivery time for
each, but also the variability in delivery time for each.
Slide64
Measures of Variability
 Range
 Interquartile Range
 Variance
 Standard Deviation
 Coefficient of Variation
Slide65
Range
 The range of a data set is the difference between the

largest and smallest data values.
 It is the simplest measure of variability.
 It is very sensitive to the smallest and largest data
values.
Slide66
Range

Range = largest value - smallest value
Range = 615 - 425 = 190
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Slide67
Interquartile Range
 The interquartile range of a data set is the difference

between the third quartile and the first quartile.
 It is the range for the middle 50% of the data.
 It overcomes the sensitivity to extreme data values.
Slide68
Interquartile Range

3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Slide69
Variance
The variance is a measure of variability that utilizes

all the data.
It is based on the difference between the value of

each observation (xi) and the mean ( for a sample,
m for a population).
The variance is useful in comparing the variability

of two or more variables.
Slide70
Variance
The variance is the average of the squared

differences between each data value and the mean.
The variance is computed as follows:
for a for a
sample population
Slide71
Standard Deviation
The standard deviation of a data set is the positive

square root of the variance.
It is measured in the same units as the data, making

it more easily interpreted than the variance.
Slide72
Standard Deviation
The standard deviation is computed as follows:
for a for a
sample population
Example
Slide73
Coefficient of Variation
The coefficient of variation indicates how large the

standard deviation is in relation to the mean.
The coefficient of variation is computed as follows:
for a for a
sample population
Slide74
Sample Variance, Standard Deviation,
And Coefficient of Variation
• Variance
• Standard Deviation the standard

deviation is
about 11%
of the mean
• Coefficient of Variation
Slide75

Quantitative Methods - Frequency Distributions

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Quantitative Methods - Frequency Distributions

Uploaded by

Copyright:

Available Formats

Quantitative Methods -1

•Statistics can also refer to the art and

A frequency distribution is a tabular summary of

The objective is to provide insights about the data

 Example: Marada Inn

 Example: Marada Inn

The relative frequency of a class is the fraction or

A relative frequency distribution is a tabular

The percent frequency of a class is the relative

A percent frequency distribution is a tabular

 A bar chart is a graphical device for depicting

10 Marada Inn Quality Ratings

 In quality control, bar charts are used to identify the

 The pie chart is a commonly used graphical device

Marada Inn Quality Ratings

 Insights Gained from the Preceding Pie Chart

 Example: Hudson Auto Repair

 Example: Hudson Auto Repair

The three steps necessary to define the classes for a

 Guidelines for Determining the Number of Classes

The goal is to use enough classes to show the

 Guidelines for Determining the Width of Each Class

Making the classes the same

 Note on Number of Classes and Class Width

 Guidelines for Determining the Class Limits

An open-end class requires only a

 Example: Hudson Auto Repair

Parts Relative Percent

 Another common graphical presentation of

 Example: Hudson Auto Repair

 Moderately Skewed Left

 Moderately Right Skewed

 Highly Skewed Right

Cumulative frequency distribution - shows the

Cumulative relative frequency distribution – shows

Cumulative percent frequency distribution – shows

 The last entry in a cumulative frequency distribution

 Hudson Auto Repair

Most of the statistical information in newspapers,

Such summaries of data, which may be tabular,

A sample statistic is referred to

 Perhaps the most important measure of location is

Sum of the values

Sum of the values

 Example: Apartment Rents

 Example: Apartment Rents

 The median of a data set is the value in the middle

 For an odd number of observations:

the median is the middle value.

 For an even number of observations:

the median is the average of the middle two values.

Median = (19 + 26)/2 = 22.5

 Example: Apartment Rents

Note: Data is in ascending order.

 Another measure, sometimes used when extreme

 The mode of a data set is the value that occurs with

 Example: Apartment Rents

Note: Data is in ascending order.

 Measures of Location : Mean

Class 0-1 1-2 2-3 3-4 4-5 5-6

Mean = (66.5/23) = 2.9

 Measures of Location : Mean