You are on page 1of 75

Quantitative Methods -1

Jighyasu Gaur
•The term statistics can refer to
numerical facts such as averages,
medians, percent, and index numbers
that help us understand a variety of
business and economic situations.

•Statistics can also refer to the art and

science of collecting, analyzing,
presenting, and interpreting data.

– Data are facts and figures collected, analysed,
and summarized for presentation and

•Data Set
– All the data collected in a particular study are
referred to as the data set for the study.

What is Categorical data?

Summarizing Categorical Data

 Frequency Distribution
 Relative Frequency Distribution
 Percent Frequency Distribution
 Bar Chart
 Pie Chart

Frequency Distribution

A frequency distribution is a tabular summary of

data showing the frequency (or number) of items
in each of several non-overlapping classes.

The objective is to provide insights about the data

that cannot be quickly obtained by looking only at
the original data.

Frequency Distribution

 Example: Marada Inn

Guests staying at Marada Inn were asked to rate the
quality of their accommodations as being excellent,
above average, average, below average, or poor. The
ratings provided by a sample of 20 guests are:
Below Average Average Above Average
Above Average Above Average Above Average
Above Average Below Average Below Average
Average Poor Poor
Above Average Excellent Above Average
Average Above Average Average
Above Average Average

Frequency Distribution

 Example: Marada Inn

Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20

Relative Frequency Distribution

The relative frequency of a class is the fraction or

proportion of the total number of data items
belonging to the class.

A relative frequency distribution is a tabular

summary of a set of data showing the relative
frequency for each class.

Percent Frequency Distribution

The percent frequency of a class is the relative

frequency multiplied by 100.

A percent frequency distribution is a tabular

summary of a set of data showing the percent
frequency for each class.

Relative Frequency and
Percent Frequency Distributions
 Example: Marada Inn

Relative Percent
Rating Frequency Frequency
Poor .10 10
Below Average .15 15
Average .25 25 .10(100) = 10
Above Average .45 45
Excellent .05 5
Total 1.00 100

1/20 = .05
Bar Chart

 A bar chart is a graphical device for depicting

qualitative data.
 On one axis (usually the horizontal axis), we specify
the labels that are used for each of the classes.
 A frequency, relative frequency, or percent frequency
scale can be used for the other axis (usually the
vertical axis).
 Using a bar of fixed width drawn above each class
label, we extend the height appropriately.
 The bars are separated to emphasize the fact that each
class is a separate category.

Bar Chart

10 Marada Inn Quality Ratings

Poor Below Average Above Excellent
Average Average

Pareto Diagram

 In quality control, bar charts are used to identify the

most important causes of problems.
 When the bars are arranged in descending order of
height from left to right (with the most frequently
occurring cause appearing first) the bar chart is
called a Pareto diagram.
 This diagram is named for its founder, Vilfredo
Pareto, an Italian economist.

Pie Chart

 The pie chart is a commonly used graphical device

for presenting relative frequency and percent
frequency distributions for categorical data.
 First draw a circle; then use the relative frequencies
to subdivide the circle into sectors that correspond to
the relative frequency for each class.
 Since there are 360 degrees in a circle, a class with a
relative frequency of .25 would consume .25(360) = 90
degrees of the circle.

Pie Chart

Marada Inn Quality Ratings

Above 15%

Example: Marada Inn

 Insights Gained from the Preceding Pie Chart

• One-half of the customers surveyed gave Marada
a quality rating of “above average” or “excellent”
(looking at the left side of the pie). This might
please the manager.
• For each customer who gave an “excellent” rating,
there were two customers who gave a “poor”
rating (looking at the top of the pie). This should
displease the manager.

Summarizing Quantitative Data

 Frequency Distribution
 Relative Frequency and
Percent Frequency Distributions
 Dot Plot
 Histogram
 Cumulative Distributions
 Ogive

Frequency Distribution

 Example: Hudson Auto Repair

The manager of Hudson Auto would like to gain a
better understanding of the cost of parts used in the
engine tune-ups performed in the shop. She examines
50 customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed on the next

Frequency Distribution

 Example: Hudson Auto Repair

Sample of Parts Cost($) for 50 Tune-ups
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73

Frequency Distribution

The three steps necessary to define the classes for a

frequency distribution with quantitative data are:
1. Determine the number of non-overlapping classes.
2. Determine the width of each class.
3. Determine the class limits.

Frequency Distribution

 Guidelines for Determining the Number of Classes

• Use between 5 and 20 classes.
• Data sets with a larger number of elements
usually require a larger number of classes.
• Smaller data sets usually require fewer classes.

The goal is to use enough classes to show the

variation in the data, but not so many classes
that some contain only a few data items.

Frequency Distribution

 Guidelines for Determining the Width of Each Class

• Use classes of equal width.
• Approximate Class Width =

Making the classes the same

width reduces the chance of
inappropriate interpretations.

Frequency Distribution

 Note on Number of Classes and Class Width

• In practice, the number of classes and the
appropriate class width are determined by trial
and error.
• Once a possible number of classes is chosen, the
appropriate class width is found.
• The process can be repeated for a different
number of classes.
• Ultimately, the analyst uses judgment to
determine the combination of the number of
classes and class width that provides the best
frequency distribution for summarizing the data.

Frequency Distribution

 Guidelines for Determining the Class Limits

• Class limits must be chosen so that each data
item belongs to one and only one class.
• The lower class limit identifies the smallest
possible data value assigned to the class.
• The upper class limit identifies the largest
possible data value assigned to the class.
• The appropriate values for the class limits
depend on the level of accuracy of the data.

An open-end class requires only a

lower class limit or an upper class limit.

Frequency Distribution

 Example: Hudson Auto Repair

If we choose six classes:
Approximate Class Width = (109 - 52)/6 = 9.5   10
Parts Cost ($) Frequency
50-59 2
60-69 13
70-79 16
80-89 7
90-99 7
100-109 5
Total 50

Relative Frequency and
Percent Frequency Distributions
 Example: Hudson Auto Repair

Parts Relative Percent

Cost ($) Frequency Frequency
50-59 .04 4
60-69 .26 2/50 26 .04(100)
70-79 .32 32
80-89 .14 14 Percent
frequency is
90-99 .14 14 the relative
100-109 .10 10 frequency
Total 1.00 100 multiplied
by 100.

Relative Frequency and
Percent Frequency Distributions
 Example: Hudson Auto Repair
Insights Gained from the % Frequency Distribution:
• Only 4% of the parts costs are in the $50-59 class.
• 30% of the parts costs are under $70.
• The greatest percentage (32% or almost one-third)
of the parts costs are in the $70-79 class.
• 10% of the parts costs are $100 or more.


 Another common graphical presentation of

quantitative data is a histogram.
 The variable of interest is placed on the horizontal
 A rectangle is drawn above each class interval with
its height corresponding to the interval’s frequency,
relative frequency, or percent frequency.
 Unlike a bar graph, a histogram has no natural
separation between rectangles of adjacent classes.


 Example: Hudson Auto Repair

Tune-up Parts Cost
50-59 60-69 70-79 80-89 90-99 100-110 Cost ($)

Histograms Showing Skewness

 Symmetric
• Left tail is the mirror image of the right tail
• Examples: heights and weights of people
Relative Frequency

Histograms Showing Skewness

 Moderately Skewed Left

• A longer tail to the left
• Example: exam scores
Relative Frequency

Histograms Showing Skewness

 Moderately Right Skewed

• A Longer tail to the right
• Example: housing values
Relative Frequency

Histograms Showing Skewness

 Highly Skewed Right

• A very long tail to the right
• Example: executive salaries
Relative Frequency


Cumulative Distributions

Cumulative frequency distribution - shows the

number of items with values less than or equal to the
upper limit of each class..

Cumulative relative frequency distribution – shows

the proportion of items with values less than or
equal to the upper limit of each class.

Cumulative percent frequency distribution – shows

the percentage of items with values less than or
equal to the upper limit of each class.

Cumulative Distributions

 The last entry in a cumulative frequency distribution

always equals the total number of observations.
 The last entry in a cumulative relative frequency
distribution always equals 1.00.
 The last entry in a cumulative percent frequency
distribution always equals 100.

Cumulative Distributions

 Hudson Auto Repair

Cumulative Cumulative
Cumulative Relative Percent
Cost ($) Frequency Frequency Frequency
< 59 2 .04 4
< 69 15 .30 30
< 79 31 2 + 13 .62 15/50 62 .30(100)
< 89 38 .76 76
< 99 45 .90 90
< 109 50 1.00 100

Descriptive Statistics

Descriptive Statistics

Most of the statistical information in newspapers,

magazines, company reports, and other
publications consists of data that are summarized
and presented in a form that is easy to understand.

Such summaries of data, which may be tabular,

graphical, or numerical, are referred to as descriptive

Descriptive Statistics: Numerical Measures

 Measures of Location
 Measures of Variability

Measures of Location

 Mean
If the measures are computed
 Median
for data from a sample,
 Mode they are called sample statistics.
 Percentiles
 Quartiles If the measures are computed
for data from a population,
they are called population parameters.

A sample statistic is referred to

as the point estimator of the
corresponding population parameter.


 Perhaps the most important measure of location is

the mean.
 The mean provides a measure of central location.
 The mean of a data set is the average of all the data
 The sample mean is the point estimator of the
population mean m.

Sample Mean

Sum of the values

of the n observations

Number of
in the sample

Population Mean m

Sum of the values

of the N observations

Number of
observations in
the population

Sample Mean

 Example: Apartment Rents

Seventy efficiency apartments were randomly
sampled in a small college town. The monthly rent
prices for these apartments are listed below.
445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440

Sample Mean

 Example: Apartment Rents

445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440


 The median of a data set is the value in the middle

when the data items are arranged in ascending order.
 Whenever a data set has extreme values, the median
is the preferred measure of central location.
 The median is the measure of location most often
reported for annual income and property value data.
 A few extremely large incomes or property values
can inflate the mean.


 For an odd number of observations:

26 18 27 12 14 27 19 7 observations

12 14 18 19 26 27 27 in ascending order

the median is the middle value.

Median = 19


 For an even number of observations:

26 18 27 12 14 27 30 19 8 observations

12 14 18 19 26 27 27 30 in ascending order

the median is the average of the middle two values.

Median = (19 + 26)/2 = 22.5


 Example: Apartment Rents

Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.

Trimmed Mean

 Another measure, sometimes used when extreme

values are present, is the trimmed mean.
 It is obtained by deleting a percentage of the
smallest and largest values from a data set and then
computing the mean of the remaining values.
 For example, the 5% trimmed mean is obtained by
removing the smallest 5% and the largest 5% of the
data values and then computing the mean of the
remaining values.


 The mode of a data set is the value that occurs with

greatest frequency.
 The greatest frequency can occur at two or more
different values.
 If the data have exactly two modes, the data are
 If the data have more than two modes, the data are
 Caution: If the data are bimodal or multimodal,
Excel’s MODE function will incorrectly identify a
single mode.


 Example: Apartment Rents

450 occurred most frequently (7 times)
Mode = 450
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.

Descriptive Statistics: Numerical Measures for
Grouped Data

 Measures of Location : Mean

Class 0-1 1-2 2-3 3-4 4-5 5-6

Frequency 1 4 8 6 3 1

Descriptive Statistics: Numerical Measures for
Grouped Data
 Measures of Location : Mean
Class x f fx
0-1 0.5 1 0.5
1-2 1.5 4 6
2-3 2.5 8 20
3-4 3.5 6 21
4-5 4.5 3 13.5
5-6 5.5 1 5.5
Total 23 66.5

Mean = (66.5/23) = 2.9

Descriptive Statistics: Numerical Measures for
Grouped Data

 Measures of Location : Mean

 Find mean for the following data set

Class 05-10 10-15 15-20 20-25 25-30

Frequency 10 12 16 14 8

Descriptive Statistics: Numerical Measures for
Grouped Data
 Measures of Location : Mean
Class x f fx
05-10 7.50 10 75
10-15 12.50 12 150
15-20 17.50 16 280
20-25 22.50 14 315
25-30 27.50 8 220
Total 60 1040

Mean = (1040/60) = 17.33


 A percentile provides information about how the

data are spread over the interval from the smallest
value to the largest value.
 Admission test scores for colleges and universities
are frequently reported in terms of percentiles.
 The pth percentile of a data set is a value such that at
least p percent of the items take on this value or less
and at least (100 - p) percent of the items take on this
value or more.


Arrange the data in ascending order.

Compute index i, the position of the pth percentile.

i = (p/100)n

If i is not an integer, round up. The p th percentile

is the value in the i th position.

If i is an integer, the p th percentile is the average

of the values in positions i and i +1.

80th Percentile

 Example: Apartment Rents

i = (p/100)n = (80/100)70 = 56
Averaging the 56th and 57th data values:
80th Percentile = (535 + 549)/2 = 542
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending order.

80th Percentile

 Example: Apartment Rents

“At least 80% of the “At least 20% of the
items take on a items take on a
value of 542 or less.” value of 542 or more.”
56/70 = .8 or 80% 14/70 = .2 or 20%
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615


 Quartiles are specific percentiles.

 First Quartile = 25th Percentile
 Second Quartile = 50th Percentile = Median
 Third Quartile = 75th Percentile

Third Quartile

 Example: Apartment Rents

Third quartile = 75th percentile
i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending order.

Measures of Variability

 It is often desirable to consider measures of variability

(dispersion), as well as measures of location.
 For example, in choosing supplier A or supplier B we
might consider not only the average delivery time for
each, but also the variability in delivery time for each.

Measures of Variability

 Range
 Interquartile Range
 Variance
 Standard Deviation
 Coefficient of Variation


 The range of a data set is the difference between the

largest and smallest data values.
 It is the simplest measure of variability.
 It is very sensitive to the smallest and largest data


 Example: Apartment Rents

Range = largest value - smallest value
Range = 615 - 425 = 190
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending order.

Interquartile Range

 The interquartile range of a data set is the difference

between the third quartile and the first quartile.
 It is the range for the middle 50% of the data.
 It overcomes the sensitivity to extreme data values.

Interquartile Range

 Example: Apartment Rents

3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.


The variance is a measure of variability that utilizes

all the data.

It is based on the difference between the value of

each observation (xi) and the mean ( for a sample,
m for a population).

The variance is useful in comparing the variability

of two or more variables.


The variance is the average of the squared

differences between each data value and the mean.

The variance is computed as follows:

for a for a
sample population

Standard Deviation

The standard deviation of a data set is the positive

square root of the variance.

It is measured in the same units as the data, making

it more easily interpreted than the variance.

Standard Deviation

The standard deviation is computed as follows:

for a for a
sample population


Coefficient of Variation

The coefficient of variation indicates how large the

standard deviation is in relation to the mean.

The coefficient of variation is computed as follows:

for a for a
sample population

Sample Variance, Standard Deviation,
And Coefficient of Variation
 Example: Apartment Rents
• Variance

• Standard Deviation the standard

deviation is
about 11%
of the mean
• Coefficient of Variation


You might also like