You are on page 1of 75

Quantitative Methods -1

Jighyasu Gaur
Statistics
•The term statistics can refer to
numerical facts such as averages,
medians, percent, and index numbers
that help us understand a variety of
business and economic situations.

•Statistics can also refer to the art and


science of collecting, analyzing,
presenting, and interpreting data.

2
•Data
– Data are facts and figures collected, analysed,
and summarized for presentation and
interpretation.

•Data Set
– All the data collected in a particular study are
referred to as the data set for the study.

3
What is Categorical data?

Slide4
Summarizing Categorical Data

 Frequency Distribution
 Relative Frequency Distribution
 Percent Frequency Distribution
 Bar Chart
 Pie Chart

Slide5
Frequency Distribution

A frequency distribution is a tabular summary of


data showing the frequency (or number) of items
in each of several non-overlapping classes.

The objective is to provide insights about the data


that cannot be quickly obtained by looking only at
the original data.

Slide6
Frequency Distribution

 Example: Marada Inn


Guests staying at Marada Inn were asked to rate the
quality of their accommodations as being excellent,
above average, average, below average, or poor. The
ratings provided by a sample of 20 guests are:
Below Average Average Above Average
Above Average Above Average Above Average
Above Average Below Average Below Average
Average Poor Poor
Above Average Excellent Above Average
Average Above Average Average
Above Average Average

Slide7
Frequency Distribution

 Example: Marada Inn

Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20

Slide8
Relative Frequency Distribution

The relative frequency of a class is the fraction or


proportion of the total number of data items
belonging to the class.

A relative frequency distribution is a tabular


summary of a set of data showing the relative
frequency for each class.

Slide9
Percent Frequency Distribution

The percent frequency of a class is the relative


frequency multiplied by 100.

A percent frequency distribution is a tabular


summary of a set of data showing the percent
frequency for each class.

Slide10
Relative Frequency and
Percent Frequency Distributions
 Example: Marada Inn

Relative Percent
Rating Frequency Frequency
Poor .10 10
Below Average .15 15
Average .25 25 .10(100) = 10
Above Average .45 45
Excellent .05 5
Total 1.00 100

1/20 = .05
Slide11
Bar Chart

 A bar chart is a graphical device for depicting


qualitative data.
 On one axis (usually the horizontal axis), we specify
the labels that are used for each of the classes.
 A frequency, relative frequency, or percent frequency
scale can be used for the other axis (usually the
vertical axis).
 Using a bar of fixed width drawn above each class
label, we extend the height appropriately.
 The bars are separated to emphasize the fact that each
class is a separate category.

Slide12
Bar Chart

10 Marada Inn Quality Ratings


9
8
7
Frequency
6
5
4
3
2
1
Rating
Poor Below Average Above Excellent
Average Average

Slide13
Pareto Diagram

 In quality control, bar charts are used to identify the


most important causes of problems.
 When the bars are arranged in descending order of
height from left to right (with the most frequently
occurring cause appearing first) the bar chart is
called a Pareto diagram.
 This diagram is named for its founder, Vilfredo
Pareto, an Italian economist.

Slide14
Pie Chart

 The pie chart is a commonly used graphical device


for presenting relative frequency and percent
frequency distributions for categorical data.
 First draw a circle; then use the relative frequencies
to subdivide the circle into sectors that correspond to
the relative frequency for each class.
 Since there are 360 degrees in a circle, a class with a
relative frequency of .25 would consume .25(360) = 90
degrees of the circle.

Slide15
Pie Chart

Marada Inn Quality Ratings


Excellent
5%
Poor
10%
Below
Average
Above 15%
Average
45%
Average
25%

Slide16
Example: Marada Inn

 Insights Gained from the Preceding Pie Chart


• One-half of the customers surveyed gave Marada
a quality rating of “above average” or “excellent”
(looking at the left side of the pie). This might
please the manager.
• For each customer who gave an “excellent” rating,
there were two customers who gave a “poor”
rating (looking at the top of the pie). This should
displease the manager.

Slide17
Summarizing Quantitative Data

 Frequency Distribution
 Relative Frequency and
Percent Frequency Distributions
 Dot Plot
 Histogram
 Cumulative Distributions
 Ogive

Slide18
Frequency Distribution

 Example: Hudson Auto Repair


The manager of Hudson Auto would like to gain a
better understanding of the cost of parts used in the
engine tune-ups performed in the shop. She examines
50 customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed on the next
slide.

Slide19
Frequency Distribution

 Example: Hudson Auto Repair


Sample of Parts Cost($) for 50 Tune-ups
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73

Slide20
Frequency Distribution

The three steps necessary to define the classes for a


frequency distribution with quantitative data are:
1. Determine the number of non-overlapping classes.
2. Determine the width of each class.
3. Determine the class limits.

Slide21
Frequency Distribution

 Guidelines for Determining the Number of Classes


• Use between 5 and 20 classes.
• Data sets with a larger number of elements
usually require a larger number of classes.
• Smaller data sets usually require fewer classes.

The goal is to use enough classes to show the


variation in the data, but not so many classes
that some contain only a few data items.

Slide22
Frequency Distribution

 Guidelines for Determining the Width of Each Class


• Use classes of equal width.
• Approximate Class Width =

Making the classes the same


width reduces the chance of
inappropriate interpretations.

Slide23
Frequency Distribution

 Note on Number of Classes and Class Width


• In practice, the number of classes and the
appropriate class width are determined by trial
and error.
• Once a possible number of classes is chosen, the
appropriate class width is found.
• The process can be repeated for a different
number of classes.
• Ultimately, the analyst uses judgment to
determine the combination of the number of
classes and class width that provides the best
frequency distribution for summarizing the data.

Slide24
Frequency Distribution

 Guidelines for Determining the Class Limits


• Class limits must be chosen so that each data
item belongs to one and only one class.
• The lower class limit identifies the smallest
possible data value assigned to the class.
• The upper class limit identifies the largest
possible data value assigned to the class.
• The appropriate values for the class limits
depend on the level of accuracy of the data.

An open-end class requires only a


lower class limit or an upper class limit.

Slide25
Frequency Distribution

 Example: Hudson Auto Repair


If we choose six classes:
Approximate Class Width = (109 - 52)/6 = 9.5   10
Parts Cost ($) Frequency
50-59 2
60-69 13
70-79 16
80-89 7
90-99 7
100-109 5
Total 50

Slide26
Relative Frequency and
Percent Frequency Distributions
 Example: Hudson Auto Repair

Parts Relative Percent


Cost ($) Frequency Frequency
50-59 .04 4
60-69 .26 2/50 26 .04(100)
70-79 .32 32
80-89 .14 14 Percent
frequency is
90-99 .14 14 the relative
100-109 .10 10 frequency
Total 1.00 100 multiplied
by 100.

Slide27
Relative Frequency and
Percent Frequency Distributions
 Example: Hudson Auto Repair
Insights Gained from the % Frequency Distribution:
• Only 4% of the parts costs are in the $50-59 class.
• 30% of the parts costs are under $70.
• The greatest percentage (32% or almost one-third)
of the parts costs are in the $70-79 class.
• 10% of the parts costs are $100 or more.

Slide28
Histogram

 Another common graphical presentation of


quantitative data is a histogram.
 The variable of interest is placed on the horizontal
axis.
 A rectangle is drawn above each class interval with
its height corresponding to the interval’s frequency,
relative frequency, or percent frequency.
 Unlike a bar graph, a histogram has no natural
separation between rectangles of adjacent classes.

Slide29
Histogram

 Example: Hudson Auto Repair


18
Tune-up Parts Cost
16
14
Frequency
12
10
8
6
4
2
Parts
50-59 60-69 70-79 80-89 90-99 100-110 Cost ($)

Slide30
Histograms Showing Skewness

 Symmetric
• Left tail is the mirror image of the right tail
• Examples: heights and weights of people
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0

Slide31
Histograms Showing Skewness

 Moderately Skewed Left


• A longer tail to the left
• Example: exam scores
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0

Slide32
Histograms Showing Skewness

 Moderately Right Skewed


• A Longer tail to the right
• Example: housing values
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0

Slide33
Histograms Showing Skewness

 Highly Skewed Right


• A very long tail to the right
• Example: executive salaries
.35
.30
Relative Frequency

.25
.20
.15
.10
.05
0

Slide34
Cumulative Distributions

Cumulative frequency distribution - shows the


number of items with values less than or equal to the
upper limit of each class..

Cumulative relative frequency distribution – shows


the proportion of items with values less than or
equal to the upper limit of each class.

Cumulative percent frequency distribution – shows


the percentage of items with values less than or
equal to the upper limit of each class.

Slide35
Cumulative Distributions

 The last entry in a cumulative frequency distribution


always equals the total number of observations.
 The last entry in a cumulative relative frequency
distribution always equals 1.00.
 The last entry in a cumulative percent frequency
distribution always equals 100.

Slide36
Cumulative Distributions

 Hudson Auto Repair

Cumulative Cumulative
Cumulative Relative Percent
Cost ($) Frequency Frequency Frequency
< 59 2 .04 4
< 69 15 .30 30
< 79 31 2 + 13 .62 15/50 62 .30(100)
< 89 38 .76 76
< 99 45 .90 90
< 109 50 1.00 100

Slide37
Descriptive Statistics

Slide38
Descriptive Statistics

Most of the statistical information in newspapers,


magazines, company reports, and other
publications consists of data that are summarized
and presented in a form that is easy to understand.

Such summaries of data, which may be tabular,


graphical, or numerical, are referred to as descriptive
statistics.

Slide39
Descriptive Statistics: Numerical Measures

 Measures of Location
 Measures of Variability

Slide40
Measures of Location

 Mean
If the measures are computed
 Median
for data from a sample,
 Mode they are called sample statistics.
 Percentiles
 Quartiles If the measures are computed
for data from a population,
they are called population parameters.

A sample statistic is referred to


as the point estimator of the
corresponding population parameter.

Slide41
Mean

 Perhaps the most important measure of location is


the mean.
 The mean provides a measure of central location.
 The mean of a data set is the average of all the data
values.
 The sample mean is the point estimator of the
population mean m.

Slide42
Sample Mean

Sum of the values


of the n observations

Number of
observations
in the sample

Slide43
Population Mean m

Sum of the values


of the N observations

Number of
observations in
the population

Slide44
Sample Mean

 Example: Apartment Rents


Seventy efficiency apartments were randomly
sampled in a small college town. The monthly rent
prices for these apartments are listed below.
445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440

Slide45
Sample Mean

 Example: Apartment Rents

445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440

Slide46
Median

 The median of a data set is the value in the middle


when the data items are arranged in ascending order.
 Whenever a data set has extreme values, the median
is the preferred measure of central location.
 The median is the measure of location most often
reported for annual income and property value data.
 A few extremely large incomes or property values
can inflate the mean.

Slide47
Median

 For an odd number of observations:

26 18 27 12 14 27 19 7 observations

12 14 18 19 26 27 27 in ascending order

the median is the middle value.

Median = 19

Slide48
Median

 For an even number of observations:

26 18 27 12 14 27 30 19 8 observations

12 14 18 19 26 27 27 30 in ascending order

the median is the average of the middle two values.

Median = (19 + 26)/2 = 22.5

Slide49
Median

 Example: Apartment Rents


Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.

Slide50
Trimmed Mean

 Another measure, sometimes used when extreme


values are present, is the trimmed mean.
 It is obtained by deleting a percentage of the
smallest and largest values from a data set and then
computing the mean of the remaining values.
 For example, the 5% trimmed mean is obtained by
removing the smallest 5% and the largest 5% of the
data values and then computing the mean of the
remaining values.

Slide51
Mode

 The mode of a data set is the value that occurs with


greatest frequency.
 The greatest frequency can occur at two or more
different values.
 If the data have exactly two modes, the data are
bimodal.
 If the data have more than two modes, the data are
multimodal.
 Caution: If the data are bimodal or multimodal,
Excel’s MODE function will incorrectly identify a
single mode.

Slide52
Mode

 Example: Apartment Rents


450 occurred most frequently (7 times)
Mode = 450
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.

Slide53
Descriptive Statistics: Numerical Measures for
Grouped Data

 Measures of Location : Mean

Class 0-1 1-2 2-3 3-4 4-5 5-6


Frequency 1 4 8 6 3 1

Slide54
Descriptive Statistics: Numerical Measures for
Grouped Data
 Measures of Location : Mean
Class x f fx
0-1 0.5 1 0.5
1-2 1.5 4 6
2-3 2.5 8 20
3-4 3.5 6 21
4-5 4.5 3 13.5
5-6 5.5 1 5.5
Total 23 66.5

Mean = (66.5/23) = 2.9


Slide55
Descriptive Statistics: Numerical Measures for
Grouped Data

 Measures of Location : Mean

 Find mean for the following data set

Class 05-10 10-15 15-20 20-25 25-30

Frequency 10 12 16 14 8

Slide56
Descriptive Statistics: Numerical Measures for
Grouped Data
 Measures of Location : Mean
Class x f fx
05-10 7.50 10 75
10-15 12.50 12 150
15-20 17.50 16 280
20-25 22.50 14 315
25-30 27.50 8 220
Total 60 1040

Mean = (1040/60) = 17.33


Slide57
Percentiles

 A percentile provides information about how the


data are spread over the interval from the smallest
value to the largest value.
 Admission test scores for colleges and universities
are frequently reported in terms of percentiles.
 The pth percentile of a data set is a value such that at
least p percent of the items take on this value or less
and at least (100 - p) percent of the items take on this
value or more.

Slide58
Percentiles

Arrange the data in ascending order.

Compute index i, the position of the pth percentile.


i = (p/100)n

If i is not an integer, round up. The p th percentile


is the value in the i th position.

If i is an integer, the p th percentile is the average


of the values in positions i and i +1.

Slide59
80th Percentile

 Example: Apartment Rents


i = (p/100)n = (80/100)70 = 56
Averaging the 56th and 57th data values:
80th Percentile = (535 + 549)/2 = 542
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending order.

Slide60
80th Percentile

 Example: Apartment Rents


“At least 80% of the “At least 20% of the
items take on a items take on a
value of 542 or less.” value of 542 or more.”
56/70 = .8 or 80% 14/70 = .2 or 20%
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Slide61
Quartiles

 Quartiles are specific percentiles.


 First Quartile = 25th Percentile
 Second Quartile = 50th Percentile = Median
 Third Quartile = 75th Percentile

Slide62
Third Quartile

 Example: Apartment Rents


Third quartile = 75th percentile
i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending order.

Slide63
Measures of Variability

 It is often desirable to consider measures of variability


(dispersion), as well as measures of location.
 For example, in choosing supplier A or supplier B we
might consider not only the average delivery time for
each, but also the variability in delivery time for each.

Slide64
Measures of Variability

 Range
 Interquartile Range
 Variance
 Standard Deviation
 Coefficient of Variation

Slide65
Range

 The range of a data set is the difference between the


largest and smallest data values.
 It is the simplest measure of variability.
 It is very sensitive to the smallest and largest data
values.

Slide66
Range

 Example: Apartment Rents


Range = largest value - smallest value
Range = 615 - 425 = 190
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending order.

Slide67
Interquartile Range

 The interquartile range of a data set is the difference


between the third quartile and the first quartile.
 It is the range for the middle 50% of the data.
 It overcomes the sensitivity to extreme data values.

Slide68
Interquartile Range

 Example: Apartment Rents


3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.

Slide69
Variance

The variance is a measure of variability that utilizes


all the data.

It is based on the difference between the value of


each observation (xi) and the mean ( for a sample,
m for a population).

The variance is useful in comparing the variability


of two or more variables.

Slide70
Variance

The variance is the average of the squared


differences between each data value and the mean.

The variance is computed as follows:

for a for a
sample population

Slide71
Standard Deviation

The standard deviation of a data set is the positive


square root of the variance.

It is measured in the same units as the data, making


it more easily interpreted than the variance.

Slide72
Standard Deviation

The standard deviation is computed as follows:

for a for a
sample population

Example

Slide73
Coefficient of Variation

The coefficient of variation indicates how large the


standard deviation is in relation to the mean.

The coefficient of variation is computed as follows:

for a for a
sample population

Slide74
Sample Variance, Standard Deviation,
And Coefficient of Variation
 Example: Apartment Rents
• Variance

• Standard Deviation the standard


deviation is
about 11%
of the mean
• Coefficient of Variation

Slide75

You might also like