You are on page 1of 11

Session: 3

Descriptive Analysis: Numerical Measures


Session-3
 Measures of Location
 Measures of Variability

Describing Data-II

1 2

Measures of Location Mean

 Mean  Perhaps the most important measure of location is


 Weighted Mean the mean.
If the measures are computed  The mean provides a measure of central location.
 Median for data from a sample,
 Geometric Mean they are called sample statistics.  The mean of a data set is the average of all the data
values.
 Mode
If the measures are computed  The sample mean x is the point estimator of the
 Percentiles for data from a population, population mean m.
 Quartiles they are called population parameters.

A sample statistic is referred to


as the point estimator of the
corresponding population parameter.

3 4

Sample Mean x Population Mean m

Sum of the values Sum of the values


of the n observations of the N observations

x i x i
x m
n N

Number of Number of
observations observations in
in the sample the population

5 6

1
Sample Mean Sample Mean

 Example: Apartment Rents  Example: Apartment Rents

x i 
Seventy efficiency apartments were randomly x 34, 356
 490.80
sampled in a small college town. The monthly rent n 70
prices for these apartments are listed below. 445 615 430 590 435 600 460 600 440 615
445 615 430 590 435 600 460 600 440 615 440 440 440 525 425 445 575 445 450 450
440 440 440 525 425 445 575 445 450 450 465 450 525 450 450 460 435 460 465 480
465 450 525 450 450 460 435 460 465 480 450 470 490 472 475 475 500 480 570 465
450 470 490 472 475 475 500 480 570 465 600 485 580 470 490 500 549 500 500 480
600 485 580 470 490 500 549 500 500 480 570 515 450 445 525 535 475 550 480 510
570 515 450 445 525 535 475 550 480 510 510 575 490 435 600 435 445 435 430 440
510 575 490 435 600 435 445 435 430 440

7 8

Weighted Mean Weighted Mean


If data is from
 In some instances the mean is computed by giving a population,
each observation a weight that reflects its relative m replaces x. Numerator:
importance. sum of the weighted
data values
 The choice of weights depends on the application.
x
w x i i

w i
Denominator:
 In other weighted mean computations, quantities sum of the
such as pounds, dollars, or volume are frequently weights
used. where:
xi = value of observation i
wi = weight for observation i

9 10

Example: Weighted Mean


Course Name Credits Grade Point
Organizational Behaviour and Management-I 3.0 A 9
Micro Economics for Managers 3.0 A+ 10
 Example: Construction Wages
DAM-I 3.0 A- 8 Ron Butler, a home builder, is looking over the
Financial Accounting and Reporting 3.0 B+ 7 expenses he incurred for a house he just built. For the
Marketing Management-I 3.0 B+ 7 purpose of pricing future projects, he would like to
Workshop on Reading and Thinking Skills 3.0 A 9 know the average wage ($/hour) he paid the workers
Communication Skills-I 3.0 A 9 he employed. Listed below are the categories of
Management and Productivity Tools 3.0 C+ 4 worker he employed, along with their respective wage
Comprehensive Viva-Voce 2.0 B+ 7 and total hours worked.
Qualifying Mathematics NC PASS -
Worker Wage ($/hr) Total Hours
Total 26
Carpenter 21.60 520
Electrician 28.72 230
Laborer 11.80 410
Painter 19.75 270
Plumber 24.16 160
GPA= [(9*3)+(10*3)+(8*3)+(7*3)+(7*3)+(9*3)+(9*3)+(4*3)+(7*2)]/26

=7.81

11 12

2
Weighted Mean Median

 Example: Construction Wages  The median of a data set is the value in the middle
when the data items are arranged in ascending order.
Worker xi wi wi x i
Carpenter 21.60 520 11232.0  Whenever a data set has extreme values, the median
Electrician 28.72 230 6605.6 is the preferred measure of central location.
Laborer 11.80 410 4838.0
Painter 19.75 270 5332.5  The median is the measure of location most often
Plumber 24.16 160 3865.6 reported for annual income and property value data.
1590 31873.7
 A few extremely large incomes or property values
can inflate the mean.
m
wx i i 31873.7
  20.0464  $20.05
w i 1590

FYI, equally-weighted (simple) mean = $21.21

13 14

Median Median

 For an odd number of observations:  For an even number of observations:

26 18 27 12 14 27 19 7 observations 26 18 27 12 14 27 30 19 8 observations

12 14 18 19 26 27 27 in ascending order 12 14 18 19 26 27 27 30 in ascending order

the median is the middle value. the median is the average of the middle two values.

Median = 19 Median = (19 + 26)/2 = 22.5

15 16

Median Trimmed Mean

 Example: Apartment Rents  Another measure, sometimes used when extreme


Averaging the 35th and 36th data values: values are present, is the trimmed mean.
Median = (475 + 475)/2 = 475  It is obtained by deleting a percentage of the
smallest and largest values from a data set and then
425 430 430 435 435 435 435 435 440 440
computing the mean of the remaining values.
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465  For example, the 5% trimmed mean is obtained by
465 470 470 472 475 475 475 480 480 480 removing the smallest 5% and the largest 5% of the
480 485 490 490 490 500 500 500 500 510 data values and then computing the mean of the
510 515 525 525 525 535 549 550 570 570 remaining values.
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.

17 18

3
Geometric Mean Geometric Mean

 The geometric mean is calculated by finding the nth


root of the product of n values.
 It is often used in analyzing growth rates in
financial data (where using the arithmetic mean x g  n ( x1 )( x2 )( xn )
will provide misleading results). 1
 It should be applied anytime you want to determine  [( x1 )( x2 )( xn )] n

the mean rate of change over several successive


periods (be it years, quarters, weeks, . . .).
 Other common applications include changes in
populations of species, crop yields, pollution levels,
and birth and death rates.

19 20

Geometric Mean Pl note that, here 1


Mode
is
already
added

 Example: Avg. Rate of Return  The mode of a data set is the value that occurs with
Period Return (%) Growth Factor
greatest frequency.
1 -6.0 0.940  The greatest frequency can occur at two or more
2 -8.0 0.920 different values.
3 -4.0 0.960
4 2.0 1.020  If the data have exactly two modes, the data are
5 5.4 1.054 bimodal.
 If the data have more than two modes, the data are
xg  5 (.94 )(.92 )(.96)(1.02 )(1.054 ) multimodal.
1
 [.89254 ] 5
 .97752  Caution: If the data are bimodal or multimodal,
Excel’s MODE function will incorrectly identify a
Average growth rate per period in % single mode.
is (.97752 - 1) (100) = -2.248%

21 22

Mode Percentiles

 Example: Apartment Rents  A percentile provides information about how the


450 occurred most frequently (7 times) data are spread over the interval from the smallest
value to the largest value.
Mode = 450
 Admission test scores for colleges and universities
425 430 430 435 435 435 435 435 440 440
are frequently reported in terms of percentiles.
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465  The pth percentile of a data set is a value such that at
465 470 470 472 475 475 475 480 480 480 least p percent of the items take on this value or less
480 485 490 490 490 500 500 500 500 510 and at least (100 - p) percent of the items take on this
510 515 525 525 525 535 549 550 570 570 value or more.
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.

23 24

4
𝑝 Percentiles 80th Percentile

 Example: Apartment Rents


Arrange the data in ascending order.
i = (p/100)n = (80/100)70 = 56
Compute index i, the position of the pth percentile. Averaging the 56th and 57th data values:
80th Percentile = (535 + 549)/2 = 542
i = (p/100)n
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
If i is not an integer, round up. The pth percentile 450 450 450 450 450 460 460 460 465 465
is the value in the ith position. 465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
If i is an integer, the pth percentile is the average 510 515 525 525 525 535 549 550 570 570
of the values in positions i and i+1. 575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.

25 26

80th Percentile Quartiles

 Example: Apartment Rents  Quartiles are specific percentiles.


“At least 80% of the “At least 20% of the  First Quartile = 25th Percentile
items take on a items take on a  Second Quartile = 50th Percentile = Median
value of 542 or less.” value of 542 or more.”
 Third Quartile = 75th Percentile
56/70 = .8 or 80% 14/70 = .2 or 20%
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

27 28

Third Quartile Measures of Variability

 Example: Apartment Rents  It is often desirable to consider measures of variability


Third quartile = 75th percentile (dispersion), as well as measures of location.
i = (p/100)n = (75/100)70 = 52.5 = 53  For example, in choosing supplier A or supplier B we
Third quartile = 525 might consider not only the average delivery time for
each, but also the variability in delivery time for each.
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending order.

29 30

5
Measures of Variability Range

 Range  The range of a data set is the difference between the


 Interquartile Range largest and smallest data values.

 Variance  It is the simplest measure of variability.

 Standard Deviation  It is very sensitive to the smallest and largest data


values.
 Coefficient of Variation

31 32

Range Interquartile Range

 Example: Apartment Rents  The interquartile range of a data set is the difference
Range = largest value - smallest value between the third quartile and the first quartile.
Range = 615 - 425 = 190  It is the range for the middle 50% of the data.
425 430 430 435 435 435 435 435 440 440  It overcomes the sensitivity to extreme data values.
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.

33 34

Interquartile Range Variance

 Example: Apartment Rents


The variance is a measure of variability that utilizes
3rd Quartile (Q3) = 525 all the data.
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80 It is based on the difference between the value of
425 430 430 435 435 435 435 435 440 440 each observation (xi) and the mean ( x for a sample,
440 440 440 445 445 445 445 445 450 450 m for a population).
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480 The variance is useful in comparing the variability
480 485 490 490 490 500 500 500 500 510 of two or more variables.
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Note: Data is in ascending order.

35 36

6
Variance Standard Deviation

The variance is the average of the squared The standard deviation of a data set is the positive
differences between each data value and the mean. square root of the variance.

The variance is computed as follows: It is measured in the same units as the data, making
it more easily interpreted than the variance.
2
 ( xi  x ) ( xi  m )
2
s2  2 
n 1 N

for a for a
sample population

37 38

Standard Deviation Example

The standard deviation is computed as follows:


• Calculate variance of following data
8, 4, 9, 11, 3
s s 2
  2

for a for a
sample population

39 40

Coefficient of Variation Sample Variance, Standard Deviation,


(Measures of relative variation of the data from its Mean)
And Coefficient of Variation
 Example: Apartment Rents
The coefficient of variation indicates how large the
standard deviation is in relation to the mean. • Variance  (x
i  x)
2
s2   2, 996.16
n1
The coefficient of variation is computed as follows:

s    • Standard Deviation the standard


 100  %   100  % deviation is
x  m  s  s 2  2996.16  54.74
about 11%
for a for a of the mean
sample population • Coefficient of Variation
 s   54.74 
Note: It is the measure of Precision & consistency when comparison is   100  %    100  %  11.15%
about two different Population. If the variability comparison between x   490.80 
two groups from same population is being done, then SD (or Var) is
the tool.

41 42

7
Measures of Relative Standings: Box Plots Five-Number Summaries
and Box Plots
Graphs 5 statistical measures simultaneously Summary statistics and easy-to-draw graphs can be
such as; used to quickly summarize large quantities of data.
• Minimum observation
• Maximum observation Two tools that accomplish this are five-number
summaries and box plots.
• 1st Quartile
• 2nd Quartile or Median
• 3rd Quartile
And also tells about Whisker and Outliers

43 44

Five-Number Summary Five-Number Summary

1 Smallest Value  Example: Apartment Rents


Lowest Value = 425 First Quartile = 445
2 First Quartile
Median = 475
3 Median Third Quartile = 525 Largest Value = 615

4 Third Quartile 425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
5 Largest Value 450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

45 46

Box Plot Box Plot

 Example: Apartment Rents


A box plot is a graphical summary of data that is • A box is drawn with its ends located at the first and
based on a five-number summary. third quartiles.

A key to the development of a box plot is the


• A vertical line is drawn in the box at the location of
computation of the median and the quartiles Q1 and the median (second quartile).
Q3 .

Box plots provide another way to identify outliers.

400 425 450 475 500 525 550 575 600 625

Q1 = 445 Q3 = 525
Q2 = 475

47 48

8
Box Plot Box Plot

 Limits are located (not drawn) using the interquartile  Example: Apartment Rents
range (IQR). • The lower limit is located 1.5(IQR) below Q1.
 Data outside these limits are considered outliers.
 The locations of each outlier is shown with the Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325
symbol * .
• The upper limit is located 1.5(IQR) above Q3.
continued
Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645

• There are no outliers (values less than 325 or


greater than 645) in the apartment rent data.

• Why 1.5 x IQR Rule?


• Ask John Tukey

49 50

Box Plot Example : Box Plot- Applications


Indian Railway Catering Service:
 Example: Apartment Rents
A large number of restaurants with drive-through
• Whiskers (dashed lines) are drawn from the ends
windows are registering in IRCTC app-based
of the box to the smallest and largest data values
inside the limits.
interface to provide train passengers the advantages
of quick service. To measure how good the service is,
Indian Railways planned a study in which the
amount of time taken by a sample of drive-through
customers of each of five restaurants was recorded.
Compare the five sets of data using a box plot and
400 425 450 475 500 525 550 575 600 625
interpret the results.
Smallest value Largest value
inside limits = 425 inside limits = 615
Dataset

51 52

Information from Box Plot Measures of Association


Between Two Variables
SubWays times appear to be the lowest and most Thus far we have examined numerical methods used
consistent. The service times for Sarvana Bhavan to summarize the data for one variable at a time.
display considerably more variability. The slowest
service times are provided by Anand Bhavan. The Often a manager or decision maker is interested in
service times for KFC’s, SubWays’s, and Anand the relationship between two variables.
Bhavan’s seem to be symmetric. However, the times
Two descriptive measures of the relationship
for McDonald’s and Sarvana Bhavan’s are positively between two variables are covariance and correlation
skewed. coefficient.

53 54

9
Covariance Covariance

The covariance is a measure of the linear association The covariance is computed as follows:
between two variables.
 ( xi  x )( yi  y ) for
sxy 
Positive values indicate a positive relationship. n 1 samples

Negative values indicate a negative relationship.


 ( xi  m x )( yi  m y ) for
 xy  populations
N

55 56

Correlation Coefficient Correlation Coefficient

Correlation is a measure of linear association and not The correlation coefficient is computed as follows:
necessarily causation. sxy  xy
rxy   xy 
sx s y  x y
Just because two variables are highly correlated, it
does not mean that one variable is the cause of the
for for
other. samples populations

57 58

Correlation Coefficient Covariance and Correlation Coefficient

 Example: Golfing Study


The coefficient can take on values between -1 and +1.
A golfer is interested in investigating the
Values near -1 indicate a strong negative linear relationship, if any, between driving distance and
relationship. 18-hole score.
Average Driving Average
Distance (yds.) 18-Hole Score
Values near +1 indicate a strong positive linear
277.6 69
relationship.
259.5 71
The closer the correlation is to zero, the weaker the 269.1 70
relationship. 267.0 70
255.6 71
272.9 69

59 60

10
Covariance and Correlation Coefficient Covariance and Correlation Coefficient

 Example: Golfing Study  Example: Golfing Study


• Sample Covariance
x y ( xi  x ) ( y i  y ) ( xi  x )( y i  y )
sxy 
 (x  x )( y
i i  y)

35.40
  7.08
277.6 69 10.65 -1.0 -10.65 n1 61
259.5 71 -7.45 1.0 -7.45
269.1 70 2.15 0 0 • Sample Correlation Coefficient
267.0 70 0.05 0 0 sxy 7.08
255.6 71 -11.35 1.0 -11.35 rxy    -.9631
sx sy (8.2192)(.8944)
272.9 69 5.95 -1.0 -5.95
Average 267.0 70.0 Total -35.40
Std. Dev. 8.2192 .8944

61 62

Data Dashboards:
Adding Numerical Measures
to Improve Effectiveness
 Data dashboards are not limited to graphical displays.
 The addition of numerical measures, such as the mean
and standard deviation of KPIs, to a data dashboard
is often critical.
 Dashboards are often interactive.
 Drilling down refers to functionality in interactive
dashboards that allows the user to access information
and analyses at increasingly detailed level.

63 64

11

You might also like