Professional Documents
Culture Documents
Describing Data-II
1 2
3 4
x i x i
x m
n N
Number of Number of
observations observations in
in the sample the population
5 6
1
Sample Mean Sample Mean
x i
Seventy efficiency apartments were randomly x 34, 356
490.80
sampled in a small college town. The monthly rent n 70
prices for these apartments are listed below. 445 615 430 590 435 600 460 600 440 615
445 615 430 590 435 600 460 600 440 615 440 440 440 525 425 445 575 445 450 450
440 440 440 525 425 445 575 445 450 450 465 450 525 450 450 460 435 460 465 480
465 450 525 450 450 460 435 460 465 480 450 470 490 472 475 475 500 480 570 465
450 470 490 472 475 475 500 480 570 465 600 485 580 470 490 500 549 500 500 480
600 485 580 470 490 500 549 500 500 480 570 515 450 445 525 535 475 550 480 510
570 515 450 445 525 535 475 550 480 510 510 575 490 435 600 435 445 435 430 440
510 575 490 435 600 435 445 435 430 440
7 8
w i
Denominator:
In other weighted mean computations, quantities sum of the
such as pounds, dollars, or volume are frequently weights
used. where:
xi = value of observation i
wi = weight for observation i
9 10
=7.81
11 12
2
Weighted Mean Median
Example: Construction Wages The median of a data set is the value in the middle
when the data items are arranged in ascending order.
Worker xi wi wi x i
Carpenter 21.60 520 11232.0 Whenever a data set has extreme values, the median
Electrician 28.72 230 6605.6 is the preferred measure of central location.
Laborer 11.80 410 4838.0
Painter 19.75 270 5332.5 The median is the measure of location most often
Plumber 24.16 160 3865.6 reported for annual income and property value data.
1590 31873.7
A few extremely large incomes or property values
can inflate the mean.
m
wx i i 31873.7
20.0464 $20.05
w i 1590
13 14
Median Median
26 18 27 12 14 27 19 7 observations 26 18 27 12 14 27 30 19 8 observations
the median is the middle value. the median is the average of the middle two values.
15 16
17 18
3
Geometric Mean Geometric Mean
19 20
Example: Avg. Rate of Return The mode of a data set is the value that occurs with
Period Return (%) Growth Factor
greatest frequency.
1 -6.0 0.940 The greatest frequency can occur at two or more
2 -8.0 0.920 different values.
3 -4.0 0.960
4 2.0 1.020 If the data have exactly two modes, the data are
5 5.4 1.054 bimodal.
If the data have more than two modes, the data are
xg 5 (.94 )(.92 )(.96)(1.02 )(1.054 ) multimodal.
1
[.89254 ] 5
.97752 Caution: If the data are bimodal or multimodal,
Excel’s MODE function will incorrectly identify a
Average growth rate per period in % single mode.
is (.97752 - 1) (100) = -2.248%
21 22
Mode Percentiles
23 24
4
𝑝 Percentiles 80th Percentile
25 26
27 28
29 30
5
Measures of Variability Range
31 32
Example: Apartment Rents The interquartile range of a data set is the difference
Range = largest value - smallest value between the third quartile and the first quartile.
Range = 615 - 425 = 190 It is the range for the middle 50% of the data.
425 430 430 435 435 435 435 435 440 440 It overcomes the sensitivity to extreme data values.
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
33 34
35 36
6
Variance Standard Deviation
The variance is the average of the squared The standard deviation of a data set is the positive
differences between each data value and the mean. square root of the variance.
The variance is computed as follows: It is measured in the same units as the data, making
it more easily interpreted than the variance.
2
( xi x ) ( xi m )
2
s2 2
n 1 N
for a for a
sample population
37 38
for a for a
sample population
39 40
41 42
7
Measures of Relative Standings: Box Plots Five-Number Summaries
and Box Plots
Graphs 5 statistical measures simultaneously Summary statistics and easy-to-draw graphs can be
such as; used to quickly summarize large quantities of data.
• Minimum observation
• Maximum observation Two tools that accomplish this are five-number
summaries and box plots.
• 1st Quartile
• 2nd Quartile or Median
• 3rd Quartile
And also tells about Whisker and Outliers
43 44
4 Third Quartile 425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
5 Largest Value 450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
45 46
400 425 450 475 500 525 550 575 600 625
Q1 = 445 Q3 = 525
Q2 = 475
47 48
8
Box Plot Box Plot
Limits are located (not drawn) using the interquartile Example: Apartment Rents
range (IQR). • The lower limit is located 1.5(IQR) below Q1.
Data outside these limits are considered outliers.
The locations of each outlier is shown with the Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(80) = 325
symbol * .
• The upper limit is located 1.5(IQR) above Q3.
continued
Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(80) = 645
49 50
51 52
53 54
9
Covariance Covariance
The covariance is a measure of the linear association The covariance is computed as follows:
between two variables.
( xi x )( yi y ) for
sxy
Positive values indicate a positive relationship. n 1 samples
55 56
Correlation is a measure of linear association and not The correlation coefficient is computed as follows:
necessarily causation. sxy xy
rxy xy
sx s y x y
Just because two variables are highly correlated, it
does not mean that one variable is the cause of the
for for
other. samples populations
57 58
59 60
10
Covariance and Correlation Coefficient Covariance and Correlation Coefficient
61 62
Data Dashboards:
Adding Numerical Measures
to Improve Effectiveness
Data dashboards are not limited to graphical displays.
The addition of numerical measures, such as the mean
and standard deviation of KPIs, to a data dashboard
is often critical.
Dashboards are often interactive.
Drilling down refers to functionality in interactive
dashboards that allows the user to access information
and analyses at increasingly detailed level.
63 64
11