Professional Documents
Culture Documents
Descriptive Statistics
University of Castilla-La Mancha
Department of Mathematics
Institute of Applied Mathematics to Science and Engineering
ETSII
Descriptive Statistics
Outline
1. Frequency distribution.
2. Graphics.
3. Numerical measures: Position, centrality, dispersion and shape.
4. Bidimensional distributions: Regression and correlation.
Descriptive Statistics
Introduction
Descriptive Statistics.
Probability.
Descriptive Statistics
Comparative
Size
Variables
Measures
Aspect
Graphics
Sample
n
Statistical
Statistic
Latin letters
x, S 2 ...
Histogram
Population
N
Random
Parameter
Greek letters
, 2 ...
Probability density function (pdf)
Cumulative distribution function (cdf)
Descriptive Statistics
578
642
504
448
526
804
374
751
739
505
718
856
807
571
624
210
267
561
562
703
388
376
719
189
605
421
684
1020
817
809
562
508
464
661
496
435
685
592
690
706
971
529
410
877
296
291
460
814
720
626
698
393
491
563
628
393
570
843
758
631
298
354
557
647
481
605
928
466
731
585
673
725
771
447
224
341
516
498
480
639
Descriptive Statistics
Concepts
Experimental unit.
Measurement.
Types of variables:
Qualitative (Categorical):
I
I
Quantitative:
I
I
Descriptive Statistics
Digits
0
1
2
3
4
5
6
7
8
9
Total
Descriptive Statistics
Bar chart
Descriptive Statistics
Pie chart
Descriptive Statistics
Class
limits
Mark
0-10
10-20
20-30
30-40
40-50
50-60
60-70
70-80
80-90
90-100
5
15
25
35
45
55
65
75
85
95
8
18
30
52
84
134
162
180
192
200
interval
Cumulative
percentage
frequency
4
9
15
26
42
67
81
90
96
100
Descriptive Statistics
Histogram
Descriptive Statistics
Descriptive Statistics
Frequency Polygon
Descriptive Statistics
Rule of thumb:
n 50 > 5 8 classes
Descriptive Statistics
Descriptive Statistics
Descriptive Statistics
Descriptive Statistics
Stem-and-leaf diagram
3, 7, 11, 12, 13, 14, 15, 16, 17, 17, 18, 18, 18, 19, 19, 19, 20, 20, 21, 21,
21, 22, 22, 23, 23 ...
Descriptive Statistics
Stem-and-leaf diagram
Descriptive Statistics
Scatterplot
Numerical measures:
Position, centrality, dispersion, shape
Descriptive Statistics
Descriptive Statistics
Median
Descriptive Statistics
Centrality statistics
Example: 56, 62, 63, 65, 65, 65, 65, 68, 70, 72
I
Mean: x =
xi
P
x =
i fi xi
56+62+63+65+65+65+65+68+70+72
10
= 65.1
56 + 62 + 63 + 4 65 + 68 + 70 + 72
= 65.1
10
Descriptive Statistics
Dispersion
Descriptive Statistics
Dispersion
x)
i fi (xi
i fi |xi Me|
Variance: S =
I
I
Coefficient of variation: CV =
n
x )2
i fi (xi
n1
S
|
x|
Descriptive Statistics
Chebyshevs Theorem (any data): At least 100(1 1/k 2 ) will lie within k
standard deviations.
That is,
Descriptive Statistics
Descriptive Statistics
Position statistics
values before x .
n
Descriptive Statistics
Example
Descriptive Statistics
Inter-quartile range
IQR = Q3 Q1 .
SIQR =
Q3 Q1
2
(Semi-Inter-quartile range)
Descriptive Statistics
Sample z-score
z=
x x
S
Not unusual 2 z 2
Suspect outlier 3 z 2 or 2 z 3
Extreme outlier z < 3 or z > 3
Descriptive Statistics
Outlier effect
Descriptive Statistics
Outlier effect
Descriptive Statistics
Outlier
Descriptive Statistics
Descriptive Statistics
Minimum
Maximum
Quartiles
Q1
Q2
Q3
IQR
Lower limit
Upper limit
189
1020
463.00
574.50
719.25
256.25
78.625
1103.625
Descriptive Statistics
Box-Plot
Descriptive Statistics
Descriptive Statistics
Symmetric: x = Me = Mo
Descriptive Statistics
x < Me < Mo
Descriptive Statistics
Mo < Me < x
Descriptive Statistics
Skewness coefficients
xMo
S .
3(
x Me)
.
S
Pearson 1:
Pearson 2:
Fisher: g1 =
m3
S3 ,
where m3 =
< 0,
0,
> 0,
x)
i fi (xi
n
Descriptive Statistics
Descriptive Statistics
Leptokurtic
Descriptive Statistics
Platikurtic
Descriptive Statistics
Kurtosis coefficients
Percentile kurtosis:
Fisher: g2 =
m4
S4
Q3 Q1
2(P90 P10 ) .
3, where m4 =
< 0,
0,
> 0,
x)
i fi (xi
n
platikurtic.
mesokurtic.
leptokurtic.
Descriptive Statistics
Covariance
P
Sxy =
i (xi
x)(yi y )
.
n
Descriptive Statistics
Covariance
P
Sxy =
i (xi
x)(yi y )
> 0.
n
Descriptive Statistics
Covariance
P
Sxy =
i (xi
x)(yi y )
< 0.
n
Descriptive Statistics
Covariance
P
Sxy =
i (xi
x)(yi y )
0.
n
Descriptive Statistics
Covariance matrix
of variables i and j.
S1k
Skk
Descriptive Statistics
rxy =
Sxy
[1, 1].
Sx Sy
No units
rxy
> 0,
positively correlated
rxy
< 0,
negatively correlated
rxy
0,
uncorrelated
rxy
1 or 1, perfect correlation
Descriptive Statistics
Descriptive Statistics
Normal equations:
y = a + b
x
xy = a
x + bx 2
Estimators:
Sxy
b = 2 .
Sx
x .
a = y b
Descriptive Statistics
Sxy
(x x).
Sx2
Residuals:
i ).
ei = yi yi = yi (a + bx
Residual variance (variance estimator):
P 2
e
2
SR = i i .
n2
Determination coefficient:
R 2 (= r 2 here).
Descriptive Statistics
Remarks
Descriptive Statistics
Nonlinear correlation
Descriptive Statistics