Professional Documents
Culture Documents
Peter Goos
peter.goos@biw.kuleuven.be
What ?
• Summarizing data by means of “summary
statistics”
• Location
o Mean or average
o Median
o Quartiles
o Quantiles or percentiles
• Spread or variation
o Variance
o Standard deviation
What ?
• Relationship between variables
o Covariance
o Ordinary correlation (Pearson)
o Rank correlation (Spearman)
Summary Statistics
Mean 15.1
Std Dev 1.5238839
Std Err Mean 0.4818944
Upper 95% Mean 16.190121
Lower 95% Mean 14.009879
N 10
Quantiles
100.0% maximum 17
99.5% 17
97.5% 17
90.0% 17
75.0% quartile 16.25
50.0% median 15.5
25.0% quartile 13.75
10.0% 13
2.5% 13
0.5% 13
0.0% minimum 13
Quantiles or percentiles
• The 80th percentile separates the 80% smallest
values and the 20% largest values
Custom Quantiles
Quantiles
Actual
Quantile Estimate Lower 95% Upper 95% Coverage
20% 13.2 13 17 89.26
80% 16.8 13 17 89.26
Spread or variation
• The following data sets have the same median
o Data set 1: 16, 13, 14, 17, 14, 16, 17, 16, 15, 13
o Data set 2: 19, 10, 11, 20, 11, 19, 20, 19, 12, 10
TRN
TLS
THF
SXB
OST
NCL
NAP
MRS
LCY
LBA
HAM
HAJ
GLA
FLR
EDI
DUS
CPH
BUD
BRS
BOD
BHX
0 100 200 300
Delay Time Arrival
Other measures of spread or variation
• Variance
• Standard deviation
Observ-
ation
Correlation 0.7
• The next few pictures all correspond to a
correlation of 0.7
• Only the first picture corresponds to the picture we
expect: a positive relation between two variables,
which is not perfect
• In all other scenarios, the story is more
complicated
o A zero correlation, but one outliying data point
o A nearly perfect correlation, with one outlying
data point
o …
Take-away lesson
• Do not calculate correlations and interpret them
without looking at the data
• The same goes for averages/means, standard
deviations, variances, …
• So, create graphs whenever possible
• Think critically
• Do not study your data too superficially
• In JMP, various types of correlation can be
calculated via the menu `Analyze´ and
`Multivariate´
Spanish red wines
Correlations in JMP
Correlations in JMP
Rank correlation (Spearman)
• Measures more general positive and negative
relations
• Relations should not be linear
• Calculation
o Data points first have to be ranked according to
the values of the two variables under study
o Next, the (ordinary) correlation has to be
calculated for the ranks
Rank correlation
100
80
60
y
40
20
0 2 4 6 8 10 12
x
Rank correlation
Multivariate
Correlations
x y
x 1.0000 0.7169
y 0.7169 1.0000
Nonparametric: Spearman's ρ
Variable by Variable Spearman ρ Prob>|ρ| -.8-.6-.4-.2 0 .2 .4 .6 .8
y x 1.0000 <.0001*
Warning: sample size of 10 is too small, P
value suspect.