You are on page 1of 62

Probability and Statistics ì

for Computer Science

“Correlation is not Causation”


but Correlation is so beautiful!

Credit: wikipedia

Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 8.29.2023


Last time
✺ Mean
✺ Standard deviation
✺ Variance
✺ Standardizing data
3x 3 :
it [ 1 , 8]

7,
9x 3
: = 1 ,
2 ,
3 ,
4, 5 ,
6
,
12

I
Mean (3x: 3) =

8
I(x :
-
e)
i =
1

i ' *
I

X 5/7 12

p
argmin ((x: -
u)")
i
el
\xi : = [1 , N]

y *
i
↳ ,
7I C

& ~
*
/ S ↑

a
S & ..

.
I meant &
mean-k8 mean

N
-
I k2
* of Po
sitive
number
a

peS k is
Q: Estimate the range of data in
standard coordinates
✺ The interval [-5, 5] covers x% data,
choose the closest estimate of x.
A. 80%
B. 99

I
C. 96
xi − mean({xi })
x!i =
D. 96% std({xi })
Objectives
✺ Median, Percentile, Mode, IQR,
✺ Scatter plots for relationships
✺ Correlation Coefficient
✺ Other Visualization for relationships
Heatmap, 3D bar, Time series plots,
Median
✺ We first sort the data set {xi}
✺ Then if the number of items N is odd
median = middle item's value
if the number of items N is even
median = mean of middle 2 items'
values
Properties of Median
✺ Scaling data scales the median

median({k · xi }) = k · median({xi })

✺ Translating data translates the median

median({xi + c}) = median({xi }) + c


Percentile

✺ kth percentile is the value relative to


which k% of the data items have smaller
or equal numbers
✺ Median is roughly the 50th percentile
Interquartile range
✺ iqr = (75th percentile) - (25th percentile)
✺ Scaling data scales the interquartile range
iqr({k · xi }) = |k| · iqr({xi })

✺ Translating data does NOT change the


interquartile range
iqr({xi + c}) = iqr({xi })
Box plots
✺ Boxplots Vehicle death by region
✺ Simpler than
histogram

DEATH
✺ Good for outliers
✺ Easier to use
for comparison

Data from
https://www2.stetson.edu/~jrasp/data.htm
Boxplots details, outliers
✺ How to Outlier

define
> 1.5 iqr
outliers? Whisker

(the default) Box


Interquartile
Median Range (iqr)
< 1.5 iqr
Sensitivity of summary statistics to
outliers
✺ mean and standard deviation are
very sensitive to outliers
✺ median and interquartile range are
not sensitive to outliers
Modes
✺ Modes are peaks in a histogram
✺ If there are more than 1 mode, we
should be curious as to why
Multiple modes
✺ We have seen
the “iris” data
which looks to
have several
peaks

Data: “iris” in R
Example Bi-modes distribution
✺ Modes may
indicate
multiple
populations

Data: Erythrocyte cells in


healthy humans

Piagnerelli, JCP 2007


Tails and Skews
Symmetric Histogram
mode, median, mean all on top of
one another

left right
tail tail

Left Skew Right Skew


mean mean
mode
mode

medium medium

left right left right


tail tail tail tail

Credit: Prof.Forsyth
Relationship between data features
✺ Example: Does the weight of people relate to
their height?

✺ x : HIGHT, y: WEIGHT
Scatter plot
✺ Body Fat data set
Scatter plot
✺ Scatter plot with density
Scatter plot
✺ Removed of outliers & standardized
Correlation seen from scatter plots
Zero Positive Negative
Correlation correlation correlation

Weights, outliers removed, normalized


No Correlation Positive Correlation Negative Correlation
Normalized heart rate

Density
Normalized body temperature Heights, outliers removed, normalized Body fat
What kind of Correlation?
✺ Line of code in a database and number of bugs

✺ Frequency of hand washing and number of germs


on your hands
✺ GPA and hours spent playing video games

✺ earnings and happiness

Credit: Prof. David Varodayan


Correlation Coefficient
✺ Given a data set {(xi , yi )}consisting of
items (x1 , y1 ) ... (xN , yN ),
✺ Standardize the coordinates of each feature:
xi − mean({xi }) yi − mean({yi })
x!i = y!i =
std({xi }) std({yi })

✺ Define the correlation coefficient as:


!N
1
corr({(xi , yi )}) = x"i y"i
N i=1
Correlation Coefficient

xi − mean({xi }) yi − mean({yi })
x!i = y!i =
std({xi }) std({yi })

!N
1
corr({(xi , yi )}) = x"i y"i
N i=1

= mean({x!i y!i })
Q: Correlation Coefficient
✺ Which of the following describe(s)
correlation coefficient correctly?
A. It’s unitless
B. It’s defined in standard coordinates
I
C. Both A & B
N
1 !
corr({(xi , yi )}) = x"i y"i
N i=1
A visualization of correlation
coefficient
https://rpsychologist.com/d3/correlation/
In a data set {(xi , yi )} consisting of items
(x1 , y1 ) ... (xN , yN ),

corr({(xi , yi )}) > 0 shows positive correlation

corr({(xi , yi )}) < 0 shows negative correlation


corr({(xi , yi )}) = 0 shows no correlation
Correlation seen from scatter plots
Zero Positive Negative
Correlation correlation correlation
corice
Weights, outliers removed, normalized
No Correlation Positive Correlation Negative Correlation
Normalized heart rate

Density
ourco

wr7 O

Normalized body temperature Heights, outliers removed, normalized Body fat


The Properties of Correlation
Coefficient
✺ The correlation coefficient is symmetric

corr({(xi , yi )}) = corr({(yi , xi )})

✺ Translating the data does NOT change the


correlation coefficient
The Properties of Correlation
Coefficient
✺ Scaling the data may change the sign of
the correlation coefficient
corr({(a xi + b, c yi + d)})
= sign(a c)corr({(xi , yi )})
The Properties of Correlation
Coefficient
✺ The correlation coefficient is bounded
within [-1, 1]
corr({(xi , yi )}) = 1 if and only if x!i = y!i

corr({(xi , yi )}) = −1 if and only if x!i = −y!i


Q. Which of the following has correlation
coefficient equal to 1?
Y Y Y

14

....
**
1 [ *

.....
·

·

ix

X X

E
A. Left and right
B. Left
C. Middle
and
Review
it at
finish
y a) 8
=
ax home
; .

amane
a
M

y
=

cor-E
=
amean)(x))
astd(x)
X
x
-

-
Concept of Correlation Coefficient’s
bound
✺ The correlation coefficient can be
written as 1 !N
corr({(xi , yi )}) = x"i y"i
N i=1

!N
x"i y"i
corr({(xi , yi )}) = √ √
i=1
N N
✺ It’s the inner product of two vectors
! " ! "
√!
x
N
1
, ... x!
√N
N
and √y!1
N
, ... y"
√N
N
Inner product
✺ Inner product’s geometric meaning:
ν1
|ν1 | |ν2 | cos(θ)
θ ν2
✺ Lengths of both vectors
ν1= N , ... N ν2= N , ...
! " ! "
!
x 1 x!
N y!1
√ y"
√N
√ √
N
are 1
Bound of correlation coefficient

|corr({(xi , yi )})| = |cos(θ)| ≤ 1


ν1
θ ν2
ν1= ν2=
! " ! "
!
x
√ 1 x!
√N y!1
√ y"
√N
N
, ... N N
, ... N
The Properties of Correlation
Coefficient
✺ Symmetric
✺ Translating invariant
✺ Scaling only may change sign
✺ bounded within [-1, 1]
Using correlation to predict
✺ Caution! Correlation is NOT Causation

Credit: Tyler Vigen


How do we go about the prediction?
✺ Removed of outliers & standardized

-
yP

see

! Erin
.
Using correlation to predict
✺ Given a correlated data set {(xi , yi )}
we can predict a value y0 that goes
p

with x0 a value
✺ In standard coordinates {(x!i , y!i )}
we can predict a value y!0 that goes
p

with x!0 a value


Q:
✺ Which coordinates will you use for the
predictor using correlation?

A. Standard coordinates
I
B. Original coordinates
C. Either
Linear predictor and its error
✺ We will assume that our predictor is linear

y! = a x
p
!+b
✺ We denote the prediction at each x
!i in the data
set as y!i p

p
y!i = a x!i + b
✺ The error in the prediction is denoted ui
p
ui = y!i − y!i = y!i − a x!i − b
Require the mean of error to be zero
We would try to make the mean of error equal to
zero so that it is also centered around 0 as
the standardized data:
Require the variance of error is
minimal
Require the mean of error to be zero
We would try to make the mean of error equal to
zero so that it is also centered around 0 as
the standardized data: (4 : 3)
mean 8 ~ =

mean (4i3) = (y-ypys


mean

Recall : -mean y - ax-b3s


,

-
mean
(kx b3)
b
(9y5) a mean( 3)
+
=mean -

=kmean((x3 + b -
=
0

b =O
Require the variance of error is
minimal
mize
vars wi -o
:
min

ni-means(wit()
var()wi)= mean (
=mean (9 wit 3 ) *
U! P
=mean((y - yp3) =y y
ax
x

=mean)(y-ax , 23
- -

-b 0
=

zac+a 3
=

-
mean)

X (iz]) means
-ea
s I
-mean

-
?
a means-
Require the variance of error is
minimal
mize
vars wi -o
:
min

ni-means(wit()
var()wi)= mean (
=mean (9 wit 3 )
Mi - y*
=mean((y - yp3)
= ↳

a
=mean)(y-ax , 23
- -

-b 0
=

zacy+a 3
=

-
-mean) I

mean(23
I
-
meansyzz-amenys +a ↑
"mean*
S
>

=mean) 314-0323 con ->
r
=mean 19-meanly , 2}
=varcy=I
Require the variance of error is
minimal

warnll-mean(423-2 a mean (= Yys


+amean(

- I -
zamean() + a
2
var(up
1
2 a corr (x y3) +,
a

argminst
- -

r = corr(5x y3 ,

a a
=

?
zar
+

-1 -

drar(su 0
za
=

-= - 2 +

d a
Require the variance of error is
minimal
xP
ax + b
y
-
-

C
=
uX r
a
=

b =
0
Here is the linear predictor!

y! = r x
p
!
Correlation coefficient
Prediction Formula
✺ In standard coordinates
p
y!0 = r x!0 where r = corr({(xi , yi )})
✺ In original coordinates
p
y0 − mean({yi }) x0 − mean({xi })
=r
std({yi }) std({xi })
Root-mean-square (RMS) prediction
error
✺ Given var({ui }) = 1 − 2ar + a2
& a=r

var({ui }) = 1 − r2


!
RM S error = mean({u2i })
!
= var({ui })

= 1 − r2
See the error through simulation

https://rpsychologist.com/d3/correlation/
Example: Body Fat data

r = 0.513
Example: remove 2 more outliers

r = 0.556
Scatter plot
✺ Coupled with
heatmap to
show a 3rd
feature
Time Series Plot: Stock of Amazon
Heatmap
✺ Display matrix of data via gradient of color(s)

Summarization of 4 locations’ annual mean


temperature by month
3D bar chart
✺ Transparent
3D bar chart
is good for
small # of
samples
across
categories
Assignments
✺ Quiz1 open at 4:30pm today on PL
✺ Finish reading Chapter 2 of the
textbook
✺ Work on the Week 2 module on Canvas
✺ Next time: Probability a first look
Additional References
✺ Charles M. Grinstead and J. Laurie Snell
"Introduction to Probability”
✺ Morris H. Degroot and Mark J. Schervish
"Probability and Statistics”
See you next time

See
You!

You might also like