CS361 FA23 Lec3 Post

Probability and Statistics ì
for Computer Science
“Correlation is not Causation”

but Correlation is so beautiful!
Credit: wikipedia
Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 8.29.2023

Last time
✺ Mean
✺ Standard deviation
✺ Variance
✺ Standardizing data
3x 3 :
it [ 1 , 8]
7,
9x 3
: = 1 ,
2 ,
3 ,
4, 5 ,
6
,
12
I
Mean (3x: 3) =
8
I(x :
-
e)
i =
1
i ' *
I
X 5/7 12
p
argmin ((x: -
u)")
i
el
\xi : = [1 , N]
y *
i
↳ ,
7I C
& ~
*
/ S ↑
a
S & ..
.
I meant &
mean-k8 mean
N
-
I k2
* of Po
sitive
number
a
peS k is
Q: Estimate the range of data in
standard coordinates
✺ The interval [-5, 5] covers x% data,
choose the closest estimate of x.
A. 80%
B. 99
I
C. 96
xi − mean({xi })
x!i =
D. 96% std({xi })
Objectives
✺ Median, Percentile, Mode, IQR,
✺ Scatter plots for relationships
✺ Correlation Coefficient
✺ Other Visualization for relationships
Heatmap, 3D bar, Time series plots,
Median
✺ We first sort the data set {xi}
✺ Then if the number of items N is odd
median = middle item's value
if the number of items N is even
median = mean of middle 2 items'
values
Properties of Median
✺ Scaling data scales the median
median({k · xi }) = k · median({xi })
✺ Translating data translates the median
median({xi + c}) = median({xi }) + c

Percentile
✺ kth percentile is the value relative to

which k% of the data items have smaller
or equal numbers
✺ Median is roughly the 50th percentile
Interquartile range
✺ iqr = (75th percentile) - (25th percentile)
✺ Scaling data scales the interquartile range
iqr({k · xi }) = |k| · iqr({xi })
✺ Translating data does NOT change the

interquartile range
iqr({xi + c}) = iqr({xi })
Box plots
✺ Boxplots Vehicle death by region
✺ Simpler than
histogram
DEATH
✺ Good for outliers
✺ Easier to use
for comparison
Data from
https://www2.stetson.edu/~jrasp/data.htm
Boxplots details, outliers
✺ How to Outlier
define
> 1.5 iqr
outliers? Whisker
(the default) Box

Interquartile
Median Range (iqr)
< 1.5 iqr
Sensitivity of summary statistics to
outliers
✺ mean and standard deviation are
very sensitive to outliers
✺ median and interquartile range are
not sensitive to outliers
Modes
✺ Modes are peaks in a histogram
✺ If there are more than 1 mode, we
should be curious as to why
Multiple modes
✺ We have seen
the “iris” data
which looks to
have several
peaks
Data: “iris” in R
Example Bi-modes distribution
✺ Modes may
indicate
multiple
populations
Data: Erythrocyte cells in

healthy humans
Piagnerelli, JCP 2007

Tails and Skews
Symmetric Histogram
mode, median, mean all on top of
one another
left right
tail tail
Left Skew Right Skew

mean mean
mode
mode
medium medium
left right left right

tail tail tail tail
Credit: Prof.Forsyth
Relationship between data features
✺ Example: Does the weight of people relate to
their height?
✺ x : HIGHT, y: WEIGHT
Scatter plot
✺ Body Fat data set
Scatter plot
✺ Scatter plot with density
Scatter plot
✺ Removed of outliers & standardized
Correlation seen from scatter plots
Zero Positive Negative
Correlation correlation correlation
Weights, outliers removed, normalized

No Correlation Positive Correlation Negative Correlation
Normalized heart rate
Density
Normalized body temperature Heights, outliers removed, normalized Body fat
What kind of Correlation?
✺ Line of code in a database and number of bugs
✺ Frequency of hand washing and number of germs

on your hands
✺ GPA and hours spent playing video games
✺ earnings and happiness
Credit: Prof. David Varodayan

Correlation Coefficient
✺ Given a data set {(xi , yi )}consisting of
items (x1 , y1 ) ... (xN , yN ),
✺ Standardize the coordinates of each feature:
xi − mean({xi }) yi − mean({yi })
x!i = y!i =
std({xi }) std({yi })
✺ Define the correlation coefficient as:

!N
1
corr({(xi , yi )}) = x"i y"i
N i=1
Correlation Coefficient
xi − mean({xi }) yi − mean({yi })
x!i = y!i =
std({xi }) std({yi })
!N
1
N i=1
= mean({x!i y!i })
Q: Correlation Coefficient
✺ Which of the following describe(s)
correlation coefficient correctly?
A. It’s unitless
B. It’s defined in standard coordinates
I
C. Both A & B
N
1 !
N i=1
A visualization of correlation
coefficient
https://rpsychologist.com/d3/correlation/
In a data set {(xi , yi )} consisting of items
(x1 , y1 ) ... (xN , yN ),
corr({(xi , yi )}) > 0 shows positive correlation
corr({(xi , yi )}) < 0 shows negative correlation

corr({(xi , yi )}) = 0 shows no correlation
Correlation seen from scatter plots
Zero Positive Negative
Correlation correlation correlation
corice
Weights, outliers removed, normalized
No Correlation Positive Correlation Negative Correlation
Normalized heart rate
Density
ourco
wr7 O
Normalized body temperature Heights, outliers removed, normalized Body fat

The Properties of Correlation
Coefficient
✺ The correlation coefficient is symmetric
corr({(xi , yi )}) = corr({(yi , xi )})
✺ Translating the data does NOT change the

correlation coefficient
Coefficient
✺ Scaling the data may change the sign of
the correlation coefficient
corr({(a xi + b, c yi + d)})
= sign(a c)corr({(xi , yi )})
Coefficient
✺ The correlation coefficient is bounded
within [-1, 1]
corr({(xi , yi )}) = 1 if and only if x!i = y!i
corr({(xi , yi )}) = −1 if and only if x!i = −y!i

Q. Which of the following has correlation
coefficient equal to 1?
Y Y Y
14
....
**
1 [ *
.....
·
⑧
·
⑧
ix
⑧
X X
E
A. Left and right
B. Left
C. Middle
and
Review
it at
finish
y a) 8
=
ax home
; .
amane
a
M
y
=
cor-E
=
amean)(x))
astd(x)
X
x
-
-
Concept of Correlation Coefficient’s
bound
✺ The correlation coefficient can be
written as 1 !N
N i=1
!N
x"i y"i
corr({(xi , yi )}) = √ √
i=1
N N
✺ It’s the inner product of two vectors
! " ! "
√!
x
N
1
, ... x!
√N
N
and √y!1
N
, ... y"
√N
N
Inner product
✺ Inner product’s geometric meaning:
ν1
|ν1 | |ν2 | cos(θ)
θ ν2
✺ Lengths of both vectors
ν1= N , ... N ν2= N , ...
! " ! "
!
x 1 x!
N y!1
√ y"
√N
√ √
N
are 1
Bound of correlation coefficient
|corr({(xi , yi )})| = |cos(θ)| ≤ 1

ν1
θ ν2
ν1= ν2=
! " ! "
!
x
√ 1 x!
√N y!1
√ y"
√N
N
, ... N N
, ... N
Coefficient
✺ Symmetric
✺ Translating invariant
✺ Scaling only may change sign
✺ bounded within [-1, 1]
Using correlation to predict
✺ Caution! Correlation is NOT Causation
Credit: Tyler Vigen

How do we go about the prediction?
✺ Removed of outliers & standardized
-
yP
see
! Erin
.
Using correlation to predict
✺ Given a correlated data set {(xi , yi )}
we can predict a value y0 that goes
p
with x0 a value
✺ In standard coordinates {(x!i , y!i )}
we can predict a value y!0 that goes
p
with x!0 a value

Q:
✺ Which coordinates will you use for the
predictor using correlation?
A. Standard coordinates
I
B. Original coordinates
C. Either
Linear predictor and its error
✺ We will assume that our predictor is linear
y! = a x
p
!+b
✺ We denote the prediction at each x
!i in the data
set as y!i p
p
y!i = a x!i + b
✺ The error in the prediction is denoted ui
p
ui = y!i − y!i = y!i − a x!i − b
Require the mean of error to be zero
We would try to make the mean of error equal to
zero so that it is also centered around 0 as
the standardized data:
Require the variance of error is
minimal
Require the mean of error to be zero
We would try to make the mean of error equal to
zero so that it is also centered around 0 as
the standardized data: (4 : 3)
mean 8 ~ =
mean (4i3) = (y-ypys

mean
Recall : -mean y - ax-b3s

,
-
mean
(kx b3)
b
(9y5) a mean( 3)
+
=mean -
=kmean((x3 + b -
=
0
b =O
minimal
mize
vars wi -o
:
min
ni-means(wit()
var()wi)= mean (
=mean (9 wit 3 ) *
U! P
=mean((y - yp3) =y y
ax
x
=mean)(y-ax , 23
- -
-b 0
=
zac+a 3
=
-
mean)
X (iz]) means
-ea
s I
-mean
-
?
a means-
minimal
mize
vars wi -o
:
min
ni-means(wit()
var()wi)= mean (
=mean (9 wit 3 )
Mi - y*
=mean((y - yp3)
= ↳
a
=mean)(y-ax , 23
- -
-b 0
=
zacy+a 3
=
-
-mean) I
mean(23
I
-
meansyzz-amenys +a ↑
"mean*
S
>
↓
=mean) 314-0323 con ->
r
=mean 19-meanly , 2}
=varcy=I
minimal
warnll-mean(423-2 a mean (= Yys

+amean(
- I -
zamean() + a
2
var(up
1
2 a corr (x y3) +,
a
argminst
- -
r = corr(5x y3 ,
a a
=
?
zar
+
-1 -
drar(su 0
za
=
-= - 2 +
d a
minimal
xP
ax + b
y
-
-
C
=
uX r
a
=
b =
0
Here is the linear predictor!
y! = r x
p
!
Correlation coefficient
Prediction Formula
✺ In standard coordinates
p
y!0 = r x!0 where r = corr({(xi , yi )})
✺ In original coordinates
p
y0 − mean({yi }) x0 − mean({xi })
=r
std({yi }) std({xi })
Root-mean-square (RMS) prediction
error
✺ Given var({ui }) = 1 − 2ar + a2
& a=r
var({ui }) = 1 − r2
✺
!
RM S error = mean({u2i })
!
= var({ui })
√
= 1 − r2
See the error through simulation
https://rpsychologist.com/d3/correlation/
Example: Body Fat data
r = 0.513
Example: remove 2 more outliers
r = 0.556
Scatter plot
✺ Coupled with
heatmap to
show a 3rd
feature
Time Series Plot: Stock of Amazon
Heatmap
✺ Display matrix of data via gradient of color(s)
Summarization of 4 locations’ annual mean

temperature by month
3D bar chart
✺ Transparent
3D bar chart
is good for
small # of
samples
across
categories
Assignments
✺ Quiz1 open at 4:30pm today on PL
✺ Finish reading Chapter 2 of the
textbook
✺ Work on the Week 2 module on Canvas
✺ Next time: Probability a first look
Additional References
✺ Charles M. Grinstead and J. Laurie Snell
"Introduction to Probability”
✺ Morris H. Degroot and Mark J. Schervish
"Probability and Statistics”
See you next time
See
You!

CS361 FA23 Lec3 Post

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS361 FA23 Lec3 Post

Uploaded by

Copyright:

Available Formats

Probability and Statistics ì

for Computer Science

“Correlation is not Causation”

Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 8.29.2023

✺ Translating data translates the median

median({xi + c}) = median({xi }) + c

✺ kth percentile is the value relative to

✺ Translating data does NOT change the

(the default) Box

Data: Erythrocyte cells in

Piagnerelli, JCP 2007

Left Skew Right Skew

left right left right

Weights, outliers removed, normalized

✺ Frequency of hand washing and number of germs

✺ earnings and happiness

Credit: Prof. David Varodayan

✺ Define the correlation coefficient as:

corr({(xi , yi )}) > 0 shows positive correlation

corr({(xi , yi )}) < 0 shows negative correlation

Normalized body temperature Heights, outliers removed, normalized Body fat

corr({(xi , yi )}) = corr({(yi , xi )})

✺ Translating the data does NOT change the

corr({(xi , yi )}) = −1 if and only if x!i = −y!i

|corr({(xi , yi )})| = |cos(θ)| ≤ 1

Credit: Tyler Vigen

with x!0 a value

mean (4i3) = (y-ypys

Recall : -mean y - ax-b3s

warnll-mean(423-2 a mean (= Yys

Summarization of 4 locations’ annual mean

You might also like