Professional Documents
Culture Documents
CS361 FA23 Lec3 Post
CS361 FA23 Lec3 Post
Credit: wikipedia
7,
9x 3
: = 1 ,
2 ,
3 ,
4, 5 ,
6
,
12
I
Mean (3x: 3) =
8
I(x :
-
e)
i =
1
i ' *
I
X 5/7 12
p
argmin ((x: -
u)")
i
el
\xi : = [1 , N]
y *
i
↳ ,
7I C
& ~
*
/ S ↑
a
S & ..
.
I meant &
mean-k8 mean
N
-
I k2
* of Po
sitive
number
a
peS k is
Q: Estimate the range of data in
standard coordinates
✺ The interval [-5, 5] covers x% data,
choose the closest estimate of x.
A. 80%
B. 99
I
C. 96
xi − mean({xi })
x!i =
D. 96% std({xi })
Objectives
✺ Median, Percentile, Mode, IQR,
✺ Scatter plots for relationships
✺ Correlation Coefficient
✺ Other Visualization for relationships
Heatmap, 3D bar, Time series plots,
Median
✺ We first sort the data set {xi}
✺ Then if the number of items N is odd
median = middle item's value
if the number of items N is even
median = mean of middle 2 items'
values
Properties of Median
✺ Scaling data scales the median
median({k · xi }) = k · median({xi })
DEATH
✺ Good for outliers
✺ Easier to use
for comparison
Data from
https://www2.stetson.edu/~jrasp/data.htm
Boxplots details, outliers
✺ How to Outlier
define
> 1.5 iqr
outliers? Whisker
Data: “iris” in R
Example Bi-modes distribution
✺ Modes may
indicate
multiple
populations
left right
tail tail
medium medium
Credit: Prof.Forsyth
Relationship between data features
✺ Example: Does the weight of people relate to
their height?
✺ x : HIGHT, y: WEIGHT
Scatter plot
✺ Body Fat data set
Scatter plot
✺ Scatter plot with density
Scatter plot
✺ Removed of outliers & standardized
Correlation seen from scatter plots
Zero Positive Negative
Correlation correlation correlation
Density
Normalized body temperature Heights, outliers removed, normalized Body fat
What kind of Correlation?
✺ Line of code in a database and number of bugs
xi − mean({xi }) yi − mean({yi })
x!i = y!i =
std({xi }) std({yi })
!N
1
corr({(xi , yi )}) = x"i y"i
N i=1
= mean({x!i y!i })
Q: Correlation Coefficient
✺ Which of the following describe(s)
correlation coefficient correctly?
A. It’s unitless
B. It’s defined in standard coordinates
I
C. Both A & B
N
1 !
corr({(xi , yi )}) = x"i y"i
N i=1
A visualization of correlation
coefficient
https://rpsychologist.com/d3/correlation/
In a data set {(xi , yi )} consisting of items
(x1 , y1 ) ... (xN , yN ),
Density
ourco
wr7 O
14
....
**
1 [ *
.....
·
⑧
·
⑧
ix
⑧
X X
E
A. Left and right
B. Left
C. Middle
and
Review
it at
finish
y a) 8
=
ax home
; .
amane
a
M
y
=
cor-E
=
amean)(x))
astd(x)
X
x
-
-
Concept of Correlation Coefficient’s
bound
✺ The correlation coefficient can be
written as 1 !N
corr({(xi , yi )}) = x"i y"i
N i=1
!N
x"i y"i
corr({(xi , yi )}) = √ √
i=1
N N
✺ It’s the inner product of two vectors
! " ! "
√!
x
N
1
, ... x!
√N
N
and √y!1
N
, ... y"
√N
N
Inner product
✺ Inner product’s geometric meaning:
ν1
|ν1 | |ν2 | cos(θ)
θ ν2
✺ Lengths of both vectors
ν1= N , ... N ν2= N , ...
! " ! "
!
x 1 x!
N y!1
√ y"
√N
√ √
N
are 1
Bound of correlation coefficient
-
yP
see
! Erin
.
Using correlation to predict
✺ Given a correlated data set {(xi , yi )}
we can predict a value y0 that goes
p
with x0 a value
✺ In standard coordinates {(x!i , y!i )}
we can predict a value y!0 that goes
p
A. Standard coordinates
I
B. Original coordinates
C. Either
Linear predictor and its error
✺ We will assume that our predictor is linear
y! = a x
p
!+b
✺ We denote the prediction at each x
!i in the data
set as y!i p
p
y!i = a x!i + b
✺ The error in the prediction is denoted ui
p
ui = y!i − y!i = y!i − a x!i − b
Require the mean of error to be zero
We would try to make the mean of error equal to
zero so that it is also centered around 0 as
the standardized data:
Require the variance of error is
minimal
Require the mean of error to be zero
We would try to make the mean of error equal to
zero so that it is also centered around 0 as
the standardized data: (4 : 3)
mean 8 ~ =
-
mean
(kx b3)
b
(9y5) a mean( 3)
+
=mean -
=kmean((x3 + b -
=
0
b =O
Require the variance of error is
minimal
mize
vars wi -o
:
min
ni-means(wit()
var()wi)= mean (
=mean (9 wit 3 ) *
U! P
=mean((y - yp3) =y y
ax
x
=mean)(y-ax , 23
- -
-b 0
=
zac+a 3
=
-
mean)
X (iz]) means
-ea
s I
-mean
-
?
a means-
Require the variance of error is
minimal
mize
vars wi -o
:
min
ni-means(wit()
var()wi)= mean (
=mean (9 wit 3 )
Mi - y*
=mean((y - yp3)
= ↳
a
=mean)(y-ax , 23
- -
-b 0
=
zacy+a 3
=
-
-mean) I
mean(23
I
-
meansyzz-amenys +a ↑
"mean*
S
>
↓
=mean) 314-0323 con ->
r
=mean 19-meanly , 2}
=varcy=I
Require the variance of error is
minimal
- I -
zamean() + a
2
var(up
1
2 a corr (x y3) +,
a
argminst
- -
r = corr(5x y3 ,
a a
=
?
zar
+
-1 -
drar(su 0
za
=
-= - 2 +
d a
Require the variance of error is
minimal
xP
ax + b
y
-
-
C
=
uX r
a
=
b =
0
Here is the linear predictor!
y! = r x
p
!
Correlation coefficient
Prediction Formula
✺ In standard coordinates
p
y!0 = r x!0 where r = corr({(xi , yi )})
✺ In original coordinates
p
y0 − mean({yi }) x0 − mean({xi })
=r
std({yi }) std({xi })
Root-mean-square (RMS) prediction
error
✺ Given var({ui }) = 1 − 2ar + a2
& a=r
var({ui }) = 1 − r2
✺
!
RM S error = mean({u2i })
!
= var({ui })
√
= 1 − r2
See the error through simulation
https://rpsychologist.com/d3/correlation/
Example: Body Fat data
r = 0.513
Example: remove 2 more outliers
r = 0.556
Scatter plot
✺ Coupled with
heatmap to
show a 3rd
feature
Time Series Plot: Stock of Amazon
Heatmap
✺ Display matrix of data via gradient of color(s)
See
You!