CS361 FA23 Lec2 Post

Probability and Statistics ì
for Computer Science
“The statement that “The

average US family has 2.6
children” invites mockery” –
Prof. Forsyth reminds us
about critical thinking
Credit: wikipedia
Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 8.24.2023

Last lecture
✺ Welcome/Orientation
✺ Big picture of the contents
✺ Lecture 1 - Data Visualization &
Summary (I)
Warm up question:
✺ What kind of data is a letter grade?
Ordinal.
✺ What do you ask for usually about the
stats of an exam with numerical
scores?
Objectives
✺ Simple visualization of 1-D data
✺ Grasp Summary Statistics
✺ Learn more Data Visualization for
Relationships
Simple Visualization of Data
✺ General principles
✺ Bar chart
✺ Histogram
✺ Conditional histogram
✺ General principles
Must not mislead or distort;
Aesthetically pleasing;
Clear, Attractive, Convincing;
Show message/significance.
✺ Bar chart
A set of bars that
are organized
by categorical
or ordinal feature
Data: “mtcars”
An example of good, ugly, bad, wrong
The difference a b ugly

between good,
5 5
4 4
ugly, bad and
value
3
value
3
2 2
wrong 1
0
1
0
visualization
A B C A B C
c bad d wrong
a – good 4
3
5
4 4
b – ugly 3 3
value
2
2 2
c – bad 1
0
1
0
1
0
d - wrong
A B C A B C
Q: Is this a good bar chart?
A. Yes
I
B. No
How about using a color scale
Visualizing Data with Histogram
✺ Histogram
A set of bars that
are organized
by bins that
contains
numerical data
(discrete or
continuous)
Data: “iris”
Visualizing Data with Histogram (II)
✺ Conditional
histogram
S
Histogram I
/
/
generated by 1
W
I
I
'
I /
/
subsets of the
I /
, i 1
W I
/
...
9
-
data
/
/
11. - I
- /
/ /
...
,
/
"
/
"I -
iii,
/
/ I
/
I I " I
/
/
W
I
/ ! ↳
,
,
/
I I S
Y 1,
I
i
S
! i I
↳.., /
!! -
Data: “iris”
, - I E
/ /
/
Which group has the higher total scores?
I
W
!
7
I I
I I
W
W
I
-
I
I
I
:
I
I
I
I I 2
W ↳
CS361 Spring 2021

Which group has the higher total scores?
i
7
I I
.........
.
Summarizing 1D continuous data
For a data set {x} or annotated as {xi}, we
summarize with:
✺ Location Parameters
Mean Median Mode
✺ Scale parameters
Std variance IQR
Summarizing 1D continuous data
✺ Mean
1 !N
mean({xi }) = xi
N i=1
It’s the centroid of the data geometrically,

by identifying the data set at that point, you find
the center of balance.
32:3 it [1,8]
9x,3 =
1, 2, 3, 4, 5, 6, 7, 12
I
Mean((x:7) 5
=
X
I -
i2;456
-
S
7 12
p
Properties of the mean
Scaling data scales the mean
mean(((xi c3)
+
3:33
=K. means
+
C
mean({k · xi }) = k · mean({xi })
TranslaAng the data translates the mean
mean({xi + c}) = mean({xi }) + c

Less obvious properties of the mean
The signed distances from the mean
sum to 0 !
N
(xi − mean({xi })) = 0
i=1
The mean minimizes the sum of the

squared distance from any real value
!
N
argmin (xi − µ)2 = mean({xi })
µ
i=1
Proof:I(x: - mean (3x;3)) 0 =
i 1
=
LHS xi
= - mean (3x:3)
i I
=
I
i=
- [xi -N:mean (xi)

i
=
1
N
i N- H.
= ↳xi- -
i
N
N N
- -
0
-
I Xi 2 X.
-
-
i .
L L
Proof: Argmin((x: -n() =
e mean(2:3) i1
=
the
Argument es that minimizes
function that follows &
LHS=Argmin't) e
I
N
e)2
f
e (xi
=
-
=
i1
=
Eh(u) gre
N
=
I
=
i 1
=
i 1
=
X
smooth
b
satisfies 0
=
for a
function.
xY x2
*
y
=
argmin(y' =
?
x
dy
-= 2x 0
=
dx
i= 0
e mean(2:3) i1
=
f
S
find
g2
=0 to le
Set
del h =
the chain rule

using
at=
del ch =
=
d2
=
n
g2;g
= -
-
xi - M
Bu -I
i1
=
eg.( 1) -
2(
=
-
2g) 0
=
e mean(2:3)
i1 =
3 2g 0
=
2 z(xi u)
=
0
- -
[(xi )
0
=
-
: N.M
=
-
0
d+(M)
-
2
du
Y
i E
=
Q1:
What is the answer for
mean({mean({xi})}) ?
I
A. mean({xi}) B. unsure C. 0
Standard Deviation (σ)
The standard deviaAon
!
"
"1 $ N
std({xi }) = # (xi − mean({xi }))2
N i=1
!
= mean({(xi − mean({xi }))2 })
How much the
.
data speeds
out ur t mean
!
↑ !
X X
- ·- X X
0 R
Std=
* i1
=
di
Q2. Can a standard deviation of a dataset
be -1?
A. YES
I
B. NO
Properties of the standard deviation
Scaling data scales the standard deviaAon
std({k · xi }) = |k| · std({xi })
↑
TranslaAng the data does NOT change the

standard deviaAon
std({xi + c}) = std({xi })

Standard deviation: Chebyshev’s
inequality (1 look)
st
units of
N
At most items are k standard
k2 X
deviaAons (σ) away from the mean

Rough jusAﬁcaAon: Assume mean =0
N
Hot
N− 2 Hot
0.5N k zewos 0.5N
·K8
0
X
k2 k2
y
−kσ kσ
!
1 N 2 N
std = [(N − )0 + 2 (kσ)2 ] = σ
N k k
Variance 2
(σ )
✺ Variance = (standard deviation)2
1 !N
var({xi }) = (xi − mean({xi }))2
W
N i=1
✺ Scaling and translating similar to standard

deviation var({k · xi }) = k 2 · var({xi })
var({xi + c}) = var({xi })
Q3: Standard deviation
✺ What is the value of
std({mean({xi})}) ?
I
A. 0 B. 1 C. unsure
Standard Coordinates/normalized
data
✺ The mean tells where the data set is and the
standard deviation tells how spread out it is.
If we are interested only in comparing the
shape, we could
define: xi − mean({xi })
x!i =
std({xi })
✺ We say {x!i } is in standard coordinates
Q4: Mean of standard coordinates
✺ mean( {x!i } ) is:
A. 1 B. 0 C. unsure
xi − mean({xi })
x!i =
std({xi })
Q5: Standard deviation (σ) of
standard coordinates
✺ Std( {x!i } ) is:
H
A. 1 B. 0 C. unsure
xi − mean({xi })
x!i =
std({xi })
Q6: Variance of standard
coordinates
✺ Variance of {x!i } is:
I
A. 1 B. 0 C. unsure
xi − mean({xi })
x!i =
std({xi })
Q7: Estimate the range of data in
standard coordinates -
✺ Estimate as close as possible, 90% data

-
⑰
N
-
-
is within:
&
-
A. [-10, 10]
1/
/
--
K/ L
- is 5
I S
k -
1
=
C
B. [-100, 100]
Yk 2-
↓
10 Tr
C. [-1, 1]
xi − mean({xi })
-
x!i =
D. [-4, 4] std({xi })
E. others
Standard Coordinates/normalized data to
μ=0, σ=1, σ2=1
✺ Data in standard coordinates always has
mean = 0; standard deviation =1;
variance = 1.
✺ Such data is unit-less, plots based on this
sometimes are more comparable
✺ We see such normalization very often in
statistics
Median
✺ We first sort the data set {xi}
✺ Then if the number of items N is odd
median = middle item's value
if the number of items N is even
median = mean of middle 2 items'
values
Properties of Median
✺ Scaling data scales the median
median({k · xi }) = k · median({xi })
✺ Translating data translates the median
median({xi + c}) = median({xi }) + c

Percentile
✺ kth percentile is the value relative to

which k% of the data items have smaller
or equal numbers
✺ Median is roughly the 50th percentile
(1, 2, 3, 4. 5
5
/
6,7,12}
75th percentile =?
Interquartile range
✺ iqr = (75th percentile) - (25th percentile)
✺ Scaling data scales the interquartile range
iqr({k · xi }) = |k| · iqr({xi })
✺ Translating data does NOT change the

interquartile range
iqr({xi + c}) = iqr({xi })
Box plots
✺ Boxplots Vehicle death by region
✺ Simpler than
histogram
DEATH
✺ Good for outliers
✺ Easier to use
for comparison
Data from
https://www2.stetson.edu/~jrasp/data.htm
Boxplots details, outliers
✺ How to Outlier
define
> 1.5 iqr
outliers? Whisker
(the default) Box

Interquartile
Median Range (iqr)
< 1.5 iqr
Sensitivity of summary statistics to
outliers
✺ mean and standard deviation are
very sensitive to outliers
✺ median and interquartile range are
not sensitive to outliers
Modes
✺ Modes are peaks in a histogram
✺ If there are more than 1 mode, we
should be curious as to why
Multiple modes
✺ We have seen
the “iris” data
which looks to
have several
peaks
Data: “iris” in R
Example Bi-modes distribution
✺ Modes may
indicate
multiple
populations
Data: Erythrocyte cells in

healthy humans
Piagnerelli, JCP 2007

Tails and Skews
Symmetric Histogram
mode, median, mean all on top of
one another
left right
tail tail
Left Skew Right Skew

mean mean
mode
mode
medium medium
left right left right

tail tail tail tail
Credit: Prof.Forsyth
Looking at relationships in data
✺ Finding relationships between
features in a data set or many data
sets is one of the most important
tasks in data science
Heatmap
✺ Display matrix of data via gradient of color(s)
Summarization of 4 locations’ annual mean

temperature by month
3D bar chart
✺ Transparent
3D bar chart
is good for
small # of
samples
across
categories
Relationship between data feature
and time
✺ Example: How does Amazon’s stock change over
1 years?
take out the pair of
features
x: Day
y: AMZN
Relationship between data features
✺ Example: does the weight of people relate to
their height?
✺ x : HIGHT, y: WEIGHT
The visual way for continuous
features
✺ Time series plot
✺ Scatter plot
Time Series Plot: Stock of Amazon
Scatter plot
✺ A most effective tool for geographic
data and 2D data in general. It should
be your first step with a new 2D
dataset.
Scatter plot
✺ Body Fat data set
Scatter plot
✺ Scatter plot with density
Scatter plot
✺ Removed of outliers & standardized
Scatter plot
✺ Coupled with
heatmap to
show a 3rd
feature
Correlation seen from scatter plots
Zero Positive Negative
Correlation correlation correlation
Weights, outliers removed, normalized

No Correlation Positive Correlation Negative Correlation
Normalized heart rate
Density
Normalized body temperature Heights, outliers removed, normalized Body fat
What kind of Correlation?
✺ line of code in a database and number of bugs
✺ GPA and hours spent playing video games
✺ earnings and happiness
Credit: Prof. David Varodayan

Correlation doesn’t mean causation
✺ Shoe size is correlated to reading skills,
but it doesn’t mean making feet grow
will make one person read faster.
Assignments
✺ HW1 due 8/31 Thurs.
✺ Reading upto Chapter 2.1
✺ Next time: the quantitative part of
correlation coefficient
Additional References
✺ Charles M. Grinstead and J. Laurie Snell
"Introduction to Probability”
✺ Morris H. Degroot and Mark J. Schervish
"Probability and Statistics”
See you next time
See
You!

CS361 FA23 Lec2 Post

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS361 FA23 Lec2 Post

Uploaded by

Copyright:

Available Formats

Probability and Statistics ì

for Computer Science

“The statement that “The

Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 8.24.2023

The difference a b ugly

ugly, bad and

CS361 Spring 2021

It’s the centroid of the data geometrically,

TranslaAng the data translates the mean

mean({xi + c}) = mean({xi }) + c

The mean minimizes the sum of the

- [xi -N:mean (xi)

function that follows &

the chain rule

TranslaAng the data does NOT change the

std({xi + c}) = std({xi })

deviaAons (σ) away from the mean

✺ Scaling and translating similar to standard

✺ Estimate as close as possible, 90% data

✺ Translating data translates the median

median({xi + c}) = median({xi }) + c

✺ kth percentile is the value relative to

✺ Translating data does NOT change the

(the default) Box

Data: Erythrocyte cells in

Piagnerelli, JCP 2007

Left Skew Right Skew

left right left right

Summarization of 4 locations’ annual mean

Weights, outliers removed, normalized

✺ GPA and hours spent playing video games

✺ earnings and happiness

Credit: Prof. David Varodayan

You might also like