You are on page 1of 4

Data Science Math Skills

Paul Bendich and Daniel Egger


Duke University

Sigma Notation: Mean and Variance


Video companion

1 Introduction
Important equations for this video:

X = {x1 , ..., xn }
n
1X
µx = xi
n i=1
" n #
1 X
σx 2 = (xi − µx )2
n i=1

The symbol µx is the “mean of x,” and σx 2 is the “variance of x.” The standard deviation
is denoted σx .

2 Mean
Example:

Z = {1, 5, 12}
|Z| = 3
1 + 5 + 12 18
µz = = =6
3 3
The mean µz is also denoted µ(z) or simply µ.

Symbolic example:

Y = {y1 , y2 , y3 , y4 }
1
µy = (y1 + y2 + y3 + y4 )
4 !
4
1 X
= yi
4 i=1

1
Data Science Math Skills
Paul Bendich and Daniel Egger
Duke University

In general, suppose you have a set

X = {x1 , x2 , ..., xn },

then the mean of X is


n
!
1 X
µx = xi .
n i=1

The variable i is a counter. The variable n is a number, which tells you when to stop counting.

3 Mean centering

Z = {1, 5, 12}
µz = 6

0 1 5 6 12 R

Z 0 = {1 − 6, 5 − 6, 12 − 6}
= {−5, −1, 6}
µz 0 = 0

−5 −1 0 6 R

Mean centering data produces a new data set, which has the same relationships, but the
mean is zero.

2
Data Science Math Skills
Paul Bendich and Daniel Egger
Duke University

4 Variance

Z = {1, 5, 12}
µz = 6

W = {5, 6, 7}
µw = 6

5 6 7

0 1 5 6 12 R

Set Z (blue) is more “spread out” than set W (olive).

If X = {x1 , ..., xn }, the variance of X is


" n #
2 1 X 2
σx = (xi − µx ) .
n i=1

The standard deviation is given by


p
σx = σx 2 .

Z and W have the same mean, but Z is more spread out, so σz should be greater than σw .

" 3 #
1 X
σw2 = (wi − µw )2
3 i=1
1
(5 − 6)2 + (6 − 6)2 + (7 − 6)2

=
3
1
(−1)2 + 02 + 12

=
3
2
=
3
r
2
σw =
3

3
Data Science Math Skills
Paul Bendich and Daniel Egger
Duke University

" 3
#
1 X
σz2 = (zi − µz )2
3 i=1
1
(1 − 6)2 + (5 − 6)2 + (12 − 6)2

=
3
1
(−5)2 + (−1)2 + 62

=
3
62
=
r3
62
σz =
3
σz2  σw2 , so Z is much more spread out than W .

You might also like