You are on page 1of 37

Special Chapters on Artificial Intelligence

Lecture 1. Probability and Statistics

Cristian Gatu
1 Faculty of Computer Science

“Alexandru Ioan Cuza” University of Iaşi, Romania

MCO, MDS, MCL 2019–2020


Content

Random Variables and Probability Distributions


Definitions and axioms of probability
Random variables

Statistical measures
Location measures
Dispersion measures
Content

Random Variables and Probability Distributions


Definitions and axioms of probability
Random variables

Statistical measures
Location measures
Dispersion measures
Definitions and axioms of probability

◮ Experiment – any operation whose outcome is subject


to a chance. E.g. spin of a coin or a die.

◮ Outcome Space – the set of all possible outcomes.


E.g. for the coin S = {h, t} and for the die
S = {1, 2, 3, 4, 5, 6}.

◮ Event – any subset of the outcome space. E.g. The


event of obtaining head is {h}. The event of obtaining an
even number is {2, 4, 6}.

◮ Outcome – one of the things that can happen in an


experiment.
Probability
◮ Suppose we throw a die.
– outcome space: S = {1, 2, 3, 4, 5, 6}.
– outcomes: A1 = [1], A2 = [2], . . . , A6 = [6].
– outcomes probability: P(A1 ), P(A2 ), . . . , P(A6 ).

◮ suppose P(A1 ) = P(A2 ) = · · · = P(A6 ) = 1/6.

◮ The event of obtaining an even number: E = {2, 4, 6}.

◮ P(E ) = P(A2 ) + P(A4 ) + P(A6 ) = 3/6 = 1/2.

◮ If each outcome is equally likely, then

Number of outcomes in E |E |
P(E ) = = .
Number of outcomes in S |S|
Probability

◮ The probability of an event A : 0 ≤ P(A) ≤ 1

◮ The complement of A: A = {e | e ∈ S, e ∈
/ A}.

◮ The probability of A : P(A) = 1 − P(A) .

Examples
Throw a die: S = {1, 2, 3, 4, 5, 6}.
– E = {2, 4, 6} the die is an even number.
– E = {1, 3, 5} the die is an odd number.
P(E ) = 3/6 = 0.5 and P(E ) = 1 − 0.5 = 0.5.
Independent and Dependent events

Two events, A and B, are independent if and only if the


event A provides no information about event B and vice-versa.

Examples
In a coin tossing experiment the fact that at the first toss we
get heads or tails does not provide any information about the
second toss.
Independent and Dependent events

The events A and B are dependent if knowledge of the


occurrence of one provides information about the other.

Examples
Consider the following events in the toss of a die:
A = {Observe an odd number}.
B = {Observe an even number}.
The events A and B are dependent since the one event
pre-excludes the other.
Bayes Theorem

P(A ∩ B)
◮ Conditional probability: P(A|B) =
P(B)

P(A|B)P(B)
◮ Bayes formula: P(B|A) =
P(A)

– P(B) prior probability


– P(B|A) posterior probability
Content

Random Variables and Probability Distributions


Definitions and axioms of probability
Random variables

Statistical measures
Location measures
Dispersion measures
Discrete random variables

◮ A random variable is denoted by X and its values by x.

◮ Let X be a discrete variable, that is

P(X = xi ) = pi where i = 1, 2, . . . , n.

◮ X is a discrete random variable if


n
X
pi = 1 and 0 ≤ pi ≤ 1.
i=1
Continues random variables
◮ Random variables that takes on any value in an interval
are called continues.
◮ Let X be a continues variable such that

P(x1 ≤ X < x2 ) = p1
P(x2 ≤ X < x3 ) = p2
..
.
P(xn ≤ X < xn+1 ) = pn
◮ X is said to be a continues random variable iff
n
X
pi = 1 and 0 ≤ pi ≤ 1.
i=1
Probability density function
◮ The probability density function (pdf) of a
continues random variable X is a function that allocates
probabilities to all of the ranges of values that the
random variable can take.
◮ The pdf takes the form of a function of x, say f (x).

◮ Integrating f (x) over a range of values of x it gives the


probability that the random variable X lies in that
particular range.
◮ X is continues ramdom variable iff
Z
f (x) dx = 1.
all x
Let X have a pdf f (x) valid over the range a to b only.
If a ≤ x1 ≤ x2 ≤ b, then

Zx2
P(x1 ≤ X ≤ x2 ) = f (x) dx.
x1


y y = f (x)
....... ..
...........
........... ..
...
...
.
...........................................................


a x1 x2 b
x
Expectation

◮ Given a random variable X with pdf


– P(X = x) for X discrete
– f (X ) for X continues
P
 x P(X = x);
 discrete X ,
E (X ) = allR x
 x f (x)dx
 continues X .
all x

◮ The E (X ) is just the arithmetic mean of a discrete


probability distribution.
Example 1

x 0 1 2 3 4
1 1 1 1 1
P(X = x) 4 8 8 4 4

X
E (X ) = x P(X = x)
all x
1 1 1 1 1
=0× +1× +2× +3× +4×
4 8 8 4 4
= 17/8 = 2.125.
Example 2

3
f (x) = x(2 − x) for 0 ≤ x ≤ 2.
4

Z2 Z2
3
E (X ) = x f (x)dx = (2x 2 − x 3 )dx
4
0 0
3 h 2 3 1 4 i2 3 16 16
= x − x = ( − )
4 3 4 0 4 3 4
= 1.
f(x) = 3/4 x (2 − x), 0 <= x <= 2

0.6
0.4
f (x)

0.2
0.0

0.0 0.5 1.0 1.5 2.0

x
Expectation. Properties

◮ Let g (X ) be any function of a random variable X having


pdf

P(X = x) for discrete X and f (X ) for continues X

◮ The expectation of g (X ), written as E (g (X )), is defined


as:
P

 g (x) P(X = x); discrete x,
all x
E (g (X )) = R
 g (x) f (x)dx continues x.


all x
Expectation. Properties

1. E (a) = a.

2. E (aX ) = aE (X ).

3. E (f1 (X ) + f2 (X )) = E (f1 (X )) + E (f2 (X )).


Variance

◮ The variance of a probability distribution associated


with a random variable X is written as Var (X ) and is
defined by:

Var (X ) = E (X − µ)2

where µ = E (X ).

◮ The variance can be computed by:


2
Var (X ) = E (X 2 ) − E 2 (X ) where E 2 (X ) = E (X ) .
Example
x 0 1 2
Given the discrete distribution
1 1 1
P(X = x) 4 2 4
Find the variance of X .

Solution 2
Var (X ) = E (X 2 ) − E (X ) .

X 1 1 1
E (X ) = x P(X = x) = 0 × + 1 × + 2 × = 1.
4 2 4
X 1 1 1 3
E (X 2 ) = x 2 P(X = x) = 02 × + 12 × + 22 × = .
4 2 4 2
3 1
Var (X ) = − 12 =
2 2
Content

Random Variables and Probability Distributions


Definitions and axioms of probability
Random variables

Statistical measures
Location measures
Dispersion measures
Content

Random Variables and Probability Distributions


Definitions and axioms of probability
Random variables

Statistical measures
Location measures
Dispersion measures
Mean
◮ The arithmetic mean (or just mean) of a set of
numbers {x1 , x2 , . . . , xn } is denoted by x̄ and is defined
by:
n
1 1X
x̄ = (x1 + x2 . . . xn ) = xi .
n n i=1

◮ Consider a discrete frequency distribution taking values


{x1 , x2 , . . . , xn } with corresponding frequencies
{f1 , f2 , . . . , fn }. The mean x̄ is given by:
n
P
fi xi
i=1
x̄ = Pn .
fi
i=1
Examples

◮ Find the mean of the set {−3, −1, 0, 2, 3, 4}.

x̄ = (−3 − 1 + 0 + 2 + 3 + 4)/6 = 0.83.

◮ Find the mean of the following frequency distribution:

xi -3 -2 -1 0 1 2 3
fi 6 5 4 3 2 1 1
fi xi -18 -10 -4 0 2 2 3
P P
fi = 22, fi xi = −25, x̄ = −25/22 = −1.14.
Median

◮ The median of a set of numbers {x1 , x2 , . . . , xn } is


defined as the middled value of the set when arranged in
size order.

◮ If the set has an even number of items, then the median


is taken as the mean of the two middle two.

Remark
The mean has the disadvantage of taking extreme values into
account, especially for a small set of numbers.
Examples

1. Some wages arranged in size order are:


{28, 29, 32, 35, 36, 38, 41, 103}.

The mean is x̄ = 41.89 and the median x ∗ = 35.5.

2. Find the median of the set:


{65, 68, 68, 66, 64, 65, 65, 67}.

Arranging the set in order:


{64, 65, 65, 65, 66, 67, 68, 68}.

The median is given by: (65 + 66)/2 = 65.5.


Median

◮ Consider the discrete frequency distribution taking the


values {x1 , x2 , . . . , xn } with corresponding frequencies
{f1 , f2 , . . . , fn }.

◮ The median is given by the

 1 + Pn f 
i=1 i
th
2

value when the values are ranked.


Mode

◮ The mode of a set of values is defined as the one which


occurs with the greatest frequency.

Examples
1. The mode of the set {2, 3, 3, 1, 3, 2, 4, 5, 8, 3, 2, 4, 4, 3}
is 3.
2. The set {8, 6, 8, 5, 5, 7, 6, 8, 6, 9} has the two modes 6
and 8.

Remark
Note that for a set that has no repeated values the mode does
not exist.
Content

Random Variables and Probability Distributions


Definitions and axioms of probability
Random variables

Statistical measures
Location measures
Dispersion measures
Range

◮ The Range of a set of numbers S = {x1 , x2 , . . . , xn } is


given by:

Range = max (S) − min (S).

Remark 1
The range uses the only extreme values !

Remark 2
The Range is the simplest of all measures of dispersion and
can be calculated very quickly and easily.
Range

Examples
1. The set {6, 5 , 7, 10 , 8, 9} has Range = 10 − 5 = 5.

2. The set {600, 610, 620, 600 , 610, 650 , 640, 650, 650}
has Range = 650 − 600 = 50.

3. The set {600, 610, 620, 200 , 610, 1000 , 640, 650, 650}
has Range = 800.
Standard Deviation

◮ The standard deviation is the measure of dispersion


used most widely in statistics. It is based on the
arithmetic mean.

◮ The standard deviation of a set of numbers


{x1 , x2 , . . . , xn } with mean x̄ is denoted by S and defined
as
v
uP n
u (xi − x̄)2 v u n
t
i
u1 X
S= =t x 2 − x̄ 2
n n i i
Standard Deviation

Examples

The set {3, 4, 6, 2} has x̄ = 15/4 = 3.75, x̄ 2 = 14.063 and


P n 2
i xi = 65. Thus,

 65  21 1
2
S= − (3.75) = (16.25 − 14.063) 2 = 1.48
4
Variance
◮ The Variance of a set (or distribution) of numbers is
defined as the square of the standard deviation and is
denoted by S 2 .
◮ For a set of numbers:
Pn 2 n
2 i (xi − x̄) 1X 2
S = = x − x̄ 2 .
n n i i

◮ For a discrete frequency distribution:

− x̄)2 X fi xi2
P
2 (xi
i fiP
S = = P − x̄ 2 .
i fi i i fi

You might also like