You are on page 1of 45

Lecture Notes 1

Probability and Random Variables

• Probability Spaces

• Conditional Probability and Independence

• Random Variables

• Functions of a Random Variable

• Generation of a Random Variable

• Jointly Distributed Random Variables

• Scalar detection

EE 278: Probability and Random Variables Page 1 – 1


Probability Theory

• Probability theory provides the mathematical rules for assigning probabilities to


outcomes of random experiments, e.g., coin flips, packet arrivals, noise voltage
• Basic elements of probability theory:
◦ Sample space Ω: set of all possible “elementary” or “finest grain” outcomes
of the random experiment
◦ Set of events F : set of (all?) subsets of Ω — an event A ⊂ Ω occurs if the
outcome ω ∈ A
◦ Probability measure P: function over F that assigns probabilities to events
according to the axioms of probability (see below)
• Formally, a probability space is the triple (Ω, F, P)

EE 278: Probability and Random Variables Page 1 – 2


Axioms of Probability

• A probability measure P satisfies the following axioms:


1. P(A) ≥ 0 for every event A in F
2. P(Ω) = 1
3. If A1, A2, . . . are disjoint events — i.e., Ai ∩ Aj = ∅, for all i 6= j — then
[∞  X∞
P Ai = P(Ai)
i=1 i=1
• Notes:
◦ P is a measure in the same sense as mass, length, area, and volume — all
satisfy axioms 1 and 3
◦ Unlike these other measures, P is bounded by 1 (axiom 2)
◦ This analogy provides some intuition but is not sufficient to fully understand
probability theory — other aspects such as conditioning and independence are
unique to probability

EE 278: Probability and Random Variables Page 1 – 3


Discrete Probability Spaces

• A sample space Ω is said to be discrete if it is countable


• Examples:
◦ Rolling a die: Ω = {1, 2, 3, 4, 5, 6}
◦ Flipping a coin n times: Ω = {H, T }n , sequences of heads/tails of length n
◦ Flipping a coin until the first heads occurs: Ω = {H, T H, T T H, T T T H, . . .}
• For discrete sample spaces, the set of events F can be taken to be the set of all
subsets of Ω, sometimes called the power set of Ω
• Example: For the coin flipping experiment,
F = {∅, {H}, {T }, Ω}

• F does not have to be the entire power set (more on this later)

EE 278: Probability and Random Variables Page 1 – 4


• The probability measure P can be defined by assigning probabilities to individual
outcomes — single outcome events {ω} — so that:
P({ω}) ≥ 0 for every ω ∈ Ω
X
P({ω}) = 1
ω∈Ω

• The probability of any other event A is simply


X
P(A) = P({ω})
ω∈A

• Example: For the die rolling experiment, assign


1
P({i}) = 6 for i = 1, 2, . . . , 6
The probability of the event “the outcome is even,” A = {2, 4, 6}, is
3 1
P(A) = P({2}) + P({4}) + P({6}) = 6 = 2

EE 278: Probability and Random Variables Page 1 – 5


Continuous Probability Spaces

• A continuous sample space Ω has an uncountable number of elements


• Examples:
◦ Random number between 0 and 1: Ω = (0, 1 ]
◦ Point in the unit disk: Ω = {(x, y) : x2 + y 2 ≤ 1}
◦ Arrival times of n packets: Ω = (0, ∞)n
• For continuous Ω, we cannot in general define the probability measure P by first
assigning probabilities to outcomes
• To see why, consider assigning a uniform probability measure over (0, 1 ]
◦ In this case the probability of each single outcome event is zero
◦ How do we find the probability of an event such as A = [0.25, 0.75]?

EE 278: Probability and Random Variables Page 1 – 6


• Another difference for continuous Ω: we cannot take the set of events F as the
power set of Ω. (To learn why you need to study measure theory, which is
beyond the scope of this course)
• The set of events F cannot be an arbitrary collection of subsets of Ω. It must
make sense, e.g., if A is an event, then its complement Ac must also be an
event, the union of two events must be an event, and so on
• Formally, F must be a sigma algebra (σ-algebra, σ-field), which satisfies the
following axioms:
1. ∅ ∈ F
2. If A ∈ F then Ac ∈ F
S∞
3. If A1, A2, . . . ∈ F then i=1 Ai ∈F
• Of course, the power set is a sigma algebra. But we can define smaller
σ-algebras. For example, for rolling a die, we could define the set of events as
F = {∅, odd, even, Ω}

EE 278: Probability and Random Variables Page 1 – 7


• For Ω = R = (−∞, ∞) (or (0, ∞), (0, 1), etc.) F is typically defined as the
family of sets obtained by starting from the intervals and taking countable
unions, intersections, and complements
• The resulting F is called the Borel field
• Note: Amazingly there are subsets in R that cannot be generated in this way!
(Not ones that you are likely to encounter in your life as an engineer or even as
a mathematician)
• To define a probability measure over a Borel field, we first assign probabilities to
the intervals in a consistent way, i.e., in a way that satisfies the axioms of
probability
For example to define uniform probability measure over (0, 1), we first assign
P((a, b)) = b − a to all intervals
• In EE 278 we do not deal with sigma fields or the Borel field beyond (kind of)
knowing what they are

EE 278: Probability and Random Variables Page 1 – 8


Useful Probability Laws

• Union of Events Bound:


n
 [  n
X
P Ai ≤ P(Ai)
i=1 i=1

• Law of Total Probability: Let A1, A2,SA3, . . . be events that partition Ω, i.e.,
disjoint (Ai ∩ Aj = ∅ for i 6= j) and i Ai = Ω. Then for any event B
P
P(B) = iP(Ai ∩ B)
The Law of Total Probability is very useful for finding probabilities of sets

EE 278: Probability and Random Variables Page 1 – 9


Conditional Probability

• Let B be an event such that P(B) 6= 0. The conditional probability of event A


given B is defined to be
P(A ∩ B) P(A, B)
P(A | B) = =
P(B) P(B)

• The function P(· | B) is a probability measure over F , i.e., it satisfies the


axioms of probability
• Chain rule: P(A, B) = P(A)P(B | A) = P(B)P(A | B) (this can be generalized
to n events)
• The probability of event A given B, a nonzero probability event — the
a posteriori probability of A — is related to the unconditional probability of
A — the a priori probability — by
P(B | A)
P(A | B) = P(A)
P(B)
This follows directly from the definition of conditional probability

EE 278: Probability and Random Variables Page 1 – 10


Bayes Rule

• Let A1, A2, . . . , An be nonzero probability events that partition Ω, and let B be
a nonzero probability event
• We know P(Ai ) and P(B | Ai), i = 1, 2, . . . , n, and want to find the a posteriori
probabilities P(Aj | B), j = 1, 2, . . . , n
• We know that
P(B | Aj )
P(Aj | B) = P(Aj )
P(B)
• By the law of total probability
X n n
X
P(B) = P(Ai, B) = P(Ai )P(B | Ai)
i=1 i=1

• Substituting, we obtain Bayes rule


P(B | Aj )
P(Aj | B) = Pn P(Aj ), j = 1, 2, . . . , n
i=1 P(Ai )P(B | Ai )

• Bayes rule also applies to a (countably) infinite number of events

EE 278: Probability and Random Variables Page 1 – 11


Independence

• Two events are said to be statistically independent if


P(A, B) = P(A)P(B)

• When P(B) 6= 0, this is equivalent to


P(A | B) = P(A)
In other words, knowing whether B occurs does not change the probability of A
• The events A1, A2, . . . , An are said to be independent if for every subset
Ai1 , Ai2 , . . . , Aik of the events,
k
Y
P(Ai1 , Ai2 , . . . , Aik ) = P(Aij )
j=1
Qn
• Note: P(A1, A2, . . . , An) = j=1 P(Ai ) is not sufficient for independence

EE 278: Probability and Random Variables Page 1 – 12


Random Variables

• A random variable (r.v.) is a real-valued function X(ω) over a sample space Ω,


i.e., X : Ω → R

ω X(ω)

• Notations:
◦ We use upper case letters for random variables: X, Y, Z, Φ, Θ, . . .
◦ We use lower case letters for values of random variables: X = x means that
random variable X takes on the value x, i.e., X(ω) = x where ω is the
outcome

EE 278: Probability and Random Variables Page 1 – 13


Specifying a Random Variable

• Specifying a random variable means being able to determine the probability that
X ∈ A for any Borel set A ⊂ R, in particular, for any interval (a, b ]
• To do so, consider the inverse image of A under X , i.e., {ω : X(ω) ∈ A}

set A
inverse image of A under X(ω), i.e., {ω : X(ω) ∈ A}

• Since X ∈ A iff ω ∈ {ω : X(ω) ∈ A},


P({X ∈ A}) = P({ω : X(ω) ∈ A}) = P{ω : X(ω) ∈ A}
Shorthand: P({set description}) = P{set description}

EE 278: Probability and Random Variables Page 1 – 14


Cumulative Distribution Function (CDF)
• We need to be able to determine P{X ∈ A} for any Borel set A ⊂ R, i.e., any
set generated by starting from intervals and taking countable unions,
intersections, and complements
• Hence, it suffices to specify P{X ∈ (a, b ]} for all intervals. The probability of
any other Borel set can be determined by the axioms of probability
• Equivalently, it suffices to specify its cumulative distribution function (cdf):
FX (x) = P{X ≤ x} = P{X ∈ (−∞, x ]} , x∈R
• Properties of cdf:
◦ FX (x) ≥ 0
◦ FX (x) is monotonically nondecreasing, i.e., if a > b then FX (a) ≥ FX (b)
FX (x)
1

EE 278: Probability and Random Variables Page 1 – 15


◦ Limits: lim FX (x) = 1 and lim FX (x) = 0
x→+∞ x→−∞

◦ FX (x) is right continuous, i.e., FX (a+) = limx→a+ FX (x) = FX (a)


◦ P{X = a} = FX (a) − FX (a−), where FX (a−) = limx→a− FX (x)
◦ For any Borel set A, P{X ∈ A} can be determined from FX (x)
• Notation: X ∼ FX (x) means that X has cdf FX (x)

EE 278: Probability and Random Variables Page 1 – 16


Probability Mass Function (PMF)

• A random variable is said to be discrete if FX (x) consists only of steps over a


countable set X

• Hence, a discrete random variable can be completely specified by the probability


mass function (pmf)
pX (x) = P{X = x} for every x ∈ X
P
Clearly pX (x) ≥ 0 and x∈X pX (x) = 1
• Notation: We use X ∼ pX (x) or simply X ∼ p(x) to mean that the discrete
random variable X has pmf pX (x) or p(x)

EE 278: Probability and Random Variables Page 1 – 17


• Famous discrete random variables:
◦ Bernoulli: X ∼ Bern(p) for 0 ≤ p ≤ 1 has the pmf
pX (1) = p and pX (0) = 1 − p

◦ Geometric: X ∼ Geom(p) for 0 ≤ p ≤ 1 has the pmf


pX (k) = p(1 − p)k−1 , k = 1, 2, 3, . . .

◦ Binomial: X ∼ Binom(n, p) for integer n > 0 and 0 ≤ p ≤ 1 has the pmf


 
n k
pX (k) = p (1 − p)n−k , k = 0, 1, 2, . . .
k
◦ Poisson: X ∼ Poisson(λ) for λ > 0 has the pmf
λk −λ
pX (k) = e , k = 0, 1, 2, . . .
k!
◦ Remark: Poisson is the limit of Binomial for np = λ as n → ∞, i.e., for every
k = 0, 1, 2, . . ., the Binom(n, λ/n) pmf
λk −λ
pX (k) → e as n → ∞
k!

EE 278: Probability and Random Variables Page 1 – 18


Probability Density Function (PDF)

• A random variable is said to be continuous if its cdf is a continuous function

• If FX (x) is continuous and differentiable (except possibly over a countable set),


then X can be completely specified by a probability density function (pdf)
fX (x) such that Z x
FX (x) = fX (u) du
−∞

• If FX (x) is differentiable everywhere, then (by definition of derivative)


dFX (x)
fX (x) =
dx
F (x + ∆x) − F (x) P{x < X ≤ x + ∆x}
= lim = lim
∆x→0 ∆x ∆x→0 ∆x

EE 278: Probability and Random Variables Page 1 – 19


• Properties of pdf:
◦ fX (x) ≥ 0
Z ∞
◦ fX (x) dx = 1
−∞

◦ For any event (Borel set) A ⊂ R,


Z
P{X ∈ A} = fX (x) dx
x∈A
In particular, Z x2
P{x1 < X ≤ x2} = fX (x) dx
x1

• Important note: fX (x) should not be interpreted as the probability that X = x.


In fact, fX (x) is not a probability measure since it can be > 1
• Notation: X ∼ fX (x) means that X has pdf fX (x)

EE 278: Probability and Random Variables Page 1 – 20


• Famous continuous random variables:
◦ Uniform: X ∼ U[a, b ] where a < b has pdf
(
1
b−a if a ≤ x ≤ b
fX (x) =
0 otherwise

◦ Exponential: X ∼ Exp(λ) where λ > 0 has pdf


(
λe−λx if x ≥ 0
fX (x) =
0 otherwise

◦ Laplace: X ∼ Laplace(λ) where λ > 0 has pdf


1
fX (x) = λe−λ|x|
2

◦ Gaussian: X ∼ N (µ, σ 2 ) with parameters µ (the mean) and σ 2 (the


variance, σ is the standard deviation) has pdf
(x−µ)2
fX (x) = √
1
e
− 2σ2
2πσ 2

EE 278: Probability and Random Variables Page 1 – 21


The cdf of the standard normal random variable N (0, 1) is
Z x 2
1 −u
Φ(x) = √ e 2 du
−∞ 2π

Define the function Q(x) = 1 − Φ(x) = P{X > x}


N (0, 1)

Q(x)

x
The Q(·) function is used to compute P{X > a} for any Gaussian r.v. X :
Given Y ∼ N (µ, σ 2), we represent it using the standard X ∼ N (0, 1) as
Y = σX + µ
Then
y−µ y−µ
   
P{Y > y} = P X > =Q
σ σ

◦ The complementary error function is erfc(x) = 2Q( 2 x)

EE 278: Probability and Random Variables Page 1 – 22


Functions of a Random Variable

• Suppose we are given a r.v. X with known cdf FX (x) and a function y = g(x).
What is the cdf of the random variable Y = g(X)?
• We use
FY (y) = P{Y ≤ y} = P{x : g(x) ≤ y}

x y

y
{x : g(x) ≤ y}

EE 278: Probability and Random Variables Page 1 – 23


• Example: Square law detector. Let X ∼ FX (x) and Y = X 2 . We wish to find
FY (y)
y

√ x
− y √
y

If y < 0, then clearly FY (y) = 0. Consider y ≥ 0,


√ √ √ √
FY (y) = P {− y < X ≤ y } = FX ( y) − FX (− y )
If X is continuous with density fX (x), then
1
 √ √ 
fY (y) = √ fX (+ y) + fX (− y)
2 y

EE 278: Probability and Random Variables Page 1 – 24


• Remark: In general, let X ∼ fX (x) and Y = g(X) be differentiable. Then
k
X fX (xi)
fY (y) = ′ (x )|
,
i=1
|g i

where x1, x2, . . . are the solutions of the equation y = g(x) and g ′(xi ) is the
derivative of g evaluated at xi

EE 278: Probability and Random Variables Page 1 – 25


• Example: Limiter. Let X ∼ Laplace(1), i.e., fX (x) = (1/2)e−|x| , and let Y be
defined by the function of X shown in the figure. Find the cdf of Y
y

+a
−1
x
+1
−a

To find the cdf of Y , we consider the following cases


◦ y < −a: Here clearly FY (y) = 0
◦ y = −a: Here
FY (−a) = FX (−1)
Z −1
1 x 1 −1
= 2 e dx = 2e
−∞

EE 278: Probability and Random Variables Page 1 – 26


◦ −a < y < a: Here
FY (y) = P{Y ≤ y}
= P {aX ≤ y}
n yo y 
=P X≤ = FX
a a
Z y/a
= 21 e−1 + 1 −|x|
2e dx
−1

◦ y ≥ a: Here FY (y) = 1
Combining the results, the following is a sketch of the cdf of Y
FY (y)

y
−a a

EE 278: Probability and Random Variables Page 1 – 27


Generation of Random Variables

• Generating a r.v. with a prescribed distribution is often needed for performing


simulations involving random phenomena, e.g., noise or random arrivals
• First let X ∼ F (x) where the cdf F (x) is continuous and strictly increasing.
Define Y = F (X), a real-valued random variable that is a function of X
What is the cdf of Y ?
Clearly, FY (y) = 0 for y < 0, and FY (y) = 1 for y > 1
For 0 ≤ y ≤ 1, note that by assumption F has an inverse F −1 , so
FY (y) = P{Y ≤ y} = P{F (X) ≤ y} = P{X ≤ F −1(y)} = F (F −1(y)) = y
Thus Y ∼ U [ 0, 1 ], i.e., Y is a uniformly distributed random variable
• Note: F (x) does not need to be invertible. If F (x) = a is constant over some
interval, then the probability that X lies in this interval is zero. Without loss of
generality, we can take F −1(a) to be the leftmost point of the interval
• Conclusion: We can generate a U[ 0, 1 ] r.v. from any continuous r.v.

EE 278: Probability and Random Variables Page 1 – 28


• Now, let’s consider the opposite scenario where we are given X ∼ U[ 0, 1 ] (a
random number generator) and wish to generate a random variable Y with
prescribed cdf F (y), e.g., Gaussian or exponential
x = F (y)
1

y = F −1(x)

• If F is continuous and strictly increasing, set Y = F −1(X). To show Y ∼ F (y),


FY (y) = P{Y ≤ y}
= P{F −1(X) ≤ y}
= P{X ≤ F (y)}
= F (y) ,
since X ∼ U[ 0, 1 ] and 0 ≤ F (y) ≤ 1

EE 278: Probability and Random Variables Page 1 – 29


• Example: To generate Y ∼ Exp(λ), set
1
Y = − ln(1 − X)
λ

• Note: F does not need to be continuous for the above to work. For example, to
generate Y ∼ Bern(p), we set
(
0 X ≤1−p
Y =
1 otherwise

x = F (y)

1−p

y
0 1

• Conclusion: We can generate a r.v. with any desired distribution from a U[0, 1] r.v.

EE 278: Probability and Random Variables Page 1 – 30


Jointly Distributed Random Variables

• A pair of random variables X and Y defined over the same probability space are
specified by their joint cdf
FX,Y (x, y) = P{X ≤ x, Y ≤ y} , x, y ∈ R
FX,Y (x, y) is the probability of the shaded region of R2
y

(x, y)

EE 278: Probability and Random Variables Page 1 – 31


• Properties of the cdf:
◦ FX,Y (x, y) ≥ 0

◦ If x1 ≤ x2 and y1 ≤ y2 then FX,Y (x1, y1) ≤ FX,Y (x2, y2)

◦ lim FX,Y (x, y) = 0 and lim FX,Y (x, y) = 0


y→−∞ x→−∞

◦ lim FX,Y (x, y) = FX (x) and lim FX,Y (x, y) = FY (y)


y→∞ x→∞

FX (x) and FY (y) are the marginal cdfs of X and Y

◦ lim FX,Y (x, y) = 1


x,y→∞

• X and Y are independent if for every x and y


FX,Y (x, y) = FX (x)FY (y)

EE 278: Probability and Random Variables Page 1 – 32


Joint, Marginal, and Conditional PMFs

• Let X and Y be discrete random variables on the same probability space


• They are completely specified by their joint pmf:
pX,Y (x, y) = P{X = x, Y = y} , x ∈ X , y ∈ Y
XX
By axioms of probability, pX,Y (x, y) = 1
x∈X y∈Y

• To find pX (x), the marginal pmf of X , we use the law of total probability
X
pX (x) = p(x, y) , x ∈ X
y∈Y

• The conditional pmf of X given Y = y is defined as


pX,Y (x, y)
pX|Y (x|y) = , pY (y) 6= 0, x ∈ X
pY (y)

• Chain rule: pX,Y (x, y) = pX (x)pY |X (y|x) = pY (y)pX|Y (x|y)

EE 278: Probability and Random Variables Page 1 – 33


• Independence: X and Y are said to be independent if for every (x, y) ∈ X × Y ,
pX,Y (x, y) = pX (x)pY (y) ,
which is equivalent to pX|Y (x|y) = pX (x) for every x ∈ X and y ∈ Y such
that pY (y) 6= 0
Example (Binary Symmetric Channel):
Z ∼ Bern(ǫ)

X ∼ Bern(p) Y =X ⊕Z

X and Z are independent

EE 278: Probability and Random Variables Page 1 – 34


Joint, Marginal, and Conditional PDF

• X and Y are jointly continuous random variables if their joint cdf is continuous
in both x and y
In this case, we can define their joint pdf, provided that it exists, as the function
fX,Y (x, y) such that
Z x Z y
FX,Y (x, y) = fX,Y (u, v) du dv , x, y ∈ R
−∞ −∞

• If FX,Y (x, y) is differentiable in x and y, then


∂ 2F (x, y) P{x < X ≤ x + ∆x, y < Y ≤ y + ∆y}
fX,Y (x, y) = = lim
∂x∂y ∆x,∆y→0 ∆x∆y
• Properties of fX,Y (x, y):

◦ fX,Y (x, y) ≥ 0
Z ∞Z ∞
◦ fX,Y (x, y) dx dy = 1
−∞ −∞

EE 278: Probability and Random Variables Page 1 – 35


• The marginal pdf of X can be obtained from the joint pdf via the law of total
probability: Z ∞
fX (x) = fX,Y (x, y) dy
−∞

• X and Y are independent iff fX,Y (x, y) = fX (x)fY (y) for every x, y
• Conditional cdf and pdf: Let X and Y be continuous random variables with
joint pdf fX,Y (x, y). We wish to define FY |X (y | X = x) = P{Y ≤ y | X = x}
We cannot define the above conditional probability as
P{Y ≤ y, X = x}
P{X = x}
because both numerator and denominator are equal to zero. Instead, we define
conditional probability for continuous random variables as a limit
FY |X (y|x) = lim P{Y ≤ y | x < X ≤ x + ∆x}
∆x→0
P{Y ≤ y, x < X ≤ x + ∆x}
= lim
∆x→0 P{x < X ≤ x + ∆x}
Ry Z y
fX,Y (x, u) du ∆x fX,Y (x, u)
= lim −∞ = du
∆x→0 fX (x)∆x −∞ fX (x)

EE 278: Probability and Random Variables Page 1 – 36


• We then define the conditional pdf in the usual way as
fX,Y (x, y)
fY |X (y|x) = if fX (x) 6= 0
fX (x)
• Thus Z y
FY |X (y|x) = fY |X (u|x) du
−∞
which shows that fY |X (y|x) is a pdf for Y given X = x, i.e.,
Y | {X = x} ∼ fY |X (y|x)

• Chain rule: fX,Y (x, y) = fX (x)fY |X (y|x)


• Independence: X and Y are independent if fX,Y (x, y) = fX (x)fY (y) for every
(x, y), or equivalently, fY |X (y|x) = fY (y)
• Example (Additive Gaussian channel):
Z ∼ N (0, N )

X ∼ N (0, P ) Y

X and Z are independent

EE 278: Probability and Random Variables Page 1 – 37


One Discrete and One Continuous Random Variables

• Let Θ be a discrete random variable with pmf pΘ(θ)


• For each Θ = θ with pΘ(θ) 6= 0, let Y be a continuous random variable, i.e.,
FY |Θ(y|θ) is continuous for all θ. We define fY |Θ(y|θ) in the usual way
• The conditional pmf of Θ given y can be defined as a limit
P{Θ = θ, y < Y ≤ y + ∆y}
pΘ|Y (θ | y) = lim
∆y→0 P{y < Y ≤ y + ∆y}
pΘ(θ)fY |Θ(y|θ)∆y fY |Θ(y|θ)pΘ(θ)
= lim =
∆y→0 fY (y)∆y fY (y)
This leads to the Bayes rule:
fY |Θ(y|θ)
pΘ|Y (θ | y) = P ′ )f
p (θ)
′) Θ
p
θ′ Θ (θ Y |Θ (y|θ

EE 278: Probability and Random Variables Page 1 – 38


• Example: Additive Gaussian Noise Channel
Consider the following noisy channel:
Z ∼ N (0, N )

Θ Y

The signal transmitted is a binary random variable Θ:


(
+1 with probability p
Θ=
−1 with probability 1 − p

The received signal, also called the observation, is Y = Θ + Z , where Θ and Z


are independent
Given Y = y is received (observed), find pΘ|Y (θ|y), the a posteriori pmf of Θ

EE 278: Probability and Random Variables Page 1 – 39


• In some cases we are given fY (y) and pΘ|Y (θ|y) for every y
• We can find the a posteriori pdf of Y using the Bayes rule:
pΘ|Y (θ|y)
fY |Θ(y|θ) = R ′ ′ ′
fY (y)
fY (y )pΘ|Y (θ|y )dy

• Example: Coin with random bias


Consider a coin with random bias P ∼ fP (p). Flip the coin and let X = 1 if the
outcome is heads and X = 0 if the outcome is tails
Given that X = 1 (i.e., outcome is heads), find fP |X (p|1), the a posteriori pdf
of P

EE 278: Probability and Random Variables Page 1 – 40


Scalar Detection

• Consider the following signal processing problem:


noisy Y
Θ ∈ {θ0, θ1} decoder Θ̂(Y ) ∈ {θ0, θ1}
channel
fY |Θ(y|θ)
where the signal sent is
(
θ0 with probability p
Θ=
θ1 with probability 1 − p
and the observation (received signal) is
Y | {Θ = θ} ∼ fY |Θ(y | θ) , θ ∈ {θ0, θ1}

• We wish to find the estimate Θ̂(Y ) (i.e., design the decoder) that minimizes the
probability of error:

Pe = P{Θ̂ 6= Θ} = P{Θ = θ0, Θ̂ = θ1} + P{Θ = θ1, Θ̂ = θ0}
= P{Θ = θ0}P{Θ̂ = θ1 | Θ = θ0} + P{Θ = θ1}P{Θ̂ = θ0 | Θ = θ1}

EE 278: Probability and Random Variables Page 1 – 41


• We define the maximum a posteriori probability (MAP) decoder as
(
θ0 if pΘ|Y (θ0|y) > pΘ|Y (θ1|y)
Θ̂(y) =
θ1 otherwise
• The MAP decoding rule minimizes Pe , since
min Pe = 1 − max P{Θ̂(Y ) = Θ}
Θ̂ Θ̂
Z ∞
= 1 − max fY (y)P{Θ = Θ̂(y) | Y = y} dy
Θ̂ −∞
Z ∞
=1− fY (y) max P{Θ = Θ̂(y) | Y = y} dy
−∞ Θ̂(y)

and the probability of error is minimized if we pick the largest pΘ|Y (Θ̂(y)|y) for
every y, which is precisely the MAP decoder
• If p = 12 , i.e., equally likely signals, using Bayes rule, the MAP decoder reduces
to the maximum likelihood (ML) decoder
(
θ0 if fY |Θ(y|θ0) > fY |Θ(y|θ1)
Θ̂(y) =
θ1 otherwise

EE 278: Probability and Random Variables Page 1 – 42


Additive Gaussian Noise Channel

• Consider the additive Gaussian noise channel with signal


( √
+ P with probability 12
Θ = √
− P with probability 12
noise Z ∼ N (0, N ) (Θ and Z are independent), and output Y = Θ + Z
• The MAP decoder is



+ P P{Θ = + P | Y = y}
if √ >1

Θ̂(y) = P{Θ = − P | Y = y}
−√P

otherwise
Since the two signals are equally likely, the MAP decoding rule reduces to the
ML decoding rule



+ P if Y |Θ f (y | + P)
√ >1

Θ̂(y) = fY |Θ(y | − P )

− P otherwise

EE 278: Probability and Random Variables Page 1 – 43


• Using the Gaussian pdf, the ML decoder reduces to the minimum distance
decoder ( √ √ 2 √
+ P (y − P ) < (y − (− P ))2
Θ̂(y) = √
− P otherwise
From the figure, this simplifies to
( √
+ P y>0
Θ̂(y) = √
− P y<0
Note: The decision when y = 0 is arbitrary

√ √
f (y |− P ) f (y |+ P )

√ √ y
− P + P

EE 278: Probability and Random Variables Page 1 – 44


• Now to find the minimum probability of error, consider
Pe = P{Θ̂(Y ) 6= Θ}
√ √ √
= P{Θ = P }P{Θ̂(Y ) = − P | Θ = P } +
√ √ √
P{Θ = − P }P{Θ̂(Y ) = P | Θ = − P }
1
√ 1

= 2 P{Y ≤ 0 | Θ = P } + 2 P{Y > 0 | Θ = − P }
√ √
1 1
= 2 P{Z ≤ − P } + 2 P{Z > P }
r ! √
P

=Q =Q SNR
N

The probability of error is a decreasing function of P/N , the signal-to-noise


ratio (SNR)

EE 278: Probability and Random Variables Page 1 – 45

You might also like