13 views

save

- Statistical Signal Processing
- MG
- 19Lecture - Central Limit Theorem
- Solution 1
- JehleRenyCh5solutions Examples
- Microeconomics NVM Lec 2 Prefrences
- 20
- Random Variable
- random variable
- Solutions Jehle Rany
- PS1solution
- A Sphere Decoding Algorithm for MIMO
- Face Recognition Using Pca Technique
- 5- CentralLimitTheorem
- ps3
- Critical Factors Limiting Interpretation of Reg Vectors (Brown TRAC 2009)
- A Step by Step to the Black Litterman Approach
- 00969385.pdf
- 351494.v1
- add math
- Thomas Idzorek - A Step by Step Guide to Black-Litterman Model
- Crib Sheet
- ch4
- Data Clustering, A Review
- Theory 5.3
- Lattice Reduction Aided Detection Techniques for MIMO Systems
- Portfolio Theory
- 1202.1552.pdf
- 30120130404044
- bayesian_networks_methods__for_traffic_flow_prediction
- Bist Pfc Cuk Bldc 201502
- ENG_2013011516314621
- PowerElectronics_HW04_20150312
- GRE Practice Test
- BergenVittal-PowerSystemsAnalysis
- 6 Final Control Project Report Cruise C WPaper-Hawks Sample Project
- DavisSnider-IntroductionToVectorAnalysis
- Performance Parameters - Module II Reading III-2
- PowerElectronics_HW05_20150402
- Transcribing DSP Based Digital Control Method for DC-DC Converters-98
- PowerElectronics_HW03_20150212
- Lecture8 Dynamic
- Lecture7 Flux
- Topic 8-Mean Square Estimation-Wiener and Kalman Filtering
- Lecture9 Dynamics
- Topic 6-Linear Systems and Power Spectra
- Blaabjerg PV Reliability 201301
- 01062238
- Topic 5_Intro to Random Processes

**Read chapter 7 of Papoulis
**

(1 week)

1

**Sequences (Vector) of Random Variables
**

Papoulis Chapter 7

**• A vector random variable, X, is a function that assigns a vector of
**

real numbers to each outcome ξ in S. Basis for Random Processes.

• The vector can be used to describe a sequence of random variables.

X(ξ) = x

S

ξ

Rn

**• Example: Sample a signal X(t) every T sec.: Xk = X(kT)
**

X = (X1, X2, …, Xn) is a vector RV (or a random process)

• An event involving an n-dimensional random variable X = (X1, X2,

…, Xn) has a corresponding region in n-dimensional real space.

• The expectation of a random vector is the vector of the expectations.

If X = (X1, X2, … Xn), then

E[X] = (E[X1], E[X2], … E[Xn])

(4-1)

2

**Conditional Expected Values
**

(very useful in estimation)

• We will show that

EX [X | Y] =EY [EX [X | Y]| Y]

(4-2a)

• Which implies the useful result

EX [X] = EY [EX [X | Y] ]

(4-2b)

• The above is known as the Law of Iterated Expectations. It is also

known as the Law of Total Expectation, since it implies that

$∑ E[ X | Y = y ]P(Y = y)

!y

E[ X ] = #

E[ X | Y = y] f Y ( y )dy

∫

!" y

• If X and Y are independent, then

EX [X | Y] = EX [X]

• If a is a constant, then

E [a |Y] = a

if Y is discrete

if Y is continuous.

(4-2c)

(4-3a)

(4-3b)

3

**Law of Iterated Expectations
**

• Proof:

! E[X | Y = y] f

E[ E[X | Y ]] =

Y

(y) dy

y

=

! !xf

X|Y

(x | y)dx fY (y)dy

y x

=

!x! f

x

=

X,Y

(x, y)dy dx

y

!xf

X

(x) dx

x

= E[X]

(4-3c)

4

**Example: Expectation of the Sum of a Random
**

Number of Random Variables

If N is a positive integer-valued R.V. and Xi, i = 1, 2, … are

identically distributed R.V.s, with mean E [ X that are independent of

N, then

N

∑X

i

i =1

**• is a random variable and
**

N

E[∑ X i ] = E[ N ]E[ X ].

(4-4)

i =1

5

**Expectation of the Sum of a Random Number of
**

Random Variables

Proof:

N

n

E X [! Xi N = n] = E X [! Xi | N = n]

i=1

(4-5a)

i=1

n

n

= E X [! Xi ] = ! E X [Xi ] = n E X [X]

i=1

so that

i=1

N

E X [! X i | N ] = N " E[X]

(4-5b)

i=1

**and finally we get the desired result
**

N

N

E[! X i ] = E N [E X [! X i | N ]] = E N [N " E[X]] = E[N ]E[X].

i=1

(4-4)

i=1

6

Conditional Variance

var( X )

E[ X | Y ]

var( X | Y )

?

The conditional variance of X given Y is the R.V. given by

var(X Y ) = E[(X ! E[X | Y ])2 | Y ] = E[X 2 | Y ]! E[X | Y ]2 .

(4-6a)

The Conditional Variance Formula (or Law of Total Variance)

var( X ) = E[var( X | Y )] + var( E[ X | Y ])

(4-6b)

7

Conditional Variance

Proof:

E[var( X Y )] = E[ E[ X 2 | Y ] − E[ X | Y ]2 ]

= E[ E[ X 2 | Y ]] − E[ E[ X | Y ]2 ]

= E[ X 2 ] − E[ E[ X | Y ]2 ]

= E[ X 2 ] − E[ X ]2 − E[ E[ X | Y ]2 ] + E[ X ]2

= var( X ) − E[ E[ X | Y ]2 ] + E[ E[ X | Y ]]2

= var( X ) − var( E[ X | Y ]).

(4-7)

8

**Generalization to Many RVs
**

• Let X1, X2, …, Xn, be n random variables defined on a sample space

• Let the row vector XT = (X1, X2, …, Xn) be the transpose of a random

(column) vector.

• Let u = (u1, u2, …, un) be a real vector

• The notation { X ≤ u } denotes the event

{X1 ≤ u1, X2 ≤ u2, …, Xn ≤ un}, where, as before, the commas denote

intersections, that is,

{ X ≤ u } = {X1 ≤ u1}∩{X2 ≤ u2 }∩…∩{Xn ≤ un}

(4-8a)

• The joint CDF of X1, X2, …, Xn or the CDF of the random vector X is

defined as

FX(u ) = P{ X ≤ u } = P{X1 ≤ u1, X2 ≤ u2, …, Xn ≤ un}

(4-8b)

– FX(u ) is a real-valued function of n real variables (or of the

vector u ) with values between 0 and 1.

• The expectation of a random vector is the vector of the expectations

9

E[XT] = [ E[X1], E[X2], … E[Xn] ]T = µXT

(4-8c)

**The Covariance Matrix — I
**

• There are n2 pairs of random variables Xi and Xj giving n2

covariance functions

• n of these are cov(Xi, Xi) = var(Xi)

• The covariance matrix R is a symmetric n×n matrix with i-jth

entry ri,j = cov(Xi, Xj).

• The variances of the Xi s appear along the diagonal of the matrix

• Uncorrelated RVs ⇒ R is diagonal matrix

• Expectation of a matrix with RVs as entries = matrix of

expectations

Note: sometimes we refer to the covariance matrix as R and

sometimes as Σ

10

**The Covariance Matrix — II
**

• X = [X1, X2, … Xn] and µ are n x1 column vectors

• (X – µX)T is a n×1 row vector

• (X – µX)•(X – µX)T is a n×n matrix whose i-jth entry is (Xi–µi)(Xj–µj) (4-9a)

• E[matrix] = matrix of expectations

• R = E[(X – µX)•(X – µX)T] is an n×n matrix with i-jth entry ri,j = cov(Xi, Xj)

• R is a symmetric positive semi-definite (also known as symmetric

nonnegative definite) matrix, since the quadratic form

var(XTa) = aTRa ≥ 0.

(4-9b)

• R is used to find variance and covariances of linear combinations of the Xi

11

**Multivariate (Sequence) Parameters
**

• Let XT denote the transposed (column) data vector (x1, x2 ,..., xn )

T

Mean: E !"XT #$ = [µ1,..., µ n ]

(4-10a)

Covariance:! ij % Cov ( Xi , X j )

(4-10b)

" ij

Correlation: Corr ( Xi , X j ) % !ij =

(4-10c)

" i" j

• The covariance matrix has elements σij and is denoted by Σ and is

defined below:

[

T

Σ ≡ Cov(X ) = E (X − µ )(X − µ )

]

& σ 12 σ 12 σ 1d #

$

!

2

σ 21 σ 2 σ 2d !

$

=

$

!

$

2 !

$%σ d 1 σ d 2 σ d !"

(4-10d)

Multivariate Normal Distribution

x ~ N d (µ, Σ )

1

& 1

#

T −1

p (x ) =

exp$− (x − µ) Σ (x − µ)!

1/ 2

d/2

% 2

"

(2π) Σ

(4-11)

**Parameter Estimation (“Predicting the value of Y”)
**

• Suppose Y is a RV with known PMF or PDF

• Problem: Predict (or estimate) what value of Y will be

observed on the next trial

• Questions:

• What value should we predict?

• What is a good prediction?

• We need to specify some criterion that determines what is a

good/reasonable estimate.

• Note that for continuous random variables, it doesn’t make

sense to predict the exact value of Y, since that occurs with

zero probability.

• A common estimate is the mean-square estimate.

**The Mean Square Error (MSE) Estimate
**

• We will let Ŷ be the mean square estimate of the random variable, Y.

• Let E[Y] = µ

• The mean-squared error (MSE) is defined as:

e = E[(Y – Ŷ)2]

• We proceed by “completing the square”

E[(Y – Ŷ)2] = E[(Y – µ + µ– Ŷ)2]

= E[(Y – µ)2 + 2(Y – µ)(µ– Ŷ) + (µ– Ŷ)2]

= var(Y) + 2(µ– Ŷ)E[Y – µ] + (µ– Ŷ)2

= var(Y) + (µ– Ŷ)2 > var(Y) if Ŷ ≠ µ

(4-12)

• Clearly choosing Ŷ = µ minimizes the MSE of the estimate

• Ŷ = µ is called the minimum- (or least-) mean-square error (MMSE

or LMSE) estimate

• The minimum mean-square error is var(Y)

**The MSE of a RV Based Upon Observing another RV
**

• Let X and Y denote random variables with known joint distribution

• Suppose that the value of X becomes known to us, but not the value

of Y. How can we find the MMSE estimate, Ŷ ?

• Can the MMSE estimate, Ŷ, which is a function of X, do better than

ignoring X and estimating the value of Y as Ŷ = µY = E[Y] ?

• Denoting the MMSE estimate Ŷ by c(X), the MSE is given by

" "

2

e = E{[Y ! c(X)] } =

#

2

[y

!

c(x)]

f X,Y (x, y)dxdy

#

!" !"

"

"

= # f X (x) # [y ! c(x)]2 fY |X (y | x)dydx

!"

(4-13)

!"

**• Note that the above integrals are positive, so that e will be minimized
**

if the inner integral is a minimum for all values of x.

• Note that for a fixed value of x, c(x) is a variable [not a function]

**The MSE of a RV Based Upon Observing another RV-2
**

• Since for a fixed value of x, c(x) is a variable [not a function], we can

minimize the MSE by setting the derivative of the inner integral, with

respect to c, to zero:

"

d "

2

[y

!

c(x)]

fY |X (y | x)dy = # 2(y ! c) fY |X (y | x)dy = 0

#

dc !"

!"

(4-14a)

**• Solving for “c” after noting that
**

"

"

# c(x) f

Y |X

!"

gives

**(y)dy = c(x) # fY |X (y)dy = c(x), where the integral is one.
**

!"

Yˆ = c(X) =

"

# yf

Y |X

(y)dy = E[Y | X]

(4-14b)

!"

**• Thus the MMSE estimate, Ŷ, is the conditional mean of Y given X.
**

• The MMSE estimate is nonlinear and its MSE are RVs that are

functions of X.

17

MMSE Example

• Let the random point (X, Y) is uniformly distributed on a semicircle

1

(1–α2)1/2!

–1

α

1

**• The joint PDF has value 2/π on the semicircle
**

• The conditional PDF of Y given that X = α is a uniform density on [0,

(1–α2)1/2].

• So, Ŷ = E[Y|X = α] = (1/2)•(1–α2)1/2 and this estimate achieves the

least possible MSE of var(Y|X = α) = (1–α2)/12

• Intuitively reasonable since

• If |α| is nearly 1, the MSE is small (since the range of Y is small)

• If |α| is nearly 0, the MSE is large (since the range of Y is large)

**The Regression Curve of Y on X
**

1

–1

1

**• Ŷ = E[Y|X=α] as a function of α is a curve called the regression
**

curve of Y on X

• Graph of (1/2)•(1–α2)1/2 is a half-ellipse

• Given X value, the MMSE estimator of Y can be read off

from the regression curve

**Linear MMSE Estimation — I
**

• Suppose that we wish to estimate Y as a linear function of the

observation X

• The linear MMSE estimate of Y is aX + b where a and b are chosen

to minimize the mean-square error E[(Y – aX – b)2]

• Let Z = Y – aX – b be the error, then

E[(Y – aX – b)2] = E[Z2] = var(Z) + (E[Z])2

(4-15a)

= var(Y) + a2var(X) – 2a•cov(X,Y) + (E[Z])2

• The above is quadratic in a and b.

• By differentiation, it can be shown that the minimum occurs when

!" Y

a=

"X

b = µY ! aµ X

(4-15b)

**Linear MMSE Estimation — II
**

• As before, let Z = Y – aX – b be the error, then the MSE is

e = E[(Y – aX – b)2] = E [(Z2)]

(4-16a)

• Setting the derivative of the MSE with respect to a to zero gives

!e

= E[2Z("X)] = 0

!A

(4-16b)

**• Which says that the estimation error, Z, is orthogonal, (that is
**

uncorrelated) to the received data X.

• This is referred to as the orthogonality principle of linear

estimation. That is the error is uncorrelated with the observation )

data), X, and, intuitively, the estimate has done all it can to extract

correlated information from the data.

**Gaussian MMSE = Linear MMSE
**

• In general, the linear MMSE estimate has a higher MSE than the

(usually nonlinear) MMSE estimate E[Y|X]

• If X and Y are jointly Gaussian RVs, it can be shown that the

conditional PDF of Y given X = α is a Gaussian PDF with mean

µY + (ρσY/σX)(α – µX)

(4-17a)

and variance (σY)2(1 – ρ2)

(4-17b)

• Hence, E[Y|X = α] = µY + (ρσY/σX)(α – µX)

(4-17c)

is the same as the linear MMSE.

• For jointly Gaussian RVs, MMSE estimate = linear MMSE estimate

Limit Theorems

• Limit theorems specify the probabilistic behavior of n random

variables as n → ∞

• Possible restrictions on RVs:

– Independent random variables

– Uncorrelated random variables

– Have identical marginal CDFs/PDFs/PMFs

– Have identical means and/or variances

23

**The Average of n RVs
**

• n random variables X1, X2, …, Xn have finite expectations µ1, µ2, …,

µn

• Let the average be

Z = (X1 + X2 + …+ Xn)/n

(4-18a)

• What is E[Z]?

• Expectation is a linear operator so that

(4-18b)

E[Z] = (E[X1] + E[X2] + …+ E[Xn])/n

• Expected value of average of n RVs = numerical average of their

expectations.

• An important practical case is when the RVs are independent and

identically distributed (i.i.d.) and the average is called the sample

mean.

24

**Variance of the Sample Mean
**

• Sample mean Z = (X1 + X2 + …+ Xn)/n

• E[Z] = E[X] = µ

• It is easy to show that

var(Z) = var(X1 + X2 + …+ Xn)/n2

= var(X)/n

• This is because the RVs are independent.

• The variance decreases as n increases.

(4-18c)

(4-18d)

25

**Weak Law of Large Numbers (WLLN)
**

• Weak Law of Large Numbers:

If X1, X2, … Xn, … are i.i.d. RVs with finite mean µ, then for

every ε > 0,

P{|(X1+X2+…+Xn)/n – µ| ≥ ε} → 0 as n → ∞

(4-19a)

Equivalently

P{|(X1+X2+…+Xn)/n – µ| ≤ ε} → 1 as n → ∞

(4-19b)

• Note that it is not necessary for the RVs to have finite variance

• But the proof is easier if variance is finite

• Note: WLLN says lim n→∞ P{something} = 1

26

**Strong Law of Large Numbers (SLLN)
**

• Strong Law of Large Numbers:

If X1, X2, … Xn, … are i.i.d. RVs with finite mean µ, then

P{ lim n→∞(X1 + X2 + … + Xn)/n = µ} = 1

(4-20a)

**• Experiment will be repeated infinitely often and the RV X took on
**

values x1, x2, … xn, … on these trials

• What can be said about (x1+x2+…+xn)/n?

• There are three possibilities

• Sequence converges to µ

• or it converges to some other number

• or it does not converge at all

• The Strong Law of Large Numbers says that

(4-21b)

P{(x1+x2+…+xn)/n converges to µ} = 1

Note: SLLN says P{lim n→∞ something} = 1

27

**Strong Law of Large Numbers — II
**

• If the Strong Law of Large Numbers holds, then so does the Weak

Law

• In fact, both require only that the RVs be i.i.d. with finite mean µ

• But, the Weak Law of Large Numbers might be applicable in

cases when the Strong Law does not hold

• Example: Weak Law of Large Numbers still applies if the RVs are

uncorrelated but not independent

28

**Strong Law and Relative frequencies
**

• The Strong Law of Large Numbers justifies the estimation of

probabilities in terms of relative frequencies

• If the Xi are i.i.d. Bernoulli RVs with parameter p (and hence,

finite mean p), then the sample mean Zn converges to p with

probability 1 as n → ∞

• The observed relative frequency of an event of probability p

converges to p with probability 1.

29

**The Central Limit Theorem
**

(previously discussed)

•

•

•

•

•

•

**Let Yn= (X1 + X2 + …+ Xn – nµ)/σ√n be a RV with mean 0
**

and variance 1.

The Central Limit Theorem asserts for large values of n that

the CDF of Yn is well-approximated by the unit Gaussian CDF

Formally, the Central Limit Theorem states that the CDF

converges to the unit Gaussian CDF.

In practical use of the Central Limit Theorem. we hardly ever

use the RV

Yn= (X1 + X2 + …+ Xn – nµ)/σ√n

Instead, X1 + X2 + …+ Xn is treated as if its CDF is

approximately that of a N(nµ,nσ2) RV

Thus, we compute

P{X1 + X2 + …+ Xn ≤ u} ≈ Φ((u–nµ)/σ√n)

which is effectively the same computation

30

**Final Remarks on Probability
**

We see that the theory of probability is at bottom only

common sense reduced to calculation; it makes us appreciate

with exactitude what reasonable minds feel by a sort of

instinct, often without being able to account for it. … It is

remarkable that this science, which originated in the

consideration of games of chance, should become the most

important object of human knowledge. … The most

important questions of life are, for the most part, really only

problems of probability.

Pierre Simon, Marquis de LaPlace, Analytical Theory of

Probability

31

Additional Backup/Reference Slides

32

- Statistical Signal ProcessingUploaded byprobability2
- MGUploaded byPablo Rodriguez
- 19Lecture - Central Limit TheoremUploaded bysaidasa
- Solution 1Uploaded byFrancis Mark Quimba
- JehleRenyCh5solutions ExamplesUploaded bySezer Alcan
- Microeconomics NVM Lec 2 PrefrencesUploaded bysatyam91
- 20Uploaded bySushma Rani
- Random VariableUploaded byTehseen Shah
- random variableUploaded bydev1712
- Solutions Jehle RanyUploaded bypatipet2753
- PS1solutionUploaded byColin Jennings
- A Sphere Decoding Algorithm for MIMOUploaded byIRJET Journal
- Face Recognition Using Pca TechniqueUploaded byRaja Naik
- 5- CentralLimitTheoremUploaded byArashVahabpour
- ps3Uploaded byThinh
- Critical Factors Limiting Interpretation of Reg Vectors (Brown TRAC 2009)Uploaded bydatamule
- A Step by Step to the Black Litterman ApproachUploaded byandjim
- 00969385.pdfUploaded bySoujanya Vinnakota
- 351494.v1Uploaded byHeba Gaber Mohamed
- add mathUploaded byqieyla93
- Thomas Idzorek - A Step by Step Guide to Black-Litterman ModelUploaded byCesar Jeanpierre Castillo García
- Crib SheetUploaded byJacob Gerten
- ch4Uploaded byLloyd Buffett Blankfein
- Data Clustering, A ReviewUploaded byGianluca Sboby
- Theory 5.3Uploaded byMinh Tùng
- Lattice Reduction Aided Detection Techniques for MIMO SystemsUploaded byIRJET Journal
- Portfolio TheoryUploaded byS.m.HasibReza
- 1202.1552.pdfUploaded byarpitmatlab
- 30120130404044Uploaded byIAEME Publication
- bayesian_networks_methods__for_traffic_flow_predictionUploaded byVedant Goyal

- Bist Pfc Cuk Bldc 201502Uploaded byHamza Mahmood
- ENG_2013011516314621Uploaded byHamza Mahmood
- PowerElectronics_HW04_20150312Uploaded byHamza Mahmood
- GRE Practice TestUploaded byHamza Mahmood
- BergenVittal-PowerSystemsAnalysisUploaded byHamza Mahmood
- 6 Final Control Project Report Cruise C WPaper-Hawks Sample ProjectUploaded byHamza Mahmood
- DavisSnider-IntroductionToVectorAnalysisUploaded byHamza Mahmood
- Performance Parameters - Module II Reading III-2Uploaded byHamza Mahmood
- PowerElectronics_HW05_20150402Uploaded byHamza Mahmood
- Transcribing DSP Based Digital Control Method for DC-DC Converters-98Uploaded byHamza Mahmood
- PowerElectronics_HW03_20150212Uploaded byHamza Mahmood
- Lecture8 DynamicUploaded byHamza Mahmood
- Lecture7 FluxUploaded byHamza Mahmood
- Topic 8-Mean Square Estimation-Wiener and Kalman FilteringUploaded byHamza Mahmood
- Lecture9 DynamicsUploaded byHamza Mahmood
- Topic 6-Linear Systems and Power SpectraUploaded byHamza Mahmood
- Blaabjerg PV Reliability 201301Uploaded byHamza Mahmood
- 01062238Uploaded byHamza Mahmood
- Topic 5_Intro to Random ProcessesUploaded byHamza Mahmood