This action might not be possible to undo. Are you sure you want to continue?

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

outcome cannot be predicted precisely. One out of a number of outcomes is possible in a random experiment.

2. Sample Space: The sample space S is the collection of all outcomes of a random experiment. The elements of S are called sample points. • A sample space may be finite, countably infinite or uncountable. • A finite or countably infinite sample space is called a discrete sample space. • An uncountable sample space is called a continuous sample space

Sample space S Sample point •s1 •s2

3. Event: An event A is a subset of the sample space such that probability can be assigned to it. Thus • A ⊆ S. • For a discrete sample space, all subsets are events. • S is the certain event (sue to occur) and φ . is the impossible event. Consider the following examples. Example 1 Tossing a fair coin –The possible outcomes are H (head) and T (tail). The associated sample space is S = {H , T }. It is a finite sample space. The events associated with the sample space S are: S ,{H },{ T } and φ . Example 2 Throwing a fair die- The possible 6 outcomes are:

‘1’

‘2’

‘3’

‘4’

‘5’

‘6’

The associated finite sample space is S = {'1', '2', '3', '4', '5', '6'}. Some events are A = The event of getting an odd face={'1', '3', '5'}. B = The event of getting a six={6} And so on.

Example 3 Tossing a fair coin until a head is obatined We may have to toss the coin any number of times before a head is obtained. Thus the possible outcomes are: H, TH,TTH,TTTH,….. How many outcomes are there? The outcomes are countable but infinite in number. The countably infinite sample space is S = {H , TH , TTH ,......}. Example 4 Picking a real number at random between -1 and 1

The associated sample space is S = {s | s ∈ , −1 ≤ s ≤ 1} = [ −1, 1]. Clearly S is a continuous sample space. The probability of an event A is a number P( A) assigned to the event . Let us see how we can define probability.

1. Classical definition of probability ( Laplace 1812) Consider a random experiment with a finite number of outcomes N . If all the outcomes of the experiment are equally likely, the probability of an event A is defined by

P( A) = NA N

where N A = Number of outcomes favourable to A.

**Example 4 A fair die is rolled once. What is the probability of getting a‘6’? Here S = {'1', ' 2 ', '3', ' 4 ', '5', '6 '} and A = { '6 '}
**

∴ N = 6 and N A = 1 ∴ P( A) = 1 6

Example 5 A fair coin is tossed twice. What is the probability of getting two ‘heads’?

Here S = {HH , TH , TT , TT } and A = {HH }. Total number of outcomes is 4 and all four outcomes are equally likely. Only outcome favourable to A is {HH}

∴ P( A) = 1 4

Discussion • The classical definition is limited to a random experiment which has only a finite number of outcomes. In many experiments like that in the above example, the sample space is finite and each outcome may be assumed ‘equally likely.’ In such cases, the counting method can be used to compute probabilities of events.

•

Consider the experiment of tossing a fair coin until a ‘head’ appears.As we have discussed earlier, there are countably infinite outcomes. Can you believe that all these outcomes are equally likely? The notion of equally likely is important here. Equally likely means equally probable. Thus this definition presupposes that all events occur with equal probability. Thus the definition includes a concept to be defined.

•

2. Relative-frequency based definition of proability( von Mises, 1919)

**If an experiment is repeated n times under similar conditions and the event A occurs in nA times, then
**

P( A) = Lim

n→∞

nA n

This definition is also inadequate from the theoretical point of view. We cannot repeat an experiment infinite number of times.

How do we ascertain that the above ratio will converge for all possible sequences of outcomes of the experiment?

Example Suppose a die is rolled 500 times. The following table shows the frequency each face. Face Frequency Relative frequency 1 82 0.164 2 3 4 81 88 81 0.162 0.176 0.162 5 90 0.18 6 78 0.156

3. Axiometic definition of probability ( Kolmogorov, 1933)

We have earlier defined an event as a subset of the sample space. Does each subset of the sample space forms an event? The answer is yes for a finite sample space. However, we may not be able to assign probability meaningfully to all the subsets of a continuous sample space. We have to eliminate those subsets. The concept of the sigma algebra is meaningful now.

Definition: Let S be a sample space and F a sigma field defined over it. Let P : F → ℜ be a mapping from the sigma-algebra F into the real line such that for each A ∈ F , there exists a unique P ( A) ∈ ℜ. Clearly P is a set function and is called probability if it satisfies the following these axioms

1. P ( A) ≥ 0 for all A ∈ F 2. P ( S ) = 1 3. Countable additivityIf A1 , A2 ,... are pair-wise disjoint events, i.e. Ai ∩ Aj = φ for i ≠ j, then P ( ∪ Ai ) = ∑ P ( Ai )

i =1 i =1 ∞ ∞

1

A

0

S

Remark • The triplet ( S , F, P). is called the probability space.

• Any assignment of probability assignment must satisfy the above three axioms • If if A ∩ B = 0, P( A ∪ B) = P( A) + P( B) / This is a special case of axiom 3 and for a discrete sample space, this simpler version may be considered as the axiom 3. We shall give a proof of this result below. • The events A and B are called mutually exclusive if A ∩ B = 0, /

Basic results of probability

From the above axioms we established the following basic results: 1. P(φ ) = 0 This is because, S ∪φ = S ⇒ P( S ∪ φ ) = P( S )

⇒ P( S ) + P(φ ) = P ( S ) ∴ P (φ ) = 0 2. P ( Ac ) = 1- P ( A) where We have A ∪ Ac = S

⇒ P( A ∪ Ac ) = P( S ) ⇒ P( A) + P( Ac ) = 1 ∵ A ∩ Ac = φ ∴ P ( A) = 1 − P ( Ac ) 3. If A, B ∈ F and A ∩ B = 0, P ( A ∪ B ) = P( A) + P ( B ) / We have A ∪ B = A ∪ B ∪ φ ... ∪ φ ..

∴ P ( A ∪ B) = P ( A) + P( B) + P (φ )... + P (φ ) + .. (using axiom 3) ∴ P ( A ∪ B) = P ( A) + P( B) 4. If A, B ∈ F, P ( A ∩ B c ) = P( A) − P( A ∩ B)

where A∈ F

S

A

B

We have ( A ∩ B ) ∪ ( A ∩ B) = A

c

A ∩ Bc

A∩ B

∴ P[( A ∩ B c ) ∪ ( A ∩ B)] = P( A) ⇒ P( A ∩ B c ) + P ( A ∩ B ) = P ( A) ⇒ P( A ∩ B c ) = P( A) − P ( A ∩ B )

A = ( A ∩ B c ) ∪ ( A ∩ B)

We can similarly show that P ( Ac ∩ B ) = P ( B ) − P ( A ∩ B ) 5. If A, B ∈ F, P ( AUB ) = P ( A) + P( B) − P ( A ∩ B ) We have

A ∪ B = ( Ac ∩ B) ∪ ( A ∩ B) ∪ ( A ∩ B c ) ∴ P( A ∪ B ) = P[( Ac ∩ B) ∪ ( A ∩ B) ∪ ( A ∩ B c )] = P ( Ac ∩ B) + P( A ∩ B) + P( A ∩ B c ) =P ( B ) − P( A ∩ B) + P( A ∩ B) + P( A) − P( A ∩ B) =P ( B) + P( A) − P( A ∩ B )

6. We can apply the properties of sets to establish the following result for A, B, C ∈ F, P ( A ∪ B ∪ C ) = P ( A) + P ( B ) + P(C ) − P ( A ∩ B) − P( B ∩ C ) − P( A ∩ C ) + P( A ∩ B ∩ C ) The following generalization is known as the principle inclusion-exclusion.

7. Principle of Inclusion-exclusion

Suppose A1 , A2 ,..., An ∈ F

Then

⎛ n ⎞ n ⎛ n ⎞ P ⎜ ∪ Ai ⎟ = ∑ P ( Ai ) − ∑ P ( Ai ∩ Aj ) + ∑ P ( Ai ∩ Aj ∩ AK ) − .... + (−1) n +1 P ⎜ ∩ Ai ⎟ i , j |i < j i , j , k |i < j < k ⎝ i =1 ⎠ i =1 ⎝ i =1 ⎠

Discussion We require some rules to assign probabilities to some basic events in ℑ . For other events we can compute the probabilities in terms of the probabilities of these basic events. Probability assignment in a discrete sample space

Consider a finite sample space S = by the power set of S. such that,

{ s1 , s2 ,...... sn } . Then the sigma algebra ℑ is defined For any elementary event {si } ∈ ℑ , we can assign a probability P( si )

∑ P ( {si } )

i =1

N

= 1

**For any event A ∈ ℑ , we can define the probability
**

P ( A) =

Ai ∈ A

∑ P ({ Ai })

In a special case, when the outcomes are equi-probable, we can assign equal probability p to each elementary event.

∴

∑p

i =1

n

= 1

⇒ p = 1n

⎛ ⎞ ∴ P ( A) = P ⎜ ∪ {Si } ⎟ ⎜S ∈A ⎟ ⎝ i ⎠ 1 n( A) = n( A) = n n Example Consider the experiment of rolling a fair die considered in example 2. Suppose Ai , i = 1,.., 6 represent the elementary events. Thus A1 is the event of getting ‘1’, A2 is the event of getting ’2’ and so on. Since all six disjoint events are equiprobale and S = A1 ∪ A2 ∪ .... ∪ A6 we get

⎧ 1 ⎪ f X ( x) = ⎨ S1 ⎪0 ⎩ P ( A) = A S1 for x ∈ S1 otherwise This example interprets the geometrical definition of probability. ∫ f ( x) dx For any Borel set A.a Example Consider S = 2 the two-dimensional Euclidean space... . P( A) = ∫ f ( x) dx A defines the probability on the Borel sigma-algebra Β.. TH . Suppose S = and f : =1 → is a non-negative integrable function such that...1 6 Su ppose A is the event of getting an odd face. Let us call s1 = H s2 = TH s3 = TTH and so on.a ⎪0 otherwise ⎩ .. If we assign. b] ⎪ f X ( x) = ⎨ b . b1 ] ⊆ [a. b] 1 P([a1 . Then for [a1 . s2 . Let A = {s1 .. the sigma algebra consists of the Borel sets on the real line. = P( A6 ) = 1 1 ∴ P ( A) = P ( A1 ) + P ( A3 ) + P( A5 ) = 3 × = 6 2 Example Consider the experiment of tossing a fair coin until a head is obtained discussed in Example 3. Here S = {H . TTH . Then P( A) = P({s1}) + P({s2 }) + P({s3}) = 1 1 1 7 + + = 2 2 2 23 8 Probability assignment in a continuous space Suppose the sample space S is continuous and un-countable. 3 etc.. P({sn }) = 1 then ∑ P({sn }) = 1. For example. In such a case. Then A = A1 ∪ A3 ∪ A5 P ( A1 ) = P ( A2 ) = . b1 ]) = b .}. 2 We can similarly define probability on the continuous space of Example Suppose ⎧ 1 for x ∈ [a. the current or the resistance. s3} is the event of sn ∈S 2n obtaining the head before the 4th toss. such sample space occurs when the experiment consists in measuring the voltage. Let S1 ⊆ 2 and S1 represents the area under S1. Such a sample space arises when the outcomes of an experiment are numbers.

. b3 a5 .. b3 a4 .. b1 a4 . Combinatorial rules give us quick algebraic rules to find the elements in S .. In this case. In other words if we can choose element a in m possible ways and the element b in n possible ways then the ordered pair (a.. b4 a3 . Then A × B contains { } mn ordered pair of elements. b4 a4 . b1 a3 ... b2 a3 . B a1 . ak ) | a1 ∈ A1 .. b4 a2 .Probability Using Counting Method: In many applications we have to deal with a finite sample space S and the elementary events formed by single elements of the set may be assumed equiprobable.. b3 a1 .. b3 a3 .. This is illustrated in Fig for m = 5 and n = 4.. b2 a5 .nk where ni represents the number of distinct elements in Ai . . ak ∈ Ak } is n1n2 .. b2 a1 . b5 a5 .. b2 a4 . b j ∈ B .. a2 ∈ A2 .... b1 a2 . b1 A Fig Illustration of the product rule The above result can be generalized as follows: The number of distinct k-tupples in A1 × A2 × .. We briefly outline some of these rules: (1) Product rule: Suppose we have a set A with m distinct elements and the set B with n distinct elements and A × B = ( ai . Thus calculation of probability involves finding the number of elements in the sample space S and the event A.. Ak = {( a1 .. a2 .. we can define the probability of the event A according to the classical definition discussed earlier: P ( A) = nA n where nA = number of elements favorable to A and n is the total number of elements in the sample space S . b j ) | ai ∈ A... b4 a1 . b2 a2 . b1 a5 . b3 a2 .b) can be chosen in mn possible ways...

4 ) (3... (3) Sampling without replacement: Suppose we have to choose k objects from a set of n objects by picking one object after another at random.2) (6..4 ) (1.1) (3. the sample space has 6. Clearly.4 ) (5.1) Throw 1 (6. In this case.2) (1. after every choosing. the number of distinct ordered k-tupples = n × n × .6) (5. Thus .3) (6. 11 Therefore.2) (2.1) (2) Sampling with replacement and ordering: Suppose we have to choose k objects from a set of n objects..4 ) (4.2) (3.Example A fair die is thrown twice.6) (4..6) (2. the number of distinct order k-tupples in this case is n × ( n − 1) .3) (3.1) (4.1) (5.5 ) (6.4 ) (2. Further.6) (1.5 ) (4. the object is placed back in the set.5 ) (3.5 ) (5.4 ) (6.. the second object can be chosen from n-1 objects. Solution: The sample space corresponding to two throws of the die is illustrated in the following table.5 ) (2. the required probability is 36 T h r o w 2 (1. and so.3) (5. Therefore.1) (2. In this case the first object can be chosen from n objects....2) (4.5 ) (1. ( n − k + 1) = n! ( n − k )! n! is called the permutation of n objects taking k at a time and ( n − k )! The number denoted by n Pk ..6 = 36 elements by the product rule..6) (3. × n (k − times) = n k .3) (1.3) (4.3) (2. The event corresponding to getting at least one 3 is highlighted and contains 11 elements. What is the probability that a 3 will appear at least once.6) (6. by applying the product rule.2) (5.

n Pk = n! ( n − k )! Clearly ...0027 0. what is the probability of two students in theclass having the same birthday? Plot this probability vs.364.k + 1) Pk 365k 365 Therefore.365.. the probability of common birthday = 1- Number of persons 2 10 15 25 40 50 60 80 100 Probability 0. n Pn = n ! Example: Birthday problem.9941 0.9999 .(365 . Let k be the number of students in the class...Given a class of students. number of people and be surprised! If the group has more than 365 people the probability of two people in the group having the same birthday is obviously 1.4114 0.5687 0.365 ( k -times) = 365k The number of cases with each of the k students having a different birth day is = 365 Pk = 365 ..9704 0.. Then the number of possible birth days=365.8912 0.1169 0.

the number of ways in which k objects can be chosen out of n objects is nCk k ! . Therefore. .The plot of probability vs number of students is shown in Fig. In this case ordering of the objects in the set of k objects is not considered. Note that k objects can be arranged among themselves in k ! ways. (4) Sampling without replacing and without ordering Suppose nCk be the number of ways in which k objects can be chosen out of a set of n objects. ∴ nCk k ! = n pk = n k n−k n n−k ∴ n Ck = . In fact this probability for a group of 25 students is greater than 0. This probability for 366 or more number of stuents is exactly one. This is the case of sampling with ordering. if ordering of the k objects is considered.5 and that for 60 students onward is closed to 1. Observe the steep rise in the probability in the beginning.

.. Example: An urn contains 6 red balls.. Then the total number of distinct partitions is n! n1 !n2 !. 9 balls were picked at random from the urn without replacement.. An of sizes n1 .. − nk −1 − nk ) ! n! n1 !n2 !..nk −1 Cnk = = ( n − n1 )! × .Ck is also called the binomial co-efficient.... 5 green balls and 4 blue balls.. group 2 consisting of two dice each showing 2 and so on. × ( n − n1 − n2 ... Solution: The total number of elements in the sample space of the outcomes of a single throw of 12 dice is = 612 The number of favourable outcomes is the number of ways in which 12 dice can be arranged in six groups of size 2 each – group 1 consisting of two dice each showing 1..nk ! Example: What is the probability that in a throw of 12 dice each face occurs twice.... Therefore.. nk respectively so that n = n1 + n2 + ........... + nk . n2 ..... 3 are green and 2 are blue? Solution: n 9 balls can be picked from a population of 15 balls in 15C9 = 6 15! 9!6! Therefore the required probability is C4 × 5C3 × 4C2 15 C9 (5) Arranging n objects into k specific groups Suppose we want to partition a set of n distinct elements into k distinct subsets A1 . A2 .. − nk −1 )! n! × n1 !( n − n1 ) ! n2 !( n − n1 − n2 ) ! nk !( n − n1 − n2 ... × n − n1 − n2 ........ What is the probability that out of the balls 4 are red. the total number distinct groups 12! = 2!2!2!2!2!2! Hence the required probability is .nk ! This can be proved by noting that the resulting number of partitions is n Cn1 × n − n1 Cn2 × .

= 12! (2)6 612 .

The probability of an event B under the condition that another event A has occurred is called the conditional probability of B given A and defined by P (A ∩ B ) P (B / A) = . Clearly P ( B / A) = Number of outcomes favourable to A and B Number of outcomes in A NAB = NA NAB P( A ∩ B) = N = NA P ( A) N This concept suggests us to define conditional probability. Let A and B two events in F . Let N AB sample points be favourable for the joint event A ∩ B. denoted by P( A / B). .Conditional probability Consider the probability space ( S . Notation P( B / A) = Conditional probability of B given A Let us consider the case of equiprobable events discussed earlier. We ask the following question – Given that A has occurred. P ). F. what is the probability of B? The answer is the conditional probability of B given A denoted by P( B / A). We shall develop the concept of the conditional probability and explain under what condition this conditional probability is same as P ( B ). P (A ) ≠ 0 P (A ) We can similarly define the conditional probability of A given B.

bb}.3} ∴ A ∩ B = {2} P( A ∩ B ) 1/ 6 1 ∴ P ( B / A) = = = P( A) 3/ 6 3 Example 2 A family has two children. 6} B = event of getting a number less than 4 = {1. P( A) ≠ 0 P( A) P( A ∩ B) ≥ 0. bg} and B = {gg} A ∩ B = {gg} P ( A ∩ B ) 1/ 4 1 ∴ P ( B / A) = = = P ( A) 3/ 4 3 Conditional probability and the axioms of probability In the following we show that the conditional probability satisfies the axioms of probability.. bg .. gb. By definition Axiom 1 P(B / A) = P( A ∩ B) . We have ∴ P( B / A) = ∞ i =1 Axiom 2 Axiom 3 ∞ i =1 (∪ Bi ) ∩ A = ∪ ( Bi ∩ A) ( See the Venn diagram below for illustration of finite version of the result. Suppose A = event of getting an even number = {2. B2 . Bn ..From the definition of conditional probability. It is known that at least one of the children is a girl.... 4. A = {gg .) .. P( A) > 0 P( A ∩ B) ≥0 P( A) We have S ∩ A = A P(S ∩ A) P(A) = =1 ∴P(S / A) = P(A) P(A) Consider a sequence of disjoint events B1 . gb. 2. What is the probability that both the children are girls? A = event of at least one girl B = event of two girls Clearly S = {gg . we have the joint probability P ( A ∩ B) of two events A and B as follows P ( A ∩ B ) = P (A ) P ( B / A ) = P ( B ) P ( A / B ) Example 1 Consider the example tossing the fair die.

i = 1... ∩ An −1 ) ... ∩ An −1 ) We have ( A ∩ B ) ∩ C P (A ∩ B )P (C / A ∩ P (A ∩ B )P (C / A ∩ P (A ) P ( B / A )P (C / = P (A ) P ( B / A )P (C B B A / ) ) ∩ B ) A ∩ B ) (A ∩ B ∩ C ) = P ( A ∩ B ∩ C ) = = = ∴ P ( A ∩ B ∩ C ) We can generalize the above to get the chain rule of probability P(A1 ∩ A2 .P ( An / A1 ∩ A2 . is also sequence of disjoint events...P ( An / A1 ∩ A2 ...Note that the sequence Bi ∩ A. An ) = P(A1 ) P( A2 / A1 )P( A3 / A1 ∩ A2 ).. then P ( B / A) = 1 and P ( A / B ) ≥ P( A) We have A ∩ B = B P ( A ∩ B) P( A) ∴ P ( B / A) = = =1 S P( A) P( A) A and B P( A ∩ B) P( A / B) = P( B) P( A) P( B / A) = P( B) P( A) = P( B) ≥ P( A) (2) Chain rule of probability P(A1 ∩ A2 . An ) = P(A1 ) P( A2 / A1 )P( A3 / A1 ∩ A2 ). 2........ ∴ P(∪(Bi ∩ A)) = ∑ P(Bi ∩ A) i =1 i =1 ∞ ∞ ∴P(∪Bi / A) = i =1 ∞ P(∪Bi ∩ A) i =1 ∞ P( A) = ∑P(B ∩ A) i=1 i ∞ P( A) = ∑P(Bi / A) i =1 ∞ Properties of Conditional Probabilities (1) If B ⊆ A..

A2 . Two balls are picked at random without replacement. Clearly A1 and A1c form a partition of the sample space corresponding to picking two balls from the box. Let A1 = event that the first ball is white and Let A1c = event that the first ball is black. A2 . ∪ An and Ai ∩ Aj = φ for i ≠ j. The subsets A1 . P(B) = ∑ P( A )P(B / A ) i =1 n i i Proof: We have ∪ B ∩ A i = B and the sequence B ∩ Ai is disjoint.... n i=1 ∴ P ( B ) = P ( ∪ B ∩ Ai ) i =1 = = ∑ ∑ n n i =1 P ( B ∩ Ai ) P ( Ai ) P ( B / Ai ) i =1 S A1 B A2 A3 Remark (1) A decomposition of a set S into 2 or more disjoint nonempty subsets is called a partition of S . An form a partition of S if S = A1 ∪ A2 . . . This result will be used in Bays’ theorem to be discussed to the end of the lecture. .... An be n events such that S = A1 ∪ A2 . . (2) The theorem of total probability can be used to determine the probability of a complex event in terms of related simpler events. ∪ An and Ai ∩ Aj = φ for i ≠ j. Let B = the event that the second ball is white.. . Then P( B) = P( A1 ) P( B / A1 ) + P( A1c ) P( B / A1c ) = 2 1 3 2 2 × + × = 5 4 5 4 5 . Then for any event B.. .(3) Theorem of Total Probability: Let n A1 . Example 3 Suppose a box contains 2 white and 3 black balls.

P ( B ) ≠ P ( B / A1 ) and A1 and B are dependent. P ( B) = and P ( B / A1 ) = ... TH .... we have P( A ∩ B) = P( B) P ( A) or P ( A ∩ B ) = P( A) P( B) Joint probability is the product of individual probabilities. An are called independent if and only if P ( Ai ∩ Aj ) = P( Ai ) P( Aj ) P ( Ai ∩ Aj ∩ Ak ) = P ( Ai ) P ( Aj ) P( Ak ) P ( Ai ∩ Aj ∩ Ak ∩ . Equivalently if A and B are independent.. 5 4 Therefore. TT } be the event of getting ‘tail’ in the first toss and B = {TH . P ( A ∩ B) = Bayes’ Theorem .P( An ) Example 4 Consider the example of tossing a fair coin twice. An ) = P ( Ai ) P( Aj ) P( Ak ). HH } be the event of getting ‘head’ in the second toss. Thus the events A and B are independent if P( B / A) = P( B) and P( A / B) = P( A).Independent events Two events are called independent if the probability of occurrence of one event does not affect the probability of occurrence of the other. The events A1 . Two events A and B are called statistically dependent if they are not independent. Similarly. TT } and all the outcomes are equiprobable. In 2 1 this case. HT .. A2 . Example 5 Consider the experiment of picking two balls at random discussed in example 3. Then 1 1 P ( A ) = and P ( B ) = .. where P( A) and P( B) are assumed to be non-zero. The resulting sample space is given by S = {HH . we can define the independence of n events. Let A = {TH . 2 2 Again. ( A ∩ B ) = {TH } so that 1 = P ( A) P ( B ) 4 Hence the events A and B are independent.

08. The probability P(Ak ) is called the a priori probability and P(Ak / B) is called the a posteriori probability. .6 and 0..Suppose A1 . Thus the Bays’ theorem enables us to determine the a posteriori probability P(Ak | B) from the observation that B has occurred.6. . 2. Thus we have the information of probabilities P ( Ai ) and P ( B / Ai ). Suppose T0 be event of transmitting 0 and T1 be the event of transmitting 1 and R0 and R1 be corresponding events of receiving 0 and 1 respectively... P ( R1 / T0 ) = 0.1 and a one becomes a zero with a probability 0. . n. Example 6 In a binary communication system a zero and a one is transmitted with probability 0. . i = 1.. . This result is of practical importance and is the heart of Baysean classification. Given P (T0 ) = 0. .08. Suppose the event B occurs if one of the events A1 . Baysean estimation etc. P (T1 ) = 0. An occurs. A2 . . Determine the probability (i) of receiving a one and (ii) that a one was transmitted when the received message is one. An are partitions on S such that S = A1 ∪ A2 . A2 .1 and P ( R0 / T1 ) = 0.. ∪ An and Ai ∩ A j = φ for i ≠ j..4 respectively.4. Due to error in the communication system a zero becomes a one with a probability 0. We ask the following question: Given that B has occured what is the probability that a particular event Ak has occured ? In other wo what is P ( Ak / B ) ? We have P ( B ) = ∴ P(Ak | B) = = ∑ P(A i =1 n i ) P ( B | Ai ) ( Using the theorem of total probability) P(Ak ) P ( B/Ak ) P(B) P(Ak ) P ( B/Ak ) ∑ i =1 n P(Ai )P ( B / Ai ) This result is known as the Baye’s theorem. Let S be the sample space corresponding to binary communication.

6 × 0. ∴ P ( D) = P ( A1 ) P( D / A1 ) + P( A2 ) P( D / A2 ) + P( A3 ) P( D / A3 ) 2 1 4 = × 0.6 × 0. 1.4 × 0.02.92 0.01 + × 0. there are identically looking capacitors of three makes A1 .1 = 0. Here we have to find P ( D) and P ( A3 / D ).1 = 0.015 and P( D / A3 ) = 0. It is known that 1% of A1 . A2 and A3 in the ratio 2:3:4.(i) P( R1 ) = Probabilty of receiving 'one' = P(T1 ) P ( R1 / T1 ) + P(T0 ) P( R1 / T0 ) = 0.02 9 3 9 = 0. P( A2 ) = and P( A3 ) = .533 .4 × 0.92 + 0. 2 1 4 Here P( A1 ) = .448 (ii) Using the Baye's rule P (T1 / R1 ) = = = P (T1 ) P( R1 / T1 ) P( R1 ) P(T1 ) P ( R1 / T1 ) P (T1 ) P( R1 / T1 ) + P (T0 ) P ( R1 / T0 ) 0.015 + × 0.0167 = 0.01.8214 Example 7 In an electronics laboratory.4 × 0. P( D / A2 ) = 0.5% of A2 and 2% of A3 are defective. What percentage of capacitors in the laboratory are defective? If a capacitor picked at defective is found to be defective.92 + 0.0167 and P ( A3 / D) = P( A3 ) P( D / A3 ) P( D) 4 × 0. what is the probability it is of make A3 ? Let D be the event that the item is defective.02 =9 0. 9 3 9 The conditional probabilities are P( D / A1 ) = 0.

s2 ) where s1 ∈ S1 and s2 ∈ S 2 . Often. Our aim is to define the probability P ( A1 × A2 ) . The sample space corresponding to the combined experiment is given by S = S1 × S2 . 2. The events in S consist of all the Cartesian products of the form A1 × A2 where A1 is an event in S1 and A2 is an event in S 2 . This is because. irrespective of the event in S 2 . For example. the experiment corresponding to sequential transmission of bits through a communication system may be considered as a sequence of experiments each representing transmission of single bit through the channel. We can easily show that P ( S1 × A2 ) = P2 ( A2 ) and P ( A1 × S2 ) = P ( A1 ) 1 where Pi is the probability defined on the events of Ai i = 1. Suppose two experiments E1 and E2 with the corresponding sample space S1 and S 2 are performed sequentially. S2 S1 × A2 A1 × A2 A1 × S2 Also note that A1 × A2 = A1 × S 2 ∩ S1 × A2 ∴ P ( A × B ) = P ⎡( A1 × S 2 ) ∩ ( S1 × A2 ) ⎤ ⎣ ⎦ A2 A1 S1 Fig. we have to consider several random experiments in a sequence. we considered the probability defined over a sample space corresponding to a random experiment. the event A1 × S2 in S occurs whenever A1 in S1 occurs..S1 × A2 and A1 × A2 are illustrated in in Fig. The sample space S1 × S2 and the events A1 × S 2 . (Animate) . the outcome of this combined experiment consists of the ordered pair ( s1 .Repeated Trials In our discussions so far. Such a combined experiment is called the product of two experiments E1 and E2. Clearly. Ai.

If two bits are transmitted in a sequence. A1 ={2. A2 = {H } ∴ S1 × S2 = {(1. what is the probability that both the bits will be 1? S1 = {0. (3. (4. (5. T ). H ). 0).Independent Experiments: In many experiments. H ). T ). T } . (1. (6. (6.1} . (5. the events A1 × S2 and S1 × A2 are independent for every selection of A1 ∈ S1 and A2 ∈ S2 . (3.3. (0.1)} and P( A1 × A2 ) = 1 1 1 = × 4 2 2 . T ). T )} and 1 2 1 P( A1 × A2 ) = . Such experiments are called independent experiments. 6} . S1 = {1. H ).5. (1. T ). H ). H ).1} . 2. Clearly. A2 = {1} ∴ S1 × S2 = {(0. (2. 0). (1. H ). = 2 6 6 Example 2 In a digital communication system transmitting 1 and 0. (4. (2.3} and S 2 = { H . 1 is transmitted twice as often as 0. A1 ={1} and S 2 = {0. T ). In this case can write P( A × B) = P ⎡( A1 × S2 ) ∩ ( S1 × A2 ) ⎤ ⎣ ⎦ = P ( A1 × S2 ) P ( S1 × A2 ) = P ( A1 ) P2 ( A2 ) 1 Example 1 Suppose S1 is the sample space of the experiment of rolling of a six-faced fair die and S 2 is the sample space of the experiment of tossing of a fair die. 4.1).

. Any event in S is of the form A1 × A2 × ...... s2 ∈ A2 . If k Ai s are A and the remaining n . we can write P ( A1 × A2 × ....... × Sn corresponding to n experiments and the Cartesian product of events A1 × A2 × .Pn ( An ) 1 where Pi is probability defined on the event of si ... Thus the sample space corresponding to the n repeated trials is S = S1 × S2 × ........ We call this event as the ‘success’ with probability P ( A) = p and the complementary event Ac as the ‘failure’ with probability P ( Ac ) = 1 − p........ we are only concerned whether a particular event A has occurred or not.....k Ai s are Ac ....... × Sn . If A occurs in a trial. sn ∈ An . × An = ( s1 ..... Such a random experiment is called Bernoulli trial.......P ( An ) .Generalization We can similarly define the sample space S = S1 × S2 × .. then . then we have a ‘success’ otherwise a ‘failure’.. Bernoulli trial Suppose in an experiment... × An where some Ai s are A and remaining Ai s are Ac .... sn ) s1 ∈ A1 .... s2 . × An ) = P ( A1 ) P ( A2 ) . Binomial Law: We are interested in finding the probability of k ‘successes’ in n independent Bernoulli trials. Using the property of independent experiment we have... Let S1 be the sample space associated with each trial and we are interested in a particular event A1 ∈ s and its complement Ac such that P ( A) = p and P( Ac ) = 1 − p .. P( A1 × A2 × ......... × An ) = P ( A1 ) P2 ( A2 ) ........ pn ( k ) = nCk p k (1 − p ) n − k Consider n independent repetitions of the Bernoulli trial.. { } If the experiments are independent... This probability px (k ) is given by..

.P ( A1 × A2 × . if n = 4.. the possible events are A × Ac × Ac × A A × A × Ac × A c A × Ac × A × A c Ac × A × A × A c Ac × A c × A × A A c × A × Ac × A We also note that all the nCk events are mutually exclusive... × An ) = p k (1 − p)n −k But the nCk number of events in S with k number of As and n – k number of Ac s.. For example.. Hence the probability of k successes in n independent repetitions of the Bernoulli trial is given by pn ( k ) = nCk p k (1 − p ) n − k . k = 2 ...

5. 4. 2.Example1: A fair dice is rolled 6 times. What is the probability that a 4 appears thrice? Solution: We have s1 = {1. What is the probability that there will be 5 1s in a message of 20 symbols? Solution: . 6} A = {4} with P ( A) = pi = 1 6 And Ac = {1.3.4 respectively.5.3.6 and 0. 6} with P ( Ac ) = 1 − p = 5 / 6 6×5 ⎛ 1 ⎞ ⎛ 5 ⎞ × ⎜ ⎟ × ⎜ ⎟ = 0. 2.008 2 ⎝6⎠ ⎝6⎠ 4 2 ∴ P6 (4) = Example2: A communication source emits binary symbols 1 and 0 with probability 0.

Case 1 Suppose n is very large and p is very small and np = λ a constant. bit error occurs with a probability of10−5 .0001 Typical plots of binomial probabilities are shown in the figure.1} A = {1} . .4 ) =0.S1 = {0.6 ∴ P20 (5) = 20C5 ( 0.6 ) ( 0.9999 ∴ Probability of at least bit-error in transmission of 8 bits=1-P8 (0) = 0. Approximations of the Binomial probabilities Two interesting approximations of the binomial probabilities are very important. What is the probability of getting at least one error bit in a message of 8 bits? Here we can consider the sample space S1 = {error in transmission of 1 bit } ∪ {no error in transmission in transmission of 1 bit} with p={error in transmission of 1 bit }=10-5 ∴ Probability of no bit-error in transmission of 8 bits =P8 (0) = (1 − 10−5 )8 = 0.0013 5 15 Example 3 In a binary communication system. P( A) = p = 0.

Pn (k ) = nCk p k (1 − p) n − k ⎛λ⎞ = nC k ⎜ ⎟ ⎝n⎠ k ⎛ λ⎞ ⎜1 − ⎟ n⎠ ⎝ n−k ⎛ λ⎞ λ ⎜1 − ⎟ n(n − 1)..... We shall discuss more about this distribution in a later class... ∴ pn (k ) = λ k e− λ ... ⎝ k k λ⎞ k ⎛ n ⎜1 − ⎟ n⎠ ⎝ k k n ⎛ k −1⎞ k ⎛ λ ⎞ n (1 − 1/ n )(1 − 2 / n ) .. ⎜1 − ⎟ λ ⎜1 − ⎟ n ⎠ ⎝ n⎠ ⎝ = k k ⎛ λ⎞ n k ⎜1 − ⎟ n⎠ ⎝ ⎛ λ⎞ ⎛ λ⎞ lim ⎜ 1 − ⎟ = 0 and lim ⎜ 1 − ⎟ = e − λ n →∞ n →∞ ⎝ n⎠ ⎝ n⎠ k n n k This distribution is known as Poisson probability and widely used in engineering and other fields.(n − k + 1) n⎠ = .

0128 . pn (k ) may be approximated as ( k − np )2 The right hand side is an expression for normal distribution to be discussed in a later class.6×0. ( 5 −3)2 2 n×0.6 × 0.4 e − = 0.Case 2 When n is sufficiently large and np (1 − p ) − 1 pn (k ) ≈ e 2 np(1− p ) 2π np (1 − p ) 1 . Example: Consider the problem in example 2.4 Here p20 (5) ≈ 1 2π × 20 × 0.

.

Random Variables In application of probabilities. Example Suppose S = {H . . If A ⊆ S . Thus f ( A) = { f ( s ) | s ∈ A} Clearly f ( A) ⊆ R f Suppose Β ⊆ . The set {x | f ( x) ∈ Β} is called the inverse image of Β under f and is denoted by f −1 (Β). These random quantities may be considered as real-valued function on the sample space. a unique element f ( s ) ∈ . the functional value f ( s ) ∈ is called the image of the point s. Therefore. Mathematical Preliminaries Real-valued point function on a set Recall that a real-valued function f : S → maps each element s ∈ S . then the set of the images of elements of A is called the image of A and denoted by f ( A). we are often concerned with numerical values which are random in nature. The set S is called the domain of f and the set R f = { f ( x ) | x ∈ S } is called the range of f . Such a real-valued function is called real random variable and plays an important role in describing random data. We shall introduce the concept of random variables in the following sections. f ( s1 ) s1 f ( s2 s3 s2 f ( s3 ) s4 f ( s1 Range of f Domain of f Image and Inverse image For a point s ∈ S . T } and f : S → is defined by f (H ) = 1 and f (T ) = −1. • R f = {1. −1} ⊆ . Clearly R f ⊆ . The range and domain of f are shown in Fig.

If S is a discrete sample space. F. • Values of a random variable are denoted by lower case letters • X ( s ) = x means that x is the value of a random variable X at the sample point s. 9]. RX = { X ( s ) | s ∈ S}. this requirement is .• • Image of H is 1 and that of T is -1.5]. • Usually. X −1 A = X ( B) −1 B S Figure Random Variable (To be animated) Remark • S is the domain of X . Such a definition will be valid if ( X −1 ( B)) is a valid event. the argument s is omitted and we simply write X = x. T }. is given by Notations: • Random variables are represented by upper-case letters. For another subset B2 = [5. Thus. denoted by R X . Clearly R X ⊆ • . For a subset of say Β1 = (−∞. f −1 (Β1 ) = {s | f ( s ) ∈ Β1 } = {H . The above definition of the random variable requires that the mapping X is such that X −1 (Β) is a valid event in S . If S is a discrete sample space. The function X : S → called a random variable if the inverse image of all Borel sets under X is an event. if X is a random variable. then X −1 (Β) = {s | X ( s ) ∈ Β} ∈ F. Let us define the probability of a subset B ⊆ PX ({B}) = P( X −1 ( B)) = P({s | X ( s ) ∈ B}). P ) and function X : S → mapping the sample by space S into the real line. Random variable A random variable associates the points in the sample space with real numbers. • The range of X . We also need the Borel sigma algebra B the sigma algebra defined on the real line. ( X −1 ( B)) is always a valid event. but the same may not be true if S is infinite. 1. The concept of sigma algebra is again necessary to overcome this difficulty. f −1 ( B2 ) = φ . Consider the probability space ( S .

then X = {1.TT} and all four outcomes are equally likely. Value of the random Variable X = x 0 1 2 3 Example 2: Consider the sample space associated with the single toss of a fair die.6} If we define the random variable X that associates a real number equal to the number in the face of the die. 2. Therefore.6} Probability Space induced by a Random Variable The random variable X induces a probability measure PX on B defined by PX ({B}) = P( X −1 ( B)) = P({s | X ( s ) ∈ B}) The probability measure PX satisfies the three axioms of probability: . 4..5. X −1 ( B2 ). are disjoint Borel sets. The sample space is given by S = {1.HT. Thus any mapping defined on the discrete sample space is a random variable.3}.TH.. are distinct events in F.5.. PX ( ∪ Bi ) = P ( ∪ X −1 ( Bi ) i =1 i =1 ∞ ∞ = ∑ P ( X −1 ( Bi )) i =1 ∞ ∞ = ∑ PX ( Bi ) i =1 ..3.1.. 2. 4. Then X −1 ( B1 ).met by any mapping X : S → . The psample space is S={ HH. Then we can define a random variable X as follows Sample Point HH HT TH TT Here RX = {0. Axiom 2 PX ( ) = P( X −1 ( )) = P( S ) = 1 Axiom 3 Suppose B1 ... Axiom 1 PX ( B) = P( X −1 ( B)) ≤ 1 . B2 . Example 1: Consider the example of tossing a fair coin twice.3.. 2.

s ∈ S}) is called the probability distribution function ( also called the cumulative distribution function abbreviated as CDF) of X and denoted by FX ( x). { X > x} = { X ≤ x}c . x] where x represents any real number. x]. The equivalent event X −1 ((−∞. Consider the Borel set (−∞. Thus (−∞. FX ( x) = P({ X ≤ x}) Value of the random variable FX ( x) Random variable Eaxmple 3 Consider the random variable X in Example 1 We have P ({ X = x}) Value of the random Variable X = x 1 0 1 2 3 4 1 4 1 4 1 4 . Any other event can be represented in terms of this event. s ∈ S} is denoted as { X ≤ x}. x]) = {s | X ( s) ≤ x.{x1 < X ≤ x2 } = { X ≤ x2 } \{ X ≤ x1}. ∞ ⎛ 1 ⎞ { X = x} = ∩ ⎜ { X ≤ x} \{ X ≤ x − }⎟ n =1 ⎝ n ⎠ and so on. The event { X ≤ x} can be taken as a representative event in studying the probability description of a random variable X .Thus the random variable X induces a probability space ( S . B. For example. The underlying sample space is omitted in notation and we simply write { X ∈ B} and P ({ X ∈ B}) in stead of {s | X ( s ) ∈ B} and P({s | X ( s ) ∈ B}) respectively. PX ) Probability Distribution Function We have seen that the event B and {s | X ( s ) ∈ B} are equivalent and PX ({B}) = P({s | X ( s ) ∈ B}). The probability P({ X ≤ x}) = P({s | X ( s) ≤ x.

FX ( x) = P({ X ≤ x}) = P({ X = 0}) = 1 4 For 1 ≤ x < 2. FX ( x) = P({ X ≤ x}) = P({ X = 0} ∪ { X = 1} ∪ { X = 2}) = P({ X = 0}) + P ({ X = 1}) + P ({ X = 2}) 1 1 1 3 = + + = 4 4 4 4 For x ≥ 3.For x < 0. FX ( x) = P({ X ≤ x}) = 0 For 0 ≤ x < 1. FX ( x) = P({ X ≤ x}) = P({ X = 0} ∪ { X = 1}) = P({ X = 0}) + P ({ X = 1}) 1 1 1 = + = 4 4 2 For 2 ≤ x < 3. FX ( x) = P({ X ≤ x}) = P( S ) =1 Properties of Distribution Function .

x1 < x2 . lim FX ( x + h) = lim P{ X ( s ) ≤ x + h} h →0 h >0 h →0 h>0 = P{ X ( s ) ≤ x} =FX ( x) • Thefunction f ( x) is said to be right-continuous at a point a if and only if f (a) is defined (iii) x→a + lim f ( x) = f (a) FX (−∞) = 0 Because. FX (∞) = P{s | X ( s) ≤ ∞} = P( S ) = 1 • P({x1 < X ≤ x2 }) = FX ( x2 ) − FX ( x1 ) We have { X ≤ x2 } = { X ≤ x1} ∪ {x1 < X ≤ x2 } ∴ P ({ X ≤ x2 }) = P({ X ≤ x1}) + P ({x1 < X ≤ x2 }) ⇒ P ({x1 < X ≤ x2 }) = P ({ X ≤ x2 }) − P ({ X ≤ x1}) = FX ( x2 ) − FX ( x1 ) • FX ( x − )=FX ( x) − P( X = x) h →0 h >0 FX ( x − ) = lim FX ( x − h) = lim P{ X ( s) ≤ x − h} h →0 h >0 = P{ X ( s ) ≤ x} − P( X ( s ) = x) =FX ( x) − P( X = x) . X. • FX (x) is a non-decreasing function of Thus. then FX ( x1 ) < FX ( x2 ) x1 < x2 ⇒ { X ( s) ≤ x1 } ⊆ { X ( s) ≤ x2 } ⇒ P{ X ( s) ≤ x1 } ≤ P{ X ( s) ≤ x2 } ∴ FX ( x1 ) < FX ( x2 ) • A real function f ( x) is said to be continuous at a point a if and only if f (a) is defined (i) (ii) lim f ( x) = lim f ( x ) = f (a) x→a + x→a − FX (x) is right continuous h →0 h>0 FX ( x + ) = lim FX ( x + h) = FX ( x) Because. FX (−∞) = P{s | X ( s) ≤ −∞} = P(φ ) = 0 • FX (∞) = 1 Because.• 0 ≤ FX ( x) ≤ 1 This follows from the fact that FX (x) is a probability and its value should lie between 0 and 1.

-∞ < x < ∞. − 2 ≤ x < 0 8 4 = 1. x≥0 Find a) P(X = 0) b) P { X ≤ 0} c) P { X > 2} d) P {−1 < X ≤ 1} Solution: a) P ( X = 0) = FX (0+ ) − FX (0− ) 1 3 = 4 4 b) P { X ≤ 0} = FX (0) = 1− =1 c) P { X > 2} = 1 − FX (2) = 1−1 = 0 d) P {−1 < X ≤ 1} = FX (1) − FX (−1) 1 7 = 1− = 8 8 FX ( x) 1 x→ . Thus FX ( x) ∀x ∈ X is a complete description of the random variable X . we can determine the probability of any event involving values of the random variable X . = x < −2 1 1 x + x.We can further establish the following results on probability of events on the real line: P{x1 ≤ X ≤ x2 } = FX ( x2 ) − FX ( x1 ) + P ( X = x1 ) P ({x1 ≤ X < x2 }) = FX ( x2 ) − FX ( x1 ) + P ( X = x1 ) − P ( X = x2 ) P ({ X > x}) = P ({x < X < ∞}) = 1 − FX ( x) Thus we have seen that given FX ( x). Example 4 Consider the random variable X defined by FX ( x) = 0.

the conditional probability P ( A / B ) was defined as P ( A / B) = P ( A ∩ B) P ( B) Clearly. Then FX ( x / B ) = = = Case 1: x<b P ⎡{ X ≤ x} ∩ B ⎤ ⎣ ⎦ P ( B) P ⎡{ X ≤ x} ∩ { X ≤ b}⎤ ⎣ ⎦ P { X ≤ b} FX ( b ) P ⎡{ X ≤ x} ∩ { X ≤ b}⎤ ⎣ ⎦ Then . Defined B = { X ≤ b} . In a similar manner. the conditional probability can be defined on events involving a random variable X. For two events A and B with P ( B ) ≠ 0 .Conditional Distribution and Density function: We discussed conditional probability in an earlier class. The conditional distribution function of X given B is defined as FX ( x / B ) = P ⎡{ X ≤ x} / B ⎤ ⎣ ⎦ = P ⎡{ X ≤ x} ∩ B ⎤ ⎣ ⎦ P ( B) P ( B) ≠ 0 We can verify that FX ( x / B ) satisfies all the properties of the distribution function. Consider the event { X ≤ x} and any event B involving the random variable X. we can define the conditional density function f X ( x / B ) of the random variable X given the event B as fX ( x / B) = d FX ( x / B ) dx Example 1: Suppose X is a random variable with the distribution function FX ( x ) .

FX ( x / B ) = = P ⎡{ X ≤ x} ∩ { X ≤ b}⎤ ⎣ ⎦ FX ( b ) P { X ≤ x} FX ( x ) = FX ( b ) FX ( b ) d FX ( x ) f X ( x ) = dx FX ( b ) f X ( b ) And f X ( x / B ) = Case 2: x ≥ b FX ( x / B ) = = P ⎡{ X ≤ x} ∩ { x ≤ b}⎤ ⎣ ⎦ FX ( x ) P { X ≤ b} FX ( b ) = FX ( x ) FX ( x ) and f X ( x / B ) = d d FX ( b ) FX ( x / B ) = =0 dx dx FX ( x ) FX ( x / B ) and f X ( x / B ) are plotted in the following figures. i =1 Then FX ( x ) = ∑ P ( Bi )FX ( x / Bi ) i =1 n n ∴ P ( B / { X ≤ x} ) = = P ⎡ B ∪ { X ≤ x}⎤ ⎣ ⎦ FX ( x ) P ⎡ B ∪ { X ≤ x}⎤ ⎣ ⎦ ∑ P(B ) F (x / B ) i =1 i X i n . Suppose the interval { X ≤ x} is portioned into non overlapping subsets such that { X ≤ x} = ∪ Bi . Remark: We can define the Bayo rule in a similar manner.

Suppose SD denotes the countable subset of points on SX such that the RV X is characterized by the probability mass function p X ( x ) . then P ( SC ) = 1 − p . A typical plot of the distribution functions of a mixed type random variable as shown in Figure. Similarly let SC be a continuous subset of points on SX such that RV is characterized by the probability density function f X ( x ) . If P ( S D ) . Thus for a mixed type random variable X. x ∈ S D . but is not of stair case type as the in the case of discrete random variable.Mixed Type Random variable: A random variable X is said to be mixed type if its distribution function FX ( x ) has discontinuous finite number of points and increases strictly with respect to over at least one interval of values of the random variable X. Clearly FD ( x ) is a staircase of function and FC ( x ) is a continuous function. x ∈ SC . Clearly the subsets SD and SC partition the set SX. Thus the probability of the event { X ≤ x} can be expressed as P { X ≤ x} = P ( S D ) P ({ X ≤ x} | S D ) + P ( SC ) P ({ X ≤ x} | SC ) = pFD ( x ) + (1 − p ) FC ( x ) Where FD ( x ) is the conditional distribution function of X given X is discrete and FC ( x ) is the conditional distribution function given that X is continuous. FX ( x ) has discontinuous. .

Solution: The distribution function FX ( x ) is as shown in the figure.Also note that. p = P ( SD ) = x∈ X D ∑ pX ( x ) Example: Consider the distribution function of a random variable X given by FX ( x ) = 0. = x<0 1 1 0≤ x<4 + x 4 16 3 1 = + ( x − 4) 4 ≤ x ≤ 8 4 16 =1 x>8 Express FX ( x ) as the distribution function of a mixed type random variable. Clearly FX ( x ) has jumps at x = 0 and x = 4 . ∴ p = pX ( 0 ) + pX ( 4 ) = 1 1 1 + = 4 4 2 And FD ( x ) = p X ( 0 ) u ( x ) + p X ( 4 ) u ( x − 4 ) 1 1 = u ( x ) + u ( x − 4) 2 2 .

⎧0 ⎪ FC ( x ) = ⎨ x ⎪1 ⎩ ∞<x<0 0≤ x≤8 x >8 Example 2: X is the RV representing the life time of a device with the PDF f X ( x ) for x>0. a ) p = P { y ∈ D} = P { X > a} = 1 − FX ( a ) SD = a ∴ FX ( x ) = pFD ( x ) + (1 − p ) FC ( x ) . Define the following random variable y=X =a if if X ≤a X >a SC = ( 0.

X is called a continuous random variable if FX (x) is an absolutely continuous • function of x. FX ( x) = pFX d ( x) + (1 − p) FX c ( x) where FX d ( x ) is the distribution function of a discrete RV and FX c ( x ) is the distribution function of a continuous RV. FX ( x) 1 x→ Plot FX ( x) vs. Typical plots of FX ( x ) for discrete. • X is called a mixed random variable if FX (x) has jump discontinuity at countable number of points and it increases continuously at least at one interval of values of x. Thus FX (x) is flat except at the points of jump discontinuity. Thus FX (x) is continuous everywhere on and FX ′ ( x) exists everywhere except at finite or countably infinite points . For a such type RV X. x for a discrete random variable ( to be animated) FX ( x) 1 x→ .Discrete. If the sample space S is discrete the random variable X defined on it is always discrete. continuous and mixed-random variables are shown in Fig below. Continuous and Mixed-type random variables • A random variable X is called a discrete random variable FX (x) is piecewise constant.

N .... x3 .. Let x1 .. Then . N . x2 . Assume RX to be countably finite. i = 1. i = 1. The discrete random variable in this case is completely specified by the probability mass function (pmf) p X ( xi ) = P ( s | X ( s ) = xi )..... x for a continuous random variable ( to be animated) FX ( x) 1 x→ Plot FX ( x) vs.. 2. x for a mixed-type random variable ( to be animated) Discrete Random Variables and Probability mass functions A random variable is said to be discrete if the number of elements in the range of RX is finite or countably infinite. Clearly. xN be the elements in RX . Examples 1 and 2 are discrete random variables. Here the mapping X ( s ) partitions S into N subsets {s | X ( s) = xi } . 2... • • • i∈RX p X ( xi ) ≥ 0 ∀xi ∈ RX and ∑ p X ( xi ) = 1 Suppose D ⊆ RX ..FX ( x) 1 x→ Plot FX ( x) vs.

.P({x ∈ D}) = xi ∈D ∑p X ( xi ) X ( s1 ) s1 s3 X (s2 ) s2 X ( s3 ) X (s4 ) s4 Figure Discrete Random Variable (To be animated) Example Consider the random variable X with the distribution function ⎧0 ⎪1 ⎪ ⎪4 FX ( x) = ⎨ ⎪1 ⎪2 ⎪1 ⎩ x<0 0 ≤ x <1 1≤ x < 2 x≥2 The plot of the FX ( x) is shown in Fig.

Continous Random Variables and Probability Density Functions For a continuous random variable X . FX (x) is continuous everywhere.FX ( x) 1 1 2 1 4 0 1 2 x The probability mass function of the random variable is given by Value of the random Variable X = x 0 1 2 p X ( x) 1 4 1 4 1 2 We shall describe about some useful discrete probability mass functions in a later class. This implies that p X ( x) = P({ X = x}) = FX ( x) − FX ( x − ) = 0 . Therefore. FX ( x) = FX ( x − ) ∀x ∈ .

Properties of the Probability Density Function • f X ( x) ≥ 0. the probability density function ( pdf) of X . This follows from the fact that FX ( x) is a non-decreasing function • FX ( x) = ∞ −∞ ∫ x f X (u )du • −∞ ∫f X ( x)dx = 1 . If FX (x) is differentiable. In that sense. A continuous random variable has a very important chacterisation in terms of a function called the probability density function. the probability mass function of a continuous RV X is zero for all x. f X ( x) represents the concentration of probability just as the density represents the concentration of mass. f X ( x). Thus the probability of X lying in the some interval ( x. us defined as f X ( x) = d FX ( x ) dx denoted by Interpretation of f X ( x) f X ( x) = d FX ( x) dx F ( x + ∆x) − FX ( x) = lim X ∆x → 0 ∆x P ({x < X ≤ x + ∆x}) = lim ∆x → 0 ∆x so that P ({x < X ≤ x + ∆x}) f X ( x)∆x. A continuous random variable cannot be characterized by the probability mass function.Therefore. x + ∆x] is determined by f X ( x).

The distribution function FX ( x) can be written as FX ( x) = ∑ p X ( xi )u ( x − xi ) i =1 N where u ( x − xi ) shifted unit-step function given by ⎧1 for x ≥ xi u ( x − xi ) = ⎨ ⎩0 otherwise Then the density function f X ( x) can be written in terms of the Dirac delta function as . 2...• P ( x1 < X ≤ x 2 ) = f X ( x) x2 − x1 ∫f X ( x)dx x0 x0 + ∆x0 x Fig. a > 0 x ≥ 0 The pdf of the RV is given by ⎧0 f X ( x) = ⎨ − ax ⎩e . a > 0 x<0 x≥0 Remark: Using the Dirac delta function we can define the density function for a discrete random variables.. i = 1.. N .. P ({x0 < X ≤ x0 + ∆x0 }) f X ( x0 )∆x0 Example Consider the random variable X with the distribution function x<0 ⎧0 FX ( x) = ⎨ − ax ⎩1 − e . Consider the random variable X defined by the probability mass function (pmf) p X ( xi ) = P( s | X ( s ) = xi ).

The distribution function FX ( x) can be written as 1 1 1 FX ( x) = u ( x) + u ( x − 1) + u ( x − 2) 4 4 2 and 1 1 1 f X ( x) = δ ( x) + δ ( x − 1) + δ ( x − 2) 4 4 2 Probability Density Function of Mixed-type Random Variable Suppose X is a mixed-type random variable with FX ( x) having jump discontinuity at X = xi .1 ⎪ FX ( x) = ⎨ ⎪0.... the CDF of a mixed-type random variable X is given by FX ( x) = pFX d ( x) + (1 − p) FX c ( x) where FX d ( x) is the distribution function of a discrete RV and FX c ( x) is the distribution function of a discrete RV.1 + 0. As already satated.f X ( x) = ∑ p X ( xi )δ ( x − xi ) i =1 n Example Consider the random variable defined in Example 1 and Example 3. n.8 x ⎪1 ⎩ x<0 x=0 0 < x <1 x >1 FX ( x) 1 The plot of FX ( x) is shown in Fig. Therefore. 0 1 x→ . i = 1. 2. f X ( x) = pf X d ( x) + (1 − p) f X c ( x) where f X d ( x) = ∑ p X ( xi )δ ( x − xi ) i =1 n Example Consider the random variable X with the distribution function ⎧0 ⎪0.

2 FX d ( x) + 0.FX ( x) can be expressed as FX ( x) = 0.5δ ( x) + 0.2 f X d ( x) + 0.5 ⎪1 ⎩ x<0 0 ≤ x ≤1 x >1 and ⎧0 ⎪ FX c ( x) = ⎨ x ⎪1 ⎩ x<0 0 ≤ x ≤1 x >1 The pdf is given by f X ( x) = 0.8 f X c ( x) where f X d ( x) = 0.8 FX c ( x) where ⎧0 ⎪ FX d ( x) = ⎨0.5δ ( x − 1) and 0 ≤ x ≤1 ⎧1 f X c ( x) = ⎨ elsewhere ⎩0 f X ( x) 1 0 1 x .

) is function . For example. suppose X represents the random voltage input to a full-wave rectifier. Then the rectifier output Y is given by Y = X . Then Y = g ( X ) is a random variable. We are interested to find the pdf of Y .Functions of Random Variables Often we have to consider random variables which are functions of other random variables. We consider the following cases: (a) X is a discrete random variable with probability mass function p X ( x) The probability mass function of Y is given by pY ( y ) = P ({Y = y}) = P ({ x | g ( x) = y}) = = { x| g ( x ) = y} { x| g ( x ) = y} ∑ ∑ P ({ X = x}) p X ( x) (b) X is a continuous random variable with probability density function y = g ( x ) is one-to-one and monotonically increasing The probability distribution function of Y is given by f X ( x) and . Let X be a random variable and g (. We have to find the probability description of the random variable Y .

FY ( y ) = P {Y ≤ y} = P { g ( X ) ≤ y} = P { X ≤ g −1 ( y )} = P({ X ≤ x}) ⎤ x = g − ( y ) ⎦ = FX ( x) ]x = g −1 ( y ) fY ( y ) = = = dFY ( y ) dy dFX ( x) ⎤ dy ⎥ x = g −1 ( y ) ⎦ dFX ( x) dx ⎤ dx dy ⎥ x = g −1 ( y ) ⎦ dFX ( x) ⎤ ⎥ = dx ⎥ dy ⎥ dx ⎥ x = g −1 ( y ) ⎦ = f X ( x) ⎤ g ′( x) ⎥ x = g −1 ( y ) ⎦ fY ( y ) = f X ( x) f X ( x) ⎤ = dy g ′( x) ⎥ x = g −1 ( y ) ⎦ dx This is illustrated in Fig. .

y −b dy =a Then x = and a dx y −b fX ( ) f X ( x) a ∴ fY ( y ) = = dy a dx Example 2: Probability density function of the distribution function of a random variable Suppose the distribution function FX ( x) of a continuous random variable X is monotonically increasing and one-to-one and define the random variable Y = FX ( X ). . fY ( y ) = 1 0 ≤ y ≤ 1. a > 0. Then.Example 1: Probability density function of a linear function of random variable Suppose Y = aX + b. y = FX ( x) Clearly 0 ≤ y ≤ 1 dy dFX ( x) = = f X ( x) dx dx f ( x) f X ( x) ∴ fY ( y ) = X = =1 dy f X ( x) dx ∴ fY ( y ) = 1 0 ≤ y ≤ 1.

. dy Y = FX ( X ) y x = F X− 1 ( y ) X (c) X is a continuous random variable with probability density function y = g ( x) has multiple solutions for x Suppose for y ∈ Y .the random variable defined by the distribution function of any random variable is uniformly distributed over [0.. FY ( y ) =P(Y ≤ y ) = P ( FX ( x) ≤ y ) = P ( X ≤ FX−1 ( y )) = FX ( FX−1 ( y )) = y ( Assigning FX−1 ( y ) to the left-most point of the interval for which FX ( x) = y )...3.1]. For example. n .. if X is a discrete RV. i = 1... dF ( y ) ∴ fY ( y ) = Y = 1 0 ≤ y ≤ 1. the result is more general.1]. y = g ( x) has solutions xi . (2) The above result is particularly important in simulating a random variable with a particular distribution function... We assumed FX ( x ) to be one-to-one function for invariability. However.Remark (1) The distribution given by fY ( y ) = 1 0 ≤ y ≤ 1 is called a uniform distribution over the interval [0...... Then f X ( x) and . 2.

{ x2 − dx2 < X ≤ x2 } and { x3 < X ≤ x3 + dx3} ∴ P { y < Y ≤ y + dy} = P { x1 < X ≤ x1 + dx1} + P { x2 − dx2 < X ≤ x2 } + P { x3 < X ≤ x3 + dx3 } ∴ fY ( y )dy = f X ( x1 )dx1 + f X ( x2 )(−dx2 ) + f X ( x3 )dx3 Where the negative sign in −dx2 is used to account for positive probability. dividing by dy and taking the limit. Therefore.fY ( y ) = ∑ i =1 n f X ( x) dy dx x = xi Proof: Consider the plot of Y = g ( X ) . we have three distinct roots as shown. we get . Suppose at a point y = g ( x) . This event will be equivalent to union events { x1 < X ≤ x1 + dx1} . Consider the event { y < Y ≤ y + dy} .

then fY ( y ) = ∑ i =1 n f X ( xi ) dy dx x = xi Example 3: Probability density function of a linear function of random variable Suppose Y = aX + b. if y = g ( x) has n roots. a ≠ 0. y −b dy Then x = and =a a dx y −b fX ( ) f X ( x) a ∴ fY ( y ) = = dy a dx Example 4: Probability density function of the output of a full-wave rectifier Suppose Y = X . − a ≤ X ≤ a. In general.⎛ dx ⎞ ⎛ dx ⎞ ⎛ dx ⎞ fY ( y ) = f X ( x1 ) ⎜ 1 ⎟ + f X ( x2 ) ⎜ − 2 ⎟ + f X ( x3 ) ⎜ 3 ⎟ ⎝ dy ⎠ ⎝ dy ⎠ ⎝ dy ⎠ = f X ( x1 ) =∑ i =1 3 dx dx1 dx + f X ( x2 ) 2 + f X ( x3 ) 3 dy dy dy f X ( xi ) dy dx x = xi In the above we assumed y = g ( x) to have three roots. a>0 .

C ≥ 0 ∴ y = cx 2 => x = ± y c y≥0 And dy dy = 2cx so that = 2c y / c = 2 cy dx dx ∴ fY ( y ) = fX ( y / c + fX 2 cy ) ( −y /c ) y>0 = 0 otherwise . dx ∴ fY ( y ) = 1 1 = f X ( x) + f X (− x) + Example 5: Probability density function of the output a square-law device Y = CX 2 .Y y −y y X y = x has two solutions x1 = y and x2 = − y and f X ( x) ]x = y f X ( x)]x =− y dy = 1 at each solution point.

N . i = 1. 2.... Note that. −∞ −∞ ∞ ∞ EX is also called the mean or statistical average of the random variable X and denoted by µ X . . It is far easier to estimate these parameters from data than to estimate the distribution or density function of the random variable. Observe that µ X = ∫ xf X ( x)dx −∞ ∞ −∞ ∞ ∞ = ∫ xf X ( x)dx ∫ f X ( x)dx ∵ ∫ f X ( x)dx = 1 −∞ ∞ −∞ Therefore... N . the pdf f X ( x) is given by f X ( x) = ∑ p X ( xi )δ ( x − xi ) i =1 N ∴ µ X = EX = ∫ x ∑ p X ( xi )δ ( x − xi )dx −∞ i =1 ∞ N = ∑ p X ( xi ) ∫ xδ ( x − xi )dx i =1 N −∞ N ∞ = ∑ xi p X ( xi ) i =1 Thus for a discrete random variable X with p X ( xi ). for a discrete RV X defined by the probability mass function (pmf) p X ( xi ).. The values of the random variable are spread about this value.. i = 1.Expected Value of a Random Variable • • The expectation operation extracts a few parameters of a random variable and provides a summary description of the random variable in terms of these parameters. the mean can be also interpreted as the centre of gravity of the pdf curve. 2. Expected value or mean of a random variable The expected value of a random variable X is defined by EX = ∫ xf X ( x)dx provided ∫ xf X ( x)dx exists. µ X = ∑ xi p X ( xi ) i =1 N • • Interpretation of the mean The mean gives an idea about the average value of the random value....

. Thus the mean of a RV −∞ ∞ with an even symmetric pdf is 0.Fig. then ∫ xf X ( x)dx = 0. Mean of a random variable Example 1 Suppose X is a random variable defined by the pdf ⎧ 1 ⎪ f X ( x) = ⎨ b − a ⎪ 0 ⎩ Then EX = ∫ xf X ( x)dx −∞ ∞ a≤ x≤b otherwise 1 dx a b−a a+b = 2 =∫x b Example 2 Consider the random variable X with pmf as tabulated below Value of the random variable x p X ( x) 0 1 8 1 1 8 2 1 4 3 1 2 ∴ µ X = ∑ xi p X ( xi ) i =1 N =0 × = 17 8 1 1 1 1 + 1× + 2 × + 3 × 8 8 4 2 Remark If f X ( x) is an even function of x.

In this case. Substituting x = g −1 ( y ) so that y = g ( x) and dy = g ′( x)dx. Then.Expected value of a function of a random variable Suppose Y = g ( X ) is a function of a random variable X as discussed in the last class. E[c1 g1 ( X ) + c2 g 2 ( X )]= c1 Eg1 ( X ) + c2 Eg 2 ( X ) X and E[c1 g1 ( X ) + c2 g 2 ( X )] = ∫ c1 [ g1 ( x) + c2 g 2 ( x)] f X ( x)dx −∞ ∞ = ∫ c1 g1 ( x) f X ( x) dx + ∫ c2 g 2 ( x) f X ( x)dx −∞ −∞ ∞ ∞ = c1 ∫ g1 ( x) f X ( x)dx + c2 ∫ g 2 ( x) f X ( x)dx −∞ −∞ ∞ ∞ = c1 Eg1 ( X ) + c2 Eg 2 ( X ) . we get EY = ∫ g ( x) f X ( x)dx −∞ ∞ x The following important properties of the expectation operation can be immediately derived: (a) If c is a constant. Ec = c Clearly Ec = ∫ cf X ( x)dx = c ∫ f X ( x)dx = c −∞ −∞ ∞ ∞ (b) If g1 ( X ) and g 2 ( X ) are two functions of the random variable c1 and c2 are constants. EY = Eg ( X ) = ∫ g ( x) f X ( x)dx −∞ ∞ We shall illustrate the theorem in the special case g ( X ) when y = g ( x ) is one-to-one and monotonically increasing function of x. fY ( y ) = ∞ f X ( x) ⎤ g ′( x) ⎥ x = g −1 ( y ) ⎦ g ( x) EY = ∫ yfY ( y )dy −∞ y2 =∫ y y1 y2 f X ( g ( y )) dy g ′( g −1 ( y ) y1 −1 where y1 = g (−∞) and y2 = g (∞).

. N .The above property means that E is a linear operator. 2 σ X = E ( X − µ X )2 = ∫ (x − a b a+b 2 1 ) dx 2 b−a 2 1 b 2 a+bb ⎛a+b⎞ b = [ ∫ x dx − 2 × ∫ xdx + ⎜ ⎟ ∫ dx 2 a b−a a ⎝ 2 ⎠ a = (b − a) 2 12 Example 4 Find the variance of the random variable discussed in Example 2.. the variance of X is 2 denoted by σ X and defined as 2 σ X = E ( X − µ X )2 = ∫ ( x − µ X ) 2 f X ( x)dx −∞ ∞ Thus for a discrete random variable X with 2 σ X = ∑ ( xi − µ X ) 2 p X ( xi ) i =1 N p X ( xi )... 2.. Mean-square value EX 2 = ∫ x 2 f X ( x)dx −∞ ∞ Variance For a random variable X with the pdf f X ( x) and men µ X . As already computed 17 µX = 8 2 σ X = E ( X − µ X )2 = (0 − = 117 128 17 2 1 17 1 17 1 17 1 ) × + (1 − ) 2 × + (2 − ) 2 × + (3 − ) 2 × 8 8 8 8 8 4 8 2 . The standard deviation of X is defined as σ X = E ( X − µ X )2 Example 3 Find the variance of the random variable discussed in Example 1. i = 1.

Remark • Variance is a central moment and measure of dispersion of the random variable about the mean. 3 pX (x) 1 2 1 4 x p X1 ( x ) -1 1 4 0 1 2 1 1 2 -1 0 1 x pX (x) 3 x p X 2 ( x) -2 1 6 -1 1 6 0 1 3 1 1 6 2 1 6 2 1 6 -1 0 1 1 x Fig. A smaller σ X implies that the random values are more clustered 2 about the mean. It • gives information about the deviation of the values of the RV about the 2 mean. Similarly. For example. σ X1 = and 2 5 2 σ X 2 = implying that X 2 has more spread about the mean. consider two random variables X 1 and X 2 with pmf as 1 2 shown below. a bigger σ X means that the random values are more scattered. E ( X − µ X ) 2 is the average of the square deviation from the mean. shows the pdfs of two continuous random variables with same mean but different variances . Note that each of X 1 and X 2 has zero means.

But it is more difficult both for analysis and numerical calculations.• We could have used the mean absolute deviation E X − µ X for the same purpose. Properties of variance 2 2 (1) σ X = EX 2 − µ X .

then σ Y = c 2σ X 2 σ Y = E (cX + b − c µ X − b) 2 = Ec 2 ( X − µ X ) 2 2 = c 2σ X (3) If c is a constant. var(c ) = 0. then the random variable has a higher kurtosis.. • The fourth central moment measures flatness of peakednes of the pdf of a random variable. where c and b are constants.. E ( X − µ X )4 4 σX is called kurtosis. nth moment of a random variable We can define the nth moment and the nth central-moment of a random variable X by the following relations nth-order moment EX n = ∫ x n f X ( x ) dx n = 1. 2. . If the peak of the pdf is sharper.. 2... −∞ ∞ nth-order central moment E ( X − µ X ) n = ∫ ( x − µ X ) n f X ( x ) dx n = 1.2 σ X = E ( X − µ X )2 2 = E ( X 2 − 2µ X X + µ X ) 2 = EX 2 − 2 µ X EX + E µ X 2 2 = EX 2 − 2 µ X + µ X 2 = EX 2 − µ X 2 2 (2) If Y = cX + b. −∞ ∞ Note that • The mean µ X = EX is the first moment and the mean-square value EX 2 is the second moment • • 2 The first central moment is 0 and the variance σ X = E ( X − µ X ) 2 is the second central moment The third central moment measures lack of symmetry of the pdf of a random variable. E ( X − µ X )3 3 σX is called the coefficient of skewness and If the pdf is symmetric this coefficient will be zero.

Following inequalities are extremely useful in many practical problems. . The quality control department rejects the item if the absolute deviation of X from µ X is greater than 2σ X . What fraction of the manufacturing item does the quality control department reject? Can you roughly guess it? The standard deviation gives us an intuitive idea how the random variable is distributed about the mean. Chebysev Inequality Suppose X is a parameter of a manufactured item with known mean 2 µ X and variance σ X . This idea is more precisely expressed in the remarkable Chebysev Inequality stated below. 2 µ X and variance σ X For a random variable X with mean P{ X − µ X ≥ ε } ≤ Proof: ε2 2 σX 2 σ x = ∫ ( x − µ X ) 2 f X ( x)dx −∞ ∞ ≥ ≥ X − µ X ≥ε ∫ ( x − µ X ) 2 f X ( x)dx X − µ X ≥ε ∫ ε 2 f X ( x) dx = ε 2 P{ X − µ X ≥ ε } ∴ P{ X − µ X ≥ ε } ≤ 2 σX ε2 Markov Inequality For a random variable X which take only nonnegative values P{ X ≥ a} ≤ E( X ) a where a > 0.Inequalities based on expectations The mean and variance also give some quantitative information about the bounds of RVs.

Find an upper bound of the probability P ( X ≥ 3}). 3 3 1 Hence the required upper bound = .E ( X ) = ∫ xf X ( x)dx 0 ∞ ≥ ∫ xf X ( x)dx a ∞ ≥ ∫ af X ( x)dx a ∞ = aP{ X ≥ a} ∴ P{ X ≥ a} ≤ E( X ) a E ( X − k )2 a Remark: P{( X − k ) 2 ≥ a} ≤ Example Example A nonnegative RV X has the mean µ X = 1. 3 . By Markov’s inequality P ( X ≥ 3}) ≤ E( X ) 1 = .

⎣ ⎦ ∫ ⎡ f ( t )⎤dt < ∞ . i. Characteristic function Consider a random variable X with probability density function f X ( x). Note the following: • φ X (ω ) is a complex quantity. • As φ X (ω ) always +ve and −∞ ∞ ∫ f X ( x ) dx = 1 . These functions are particularly important in • calculating of moments of a random variable • evaluating the PDF of combinations of multiple RVs. φ X (ω ) always exists.e. f(t) is absolutely [Recall that the Fourier transform of a function f(t) exists if integrable. This implies that the properties of the Fourier transform applies to the characteristic function. The interpretation that instead of e − jω X φ X (ω ) ∞ is the expectation of e jω X helps in calculating moments with the help of the characteristics function.Characteristic Functions of Random Variables Just as the frequency-domain charcterisations of discrete-time and continuous-time signals. is defined as φX (w) = Ee jω X = ∫ e jωx f X (x)dx −∞ ∞ where j = −1. representing the Fourier transform of jω X f ( x ) and traditionally using e • .] We can get −∞ f X ( x ) from φ X (ω ) by the inverse transform ∞ 1 fX ( x) = 2π −∞ ∫ φ (ω ) e X − jω x dω Example 1 Consider the random variable X with pdf f X ( x ) given by fX ( x) = 1 b−a a≤ x≤b . The characteristic function of X denoted by φ X ( w). the probability mass function and the probability density function can also be characterized in the frequency-domain by means of the charcteristic function of a random variable..

= 0 otherwise.... φ X (ω ) = Ee jω X = X i ∈RX ∑ p X ( xi ) e jω xi i Note that φ X (ω ) can be interpreted as the discrete-time Fourier transform with e jω x substituting e − jω xi in the original discrete-time Fourier transform. The inverse relation is .. x > 0 is φ X (ω ) = ∫ e jω x λ e − λ x 0 ∞ = λ ∫ e − ( λ − jw) x dx 0 ∞ = λ λ − jω Characteristic function of a discrete random variable: Suppose X is a random variable taking values from the discrete set corresponding probability mass function Then RX = { x1 ..} with p X ( xi ) for the value xi . The characteristics function is given by Solution: φ X (ω ) = ∫ b 1 jω x e dx b−a a e jω x ⎤ jω ⎥ a ⎦ b 1 = b−a = 1 ( e jωb − e jωa ) jω ( b − a ) Example 2 The characteristic function of the random variable X with f X ( x) = λ e − λ x λ > 0. x2 .

. φ X (ω ) = ∑ e jω k p(1 − p) k = p ∑ e jω k (1 − p) k k =0 ∞ = p 1 − (1 − p)e jω Moments and the characteristic function Given the characteristics function φ X (ω ) ..... the kth moment is given by EX k = 1 dk φ X (ω ) j dω k ω =0 To prove this consider the power series expansion of eiω X ( jω ) 2 X 2 ( jω ) n X n + . n! 2! Taking the first derivative of φ X (ω ) with respect to ω at ω = 0. ∞ k =0 k = 0. + + ....... n n−k Then φ X (ω ) = ∑ nCk p k (1 − p ) k =0 e jω k n−k = ∑ Ck ( pe n n k =0 jω ) (1 − p ) n k = ⎡ pe jω + ( − p ) ⎤ ⎣ ⎦ (Using the Binomial theorem) Example 4 The characteristic function of the discrete random variable X with p X (k ) = p(1 − p) k ... we get . EX .p X ( xi ) = 1 π − jω xi φ X (ω )dω ∫e 2π −π Example 3 Suppose X is a random variable with the probability mass function p X ( k ) = nCk p k (1 − p ) n n−k ..1.... k = 0. we get e jω X = 1 + jω X + φ X (ω ) = 1 + jω EX + ( jω ) 2 EX 2 ( jω ) n EX n + .. EX to exist............ n! 2! 2 n Taking expectation of both sides and assuming EX ..1... + + .....

z − k is used instead of z k . The characteristic function of X is given by φ X (ω ) = GX ( e jω ) GX (1) = ∑ p X ( k ) = 1 k =0 ∞ . taking the nth derivative of φ X (ω ) with respect to ω at ω = 0.dφ X (ω ) = jEX dω ω = 0 Similarly. it is convenient to characterize the random variable in terms of the probability generating function G (z) defined by GX ( z ) = Ez X = ∑ pX ( k ) z k =0 ∞ Note that GX ( z ) is related to z-transform. we get d nφ X (ω ) = j n EX n n dω ω =0 Thus EX = 1 dφ X (ω ) j dω ω = 0 1 d nφ X (ω ) j n dω n ω = 0 EX n = Example 3 First two moments of random variable in Example 2 φ X (ω ) = λ λ − jω d λj φ X (ω ) = dω (λ − jω ) 2 d2 2λ j 2 φ X (ω ) = dω 2 (λ − jω )3 EX = 1 λj j (λ − jω ) 2 2 = ω =0 1 λ EX 2 = 1 2λ j 2 = 2 2 3 j (λ − jω ) ω =0 λ Probability generating function: If the random variable under consideration takes non negative integer values only. in actual z-transform.

GX ' ( z ) = ∑ kp X ( k )z k −1 k =0 ∞ G '(1) = ∑ kp X ( k ) = EX k =0 ∞ GX ''( z ) = ∑ k (k − 1) p X ( k ) z k − 2 = ∑ k 2 px ( k ) z k − 2 − ∑ k px ( k ) z k − 2 k =0 k =0 k =0 ∞ ∞ ∞ ∴ GX ''(1) = ∑ k 2 p X ( k ) − ∑ kp X ( k ) = EX 2 − EX k =0 k =0 ∞ ∞ ∴σ X 2 = EX 2 − ( EX ) = GX ''(1) + GX '(1) − ( GX '(1) ) 2 2 Ex: Binomial distribution: ∴φx ( z ) = ∑ nCx p x (1 − p) n − x z x = ∑ nCx ( pz ) x (1 − p) n − x x x pm ( x ) = nC x p x (1 − p ) x = (1 − p + pz ) n φ ' X (1) = EX = np φ X '' (1) = EX 2 − EX = n(n − 1) p 2 ∴ EX 2 = φ X '' (1) + EX = n(n − 1) p 2 + np = np 2 + npq Example 2: Geometric distribution p X ( x ) = p (1 − p ) x φ X ( z ) = ∑ p(1 − p) x z x x = p ∑ ( (1 − p ) z ) x x =p 1 1 − (1 − p) z p(1 − p) (1 − (1 − p) z ) 2 φX ' ( z ) = + .

φ X ' (1) = φ X '' ( z ) = p(1 − p ) p (1 − p) q = = 2 (1 − 1 + p) p2 p 2 p(1 − p)(1 − p ) (1 − (1 − p) z )3 2 ⎛q⎞ 2 pq 2 φ X (1) = 3 = 2 ⎜ ⎟ p ⎝ p⎠ '' 2 '' ⎛q⎞ q q EX = φ (1) + = 2 ⎜ ⎟ + p p ⎝ p⎠ ⎛q⎞ ⎛q⎞ ⎛q⎞ Var ( X ) = 2 ⎜ ⎟ + ⎜ ⎟ − ⎜ ⎟ ⎝ p⎠ ⎝ p⎠ ⎝ p⎠ Moment Generating Function: Sometimes it is convenient to work with a function similarly to the Laplace transform and known as the moment generating function. If X is a non negative continuous random variable. the moment generating function 2 2 2 M X ( s ) is defined by M X ( s ) = Ee SX = RX ∫ f X ( x )e SX dx Where RX is the range of the random variable X. we can write ∞ M X ( s ) = ∫ f X ( x ) e SX dx 0 Note the following: M x '( s ) = ∫ xf x ( x ) e sx dx 0 ∞ ∴ M '(0) = EX dk M X ( s ) = ∫ x k f X ( x ) e sx dx ds k 0 = ∞ EX k . For a random variable X.

Y ( s1 . φ X . ⎛q⎞ ⎛q⎞ = ⎜ ⎟ +⎜ ⎟ ⎝ p⎠ ⎝ p⎠ 2 The joint characteristic function of two random variables X and Y is defined by. α > 0 EX = −∞ ∫ xf ( x ) dx X = α ∞ 2x dx π ∫ x2 + α 2 0 = α ⎤ ln (1 + x 2 ) ⎥ π ⎦0 ∞ Hence EX does not exist. φ X . ω2 ) = Ee jω x + jω y 1 2 ∞ ∞ = −∞ −∞ ∫∫ f X . y )e xs1 + ys2 dxdy = Ee s1x + s2 y IF Z =ax+by.Y (as.Example fX ( x) = Then ∞ π ( x2 + α 2 ) α Let X be a continuous random variable with − ∞ < x < ∞. s2 ) = ∞ ∞ −∞ −∞ ∫∫ f X .Y (ω1 . then φZ ( s) = Ee Zs = Ee( ax +by ) s = φ X . This density function is known as the Cauchy density function. Then . y ( x.Y ( x. s2 ) is defined by. y )e jω1x + jω2 y dydx And the joint moment generating function of φ ( s1 . bs) Suppose X and Y are independent.

.e xs dx σX2 ∞ −1 2 x2 − 2( µ X +σ X 2 s ) x + ( µ X +σ X 2 s )−( µ X +σ X 2 s )+. s ) = φ X ( s)φY ( s) Using the property of Laplace transformation we get.Y ( s1 .. EX = ∂ φ X ... s2 ) = Ee Xs1 +Ys2 = E (1 + Xs1 + Ys2 + 2 s EX 2 s2 2 EY 2 = 1 + s1 EX + s2 EY + + + s1s2 EXY 2 2 2 1 ( Xs1 + Ys2 ) + .Y ( s1 . y )dydx ∞ −∞ sx s y ∫ e 1 f X ( x)dx ∫ e 2 fY ( y)dy −∞ = φ1 ( s1 )φ2 ( s2 ) Particularly if Z=X+Y and X and Y are independent.....) Hence. s2 )]s2 =0 ∂s2 EY = .Y ( x. dx ∫ (σ 4 s 2 + 2 µ X σ X 2 s ) +1 X 1 ( x − µ −σ 2 s ) − 2 σX2 e 2 X X dx + e dx We have. φ X . f Z ( z ) = f X ( z ) * fY ( z ) Let us recall the MGF of a Gaussian random variable X = N (µ X . then φZ ( s) = φ X .Y ( s.Y ( s1 .... s2 )]s1 =0 ∂s1 ∂ φ X ..σ X 2 ) φ X (s) = Ee Xs = = = ∫e 2πσ X −∞ ∫e 2πσ X −∞ 1 2πσ X ∞ −∞ 1 1 ∞ ⎛ x−µ X −1⎜ 2⎜ σ X ⎝ ⎞ ⎟ ⎟ ⎠ 2 ..Y ( s1 ..φ X . s2 ) = = ∞ ∞ −∞ −∞ ∞ ∫ ∫e s1 x + s2 x f X .

.EXY = ∂2 φ X . s2 =0 ∂s1∂s2 We can generate the joint moments of the RVS from the moment generating function.Y ( s1 . s2 )]s1 =0.

xn with equal probability...n FX ( x) = 1 n ∑ δ ( x − xi ) n i =1 p X ( x) 1 n x1 … x2 x3 iii iii xn x Mean and variance of the Discrete uniform random variable . 2.. The probability mass function of the uniform random variable X is given by 1 ....Important Discrete Random Variables Discrete uniform random variable A discrete random variable X is said to be a uniform random variable if it assumes each of the values x1 . n Its CDF is p X ( xi ) = i = 1. x2 .

Then X can assume any of the 6 values in the set {1. 5. 6 Bernoulli random variable Suppose X is a random variable that takes two values 0 and 1.µ X =EX = ∑ xi p X ( xi ) i =0 n = n 1 n ∑ xi n i =0 EX 2 == ∑ xi2 p X ( xi ) i =0 = 1 n 2 ∑ xi n i =0 2 2 2 ∴σ X = EX 2 − µ X 1 n ⎛1 n ⎞ = ∑ xi2 − ⎜ ∑ xi ⎟ n i =0 ⎝ n i =0 ⎠ Example: Suppose X is the random variable representing the outcome of a single roll of a fair dice. 6} with the probability mass function p X ( x) = 1 6 x = 1. 4.5. with probability mass functions p X (1) = P { X = 1} = p and p X (0) = 1 − p. 3. 4. because it describes the outcomes of a Bernoulli trial. . The typical cdf of the Bernoulli RV X is as shown in Fig. 2. 0 ≤ p ≤1 Such a random variable X is called a Bernoulli random variable.3. 2.

... • For the Bernoulli RV.. .. Thus f X ( x) = (1 − p)δ ( x) + pδ ( x) Example 1: Consider the P { H } = p and P {T } = 1 − p . Thus all the moments of the Bernoulli RV have the same value of p. EX m = p m = 1.1. Binomial random variable: Suppose X is a discrete random variable taking values from the set {0. 2..FX ( x) 1 0 x Remark We can define the pdf of X with the help of delta function.. Suppose If we define the random variable X ( H ) = 1 and X (T ) = 0 then X is a Bernoulli random variable. experiment of tossing a biased coin. Mean and variance of the Bernoulli random variable µ X =EX = ∑ kp X (k ) = 1× p + 0 × (1 − p) = p k =0 1 EX 2 = ∑ k 2 p X (k ) = 1× p + 0 × (1 − p ) = p k =0 2 ∴σ = EX 2 − µ X = p (1 − p ) 2 X 1 Remark • The Bernoulli RV is the simplest discrete RV.. It can be used as the building block for many discrete RVs.. X is called a binomial random variable with parameters n and 0 ≤ p ≤ 1 if . n} ..3.

. Therefore.. The binomial distribution is useful when there are two types of objects . • The notation X ~ B(n.p X (k ) = nCk p k (1 − p ) n − k where n! n Ck = k !( n − k ) ! As we have seen. healthy ..n k).998 + 8C1 × 0..8 is shown in the figure below. This is because ∑ pX (k ) = ∑ nCk p k (1 − p)n−k = [ p +(1 − p)]n = 1. k=0.996 (c) P( all 8 bits will be erroneous)=p X (8) = 0. If X is a discrete random variable representing the number of successes in this case. defines a valid probability mass function. find the probability that (a) exactly 2 bit errors will occur (b) at least 2 bit errors will occur (c) all the bits will bit erroneous Suppose X is the random variable representing the number of bit errors in a block of 8 bits. 0. diseased etc.996 (b) P (at least 2 bit errors will occur)=p X (0) + p X (1) + p X (2) =0.001. Example: In a binary communication system. the probability of k successes in n independent repetitions of the Bernoulli trial is given by the binomial law.011 × 0. If a block of 8 bits are transmitted. k =0 k =0 n n • • The sum of n independent identically distributed Bernoulli random variables is a binomial random variable. the probability of bit error is 0. the number of heads in ‘n’ independent tossing of a fair coin is a binomial random variable. For example.997 + 8C2 × 0.012 × 0. then X is a binomial random variable.01). p) is used to represent a binomial RV with the parameters n and p. erroneous.012 × 0. • p X (k ). Then X ~ B(8. .1. correct. (a) P (exactly 2 bit errors will occur) = p X (2) = 8C2 × 0.good.018 = 10−6 The probability mass function for a binomial random variable with n = 6 and p =0. bad.

Mean and variance of the Binomial random variable .

We have EX = ∑ kp X (k ) k =0 n n = ∑ k nCk p k (1 − p) n − k k =0 =0 × q n + ∑ k nCk p k (1 − p) n − k k =1 n =∑ k k =1 n n n! p k (1 − p) n − k k !(n − k )! =∑ n! p k (1 − p ) n − k k =1 ( k − 1)!( n − k )! n − 1! p k −1 (1 − p ) n −1− k1 k =1 ( k − 1)!( n − k )! n n −1 =np ∑ =np ∑ = np n − 1! p k1 (1 − p ) n −1− k1 (Substituting k1 = k − 1) k1 = 0 k1 !( n − 1 − k1 )! = np ( p + 1 − p ) n −1 Similarly EX 2 = ∑ k 2 p X (k ) k =0 n n = ∑ k 2 nCk p k (1 − p) n − k k =0 =02 × q n + ∑ k nCk p k (1 − p) n − k k =1 n = ∑k2 k =1 n n n! p k (1 − p) n − k k !(n − k )! = ∑k k =1 n n! p k (1 − p) n − k (k − 1)!(n − k )! n − 1! p k −1 (1 − p) n −1−( k −1) (k − 1)!(n − k )! = np ∑ (k − 1 + 1) k =1 n n n − 1! n − 1! p k −1 (1 − p) n −1−( k −1) + np ∑ p k −1 (1 − p) n −1− ( k −1) (k − 1)!(n − 1 − k + 1)! k =1 k =1 ( k − 1)!( n − 1 − k + 1)! = np × (n − 1) p + np = np ∑ (k − 1) = n(n − 1) p 2 + np 2 ∴ σ X = variance of X = n(n − 1) p 2 + np − n 2 p 2 = np (1 − p ) Mean of B (n − 1. p ) .

5.... 4.. 4. 6 Bernoulli random variable Suppose X is a random variable that takes two values 0 and 1. The probability mass function of the uniform random variable X is given by p X ( xi ) = 1 .. 2. 0 ≤ p ≤1 .3. 2. 3. 2. n i = 1. Then X can assume any of the 6 values in the set {1. 5. 6} with the probability mass function p X ( x) = 1 6 x = 1. with probability mass functions p X (1) = P { X = 1} = p and p X (0) = 1 − p. x2 .n p X ( x) 1 n x1 x2 x3 iii iii xn x … Example: Suppose X is the random variable representing the outcome of a single roll of a fair dice.xn with equal probability.Important Random Variables Discrete uniform random variable A discrete random variable X is said to be a uniform random variable if it assumes each of the values x1 ..

Such a random variable X is called a Bernoulli random variable. Suppose If we define the random variable X ( H ) = 1 and X (T ) = 0 then X is a Bernoulli random variable. experiment of tossing a biased coin. . It can be used as the building block for many discrete RVs. The typical cdf of the Bernoulli RV X is as shown in Fig. Thus f X ( x) = (1 − p)δ ( x) + pδ ( x) Example 1: Consider the P { H } = p and P {T } = 1 − p . because it describes the outcomes of a Bernoulli trial. FX ( x) 1 0 x Remark We can define the pdf of X with the help of delta function. Mean and variance of the Bernoulli random variable µ X =EX = ∑ kp X (k ) = 1× p + 0 × (1 − p) = p k =0 1 EX 2 = ∑ k 2 p X (k ) = 1× p + 0 × (1 − p ) = p k =0 2 ∴σ = EX 2 − µ X = p (1 − p ) 2 X 1 Remark • The Bernoulli RV is the simplest discrete RV.

. diseased etc.01). the number of heads in ‘n’ independent tossing of a fair coin is a binomial random variable.. n} ..012 × 0. Therefore.. healthy . For example. the probability of bit error is 0..018 = 10−6 .1.998 + 8C1 × 0.. 0. If X is a discrete random variable representing the number of successes in this case..997 + 8C2 × 0..3.n k).996 (c) P( all 8 bits will be erroneous)=p X (8) = 0. EX m = p m = 1...001. Example: In a binary communication system. • p X (k ).• For the Bernoulli RV...011 × 0. p) is used to represent a binomial RV with the parameters n and p. correct. The binomial distribution is useful when there are two types of objects . k =0 k =0 n n • • The sum of n independent identically distributed Bernoulli random variables is a binomial random variable. defines a valid probability mass function.. 2. • The notation X ~ B(n. .996 (b) P (at least 2 bit errors will occur)=p X (0) + p X (1) + p X (2) =0. bad. If a block of 8 bits are transmitted.1. Then X ~ B(8.. (a) P (exactly 2 bit errors will occur) = p X (2) = 8C2 × 0.good. find the probability that (a) exactly 2 bit errors will occur (b) at least 2 bit errors will occur (c) all the bits will bit erroneous Suppose X is the random variable representing the number of bit errors in a block of 8 bits. k=0. erroneous. Binomial random variable: Suppose X is a discrete random variable taking values from the set {0.012 × 0. then X is a binomial random variable. X is called a binomial random variable with parameters n and 0 ≤ p ≤ 1 if p X (k ) = nCk p k (1 − p ) n − k where n! n Ck = k !( n − k ) ! As we have seen. Thus all the moments of the Bernoulli RV have the same value of p. This is because ∑ pX (k ) = ∑ nCk p k (1 − p)n−k = [ p +(1 − p)]n = 1. the probability of k successes in n independent repetitions of the Bernoulli trial is given by the binomial law.

The probability mass function for a binomial random variable with n = 6 and p =0.8 is shown in the figure below. Mean and variance of the Binomial random variable .

We have EX = ∑ kp X (k )

k =0 n n

= ∑ k nCk p k (1 − p) n − k

k =0

=0 × q n + ∑ k nCk p k (1 − p) n − k

k =1

n

=∑ k

k =1 n

n

n! p k (1 − p) n − k k !(n − k )!

=∑

n! p k (1 − p ) n − k k =1 ( k − 1)!( n − k )! n − 1! p k −1 (1 − p ) n −1− k1 k =1 ( k − 1)!( n − k )!

n n −1

=np ∑

=np ∑ = np

n − 1! p k1 (1 − p ) n −1− k1 (Substituting k1 = k − 1) k1 = 0 k1 !( n − 1 − k1 )!

= np ( p + 1 − p ) n −1 Similarly EX 2 = ∑ k 2 p X (k )

k =0 n n

= ∑ k 2 nCk p k (1 − p) n − k

k =0

=02 × q n + ∑ k nCk p k (1 − p) n − k

k =1

n

= ∑k2

k =1 n

n

n! p k (1 − p) n − k k !(n − k )!

= ∑k

k =1 n

n! p k (1 − p) n − k (k − 1)!(n − k )! n − 1! p k −1 (1 − p) n −1−( k −1) (k − 1)!(n − k )!

= np ∑ (k − 1 + 1)

k =1 n

n n − 1! n − 1! p k −1 (1 − p) n −1−( k −1) + np ∑ p k −1 (1 − p) n −1− ( k −1) (k − 1)!(n − 1 − k + 1)! k =1 k =1 ( k − 1)!( n − 1 − k + 1)! = np × (n − 1) p + np

= np ∑ (k − 1)

= n(n − 1) p 2 + np

2 ∴ σ X = variance of X = n(n − 1) p 2 + np − n 2 p 2 = np (1 − p )

Mean of B (n − 1, p )

**Geometric random variable:
**

X as a discrete random variable with range RX = {1, 2,.......} . X is called a geometric random variable of p X (k ) = p(1 − p) k −1 0 ≤ p ≤1

•

X describes the outcomes of independent Bernoulli trials each with probability of success p, before a ‘success’ occurs. If the first success occurs at the kth trial, then the outputs of the Bernoulli trials are F ailiure, Failure,.., Failure , Success

k −1 times

• • •

∴ p X (k ) = (1 − p) k −1 p = p(1 − p) k −1 RX is countablly infinite, because we may have to wait infinitely long before the first success occurs. The geometric random variable X with the parameter p is denoted by X ~ geo( p) The CDF of X ~ geo( p ) is given by

FX (k ) = ∑ (1 − p )i −1 p = 1 − (1 − p) k

i =1 k

which gives the probability that the first ‘success’ will occur before the (k + 1)th trial. Following Figure shows the pmf of a random variable X ~ geo( p ) for p = 0.25 and p = 0.5 respectively. Observe that the plots have a mode at k = 1.

Example: Suppose X is the random variable representing the number of independent tossing of a coin before a head shows up. Clearly X will be a geometric random variable. Example: A fair dice is rolled repeatedly. What is the probability that a 6 will be shown before the fourth roll.

Suppose X is the random variable representing the number of independent rolling of the 1 dice before a ‘6’ shows up. Clearly X will be a geometric random variable with p = . . 6 P( a '6' will be shown before the 4th roll)=P( X = 1 or X = 2 or X = 3) = p X (1) + p X (2) + p X (3)

= p + p(1 − p) + p(1 − p ) 2 = p(3 − 3 p + p 2 ) 91 = 196 Mean and variance of the Geometric random variable

Mean=EX = ∑ kp X (k )

k =0 ∞

∞

= ∑ k p (1 − p ) k −1

k =0

= p ∑ k (1 − p ) k −1

k =0

∞

= − p∑ = −p p p2 1 = p =

d (1 − p ) k k = 0 dk d ∞ ∑ (1 − p)k dk k =0 ( Sum of the geometric series)

∞

EX = ∑ k 2 p X (k )

2 k =0

∞

= ∑ k 2 p (1 − p ) k −1

k =0

∞

= p ∑ (k (k − 1) + k )(1 − p ) k −1

k =0

∞

= p (1 − p )∑ k (k − 1)(1 − p ) k − 2 + p ∑ k (1 − p ) k −1

k =0 k =0

∞

∞

d dp 2 (1- p ) 1 + =2 p p2 = p (1 − p )

2 ∴σ X = 2

2

∑ (1 − p)

k =0

∞

k

+ p ∑ k (1 − p ) k −1

k =0

∞

(1- p ) 1 1 + − 2 p p p2 (1- p ) = p2

Mean

µX =

2 Variance σ X

1 p (1- p ) = p2

Poisson Random Variable: X is a Poisson distribution with the parameter e−λ λ k p X (k ) = k = 0,1, 2,.... , k! The plot of the pmfof the Poisson RV is shown in Fig.

λ

such that

λ >0

and

Remark • e−λ λ k satisfies to be a pmf, because k! ∞ ∞ ∞ e− λ λ k λk −λ ∑ p X (k ) = ∑ k ! = e ∑ k ! = e− λ eλ = 1 k =0 k =0 k =0 Named after the French mathematician S.D. Poisson p X (k ) =

•

Mean and Variance of the Poisson RV The mean of the Poisson RV X is given by ∞ µ X = ∑ kp X (k ) k =0 = 0+∑k k =1 ∞ e− λ λ k k! = λe−λ ∑ k =1 ∞ λ k −1 k − 1! =λ EX 2 = ∑ k 2 p X (k ) k =0 ∞ e−λ λ k = 0+∑k k! k =1 ∞ 2 kλ k =e ∑ k =1 k − 1! −λ ∞ = e− λ ∑ −λ (k − 1 + 1)λ k k − 1! k =1 ∞ ∞ ⎛ λ k ⎞ −λ ∞ λ k = e ⎜0 + ∑ ⎟+e ∑ k = 2 k − 2! ⎠ k =1 k − 1! ⎝ = e− λ λ 2 ∑ k =2 ∞ λ k −2 k − 2! + e−λ λ ∑ k =1 ∞ λ k −1 k − 1! = e − λ λ 2 eλ + e − λ λ eλ = λ2 + λ 2 2 ∴σ X = EX 2 − µ X = λ Example: The number of calls received ina telephone exchange follows a Poisson distribution with an average of 10 calls per minute. What is the probability that in oneminute duration (i) no call is received (ii) exactly 5 callsare received (iii) more than 3 calls are received Let X be the random variable representing the number of e−λ λ k where λ = 10. Therefore. Given . p X (k ) = k! calls received.

(i) (ii) (iii) probability that no call is received = p X (0) = e −10 = e −10 × 105 probability that exactly 5 calls are received = p X (5) = = 5! probability that more the 3 calls are 2 3 5 10 10 10 = 1 − ∑ p X (k ) = 1 − e −10 (1 + + + )= 1 2! 3! k =0 received The Poisson distribution is used to model many practical problems. Then p X (k ) e− λ λ k k! . p → 0 so that EX = np = λ remains constant. p) with n → ∞. Consider a binomial RV X ~ B(n. Thus it is used to model the count during a particular length of time of: • customers arriving at a service station • telephone calls coming to a telephone exchange packets arriving at a particular server • particles decaying from a radioactive specimen Poisson approximation of the binomial random variable The Poisson distribution is also used to approximate the binomial distribution B (n. p ) when n is very large and p is small. It is used in many counting applications to count events that take place independently of one another.

.(1 − )(λ ) k (1 − ) n −λ k n n n n =e λ ∴ p X (k ) lim n →∞ λ k! k !(1 − ) k n λ λ Thus the Poisson approximation can be used to compute binomial probabilities for large n.....01.p X (k ) = nCk p k (1 − p) n − k n! p k (1 − p ) n − k k !(n − k )! n(n − 1)(n − 2)...(n − k + 1) k = p (1 − p) n − k k! 1 2 k −1 n k (1 − )(1 − ).01 per word in typing.12 P(more than one errors) = 1 − p X (0) − p X (1) 1 − e− λ − λ e− λ .(1 − ) n n n p k (1 − p) n − k = k! k −1 1 2 (1 − )(1 − ). It also makes the analysis of such probabilities easier. X ~ B (120.Typical examples are: • number of bit errors in a received binary data file • number of typographical errors in a printed page Example Suppose there is an error probability of 0...01 = 0... n →∞ n 1 2 k −1 λ (1 − )(1 − )...(1 − )(λ ) k (1 − ) n n n n n = = k !(1 − ) k n Note that lim(1 − ) n = e − λ ...(1 − ) n n n (np ) k (1 − p) n − k = k! 1 2 k −1 λ (1 − )(1 − ). Suppose X is the RV representing the number of errors per page of 120 words... Therefore. What is the probability that there will be more than 1 error in a page 120 words. ∴ λ = 120 × 0. p ) where p = 0....

b−a a Distribution function FX ( x) For x < a FX ( x) = 0 For a ≤ x ≤ b −∞ ∫ x f X (u )dx du b−a a x =∫ = x−a b−a For x > b.b].Uniform Random Variable A continuous random variable X is called uniformly distributed over the interval [a. if its probability density function is given by ⎧ 1 ⎪ f X ( x) = ⎨ b − a ⎪ 0 ⎩ a≤ x≤b otherwise We use the notation X ~ U (a. b]. b) to denote a random variable X uniformly distributed over the interval ∞ [a. FX ( x) = 1 Mean and Variance of a Uniform Random Variable ∞ b x µ X = EX = ∫ xf X ( x)dx = ∫ dx b−a −∞ a = a+b 2 . Also note that −∞ ∫ f X ( x)dx = ∫ b 1 dx = 1.

2 σX = Normal or Gaussian Random Variable The normal distribution is the most important distribution used to model natural and man made phenomena. for example. . (See Example) Thus if X is a contnuous raqndom variable. If a signal is discritized into steps of ∆. 2π ). to to model quantization errors. 2π ] in many applications. P (2 < X ≤ 3) = ∫ 9 2 5 − ( −4) (5 + 4)2 27 = . 12 4 Remark • The uniform distribution is the simplest continuous distribution • Used. the carrier signal is modeled as X (t ) = A cos( wc t + Φ ) where Φ ~ U (0. What is the probability that the noise voltage will lie between 2 v and 4 V? What is the variance of the voltage? 3 dx 1 = . Particular. in studying the noise performance of a communication receiver.∞ EX = 2 −∞ ∫ x f X ( x)dx = ∫ 2 b x2 dx b−a a = 2 X b 2 + ab + a 2 3 2 2 X b 2 + ab + a 2 (a + b) 2 ∴σ = EX − µ = − 3 4 2 (b − a ) = 12 The characteristic function of the random variable X ~ U (a. b) is given by b e jwx φ X ( w) = Ee jwx = ∫ dx ab−a e jwb − e jwa = b−a Example: Suppose a random noise voltage X across an electronic circuit is uniformly distributed between -4 V and 5 V.1]. then FX ( X ) ~ U (0. This follows from the fact that the distribution function of a random variable is uniformly distributed over [0. 2 2 • The unknown phase of a sinusoid is assumed to be uniformly distributed over [0. when the random variable is the sum of a large number of random variables. For example. • A random variable of arbitrary distribution can be generated with the help of a routine to generate uniformly distributed random numbers.1). it can be modeled as a normal random variable. −∆ ∆ then the quantization error is uniformly distributed between and .

f X ( x) = 1 2πσ X e 1 ⎛ x−µX ⎞ − ⎜ ⎟ 2⎝ σ X ⎠ 2 . is a bell-shaped function. 1 − 1 x2 f X ( x) = e 2 2π and the random variable X is called the standard natural variable. −∞ < x < ∞ where µ X and σ X > 0 are real numbers. • f X ( x) . symmetrical about x = µ X . We write that X is N ( µ X . σ X 2 ) distributed.A continuous random variable X is called a normal or a Gaussian random variable with parameters µ X and σ X 2 if its probability density function is given by. If µ X = 0 and σ X 2 = 1 . .

• FX ( x) = P ( X ≤ x ) = 1 2πσ X −∞ ∫e x 1 ⎛ t −µX ⎞ − ⎜ ⎟ 2⎝ σ X ⎠ 2 dt Substituting. σ X 2 ) distributed. Q( x) = 1 − Φ ( x) 1 = 2π ∞ ∫e x − u2 2 du 1 Note that Q (0) = . Q ( − x ) = Q ( x ) 2 If X is N ( µ X . it is customary to work with the Q function defined by. we get 1 FX ( x) = 2π σX −∞ ∫ e 1 − u2 2 du ⎛ x − µX ⎞ = Φ⎜ ⎟ ⎝ σX ⎠ where Φ( x) is the distribution function of the standard normal variable. .• σ X . If σ X 2 is small X is more concentrated around the mean µ X . u = x−µX t − µX σX . In communication engineering. then EX = µ X var( X ) = σ X 2 Proof: . Thus FX ( x) can be computed from tabulated values of Φ( x). determines the spread of the random variable X. The table Φ( x) was very useful in the pre-computer days.

x − µX =u 1 − u2 1 = σ X ∫ ( uσ X + µ X ) e 2 du 2π −∞ ∞ = µ 1 σ X ∫ udu + X 2π 2π −∞ 1 2π 2 ∞ ∞ σX −∞ ∫e 1 − u2 2 du so that x = uσ X + µ X = 0 + µX ∞ −∞ ∫e u2 − 2 du ∞ u − µ = XX 2∫ e 2 du = µ X 2π 0 ∞ Evaluation of ∞ −∞ ∫e − − x2 2 dx Suppose I = Then −∞ ∫e x2 2 dx ⎛ ∞ −x ⎞ I = ⎜ ∫ e 2 dx ⎟ ⎜ ⎟ ⎝ −∞ ⎠ 2 2 2 ∞ = = −∞ ∫e − x2 2 dx ∫ e −∞ x2 + y 2 2 ∞ − y2 2 dy ∞ ∞ −∞ −∞ ∫ ∫e π ∞ − dydx Substituting x = r cos θ and y = r sin θ we get I = 2 −π 0 ∞ ∫ ∫e − r2 2 r dθ dr = 2π ∫ e − r r dr 2 0 = 2π ∫ e ds −s 0 ∞ ( r2 = s ) = 2π × 1 = 2π ∴ I = 2π .∞ EX = −∞ ∫ xf X ( x)dx = 1 2πσ X ∞ −∞ ∫ xe 1 ⎛ x−µX ⎞ − ⎜ ⎟ 2⎝ σ X ⎠ dx Substituting.

Var ( X ) = E ( X − µ X ) = = 1 2πσ X 1 2πσ X ∞ 2 1 ⎛ x−µX ⎞ − ⎜ ⎟ 2⎝ σ X ⎠ 2 −∞ ∞ ∫ (x − µ ) X 2 e dx −∞ 2 2 ∫ σX u e 2 1 − u2 2 σ X du Put Put x − µX = 2× σ X 2 ∞ 2 −1u ∫ u e 2 du 2π 0 2 ∫ t e dt 0 ∞ 1 2 −t σX = u So that dx = σ X du σ 2 = 2× X 2π = 2× 2 1 t = u 2 so that dt = udu 2 σX 3 π 2 σ 21 1 = 2× X π 2 2 2 σ = X × π π =σX2 Note the definition and properties of the gamma function Γn = ∫ t n −1e − t dt 0 ∞ Γn = (n − 1)Γ(n − 1) Γ 1 = π 2 .

Exponential Random Variable A continuous random variable X is called exponentially distributed with the parameter λ > 0 if the probability density function is of the form x≥0 ⎧λ e − λ x f X ( x) = ⎨ otherwise ⎩ 0 The corresponding probabilty distribution function is FX ( x) = −∞ ∫ x f X (u )du x<0 ⎧0 =⎨ −λ x x≥0 ⎩1 − e We have µ X = EX = ∫ xλ e − λ x dx = 0 ∞ ∞ 1 λ 1 2 σ X = E ( X − µ X ) 2 = ∫ ( x − ) 2 λ e − λ x dx = 2 λ λ 0 1 The following figure shows the pdf of an exponential RV. .

-∞ < x < ∞ ∞ We have µ X = EX = ∞ −∞ ∫ x2e λ 2 e λ −λ x dx = 0 dx = λ 2 σ X = E ( X − µ X )2 = ∫ x 2 ∞ −λ x .• • • The time between two consecutive occurrences of independent events can be modeled by the exponential RV. For example. the expected life-time of a part. P( X > t + t0 / X > t0 ) = P( X > t ) for t > 0. Thus the life expectancy of an used component is same as that for a new component. The information that the component has already lasted for t0 hours is not used.are determined using the exponential distribution. the probability that the component will last more than t + t0 hours given that it has lasted t0 hours. t0 > 0 Proof: P ( X > t + t0 / X > t0 ) = = = P[( X > t + t0 ) ∩ PX > t0 )] P ( X > t0 ) P ( X > t + t0 ) P ( X > t0 ) 1 − FX (t + t0 ) 1 − FX (t0 ) e − λ ( t + t0 ) = − λt0 e − λt = e = P( X > t ) Hence if X represents the life of a component in hours. is same as the probability that the component will last t hours. Memoryless Property of the Exponential Distribution For an exponential RV X with parameter λ . but used for its simplicity. Laplace Distribution A continuous random variable X is called Laplace distributed with the parameter λ > 0 with the probability density function is of the form f X ( x) = λ 2 e −λ x λ > 0. In reliability studies. Such a model cannot represent a real-world situation. the exponential distribution gives probability density function of the time between two successive counts in Poisson distribution Used to model the service time in a queing system. the average time between successive failures of a system etc..

2 2 The pdf of χ n RVs with different degrees of freedom is shown in Fig. A Chi-square random 2 variable with n degrees of freedom is denoted by χ n .) denoting the gamma function. Note that a χ 2 RV is the exponential RV.Chi-square random variable A random variable is called a Chi-square random variable with n degrees of freedom if its PDF is given by 2 ⎧ x n / 2−1 e − x / 2σ x > 0 ⎪ n/2 n f X ( x) = ⎨ 2 σ Γ(n / 2) ⎪0 x<0 ⎩ with the parameter σ > 0 and Γ(. below: .

. This result gives an application of the chi-square random variable. . . EX 2 = ∫ x 2 f X ( x)dx −∞ ∞ = ∫ x2 0 ∞ 2 x n / 2−1 e − x / 2σ dx n/2 n 2 σ Γ(n / 2) =∫ ∞ 2 x ( n + 2) / 2 e− x / 2σ dx n/2 n σ Γ(n / 2) 0 2 =∫ = = ∞ (2σ 2 )( n + 2) / 2 u n / 2 − u e (2σ 2 )du ( Substituting u = x / 2σ 2 ) 2n / 2 σ n Γ(n / 2) 0 4σ 4 Γ[(n + 4) / 2] Γ(n / 2) 4σ 4 [(n + 2) / 2]n / 2Γ(n / 2) Γ(n / 2) = n(n + 2)σ 4 2 2 σ X = EX 2 − µ X = n(n + 2)σ 4 − nσ 4 = 2nσ 4 A random variable Let X 1 . + X n has χ n 2nσ 4 .Mean and variance of the chi-square random variable µ X = ∫ xf X ( x)dx −∞ ∞ 0 ∞ =∫x =∫ ∞ 2 x n / 2−1 e − x / 2σ dx n/2 n 2 σ Γ(n / 2) 2 xn / 2 e − x / 2σ dx n/2 n σ Γ(n / 2) 0 2 =∫ = = ∞ (2σ 2 ) n / 2 u n / 2 − u e (2σ 2 )du ( Substituting u = x / 2σ 2 ) n/2 n σ Γ(n / 2) 0 2 2σ 2 Γ[(n + 2) / 2] Γ(n / 2) 2σ 2 n / 2Γ(n / 2) Γ(n / 2) = nσ 2 Similarly. Then Y = X 12 + X 2 + . . X 2 . . X n be independent zero-mean Gaussian variables each with distribution with mean nσ 2 and varaiance 2 2 2 2 variance σ X .

.Relation between the chi-square distribution and the Gaussian distribution A random variable Let X 1 . Then Y = X 12 + X 2 + . X n be independent zero-mean Gaussian variables each with 2 2 2 2 variance σ X . . Mean and variance of the Rayleigh distribution . The probability density functions for the Rayleigh RVs are illustrated in Fig. X 2 . + X n has χ n distribution with mean Rayleigh Random Variable A Rayleigh random variable is characterized by the PDF ⎧ x − x 2 / 2σ 2 x >0 ⎪ e f X ( x ) = ⎨σ 2 ⎪0 x<0 ⎩ where σ is the parameter of the random variable. . .

Application of the Rayleigh RV • Modelling the root mean square error• Modelling the envelope of a signal with two orthogonal components as in the case of a signal of the following form: s (t ) = X 1 cos wt + Y1 sin wt If X 1 ~ N (0. Then 2 X = X 12 + X 2 has the Rayleigh pdf with the parameter σ . 2 X 12 + X 2 has the . We shall prove this result in a later class. σ 2 ) and X 2 ~ N (0. then the envelope X = Rayleigh distribution. This important result also suggests the cases where the Rayleigh RV can be used.∞ EX = −∞ ∞ ∫ xf X ( x)dx e− x 2 =∫x 0 x / 2σ 2 σ 2 ∞ dx e− x 2 = = = 2π σ ∫ 0 x2 2πσ / 2σ 2 dx 2π σ 2 σ 2 π 2 ∞ σ Similarly EX 2 = ∞ −∞ ∫x 2 f X ( x)dx e− x 2 = ∫ x2 0 x / 2σ 2 σ ∞ 2 dx ( Substituting u = ∞ = 2σ 2 ∫ ue − u du 0 x2 ) 2σ 2 = 2σ 2 ( Noting that ∫ ue 0 −u du is the mean of the exponential RV with λ =1) 2 ∴σ X = 2σ 2 − ( π 2 σ )2 = (2 − π 2 )σ 2 Relation between the Rayleigh distribution and the Gaussian distribution A Rayleigh RV is related to Gaussian RVs as follow: Let X 1 and X 2 be two independent zero-mean Gaussian RVs with a common variance σ 2 . σ 2 ) are independent.

.

1 ]. the inverse transform X = FX−1 (Y ) will have the CDF FX ( x ). We have observed that the random variable Y . This period is very high and a finite sample of data within the period appears to be uniformly distributed. computer simulation is used to study random phenomena in nature and the performance of an engineering system in a noisy environment. All the random number generators rely on a routine to generate random numbers with the uniform pdf. Software packages provide routines to generate such numbers. 1 ]. Sometimes a probability model may not be analytically tractable and computer simulation is used to calculate probabilities. Thus given U [ 0. For example. By the quality of the generated random numbers we mean how closely the empirical CDF or PDF fits the true one. random number Y .Simulation of Random Variables • In many fields of science and engineering. These numbers are pseudo random numbers because they are reproducible and the same sequence of numbers repeats after some period of count specific to the generating algorithm. There are several algorithms to generate U [0 1] random numbers. • Generation of Random Numbers Generation of random numbers means producing a sequence of independent random numbers with a specified CDF or pdf. Such routine is of vital importance because the quality of the generated random numbers with any other distribution depends on it. defined by Y = FX ( X ) ~ U [ 0. The heart of all these application is that it is possible to simulate a random variable with an empirical CDF or pdf that fits well with the theoretical CDF or pdf. We will not discuss about these algorithms. Note that these algorithms generate random number by a reproducible deterministic method. we may study through computer simulation the performance of a communication receiver.. Method of Inverse transform Suppose we want to generate a random variable X with a prescribed distribution function FX ( x ). .

Call it y. Generate a random number from Y ~ U [0. Take x to be the random number generated. 1]. . Example Suppose. Compute the value x such that FX ( x) =y. we want to generate a random variable with the pdf f X ( x) ribution given by ⎧2 0≤ x≤3 ⎪ x f X ( x) = ⎨ 9 ⎪0 otherwise ⎩ The CDF of X is given by ⎧0 ⎪1 ⎪ FX ( x) = ⎨ x 2 ⎪9 ⎪1 ⎩ 0≤ x≤3 x>0 . 3. 2.The algorithmic steps for the inverse transform method are as follows: 1.

1]. we want to generate a random variable with the exponential distribution given by f X ( x ) = λ e − λ x λ > 0. 1] distribution and set FX ( x) = y. the above expression can be simplified as. we can get x by the mapping 1 − e−λ x = y x = − log e (1 − y ) Since 1 − y is also uniformly distributed over [0. we generate a random number y from the U [0. We have 1 2 x =y 9 ⇒ x= 9y Example Suppose. Then FX ( x ) = 1 − e − λ x Therefore given y.FX(x) 1 x2 /9 x 3 Therefore. x ≥ 0. log y x = − λ λ .

Generate y from the U [0. Compute the value xk such that FX ( xk −1 ) ≤ y < FX ( xk ). below. then FX−1 ( y ) = xk The algorithmic steps for the inverse transform method for simulating discrete random variables are as follows: 1. 2. Suppose X is a discrete random variable with the probability mass function p X ( xi ). Set ⎧0 for y ≤ 1 − p x=⎨ ⎩1 otherwise . 3..n. Given y = F X ( x ) . the inverse mapping is defined as shown in the Fig. i = 1. 2. 1] distribution. Take xk to be the random number generated. Generate a random number from Y ~ U [0. Example Generation of Bernoulli random numbers Suppose we want to generate a random number from X ~ Br ( p).. 1] distributed. 1]. Y = FX ( X ) y x k = FX− 1 ( y ) X Thus if FX ( xk −1 ) ≤ y < FX ( xk ).Generation of Gaussian random numbers Generation of discrete random variables We observed that the CDF of a discrete random variable is also U [0. Call it y.

y 1− p 0 1 x .

The joint random variable in the above case id denoted by ( X . P). The mapping variable. Let X and Y be two real random variables defined on the same probability space ( S . The event { X ≤ x} was used to define the probability distribution function FX ( x). event . X 2 ( s ). Y ).The mapping S → n such that for s ∈ S . Example1: Suppose we are interested in studying the height and weight of the students in a class. • We can extend the above definition to define joint random variables of any dimension. F. ( X 1 ( s ). We can define the joint RV ( X . Y ≤ y} = { X ≤ x} ∩ {Y ≤ y} is considered as the representative event. ( X ( s ).. Y..Jointly distributed random variables We may define two or more random variables on the same sample space.... Y ) is a joint random variable. for two random variables { X ≤ x.. Then ( X . S→ 2 such that for s ∈ S . Y ( s )) ∈ 2 is called a joint random ( X ( s ).. Y ) where X represents height and Y represents the weight. Y ( s )) Y (s) 2 R s • S Figure Joint Random Variable Remark • The above figure illustrates the mapping corresponding to a joint random variable. we can find the probability of any event involving the random variable. the X and Similarly. Example 2 Suppose in a communication system X is the transmitted signal and Y is the corresponding noisy received signal. • We may represent a joint random variable as a two-dimensional vector X = [ X Y]′. Given FX ( x). Joint Probability Distribution Function Recall the definition the distribution of a single random variable. X n ( s )) ∈ n is called a n-dimesional random variable and denoted by the vector X = [ X 1 X 2 .. X n ]′.

y2 ) Y ( x2 . y1 ) ≤ FX . Y ≤ y2 } ∴ P{ X ≤ x1 .Y (∞. y1 ) ≥ 0. y1 ) X . Y ≤ y} ⊆ { X ≤ −∞} • FX . • Y ( x2 . y1 ) FX .The probability P{ X ≤ x.Y ( x2 . y) = FX .Y ( x1 . Y ≤ y1 } ⊆ { X ≤ x2 .Y ( x2 . y2 ) ( x1 . y ) X FX . y ).Y ( x. y ) is right continuous in both the variables. y ) ∈ 2 is called the joint distribution function of the random variables X and Y and denoted by FX .Y ( x1 . y1 < Y ≤ y2 } = FX .Y ( x. • FX . • I 1 P{x1 < X ≤ x2 .Y ( x.Y ( x. y2 ) if x1 ≤ x2 and y1 ≤ y2 If x1 < x2 and y1 < y2 . −∞) = 0 Note that { X ≤ −∞.Y ( x2 . Y ≤ y} ∀( x. { X ≤ x1 . Y ( x. y1 ) + FX . If x < x2 and y1 < y2 . y2 ) − FX . y ) satisfies the following properties: • FX . y1 ) ≤ FX .Y (−∞. Y ≤ y1 } ≤ P{ X ≤ x2 .Y ( x2 . y2 ) ( x1 . Y ≤ y2 } ∴ FX . y2 ) − FX .Y ( x1 .Y ( x1 . ∞) = 1.

+∞).Y ( x.Y ( x.Y (2. FX ( x) = FXY ( x. y ) = ⎨ y →∞ elsewhere ⎩0 ⎧1 − e − y y ≥ 0 FY ( y ) = lim FX .Given FX .Y ( x.Y ( x. y ).Y (1. y ≥ 0 FX . 1 < Y ≤ 2} = FX .1) (a) = (1 − e −4 )(1 − e −2 ) + (1 − e −2 )(1 − e −1 ) − (1 − e−2 )(1 − e−2 ) − (1 − e−4 )(1 − e −1 ) =0. y ). 2) + FX . • Given FX . -∞ < x < ∞. To prove this ( X ≤ x ) = ( X ≤ x ) ∩ ( Y ≤ +∞ ) ∴ F X ( x ) = P (X ≤ x ) = P (X ≤ x . the random variables • X we have a complete description of and Y . -∞ < y < ∞.0272 Jointly distributed discrete random variables .Y (1. 2) − FX . y ) = ⎨ otherwise ⎩0 each of FX ( x) and FY ( y ) is called (a) Find the marginal CDFs (b) Find the probability P{11 < X ≤ 2. Example Consider two jointly distributed random variables X and Y with the joint CDF ⎧(1 − e −2 x )(1 − e − y ) x ≥ 0. a marginal distribution function. Y ≤ ∞ )= F XY ( x . +∞ ) Similarly FY ( y ) = FXY (∞. y ) = ⎨ x →∞ elsewhere ⎩0 (b) P{11 < X ≤ 2.1) − FX .Y (2. 1 < Y ≤ 2} ⎧1 − e−2 x x ≥ 0 FX ( x) = lim FX . -∞ < y < ∞.Y ( x. y ). -∞ < x < ∞.

y ) = 1 ( x .5 1 p X ( x) . y ) ∉ RX × RY ( x .Y ( x.Y ( x . Then the joint random variable ( X .35 0.5 0. y ) = 0 for ( x. y )∈RX × RY ∪ {x. y ) y∈RY and similarly p Y ( y ) = ∑ p X .01 pY ( y ) 0. y ) x∈ R X These probability mass functions p X ( x ) and pY ( y ) obtained from the joint probability mass functions are called marginal probability mass functions. The marginal probabilities are as shown in the last column and the last row X Y 0 0 0.39 1 0. Remark • p X . Y ) is completely specified by their joint probability mass function p X . y )∈ RX × RY This is because ∑ ∑ p X . y ) = P x .If X and Y are two discrete random variables defined on the same probability space ( S .Y ( x. y ) ∈ RX × RY . Y ( s )) ∈ ( RX × RY } =P ( S ) = 1 • Marginal Probability Mass Functions: follows p X ( x) = P{ X = x} ∪ RY The probability mass functions p X ( x ) and pY ( y ) are obtained from the joint probability mass function as = ∑ p X .14 0. Y ) can take values from the countable subset in RX × RY .15 0. P) such that X takes values from the countable subset RX and Y takes values from the countable subset RY . The joint random variable ( X .Y ( x. F .Y ( x. we can determine other probabilities involving the random variables X and Y .45 2 0. y} =P ( RX × RY ) =P{s | ( X ( s ).1 0. y ). Y ( s ) = y}. Example Consider the random variables X and Y with the joint probability mass function as tabulated in Table .Y ( x. y )∈ RX × RY • ∑ ∑ p X .Y ( x. Given p X . ∀( x.25 0. y ) = P{s | X ( s ) = x.

Y ( x. f X . y ) d x −∞ ∞ Remark • The marginal CDF and pdf are same as the CDF and pdf of the concerned single random variable. ∂x∂y x y −∞ −∞ Clearly FX .Y ( x. y ) is always a non-negaitive quantity. y )dxdy ( x . That is. y ) ∈ 2 • • ∞ ∞ −∞ −∞ ∫ ∫ f X . y )∈B Marginal density functions The marginal density functions f X ( x) and fY ( y ) of two joint RVs X and Y are given by the derivatives of the corresponding marginal distribution functions. y ) d y −∞ a n d s im ila rly f Y ( y ) = ∫ f X .Y ( x. ∞ ) −∞ x ∞ ∫ ( ∫ f X .Joint Probability Density Function and Y are two continuous random variables and their joint distribution function is continuous in both x and y. . y ) ≥ 0 ∀( x. y ) d y ) d u −∞ = ∫ f X .Y ( x. v)dudv Properties of Joint Probability Density Function • f X . then we can define joint probability density function f X .Y ( x .Y ( x. The marginal term simply refers that it is derived from the corresponding joint distribution or density function of two or more jointly random variables.Y ( x.Y ( u . y ) by If X f X . y )dxdy = 1 The probability of any Borel set B can be obtained by P( B) = ∫∫ f X .Y ( x. y ) = ∂2 FX . provided it exists. Thus fX (x) = = = d dx d dx d dx ∞ FX ( x) FX ( x.Y (u .Y ( x . y ) = ∫ ∫ f X .Y ( x. y ).

•

With the help of the two-dimensional Dirac Delta function, we can define the joint pdf of two discrete jointly random variables. Thus for discrete jointly random variables X and Y . f X ,Y ( x, y ) = ∑ ∑ p X ,Y ( x, y )δ ( x − x j , y − y j )

( xi , y j )∈R X × RY .

**Example The joint density function f X ,Y ( x, y ) in the previous example are
**

f X ,Y ( x, y ) = = ∂2 FX ,Y ( x, y ) ∂x∂y ∂2 [(1 − e −2 x )(1 − e − y )] x ≥ 0, y ≥ 0 ∂x∂y x ≥ 0, y ≥ 0

= 2e −2 x e − y

**Example: The joint pdf of two random variables X f X ,Y ( x, y ) = cxy 0 ≤ x ≤ 2, 0 ≤ y ≤ 2
**

(i) (ii) (iii) (iv)

and Y are given by

**= 0 otherwise Find c. Find FX , y ( x, y ) Find f X ( x) and fY ( y ). What is the probability P (0 < X ≤ 1, 0 < Y ≤ 1) ?
**

∞

∫ ∫

∞

−∞ −∞

f X ,Y ( x, y )dydx = c ∫ 1 4

2

0

∫

2

0

xydydx

∴c =

FX ,Y ( x, y ) =

1 y x uvdudv 4 ∫0 ∫0 x2 y2 = 16 2 xy f X ( x) = ∫ dy 0 ≤ y ≤ 2 4 0 = x 2 y 2 0≤ y≤2

Similarly fY ( y ) = 0≤ y≤2

P (0 < X ≤ 1, 0 < Y ≤ 1) = FX ,Y (1,1) + FX ,Y (0, 0) − FX ,Y (0,1) − FX ,Y (1, 0) 1 +0−0−0 16 1 = 16 =

Jointly distributed random variables We may define two or more random variables on the same sample space. Let X and Y be two real random variables defined on the same probability space ( S , F, P). The mapping variable.

S→

2

such that

for s ∈ S , ( X ( s ), Y ( s )) ∈

2

**is called a joint random
**

( X ( s ), Y ( s ))

Y (s)

2 R

s •

S

Figure Joint Random Variable Remark • The above figure illustrates the mapping corresponding to a joint random variable. The joint random variable in the above case id denoted by ( X , Y ). • We may represent a joint random variable as a two-dimensional vector X = [ X Y]′. • We can extend the above definition to define joint random variables of any dimension.The mapping S → n such that for s ∈ S , ( X 1 ( s ), X 2 ( s ),...., X n ( s )) ∈ n is called a n-dimesional random variable and denoted by the vector X = [ X 1 X 2 .... X n ]′.

Example1: Suppose we are interested in studying the height and weight of the students in a class. We can define the joint RV ( X , Y ) where X represents height and Y represents the weight. Example 2 Suppose in a communication system X is the transmitted signal and Y is the corresponding noisy received signal. Then ( X , Y ) is a joint random variable.

Joint Probability Distribution Function

Recall the definition the distribution of a single random variable. The event { X ≤ x} was used to define the probability distribution function FX ( x). Given FX ( x), we can find the probability of any event involving the random variable.

Y, the X and Similarly, for two random variables { X ≤ x, Y ≤ y} = { X ≤ x} ∩ {Y ≤ y} is considered as the representative event.

event

**The probability P{ X ≤ x, Y ≤ y} ∀( x, y ) ∈ 2 is called the joint distribution function of the random variables X and Y and denoted by FX ,Y ( x, y ).
**

Y

( x, y )

X

FX ,Y ( x, y ) satisfies

the following properties:

•

FX ,Y ( x1 , y1 ) ≤ FX ,Y ( x2 , y2 ) if x1 ≤ x2 and y1 ≤ y2

If x1 < x2 and y1 < y2 , { X ≤ x1 , Y ≤ y1 } ⊆ { X ≤ x2 , Y ≤ y2 } ∴ P{ X ≤ x1 , Y ≤ y1 } ≤ P{ X ≤ x2 , Y ≤ y2 } ∴ FX ,Y ( x1 , y1 ) ≤ FX ,Y ( x2 , y2 )

Y

( x2 , y2 )

( x1 , y1 )

**FX ,Y (−∞, y) = FX ,Y ( x, −∞) = 0 Note that { X ≤ −∞, Y ≤ y} ⊆ { X ≤ −∞} • FX ,Y (∞, ∞) = 1. • FX ,Y ( x, y ) is right continuous in both the variables. If x < x2 and y1 < y2 , • I 1 P{x1 < X ≤ x2 , y1 < Y ≤ y2 } = FX ,Y ( x2 , y2 ) − FX ,Y ( x1 , y2 ) − FX ,Y ( x2 , y1 ) + FX ,Y ( x1 , y1 ) ≥ 0.
**

•

Y

( x2 , y2 )

( x1 , y1 )

X

**Given FX ,Y ( x, y ), -∞ < x < ∞, -∞ < y < ∞, the random variables •
**

X

we have a complete description of

and Y .

FX ( x) = FXY ( x,+∞).

To prove this

( X ≤ x ) = ( X ≤ x ) ∩ ( Y ≤ +∞ ) ∴ F X ( x ) = P (X ≤ x ) = P (X ≤ x , Y ≤ ∞

)=

F

XY

( x , +∞ )

**Similarly FY ( y ) = FXY (∞, y ). • Given FX ,Y ( x, y ), -∞ < x < ∞, -∞ < y < ∞, a marginal distribution function.
**

Example Consider two jointly distributed random variables X and Y with the joint CDF ⎧(1 − e −2 x )(1 − e − y ) x ≥ 0, y ≥ 0 FX ,Y ( x, y ) = ⎨ otherwise ⎩0

each of FX ( x) and FY ( y ) is called

**(a) Find the marginal CDFs (b) Find the probability P{11 < X ≤ 2, 1 < Y ≤ 2} ⎧1 − e−2 x x ≥ 0 FX ( x) = lim FX ,Y ( x, y ) = ⎨ y →∞ elsewhere ⎩0 ⎧1 − e − y y ≥ 0 FY ( y ) = lim FX ,Y ( x, y ) = ⎨ x →∞ elsewhere ⎩0
**

(b) P{11 < X ≤ 2, 1 < Y ≤ 2} = FX ,Y (2, 2) + FX ,Y (1,1) − FX ,Y (1, 2) − FX ,Y (2,1)

(a)

= (1 − e −4 )(1 − e −2 ) + (1 − e −2 )(1 − e −1 ) − (1 − e−2 )(1 − e−2 ) − (1 − e−4 )(1 − e −1 ) =0.0272

Jointly distributed discrete random variables

Y ( x. we can determine other probabilities involving the random variables X and Y .5 0. Y ) can take values from the countable subset in RX × RY . Given p X . y ) y∈RY and similarly p Y ( y ) = ∑ p X . P) such that X takes values from the countable subset RX and Y takes values from the countable subset RY . y ) = P{s | X ( s ) = x.1 0.45 2 0. y ).01 pY ( y ) 0. y )∈ RX × RY • ∑ ∑ p X .Y ( x. y ) = P x . The joint random variable ( X . Y ) is completely specified by their joint probability mass function p X . y ) = 1 ( x . F . ∀( x. Example Consider the random variables X and Y with the joint probability mass function as tabulated in Table . y ) ∉ RX × RY ( x .39 1 0.Y ( x. Y ( s ) = y}.15 0. y} =P ( RX × RY ) =P{s | ( X ( s ). Remark • p X . y ) x∈ R X These probability mass functions p X ( x ) and pY ( y ) obtained from the joint probability mass functions are called marginal probability mass functions. Then the joint random variable ( X .25 0.Y ( x. The marginal probabilities are as shown in the last column and the last row X Y 0 0 0. y ) = 0 for ( x. y )∈ RX × RY This is because ∑ ∑ p X .If X and Y are two discrete random variables defined on the same probability space ( S . Y ( s )) ∈ ( RX × RY } =P ( S ) = 1 • Marginal Probability Mass Functions: follows p X ( x) = P{ X = x} ∪ RY The probability mass functions p X ( x ) and pY ( y ) are obtained from the joint probability mass function as = ∑ p X . y ) ∈ RX × RY .35 0.5 1 p X ( x) .Y ( x.Y ( x . y )∈RX × RY ∪ {x.Y ( x.14 0.

y ) is always a non-negaitive quantity. y )dxdy = 1 The probability of any Borel set B can be obtained by P( B) = ∫∫ f X .Y ( x.Y ( x. That is. Thus fX (x) = = = d dx d dx d dx ∞ FX ( x) FX ( x. ∂x∂y x y −∞ −∞ Clearly FX . v)dudv Properties of Joint Probability Density Function • f X . y ) = ∂2 FX . .Y ( x. ∞ ) −∞ x ∞ ∫ ( ∫ f X .Y ( u . y )∈B Marginal density functions The marginal density functions f X ( x) and fY ( y ) of two joint RVs X and Y are given by the derivatives of the corresponding marginal distribution functions. provided it exists. y ) by If X f X .Y ( x. y ) d x −∞ ∞ Remark • The marginal CDF and pdf are same as the CDF and pdf of the concerned single random variable.Y ( x .Y (u . y ) d y −∞ a n d s im ila rly f Y ( y ) = ∫ f X .Y ( x.Joint Probability Density Function and Y are two continuous random variables and their joint distribution function is continuous in both x and y.Y ( x.Y ( x . y ) ∈ 2 • • ∞ ∞ −∞ −∞ ∫ ∫ f X . y ) = ∫ ∫ f X . f X . y ).Y ( x. then we can define joint probability density function f X . y ) d y ) d u −∞ = ∫ f X . y ) ≥ 0 ∀( x.Y ( x. The marginal term simply refers that it is derived from the corresponding joint distribution or density function of two or more jointly random variables. y )dxdy ( x .

What is the probability P (0 < X ≤ 1. y ≥ 0 = 2e −2 x e − y Example: The joint pdf of two random variables X f X . we can define the joint pdf of two discrete jointly random variables.Y ( x. y ) Find f X ( x) and fY ( y ). y ≥ 0 ∂x∂y x ≥ 0. y )dydx = c ∫ 1 4 2 0 ∫ 2 0 xydydx ∴c = FX .Y ( x. y ) = ∑ ∑ p X . Example The joint density function f X . y ) ∂x∂y ∂2 [(1 − e −2 x )(1 − e − y )] x ≥ 0.Y ( x. y ( x.• With the help of the two-dimensional Dirac Delta function. y − y j ) ( xi . y j )∈R X × RY . Thus for discrete jointly random variables X and Y .Y ( x. 0 ≤ y ≤ 2 (i) (ii) (iii) (iv) and Y are given by = 0 otherwise Find c. y ) in the previous example are f X .Y ( x. y ) = cxy 0 ≤ x ≤ 2.Y ( x. Find FX . y ) = = ∂2 FX .Y ( x. y ) = 1 y x uvdudv 4 ∫0 ∫0 x2 y2 = 16 2 xy f X ( x) = ∫ dy 0 ≤ y ≤ 2 4 0 = x 2 y 2 0≤ y≤2 Similarly fY ( y ) = 0≤ y≤2 . 0 < Y ≤ 1) ? ∞ ∫ ∫ ∞ −∞ −∞ f X .Y ( x. f X . y )δ ( x − x j .

1) + FX .Y (0. 0) − FX .P (0 < X ≤ 1. 0 < Y ≤ 1) = FX .Y (1.Y (1.1) − FX . 0) 1 +0−0−0 16 1 = 16 = .Y (0.

Two discrete random variables X and Y are said to be independent if and only if pY / X ( y / x ) = pY ( y ) so that p X .Y ( x. y ) = provided p X ( x) ≠ 0 p X ( x) Similarly we can define the conditional probability mass function pX / Y ( x / y) = • From the definition of conditional probability mass functions.01 pY ( y ) 0. we can define two independent random variables.39 1 0.25 0. y ) = p X ( x ) pY ( y ) • Bayes rule: p X / Y ( x / y ) = P({ X = x}/{Y = y}) = = = P({ X = x}{Y − y}) P{Y = y} p X .Y ( y ) Example Consider the random variables X and Y with the joint probability mass function as tabulated in Table .14 0.Conditional probability mass functions pY / X ( y / x) = P({Y = y}/{ X = x}) P ({ X = x}{Y − y}) P{ X = x} p X .Y ( x.45 2 0.35 0. y ) x∈RX ∑ p X . y ) pY ( y ) p X .15 0. y ( x.5 1 p X ( x) The marginal probabilities are as shown in the last column and the last row .1 0.5 0. X Y 0 0 0.Y ( x.

The conditional distribution function is defined in the limiting sense as follows: F Y/X ( y / x) = lim∆x →0 P (Y ≤ y / x < X ≤ x + ∆x) =lim∆x →0 P (Y ≤ y .Y ( x. Let us define the conditional distribution function .39 Conditional Probability Density Function f Y / X ( y / X = x) = f Y / X ( y / x) is called conditional density of Y given X .1) p X (1) 0. u )∆xdu f X ( x ) ∆x y =lim∆x →0 y ∫ f X . X = x) P ( X = x) as both the numerator and the denominator are zero for the above expression. u )du =∞ f X ( x) The conditional density is defined in the limiting sense as follows f Y / X ( y / X = x) = lim ∆y →0 ( FY / X ( y + ∆y / X = x) − FY / X ( y / X = x)) / ∆y = lim ∆y →0.pY / X (0 /1) = = p X .14 0.Y (0.∆x→0 ( FY / X ( y + ∆y / x < X ≤ x + ∆x) − FY / X ( y / x < X ≤ x + ∆x)) / ∆y Because ( X = x) = lim ∆x →0 ( x < X ≤ x + ∆x) ( The right hand side in equation (1) is . x < X ≤ x + ∆x) P ( x < X ≤ x + ∆x) ∞ ∫ f X . We cannot define the conditional distribution function for the continuous random variables X and Y by the relation F Y/X ( y / x) = P(Y ≤ y / X = x) = P(Y ≤ y.Y ( x.

Y ( x. y ) / fY ( y ) (3) • • Two random variables are statistically independent if for all ( x.Y ( x.Y ( x. y ) / f X ( x) ∴ fY / X ( x / y ) = f X .∆x→0 ( FY / X ( y + ∆y / x < X < x + ∆x) − FY / X ( y / x < X < x + ∆x)) / ∆y = lim ∆y →0.lim ∆y →0. ∆x →0 ( P( y < Y ≤ y + ∆y / x < X ≤ x + ∆x)) / ∆y = lim ∆y →0. y )∆x∆y / f X ( x)∆x∆y = f X . y ) / f X ( x) (2) Similarly we have ∴ f X / Y ( x / y ) = f X .Y ( x.Y ( x. y ) = f X ( x) fY ( y ) 2 . ∆x →0 f X . (4) Bayes rule for continuous random variables: From (2) and (3) we get Baye’s rule . y ) ∈ fY / X ( y / x ) = fY ( y ) or equivalently f X . x < X ≤ x + ∆x)) / P ( x < X ≤ x + ∆x)∆y = lim ∆y →0. ∆x →0 ( P( y < Y ≤ y + ∆y.

Y ( x. fY ( y ) and fY / X ( y / x). y) = ∞ X .Y ∫ f X .Y ( x.Y (x. y)dx −∞ f ( y / x) f X (x) = ∞ Y/X ∫ f X (u) fY / X ( y / x)du −∞ (4) Given the joint density function we can find out the conditional density function. In practical problem we may have to estimate X from observed Y . Example: For random variables X and Y.∴ f X /Y (x / y) = = f X . Then . y ) f X ( x) Baye’s Rule for mixed random variables Let X be a discrete random variable with probability mass function p X ( x) and Y be a continuous random variable with the conditional probability density function fY / X ( y / x). Are X and Y independent? f X ( x) = 1 + xy dy 4 −1 ∫ 1 1 2 Similarly = fY ( y ) = 1 2 -1 ≤ y ≤ 1 and fY / X ( y / x ) = = f X . Y ≤ 1 4 = 0 otherwise Find the marginal density f X ( x). y) fY ( y) f X (x) fY / X ( y / x) fY ( y) f (x. y ) = X ≤ 1.Y (x. the joint probability density function is given by 1 + xy f X .

Y ( x. y ) ∈ 2 . { X ≤ x} and {Y ≤ y} are . y ) = P{ X ≤ x. y < Y ≤ y + ∆y) P(y < Y ≤ y + ∆y) p X ( x ) fY / X ( y / x ) ∆ y fY ( y ) ∆ y ∆y→ 0 p X ( x ) fY / X ( y / x ) fY ( y ) p X ( x ) fY / ∑ p X ( x ) fY x X / ( y / x) X ( y / x) Example X + Y V X is a binary random variable with ⎧1 with probability p X =⎨ ⎩ −1 with probability 1-p V is the Gaussian noise with mean 0 and variance σ .Y ( x. ∀( x.pX /Y ( x / y ) = lim = lim = lim = = ∆y→ 0 ∆y→ 0 P(X = x / y < Y ≤ y + ∆y) P ( X = x. y ) = Then ∂2 ∂x∂y FX .2 Then p X / Y ( x = 1/ y ) = = pe pe − ( y −1)2 / 2σ 2 p X ( x ) fY / X ( y / x ) ∑ p X ( x ) fY / X ( y / x ) x / 2σ 2 − ( y −1)2 / 2σ 2 + (1 − p)e − ( y +1) 2 Independent Random Variables Let X and Y be two random variables characterised by the joint distribution function FX . y ) X and Y are independent if independent events. Thus.Y ( x. Y ≤ y} and the corresponding joint density function f X .

Y ( x. y ) = ∂ 2 FX . Y ≤ y} =P{ X ≤ x}P{Y ≤ y} = FX ( x) FY ( y ) ∴ f X .Y ( x.Y ( x. y ) ∂x∂y = f X ( x ) fY ( y ) and equivalently fY / X ( y) = fY ( y) Remark: Suppose X and Y are two discrete random variables with joint probability mass function p X .Y ( x. y ) = p X ( x ) pY ( y ) ∀(x. y ). y ) = P{ X ≤ x. Then X and Y are independent if p X .FX .Y ( x.y) ∈ RX × RY .

• The received signal by a communication receiver is given by Z = X +Y where Z is received signal which is the superposition of the message signal X and the noise Y . demodulation. given two random variables X and Y with joint probability density function f X . correlation etc. Y each z. We can find a variable subset Z DZ {Z ≤ z} X . X Z + Y The frequently applied operations on communication signals like modulation. we will try to address this problem.Transformation of two random variables: We are often interested in finding out the probability density function of a function of two or more RVs. Following are a few examples. • Probability density of the function of two random variables: We consider the transformation g : R 2 → R. Y ) . y ) and a function Z = g ( X . We have to know about the probability distribution of Z in any analysis of Z . y ) | g ( x. Consider the event DZ ⊆ 2 {Z ≤ z} corresponding to such that DZ = {( x.Y ( x. y ) ≤ z} . we have to find f Z ( z ) . In this lecture. involve multiplication of two signals in the form Z = XY. More formally.

y ) ∈ Dz } ( x . y ) | ( x. u − x ) dx ⎥ du −∞ ⎣ −∞ ⎦ substituting y = u .Y ( x.Y ( x. y )dy ⎥ dx ∫ −∞ ⎣ ⎦ ∞ ⎡z ⎤ = ∫ ⎢ ∫ f X . y ) dydx ∴ fZ ( z ) = dFZ ( z ) dz Example: Suppose Z = X+Y. Z≤z => X + Y ≤ Z Therefore DZ is the shaded region in the fig below. . y )∈Dz ∫∫ f X .Y ( x. z − x ) = f X ( x ) fY ( z − x ) ∴ fZ ( z ) = ∞ −∞ ∫ f X ( x ) fY ( z − x ) dx = f X ( z ) * fY ( z ) Where * is the convolution operation.x interchanging the order of integration z ∞ ⎤ d ⎡ ∴ fZ ( z ) = ⎢ ∫ f X .Y ( x. u − x ) dx If X and Y are independent f X . y )∈Dz ∞ ∫∫ f X . u − x ) dx ⎥ du ∫ dz −∞ ⎣ −∞ ⎦ ∞ = −∞ ∫ f X . y ) dxdy ⎡ z−x ⎤ ∫ ⎢ −∞ f X .Y ( x.Y ( x. Find the PDF f Z ( z ) .Y ( x.Y ( x. ∴ FZ ( z ) = = ( x . u − x ) du ⎥ dx −∞ ⎣ −∞ ⎦ ∞ z ⎡ ⎤ = ∫ ⎢ ∫ f X .∴ FZ ( z ) = P({Z ≤ z}) = = P {( x.

Y ( x. x ⎟ dudx x −∞ ⎝ ⎠ ∞ Substituting u = xy du = xdy . y ) dydx = −∞ −∞ ∫ (∫ z x f X . y )∈Dz ∞ ∫∫ f X .Y ⎜ x. Probability density function of Z = XY FZ ( z ) = ( x . b). f X ( x ) and fY ( y ) are as shown in the figure below. The PDF of z is a triangular probability density function as shown in the figure. y ) dy )dx ∞ = −∞ ∫ 1 ⎛ u⎞ ∫ f X .Example: Suppose X and Y are independent random variables and each uniformly distributed over (a.Y ( x.

X and Y are independent random variables. k ) dx y f X . z . u ) dy −∞ Example: . y ) dydx z −∞ ∫ x∫ f X .Y ( x. y )∈Dz ∞ zx ∫∫ f X .Y ( z .Y ( x.∴ fZ ( z ) = d 1 ⎛ z⎞ FZ ( z ) = ∫ f X . y ) dydx −∞ −∞ ∞ ∫∫ f XY ( x. zx ) = f X ( x ) fY ( zx ) fZ ( z ) = ∞ −∞ ∫ x f X ( x ) fY ( zx ) dx ∴ fZ ( z ) = = = d FZ ( z ) dz ∞ −∞ ∞ ∫ ∫ x f X . ux ) dudx −∞ Suppose. y ≤ xz} ∴ FZ ( z ) = = = ( x . ⎟ dx dz x ⎝ x⎠ −∞ ∞ ∞ = −∞ ∫ ⎛z ⎞ 1 f X .Y ( x.Y ⎜ x. Then. y ) | Z ≤ z} = {−∞ < x ≤ ∞.Y ( x.Y ⎜ . f X . y ⎟ dy y ⎝y ⎠ Y X Probability density function of Z = Z≤z Y => ≤ z X => Y ≤ xz ∴ Dz = {( x. y.

Probability density function of Z = ∴ Dz = {( x. Then . z sin θ ) zdθ Example Suppose X and Y are two independent Gaussian random variables each with mean 0 and variance σ 2 and Z = X 2 + Y 2 . y ) | 2π z x2 + y 2 ≤ z } = {( r . y ) | Z ≤ z} = X2 +Y2 {( x.Suppose X and Y are independent zero mean Gaussian random variable with unity Y standard derivationand Z = . θ ) | 0 ≤ r ≤ z . Then X fZ ( z ) = ∞ −∞ ∞ ∫ x 1 − x2 e 2π 2 1 − 3 2x e dx 2π 2 2 1 = 2π = = 1 ∞ −∞ 1 − x 2 1+ z 2 2 ∫ xe 1 − x 2 1+ z 2 2 ( ) dx π ∫ xe 0 ( ) dx 1 π (1 + z 2 ) which is the Cauchy probability density function. y )∈Dz ∫∫ f X .Y ( x. y ) dydx ∫ ∫ f ( r cos θ . r sin θ ) rdrdθ XY 0 0 2π d ∴ fZ ( z ) = FZ ( z ) = dz ∫ 0 f XY ( z cos θ . 0 ≤ θ ≤ 2π } ∴ FZ ( z ) = = ( x .

fZ ( z ) = 2π ∫ 0 f XY ( z cos θ . Rician Distribution: Suppose X and Y are independent Gaussian variables with non zero mean µ X and µY respectively and constant variance. Then . Here 2 2⎞ 1 ⎛ ⎜ ( x − µ X ) + ( y − µY ) ⎟ ⎟ 2σ 2 ⎜ ⎝ ⎠ f X . y ) = and 1 2πσ 2 − e Z = X2 +Y2 We have shown that fZ ( z ) = 2π ∫ 0 f XY ( z cos θ .Y ( x. z sin θ ) zdθ = z ∫ f X ( z cos θ ) fY ( z sin θ ) dθ 0 2π 2π − z2 = z∫ 0 e z2 2σ 2 cos2 θ . Envelope of a sinusoidal + a narrow band Gaussian noise. Received noise in a multipath situation. z sin θ ) zdθ Suppose µ X = µ cos φ and µY = µ sin φ .e − z2 2σ 2 sin 2 θ 2πσ 2 2σ 2 dθ − = ze σ2 z≥0 The above is the Rayleigh density function we discussed earlier. We have to find the joint density function of the random variable Z = X 2 + Y 2 .

Y ( z cos θ .f X . z sin θ ) = = ∴ fZ ( z ) = 1 2πσ 2 1 e 1 − e − 2⎞ 2 1 ⎛ ⎜ ( z cos θ − µ cos φ ) + ( z sin θ − µ csin φ ) ⎟ ⎟ 2σ 2 ⎜ ⎝ ⎠ 2πσ 2 2π 0 ( z2 +µ 2 ) 2σ 2 e z µ cos(θ −φ ) − ( z 2 + µ 2 ) z µ cos(θ −φ ) 2σ 2 e σ 2 zdθ ∫ 2πσ 2 e − ze = 2πσ 2 ( z2 + µ 2 ) 2σ 2 2π ∫ 0 z µ cos(θ −φ e σ2 ) dθ .

.

y + 2 dz ) δ z1 δ z1 X Z1 We can similarly find the points in the X − Y plane corresponding to ( z1 . z2 ) hve to find out the joint probability density function f Z1 . We and z2 = g 2 ( x. z2 ) where z1 = g1 ( x. z2 ) ( z1 + dz1 . Suppose the inverse mapping relation is x = h1 ( z1 .probability density function f Z1 . It can be shown the differential parallelogram at ( x. z2 ) in the Z1 − Z 2 plane. y ) . Y ) and Z 2 = g 2 ( X . We have to find out the joint Consider a differential region of area dz1 dz2 at point ( z1 . Let us see how the corners of the differential region are mapped to the X − Y plane. g 2 ) : R 2 → R 2 . Y ) . z2 ) dz1 dz2 where J ( z1 . Z2 ( z1 . z2 ) is the Jacobian of the transformation defined as the determinant . y + 2 dz ) in the δ z1 δ z1 X − Y plane. Y Z2 ( x. y ) Joint Probability Density Function of two functions of two random variables We consider the transformation ( g1 . δh δh The point ( z1 + dz1 . y ) ( z1 . z2 ) (x + δ h1 δh dz1 . z2 ) + 2 dz1 = y + 2 dz1 δ z1 δ z1 Therefore. z2 + dz2 ) and ( z1 + dz1 . z2 ) + 1 dz1 = x + 1 dz1 δ z1 δ z1 δh δh h2 ( z1 + dz1 . z2 ) and y = h2 ( z1 . Z2 ( z1 . z2 ) = h1 ( z1 . z2 ) is mapped to the point ( x + 1 dz1 . The mapping is shown in Fig. z2 ) where Z1 = g1 ( X . z2 ) = h1 ( z1 . y ) has a area J ( z1 . Observe that δh δh h1 ( z1 + dz1 . z2 + dz2 ). We notice that each differential region in the X − Y plane is a parallelogram.

. Z2 ( z1 . y ). z2 ) = 1 δ h2 δ z1 δ h1 δ z2 δ h2 δ z2 Further. the differential parallelogram in Fig. has an area of Suppose the transformation z1 = g1 ( x. y ) does not have a root in ( x.n be the roots.Y ( x. y ) has n roots and let ( xi . y ) and z2 = g 2 ( x. Z2 ( z1 . 2. The inverse mapping of the differential region in the X − Y plane will be n differential regions corresponding to n roots. y ) J ( xi . i = 1.. z2 ) = 0. J ( x. yi ) ∴ f Z1 . y ) = δ g2 δx δ g1 δy δ g2 δy dz1 dz2 . y ) and z2 = g 2 ( x. yi ) Remark • If z1 = g1 ( x. The inverse mapping is illustrated in the following figure for n = 4. it can be shown that the absolute values of the Jacobians of the forward and the inverse transform are inverse of each other so that 1 J ( z1 . then f Z1 . z2 ) = ∑ i =1 n f X .δ h1 δz J ( z1 . z2 ) dz1 dz2 = ∑ f X .Y ( x. z2 ) = J ( x. . y ) where δ g1 δx J ( x. y ) Therefore. y ) i =1 n dz1 dz2 J ( xi . yi ). Z 2 ( z1 . As these parallelograms are nonoverlapping. f Z1 .

z2 ) ( z1 + dz1 . z2 ) ( z1 + dz1 . y + dz1 + dz ) δ z1 δ z2 δ z1 δ z2 2 (x + Z2 ( z1 . z2 + dz2 ) ( z1 + dz1 . y ) (x + Z1 X Y Z2 ( z1 . z2 + dz2 ) δx δy dz z . z2 ) Z1 X Example: pdf of linear transformation Z1 = aX + bY Z 2 = cX + dY Then . y + dz ) δ z1 δ z1 1 ( z1 . z2 ) ( x. y + dz ) δ z2 δ z2 2 δx δy dz1 .(x + Y δx δx δy δy dz1 + dz2 .

y) ⎤ ⎥ J ⎥ x =r cosθ ⎦ y = r sinθ = = 2πσ 2 2πσ 2 r r 2 2 − r 2 cos2 θ − r 2 sin 2 θ e 2σ . find f R (r ) and fθ (θ ) . Given R = X 2 + Y 2 and θ = tan −1 .θ (r . e 2σ 2 − r 2 2σ e .dz1 − bz2 az − cz1 . X x= Solution: We have x = r cos θ and y = r sin θ so that r 2 = x 2 + y 2 …………. y= 2 ad − bc ad − bc a b J ( x. (1) and tan θ = y ……………… (2) x From (1) ∂r x = = cos θ ∂x r ∂r y = = sin θ and ∂y r From (2) ∂θ −y sin θ = 2 =− 2 r ∂x x + y ∂θ x cos θ = 2 = 2 r ∂y x + y ⎡ cos θ ∴ J = det ⎢ sin θ ⎢− ⎣ r sin θ ⎤ ⎥ 1 cos θ ⎥ = r r ⎦ ∴ f R. y ) = = ad − bc c d Suppose X and Y are two independent Gaussian random variables each with mean 0 and Y variance σ 2 .θ ) = f X ( x.

X 2 +Y2 Y φ = tan −1 X Z= We have to find J(x. We have to find the joint density function of the random variable Z = X 2 + Y 2 . Envelope of a sinusoidal + a narrow band Gaussian noise. y) corresponding to z and φ . Received noise in a multipath situation. y ) = det ⎢ ⎢ ∂φ ∂φ ⎥ ⎢ ∂x ∂y ⎥ ⎣ ⎦ From We have.2π ∴ f R (r ) = ∫ 0 f R . z = x2 + y2 y x y and tan φ = x and tan φ = z 2 = x2 + y2 .θ (r .θ (r . θ )dθ 2 − r 2 e 2σ = fθ (θ ) = ∫ f R. ⎡ ∂z ∂z ⎤ ⎢ ∂x ∂y ⎥ ⎥ J ( x. Therefore.θ )dr 0 ∞ σ2 r 0≤r <∞ = 2πσ 2 ∫ 0 1 ∞ 2 − r 2 2σ dr re = 1 0 ≤ θ ≤ 2π 2π Rician Distribution: • X and Y are independent Gaussian variables with non zero mean µ X and µY respectively and constant variance.

y ) = det ⎢ y 2 cos 2 φ ⎢− ⎢ x2 ⎣ cos3 φ y sin φ cos 2 φ = + x x2 ⎛ x cos φ + y sin φ ⎞ = cos 2 φ ⎜ ⎟ x2 ⎝ ⎠ 2 z cos φ 1 = = x2 z sin φ ⎤ ⎥ 1 2 ⎥ cos x ⎥ x ⎦ Consider the transformation as shown in the diagram below: .and ∂z x = = cos φ ∂x z ∂z y = = sin φ ∂y z also and ∂φ y y =− 2 = − 2 cos 2 φ 2 x sec φ x ∂x ∂φ y 1 = = cos 2 φ 2 ∂y x sec φ x ⎡ cos φ ∴ J ( x.

f X . y) = 2πσ 2 1 e 2⎞ ⎛ 2 − 1 2 ⎜ ( z cosφ − µ cosφ0 ) + ( z sin φ − µ sin φ ) ⎟ ⎜ ⎟ 2σ ⎝ ⎠ = 2πσ 2+ 2 z 1 e− z 2σµ eσµ cos(φ −φ0 ) 2 2 = 2 2πσ 1 e− 2σ 2 ⎜⎝ z2 −2 zµ cos(φ −φ0 )+ µ 2 ⎟⎠ 2 1 ⎛ ⎞ . X − µ X = Z cos φ − µ cos φ0 and Y − µY = Z sin φ − µ sin φ0 f X . y) corresponding to z and φ .Y ( x.Y ( x. From the above figure. y) = 2πσ 2 1 e 2 2⎞ ⎛ − 1 2 ⎜ ( x − µ X ) + ( y − µY ) ⎟ ⎟ 2σ ⎜ ⎝ ⎠ We have to find the density at (x.

yi )∆xi ∆ yi g ( xi . y )dxdy −∞ −∞ ∞ ∞ Thus EZ can be computed without explicitly determining f Z ( z ).Expected Values of Functions of Random Variables Recall that • If Y = g ( X ) is ∞ −∞ a function of a continuous random variable X . yi ) f X . y ) f X .Y ( x. Y ) is a function of continuous random variables X and Y .Y ( xi . y )dxdy −∞ −∞ −∞ ∞ ∞ ∞ Thus. yi )∈∆Di ∑ As z is varied over the entire the entire Z axis. We can establish the above result as follows. n at Z = z. the corresponding (nonoverlapping) differential regions in X − Y plane covers the entire plane.. Then { z < Z ≤ z + ∆z} = ∪ {( xi . yi )∈Di ∑ f X . y ) f X . yi )∈∆Di ( xi . y ) = ∫ zf Z ( z )dz −∞ ∞ = ∫ ∫ g ( x. yi ). then EY = Eg ( X ) = ∫ g ( x) f X ( x)dx • If Y = g ( X ) is a function of a discrete random variable X . then EY = Eg ( X ) = ∑ g ( x) p X ( x) x∈RX Suppose Z = g ( X .. Y ) has n roots ( xi . yi ). Suppose Z = g ( X ..Y ( xi . 2.Y ( x. ∴ ∫ zf Z ( z )dz = ∫ ∫ g ( x. P ({ z < Z ≤ z + ∆z}) = f Z ( z )∆z = ∴ zf Z ( z )∆z = = ( xi . yi )∆xi ∆ yi ∑ zf X . i = 1. . for n = 3. yi ) ∈ ∆Di } i =1 n where ∆Di is the differential region containing ( xi . then the expected value of Z is given by EZ = Eg ( X .Y ( xi . yi )∆xi ∆ y ( xi . The mapping is illustrated in Fig.

Y ) = EX 2Y = ∫ ∫ g ( x. y ) x . y ) pX . y ) = xy 0 ≤ x ≤ 2.Y ( x. y ) f X . y )dxdy −∞ −∞ ∞ ∞ Y Z {z < Z ≤ z + ∆z} ∆D1 ∆D2 X ∆D3 If Z = g ( X . We can similarly show that EZ = Eg ( X .Y ( x. y∈R X × RY Example: The joint pdf of two random variables X 1 f X .Eg ( X . y ) = ∫ ∫ g ( x. y ) f X . Y ) = X 2Y and Y are given by Eg ( X . Y ) is a function of discrete s random variables X and Y . 0 ≤ y ≤ 2 4 = 0 otherwise Find the joint expectation of g ( X .Y ( x. Y ) = ∑ ∑ g ( x. y )dxdy −∞ −∞ 22 1 = ∫ ∫ x 2 y xydxdy 4 00 2 2 1 = ∫ x 3 dx ∫ y 2 dy 40 0 4 1 2 23 = × × 4 4 3 8 = 3 ∞ ∞ .Y ( x.

Y ) = XY X Y 0 0 0.45 2 0.15 0.37 Remark (1) We have earlier shown that expectation is a linear operator. y )dxdy −∞ −∞ ∞ ∞ ∞ ∞ = ∫ ∫ axf X . We can generally write E[a1 g1 ( X . y )dydx + ∫ by ∫ f X .25 0.01 pY ( y ) 0. then EZ = aEX + bEY Proof: EZ = ∫ ∫ (ax + by ) f X . y ) = 1×1× 0.Y ( x. y ) p X .01 = 0. expectation is a linear operator.Y ( x. where a and b are constants.39 1 0. Y ) = g1 ( X ) g 2 (Y ).The joint probability mass function of the random variables are tabulated in Table . Y ) Thus E ( XY + 5log e XY ) = EXY + 5E log e XY (2) If X and Y are independent random variables and g ( X . y ) dxdy −∞ −∞ ∞ −∞ −∞ ∞ ∞ ∞ = ∫ ax ∫ f X .Y ( x. y ) dxdy + ∫ ∫ byf X . Example: Consider the discrete random variables X and Y discussed in Example .Y ( x.Example: If Z = aX + bY . Find the joint expectation of g ( X . then .35 0.35 + 1× 2 × 0.5 1 p X ( x) Clearly EXY = x . Y ) + a2 g 2 ( X .5 0.1 0.Y ( x. y )dxdy −∞ −∞ −∞ −∞ ∞ ∞ = a ∫ xf X ( x)dx + b ∫ yfY ( y )dy −∞ −∞ ∞ ∞ = aEX + bEY Thus. Y ) + a2 Eg 2 ( X .14 0.Y ( x. y∈R X × RY ∑ ∑ g ( x. Y )] = a1 Eg1 ( X .

Y ( x.Y ( x. y )∈RX .Y if X and Y are continuous if X and Y are discrete . y ) ⎩( x . y ) dxdy E ( XY ) = ⎨ −∞ −∞ ⎪ ∑ ∑ xyp X .Y E ( X − µ X ) m (Y − µY ) n = ( x .Eg ( X .Y ( x. y ) (2) If m = 1 and n = 1 .Y ( x. For two continuous random variables X and Y the joint moment of order m + n is defined as E ( X m Y n ) = ∫ ∫ x m y n f X .Y ( x.Y ( x. Y ) = Eg1 ( X ) g 2 (Y ) = ∫ ∫ g1 ( X ) g 2 (Y ) f X .Y ( x. y )dx −∞ −∞ ∞ ∞ ∞ ∞ = ∫ ∫ g1 ( X ) g 2 (Y ) f X ( x) fY ( y )dxdy −∞ −∞ ∞ = ∫ g1 ( X ) f X ( x)dx ∫ ∫ g 2 (Y ) fY ( y )dxdy −∞ −∞ −∞ ∞ ∞ = Eg1 ( X ) Eg 2 (Y ) Joint Moments of Random Variables Just like the moments of a random variable provides a summary description of the random variable. y )dxdy −∞ −∞ ∞ ∞ where µ X = EX and µY = EY Remark (1) If X and Y are discrete random variables. we have the second-order moment of the random variables X and Y given by ⎧∞ ∞ ⎪ ∫ ∫ xyf X . y )∈RX .Y ∑ m n ∑ ( x − µ X ) ( y − µY ) p X . the joint expectation of order m and n is defined as E ( X mY n ) = ∑ m n ∑ x y p X . y )∈R X . so also the joint moments provide summary description of two random variables. y )dxdy −∞ −∞ ∞ ∞ and the joint central moment of order m + n is defined as E ( X − µ X ) m (Y − µY ) n = ∫ ∫ ( x − µ X ) m ( y − µY ) n f X . y ) ( x .

we get Cov( X . Y ) σ X σY is called the correlation coefficient. Y ) = Cov( X . E ( XY ) = EXEY Covariance of two random variables The covariance of two random variables X and Y is defined as Cov ( X . To establish the relation. We will give an interpretation of Cov( X . Y ) = E ( X − µ X )(Y − µY ) = E ( XY − µY X − µ X Y + µ X µY ) = EXY − µY EX − µ X EY + µ X µY = EXY − µ X µY The ratio ρ ( X .(3) If X and Y are independent. Y ) and ρ ( X . Y ) ≤ 1. For the minimum value. Y ) = E ( X − µ X )(Y − µY ) Expanding the right-hand side. E 2 ( XY ) ≤ EX 2 EY 2 Proof: Consider the random variable Z = aX + Y E (aX + Y ) 2 ≥ 0 ⇒ a 2 EX 2 + EY 2 + 2aEXY ≥ 0 . Y ) later on. Non-negatively of the left-hand side => its minimum also must be nonnegative. we prove the following result: For two random variables X and Y . We will show that ρ ( X . dEZ 2 EXY = 0 => a = − da EX 2 so the corresponding minimum is Minimum is nonnegative => E 2 XY E 2 XY + EY 2 − 2 EX 2 EX 2 E 2 XY ≥0 EX 2 ⇒ E 2 XY ≤ EX 2 EY 2 EY 2 − ⇒ EXY ≤ EX 2 EY 2 .

EXY = ∫ ∫ xyf X . We will discuss this point in the next section.Y ( x.Y ) = Cov( X .Y ( x.Y ) ≤ 1 Uncorrelated random variables Two random variables X and Y are called uncorrelated if Cov ( X . y ) dxdy −∞ −∞ ∞ ∞ ∞ ∞ assuming X and Y are continuous Then = ∫ ∫ xyf X ( x) fY ( y )dxdy −∞ −∞ ∞ = ∫ xf X ( x)dx ∫ yfY ( y )dy −∞ −∞ ∞ = EXEY Thus two independent random variables are always uncorrelated. y ) = f X ( x) fY ( y ). one may be represented as a linear regression of the others. The converse is not always true. If there exists correlation between two random variables. Linear prediction of ˆ Y = aX + b Y from X Regression ˆ Prediction error Y − Y . Y ) σ Xσ X = E ( X − µ X )(Y − µY ) E ( X − µ X ) 2 E (Y − µY ) 2 ∴ ρ( X .Y ) = E ( X − µ X )(Y − µY ) E ( X − µ X ) 2 E (Y − µY ) 2 ≤ =1 E ( X − µ X ) 2 E (Y − µY ) 2 E ( X − µ X ) 2 E (Y − µY ) 2 ρ ( X . but still they may be uncorrelated. Y ) = 0 which also means E ( XY ) =µ X µY Recall that if X and Y are independent random variables. (3) Two random variables may be dependent. then f X .Now ρ( X .

then X and Y are called positively correlated. Corresponding to the optimal solutions for a and b. we have ∂ E (Y − aX − b) 2 = 0 ∂a ∂ E (Y − aX − b) 2 = 0 ∂b Solving for a and b .Y < 0.Y ( x − µ X ) σX σ ˆ so that Y − µY = ρ X . σ XσY ˆ Y − µY slope = ρ X . 1 ˆ Y − µY = 2 σ X .Y σy σx x − µX Remark If ρ X .Y = 0 then X and Y are uncorrelated. If ρ X .Y y ( x − µ X ) σx where ρ X . ˆ => Y − µ Y = 0 ˆ => Y = µ Y is the best prediction. ( To be labeled and animated) .Mean square prediction error ˆ E (Y − Y ) 2 = E (Y − aX − b) 2 For minimising the error will give optimal values of a and b. then X and Y are called negatively correlated If ρ X .Y > 0.Y = σ XY is the correlation coefficient.

.

Example : Y = X 2 and f X (x) is uniformly distributed between (1.-1).mean symmetric distribution of X.Y = 0 then X and Y are uncorrelated. Y ) = σ X = E ( X − µ X )(Y − µ Y ) Because = EXY = EX 3 = 0 = EXEY (∵ EX = 0) In fact for any zero. Cov( X . X and Y are dependent.If ρ X . ˆ => Y − µ Y = 0 ˆ => Y = µ Y is the best prediction. X and X 2 are uncorrelated. But uncorrelated generally does not imply independence (except for jointly Gaussian random variables). but they are uncorrelated. Note that independence => Uncorrelatedness. ˆ (4) is a linear estimator X = aY .

Two random variables X and Y are called jointly Gaussian if their joint density function is ⎡ ( x− µ X ⎢ 2 ⎢ σX ⎣ )2 ( x− µ )( y − µ ) ( y − µ ) 2 ⎤ Y + Y ⎥ σ σ 2 σY X Y ⎥ ⎦ X f X .Y ) The pdf has a bell shape centred at 2 σ X and σ Y2 ( µ X . σ X .Jointly Gaussian Random variables Many practically occurred random variables are modeled as jointly Gaussian random variables. σ Y .Y ( x. below. -∞ < y < ∞ The joint pdf is determined by 5 parameters means µ X and µ Y variances σ X and σ Y 2 2 correlation coefficient ρ X . We denote the jointly Gaussian random variables X and Y with these parameters as 2 2 ( X . noise occurring in the communication systems is modeled as jointly Gaussian random variables. . Y ) ~ N ( µ X . µY ) as shown in the Fig. y ) = − 1 2 2πσ X σ Y 1− ρ X .Y ) X − 2 ρ XY -∞ < x < ∞. Y . For example.Y e 1 2 (1− ρ 2 . The variances determine the spread of the pdf surface and ρ X . µY . ρ X .Y determines the orientation of the surface in the X − Y plane.

Y ( x.Y − 2 ρ XY σ Xσ 2 2 ) ⎢ 2 ⎥ σX σY 2 (1− ρ X . X and Y are not necessarily jointly Gaussian.Y Y ⎣ ⎦ ∞ 1 2 2 πσ Y 1− ρ X . y )dy −∞ ∞ 1 2 2 πσ X σ Y 1− ρ X . then X and Y are both Gaussian.Y σ ( x − µ X ⎡ 1 Y ( y − µY − 2 (1− ρ 2 ) ⎢ σ 2 σY X ⎣ X .Properties of jointly Gaussian random variables (1) If X and Y are jointly Gaussian. Suppose . If each of X and Y is Gaussian.Y ) ⎢ X ⎡ ( x− µ ⎣ X )2 σ2 X − 2 ρ XY ( x− µ σ σ X X Y )( y − µ ) ( y − µ )2 ⎤ Y + Y ⎥ 2 σY ⎥ ⎦ dy dy Area under a Gaussian −∞ 1 ⎛ x−µ X − ⎜ 2⎜ σ e ⎝ X ∞ −∞ − 2 2 πσ Y 1− ρ X .Y 2πσ X 1 ⎛ x−µ X − ⎜ 2⎜ σ e ⎝ X ⎞ ⎟ ⎟ ⎠ 2 ∫ ∫ e − ⎡ ρ 2 ( x − µ )2 ( x − µ )( y − µ ) ( y − µ )2 ⎤ X X Y + Y ⎥ 1 ⎢ X .Y ⎢ ) ⎤2 ⎥ ⎥ ⎦ dy −∞ 1 2 πσ X e 1 ⎛ x−µ X − ⎜ 2⎜ σ X ⎝ ⎞ ⎟ ⎟ ⎠ 2 fY ( y ) = 1 2πσ Y e 1 ⎛ y − µY − ⎜ 2⎜ σY ⎝ ⎞ ⎟ ⎟ ⎠ 2 (2) The converse of the above result is not true.Y 2πσ X e ρ X .Y ⎞ ⎟ ⎟ ⎠ 2 = ∫ = = = Similarly − e 1 1 ⎢ 2(1− ρ 2 . We have ∞ f X ( x) = ∫ f X .

y ) ≥ 0 and ∞ ∞ 1 2πσ X σ Y e ⎡ ( x−µ X −1 ⎢ 2 2 ⎢ σX ⎣ )2 + ( y − µ )2 ⎤ Y ⎥ 2 σY ⎥ ⎦ (1 + sin x sin y ) f X . y ) in this example is non-Gaussian and qualifies to be a joint pdf.Y ( x. given by Z = aX + bY is Gaussian with mean µ Z = aµ X + bµY and variance . then for any constants a and b. y ) = f X . (3) If X and Y are jointly Gaussian. then the random variable Z .Y ( x. −∞ −∞ ∫∫ 2πσ X σ Y 1 e ⎡ ( x− µ X −1 ⎢ 2 2 ⎢ σX ⎣ )2 + ( y − µ )2 ⎤ Y ⎥ 2 σY ⎥ ⎦ (1 + sin x sin y )dydx )2 ∞ ∞ = −∞ −∞ ∫∫ 1 2πσ X σ Y ∞ e ⎡ ( x−µ X −1 ⎢ 2 2 ⎢ σX ⎣ + ( y − µ )2 ⎤ Y ⎥ 2 σY ⎥ ⎦ ∞ ∞ dydx + ∞ −∞ −∞ −1 2 ( x− µ X )2 ∫∫ 1 2πσ X σ Y e ⎡ ( x−µ X −1 ⎢ 2 2 ⎢ σX ⎣ )2 + ( y − µ )2 ⎤ Y ⎥ 2 σY ⎥ ⎦ sin x sin ydydx = 1 + 2πσ X σ Y 1 −∞ ∫e σ2 X sin xdx −∞ ∫e −1 2 ( y − µ )2 Y 2 σY sin ydy = 1+ 0 =1 Odd function in y The marginal density f X ( x ) is given by ∞ f X ( x) = = −∞ ∞ ∫ 2πσ X σ Y 1 e ⎡ ( x− µ X −1 ⎢ 2 2 ⎢ σX ⎣ )2 + ( y − µ )2 ⎤ Y ⎥ 2 σY ⎥ ⎦ (1 + sin x sin y )dy ∞ −∞ ∫ 1 2πσ X σ Y e ⎡ ( x−µ X −1 ⎢ 2 2 ⎢ σX ⎣ )2 + ( y − µ )2 ⎤ Y ⎥ 2 σY ⎥ ⎦ dy + −∞ ⎞ ⎟ ⎟ ⎠ ⎞ ⎟ ⎟ ⎠ 2 2 ∫ − 1 2πσ X σ Y e 1 2 2 (1− ρ X .f X . Because.Y ) ⎡ ( x−µ X ⎢ 2 ⎢ σX ⎣ )2 + ( y − µ )2 ⎤ Y ⎥ 2 σY ⎥ ⎦ sin x sin ydy = = 1 2π σ X e e 1 ⎛ x−µ X − ⎜ 2⎜ σ X ⎝ +0 Odd function in y 1 ⎛ y − µY − ⎜ 2⎜ σY ⎝ ⎞ ⎟ ⎟ ⎠ 2 1 2π σ X 1 ⎛ x−µ X − ⎜ 2⎜ σ X ⎝ Similarly. but not jointly Gaussian. fY ( y ) = 1 2πσ Y e Thus X and Y are both Gaussian.Y ( x.

We can easily establish the following properties: 1.Y ( x.Y (ω . φY (ω ) = φ X . If X and Y are independent random variables.Y (ω1 .Y (ω1 .Y (4) Two jointly Gaussian RVs X and Y are independent if and only if uncorrelated ( ρ X .Y ( x. Observe that if X and Y are uncorrelated.Y (ω1 . 0) 2. ω 2 ) is same as the two-dimensional Fourier transform with the basis function −∞ −∞ ∫∫ f X . then φ X . f X . = f X ( x ) fY ( y ) Joint Characteristic Functions of Two Random Variables The joint characteristic function of two random variables X and Y is defined by φ X . y ) = = 2πσ X σ Y 1 e ⎡ ( x−µ X −1 ⎢ 2 2 ⎢ σX ⎣ ( x− µ )2 + ( y − µ )2 ⎤ Y ⎥ 2 σY ⎥ ⎦ 1 2πσ X X )2 e σ2 X 1 2πσ Y e ( y − µ )2 Y 2 σY . y ) is related to the joint characteristic function by the Fourier inversion formula f X . then . ω 2 )e− jω1x − jω2 y d ω1 d ω 2 If X and Y are discrete random variables. ω 2 ) = Note that ∞ ∞ φ X . ω 2 ) = Ee jω x + jω y 1 2 If X and Y are jointly continuous random variables.Y (ω1 . ω ) 3. y )∈R X ×RY ∑ ∑p X . then X and Y are f X .Y = 0). y ) = 1 4π 2 ∞ ∞ −∞ −∞ ∫ ∫φ X . y )e jω1x + jω2 y Properties of Joint Characteristic Functions of Two Random Variables The joint characteristic function has properties similar to the properties of the chacteristic function of a single random variable.Y ( x.Y (0. we can define the joint characteristic function in terms of the joint probability mass function as follows: φ X . ω 2 ) = ( x . y )e jω1x + jω2 y dydx e jω1x + jω 2 y instead of e − ( jω1x + jω 2 y ) .σ Z 2 = a 2σ X 2 + b 2σ Y 2 + 2abσ X σ Y ρ X . φ X (ω ) = φ X . y ( x. y ( x.Y (ω1 .

. 2 2 = E (1 + jω1 X + jω2Y + Hence. φ X . ω2 ) EX Y = m + n n j ∂ω1m ∂ω2 ω = 0. ω2 ) = Ee jω X + jω Y 1 2 j 2 (ω1 X + jω2Y ) 2 + ..Y (ω1 .ω m n 1 2 =0 Example Joint characteristic function of the jointly Gaussian random variables X and Y with the joint pdf − 1 2 2(1− ρ X . ω 2 ) j ∂ω1 ω =0 1 EX = EY = EXY = 1 ∂ φ X . ω 2 ) j2 ∂ω1∂ω 2 ω = 0.Y Let us recall the characteristic function of a Gaussian random variable 2 X ~ N (µ X .Y (ω1 ....Y (ω1 .. ω 2 ) j ∂ω 2 ω 2 2 =0 1 ∂ φ X .. the ( m + n )th order joint moment is given by m n 1 ∂ ∂ φ X .... 0) 1 ∂ φ X ...φ X .σ X ) .Y (ω1 . 1=φ X .Y (ω1 . ω2 ) = Ee jω X + jω Y 1 2 = E (e jω1 X e jω2Y ) = Ee jω1 X Ee jω2Y = φ X (ω1 )φY (ω2 ) 4. We have.Y (0.. y ) = e ⎡⎛ x − µ ⎞ 2 ⎛ x−µX X ⎢⎜ ⎟ −2 ρ X .ω 1 2 =0 In general.Y (ω1 ..Y f X ...) 2 j 2ω12 EX 2 j 2ω2 2 EY 2 = 1 + jω1 EX + jω2 EY + + + ω1ω2 EXY + . y ⎜ ⎜ ⎟ ⎜ ) ⎢⎝ σ X ⎠ ⎝ σX ⎣ 2 ⎞⎛ y − µ y ⎞ ⎛ y − µY ⎞ ⎤ ⎥ ⎟⎜ ⎟⎜ σ ⎟ + ⎜ σ ⎟ ⎥ ⎟ ⎟ ⎜ ⎠⎝ Y ⎠ ⎝ Y ⎠ ⎦ 2 2πσ X σ Y 1 − ρ X ..Y ( x.

then φZ (ω ) = φ X . then φZ (ω ) = Ee jω Z = Ee j ( aX +bY )ω = φ X .Y (ω . y ⎜ ⎟ ⎜ σ ⎠ ⎝ X 2 ⎞⎛ y − µ y ⎟⎜ ⎟⎜ σ ⎠⎝ Y ⎞ ⎛ y − µY ⎟ +⎜ ⎟ ⎜ σ ⎠ ⎝ Y ⎞ ⎟ ⎟ ⎠ 2⎤ ⎥ ⎥ ⎦ 2 2πσ X σ Y 1 − ρ X .Y we can similarly show that φ X . 1 2 2(1− ρ X .e jω x dx ∫e − 2 2 2 2 1 x2 − 2( µ X −σ X jω ) x + ( µ X −σ X jω )2 −( µ X −σ X jω )2 + µ X 2 σX2 dx Area under a Gaussian 2 2 1 ( −σ X ω 2 + 2 µ X σ X jω ) 2 σX2 1 2πσ X ∞ −∞ ∫e − 2 1 ⎛ x − µ X −σ X jω ⎞ ⎜ ⎟ ⎟ 2⎜ σX ⎝ ⎠ 2 dx = e µ X jω −σ = e µ X jω −σ − 2 X ω2 / 2 ×1 42 X ω2 / 2 If X and Y are jointly Gaussian. then φZ (ω ) = φ X .φ X (ω ) = Ee jω X = = =e ∫e 2πσ X −∞ 1 2πσ X ∞ −∞ 1 ∞ − 1 ⎛ x−µ X ⎜ 2⎜ σ X ⎝ ⎞ ⎟ ⎟ ⎠ 2 . bω ) =e mean 2 2 2 j ( µ X + µY )ω − 1 (σ X + 2 ρ X .Y σ X σ Y +σ Y )ω 2 which is the characteristic function of a Gaussian random variable with µZ = 2 µX + µ Y and variance σZ = σX 2 + 2 ρ X . ω 2 ) = Ee j ( X ω +Yω ) 1 2 =e 2 2 2 j µ X ω1 + j µY ω2 − 1 (σ X ω12 + 2 ρ X .Y ( x.Y σ X σ Y ω1ω2 +σ Y ω2 ) 2 We can use the joint characteristic functions to simplify the probabilistic analysis as illustrated below: Example 2 Suppose Z = aX + bY . ω ) = φ X (ω )φY (ω ) . y ) = e ⎡⎛ x − µ X ⎢⎜ ) ⎢⎜ σ X ⎣⎝ ⎞ ⎛ x−µX ⎟ −2 ρ X .Y (ω1 .Y (aω .Y (aω .Y σ X σ Y + σ Y 2 Example 3 If Z=X+Y and X and Y are independent. bω ) If X and Y are jointly Gaussian.Y f X .

Using the property of the Fourier transform. we get f Z ( z ) = f X ( z ) * fY ( z ) .

Y ( x. y ) / f X ( x) function of Y given X = x is given by • If X and Y are discrete random variables. • We can similarly define the conditional expectation of X given X = x.Conditional Expectation Recall that • If X and Y are continuous random variables. then the probability mass function Y given X = x is given by pY / X ( y / x) = p X . then the conditional density fY / X ( x / y ) = f X .Y ( x. the conditional variance of Y given X = x is given by σ Y2 / X = x = E[(Y − µY / X = x ) 2 / X = x] Example: Consider the discrete random variables X and Y discussed in example . Find the joint expectation of E (Y / X = 2) . y ) / p X ( x) The conditional expectation of Y given X = x is defined by µY / X = x ⎧∞ ⎪ ∫ yfY / X ( x / y ) = E (Y / X = x) = ⎨ −∞ ⎪ ∑ ypY / X ( x / y ) ⎩ y∈RY if X and Y are continuous if X and Y are discrete Remark • The conditional expectation of Y given X = x is also called the conditional mean of Y given X = x. Particulaly. denoted by E ( X / Y = y ) • • Higher-order conditional moments can be defined in a similar manner.The joint probability mass function of the random variables are tabulated in Table .

X Y 0

0 0.25 0.14 0.39

1 0.1 0.35 0.45

2 0.15 0.01 0.16

pY ( y ) 0.5 0.5

1 p X ( x)

**The conditional probability mass function is given by
**

pY / X ( y / 2) = p X ,Y (2, y ) / p X (2) ∴ pY / X (0 / 2) = p X ,Y (2, 0) / p X (2) = and pY / X (1/ 2) = p X ,Y (2,1) / p X (2) 0.01 = 1/16 0.16 E (Y / x = 2) = 0 × pY / X (0 / 2) + 1 × pY / X (1/ 2) = = 1/16 0.15 = 15 /16 0.16

Example Suppose X and Y are jointlyuniform random variables with the joint probability density function given by ⎧1 ⎪ x ≥ 0, y ≥ 0, x + y ≤ 2 f X ,Y ( x, y ) = ⎨ 2 ⎪0 otherwise ⎩ Find E (Y / X = x) From the figure, f X ,Y ( x, y ) =

1 in the shaded area. 2

We have

∴ f X ( x) = ∫ f X ,Y ( x, y )dy

0

2− x

= ∫

2− x 0

1 dy 2

1 = (2 − x) 0 ≤ x ≤ 2 2

∴ fY / X ( y / x ) = f X ,Y ( x, y ) / f X ( x) = 1 2− x

∞

∴ E (Y / X = x) = ∫ yfY / X ( y / x)dy

−∞

= ∫ y

0

2− x

1 dy 2− x

2- x = 2

Example Suppose X and Y are jointly Gaussian random variables with the joint probability density function given by f X ,Y ( x, y ) =

− 1

2 2πσ xσ y 1− ρ X ,Y 1 2(1− ρ 2 ) X ,Y

e

⎡ ( x−µ X ⎢ 2 ⎢ σX ⎣

)2

− 2 ρ XY

( x−µ

σ

X X

)( y − µ

σ

Y

)

+

( y−µ

Y

)2

Y

2 σY

⎤ ⎥ ⎥ ⎦

.

**We have to find E (Y / X = x).
**

We have fY / X ( y / x) = f X ,Y ( x, y ) / f X ( x)

−

=

1 2 2πσ X σ Y 1− ρ X ,Y

e

1 2 (1− ρ 2 ,Y ) X

⎡ ( x−µ X ⎢ 2 ⎣ σX

)2

− 2 ρ XY

( x−µ

σ Xσ

X

)( y − µ ) ( y − µ )2 ⎤ Y + Y ⎥ 2 σY Y ⎦

2π σ X

−

1 2 2 σ Y (1− ρ 2 ,Y ) X

1

e

−1 2

( x−µ

X

)2

2 σX

=

1

2 2π σ Y 1− ρ X ,Y

e

σ Y ρ X ,Y ⎤ ⎡ ⎢( y − µ Y ) − ( x − µ x ) ⎥ σX ⎦ ⎣

2

Therefore,

E (Y / X = x) = ∫ y

−∞ ∞ − 1

2 2π σ Y 1− ρ X ,Y

e

1 2 2 2 σ Y (1− ρ X ,Y )

σ Y ρ X ,Y ⎡ ⎤ ( x − µ x )⎥ ⎢( y − µ Y ) − σX ⎣ ⎦

2

dy

= µY +

σ Y ρ X ,Y (x − µx ) σX

**Conditional Expectation as a random variable
**

Note that E (Y / X = x) is a function of x.

Using this function, we may define a random variable

φ ( X ) = E (Y / X )

E (Y / X = x)

Thus we may consider E (Y / X ) at

EY / X X = x.

as a function of the random variable

X

and

as the value of

E (Y / X ) is a random variable, E (Y / X = x) is the value of E (Y / X ) at X = x EE (Y / X ) = EY EE (Y / X ) = EY Proof:

EE (Y / X ) = ∫ E (Y / X = x) f X ( x)dx

−∞

∞

= ∫ ∫ yfY / X ( x / y )dy f X ( x)dx

−∞ −∞ ∞ ∞

∞ ∞

= ∫ ∫ yf X ( x) fY / X ( x / y )dydx

−∞ −∞ ∞ ∞

= ∫ ∫ yf X ,Y ( x, y )dydx

−∞ −∞ ∞

= ∫ y ∫ f X ,Y ( x, y )dxdy

−∞ ∞ −∞

∞

= ∫ yfY ( y )dy

−∞

= EY

**Thus EE (Y / X ) = EY and similarly
**

EE ( X / Y ) = EX

**Baysean Estimation theory and conditional expectation
**

Consider two random variables X and Y with joint pdf f X ,Y ( x, y ). Suppose

Y is observable and f X ( x ) is known. We have to estimate X for a given value in

some optimal sense. I

in a sense that some values of θ are more likely (a priori information). We can represent this prior information in the form of a prior density function. In the folowing we omit the suffix in density functions just for notational simplicity.

Obervation Y = y fY / X ( y / x )

Random variable X with density f X ( x)

**The conditional density function fY / X ( y / x) is called likelihood function in estimation terminology.
**

f X ,Y ( x, y ) = f X ( x ) fY / X ( y )

**Also we have the Bayes rule
**

f X /Y ( x / y) = f X ( x ) fY / X ( y / x ) fY ( y )

**where f Θ / X (θ ) is the a posteriori density function
**

ˆ Suppose the optimam estimator X (Y ) is a function of the random variable Y such that

**it minimizes the mean-square error
**

ˆ The Estimation error E ( X (Y ) − X ) 2 . Such an estimator is known as the minimum mean-

square error estimator.

Estimation problem is

∞ ∞

Minimize

−∞ −∞

ˆ ∫ ∫ ( X ( y) − x)

2

f X ,Y (x, y )dx dy

ˆ with respect to X(y) .

This is equivalent to minimizing

∞ ∞

−∞ −∞

ˆ ∫ ∫ ( X ( y) − x)

∞ ∞

2

fY (y ) f X / Y ( x / y )dx dy

2

=

−∞ −∞

ˆ ∫ ∫ ( X ( y) − x)

f X / Y ( x / y )dx fY (y )dy

Since fY (y ) is always +ve, the above integral will be minimum if the inner integral is minimum. This results in the problem:

∞

Minimize

−∞

ˆ ∫ ( X ( y) − x)

2

f X / Y ( x / y )dx

ˆ with respect to X ( y ). The minimum is given by

∂ 2 ˆ ∫ ( X ( y) − x) f X / Y ( x / y)dx = 0 ˆ ( y ) −∞ ∂X ˆ ⇒ −2 ∫ ( X ( y ) − x) f X / Y ( x / y )dx = 0

−∞ ∞ ∞

∞

⇒

−∞

∫

∞

ˆ X ( y ) f X / Y ( x / y )dx =

−∞

∫xf

X /Y

( x / y )dx =E ( X / Y = y )

(b) FX1 .. X 2 . ∞ ) = 1 (d) FX1 ... X 2 . xn ]'. it is convenient to define the vector-valued random variables where each component of the vector is a random variable.. in the navigation problem.. X n ( −∞... X n ( x1 ..... B channels of colour video may be represented by three random variables. − ∞ ) = 0 (c) FX1 ..xn ) = FX (x) = P ({ X 1 ≤ x1 . X n ( x1 . A particular value of the random vector X. The noise affecting the R.. G. X n defined on the same probability space ( S . ⎡ X1 ⎤ ⎢X ⎥ ⎢ 2⎥ X =⎢ .xn ) is right-continuous in each of its arguments. For example. ∞. x2 . X 2 . x2 . X n ≤ xn }) Some of the most important properties of the joint CDF are listed below.. These properties are mere extensions of the properties of two joint random variables. X 2 ≤ x2 . The CDF of the random vector X. In this lecture. X 2 .... −∞. ⎥ ⎢ ⎥ ⎢ .. Xn ] where ' indicates the transpose operation. x2 .. X 2 . X n ( x1 . ⎥ ⎢Xn ⎥ ⎣ ⎦ or X ' = [ X1 X2 .. X 2 ..xn ) = . X 2 . Joint CDF of n random variables Consider n random variables X 1 . We define the random vector X as. x2 . y and z coordinates.. .... X n ( x1 .. X n .. is defined by the mapping X : S → R n ... Thus an n − dimensional random vector X.. x2 ...xn ) is a non-decreasing function of each of its arguments.. Thus FX1 .. In such situations. we extend the concepts of joint random variables to the case of multiple random variables.. X 2 .. A generalized analysis will be presented n random variables defined on the same sample space.. is defined as the joint CDF of X 1 . X n (∞.Multiple Random Variables In many applications we have to deal with many random variables. P).. Properties of the joint CDF of n random variables (a) FX1 . 1 ... is denoted by x=[ x1 x2 . the position of a space craft is represented by three random variables denoting the x. = FX1 . F.. X n ( x1 .xn ) = FX1 . X 2 .

. That is. x2 . X n ( x1 .. we can find the probability of a Borel set (region ) B ⊆ R n ..∑ p X 2 . X 2 ... X 2 . X 2 .. X n ( x1 . X n ( x1 .xn ) ∈ R n ..xn ) for all ( x1 ..∂xn 1 2 n Properties of joint PDF of n random variables The joint pdf of n random variables satisfies the following important properties (1) f X1 .. ∫ f X1 ...... X .dxn B (4) The marginal CDF of a random variable X i is related to f X1 .. P). x2 . x2 .. x2 . x2 . X 2 .......xn ) is always a non-negaitive quantity. X 2 .... x2 .. X 2 .xn ) by the ( n − 1) − fold integral 2 ... X n ( x1 .... x2 . X n ( x1 . ∞).. ∞) and so on. f X1 . X 2 ... x2 .. X 2 . F . x2 .....dxn = 1 (3) Given f X (x) = f X1 .(e) The marginal CDF of a random variable X i is obtained from FX1 . X 2 .. X n ( x1 .. X 2 ... that is. X 2 = x2 . X ( x1 ... x2 . then X can be specified by the joint probability density function f X (x) = f X1 . X n ( x1 . FX1 ... X n ( x1 .... X n ( x1 . xn ) X 2 X3 Xn n -1 summations Joint PDF of n random variables If X is a continuous random vector. X 2 .xn ) is continuous in each of its argument... X n = xn }) Given PX1 .. X n ( x1 .. X n ( x1 . X n ( x1 ... Then X is completely specified by the joint probability mass function PX1 . x2 .... Thus FX1 ( x1 ) = FX1 ..xn ) we can find the marginal probability mass function p X1 ( x1 ) = ∑∑ .. X n ( x1 .. P({( x1 ... FX 2 ( x2 ) = FX1 ..xn )dx1 dx2 ...xn ) ∈ ∞ ∞ ∞ −∞ −∞ −∞ n (2) ∫ ∫ . Joint pmf of n discrete random variables Suppose X is a discrete random vector defined on the same probability space ( S . ∫ f X1 ..xn ) by letting all random variables except X i tend to ∞.xn ) = PX (x) = P ({ X 1 = x1 ..xn ) ≥ 0 ∀( x1 . ∞. X 2 ..... x2 .xn )dx1 dx2 . x2 .. x2 .. X 2 . X 3 .. X n (∞.xn ) = ∂n FX ... x2 . x2 ....xn ) ∂x1∂x2 ........ x2 .xn ) ∈ B}) = ∫ ∫ .

.... X 2 ..... X 1 ... then − 1 ( xi − µi ) 2 σ i2 2 f X1 . X n are called identically distributed if each random variable has the same marginal distribution function........... X n are called iid if X 1 . X n / X1 .. X 2 . X n are iid and ⎛1⎞ p X (1. X n ( x1 . X n are independent Gaussian random variables.. xn ) f X m +1 ..xn ) = ∏ i =1 n 1 e 2πσ i Remark. X n are mutually independent and each of X 1 ......xn ) = ∏ f X i ( xi ) i =1 n For example.. X 2 .... then X 1 ....xi .......... FX1 ( x ) = FX 2 ( x ) = . x2 ...dxn −∞ −∞ −∞ ∞ ∞ ∞ ( n − 2) − fold integral and so on..... xm ) = 1 2 n f X1 .. X ( x1 . Example: If X 1 .... X n may be pair wise independent.. x2 ..dxn −∞ −∞ −∞ ∞ ∞ ∞ where the integral is performed over all the arguments except xi ..1.... X n may be iid random variables generated by n independent throwing of a fair coin and each taking values 0 and 1.. x2 . but may not be mutually independent...... The random variables X 1 . that is.........1) = ⎜ ⎟ ⎝2⎠ n 3 ...... ∫ f X1 ....... X m ( x1 ....... x2 .. X . X 2 .. X n are called (mutually) independent if and only if f X1 .... xm + 2 .................. X 2 ............ x2 ...xn )dx1 dx2 .f X i ( xi ) = ∫ ∫ ...... x2 .xn )dx1 dx2 ........... X 2 . X n has the same marginal distribution function..... X 2 ..... X 2 . x j ) = ∫ ∫ .. X n ( x1 .. X m ( xm +1 ... X n ( x1 .. X 2 ... ∫ f X1 . The conditional density functions are defined in a similar manner.. Identically distributed random variables: The random variables X 1 .... X 2 .... f X i ..xi .......... Similarly... X j ( xi .. X m + 2 .. Thus f X ... X 2 ...... xn / x1 ... xm ) Independent random variables: The random variables X 1 . X 2 .... X 2 ..... X n ( x1 .......... X 2 ... = FX n ( x ) ∀x An important subclass of independent random variables is the independent and identically distributed (iid) random variables.... X 2 ... x2 . if X 1 .

= ⎡ µ X1 µ X 2 .xn ) f X1 . X 2 ) Properties of the Covariance Matrix • CX is a symmetric matrix because Cov ( X i . 2.. X n are called uncorrelated if for each (i. X 2 . j ) i = 1.. X ) var( X ) . X 1 ) cov( X n . X j ) = Cov ( X j . j ) i = 1. X n ( x1 . X ) ⎥ 2 1 2 2 n ⎥ =⎢ ⎢ ⎥ ⎢ ⎥ var( X n ) ⎦ ⎣ cov( X n . x2 . j = 1... x2 . X 2 ) cov( X 1 ... 2. X 2 ...xn )dx1 dx2 . n Cov( X i . X j ) = E ( X i − µ X i )( X j − µ X j ) All the possible covariances can be represented in terms of a matrix called the covariance matrix CX defined by CX = E ( X − µ X )( X − µ X )′ cov( X 1 .. The result can be proved as follows: z ′C X z = z ′E (( X − µ X )( X − µ X )′)z = E (z ′( X − µ X )( X − µ X )′z ) = E (z ′( X − µ X )) 2 ≥0 z ≠ 0... CX will be a diagonal matrix. The expected value of any scalar-valued function g ( X) is defined using the n − fold integral as E ( g ( X) = ∫ ∫ ..... X n ) ⎤ ⎡ var( X i ) ⎢cov( X . X 2 ..... X 2 ...Moments of Multiple random variables Consider n jointly random variables represented by the random vector X = [ X 1 . X j ) = 0 If X 1 . 4 . The covariance matrix represents second-order relationship between each pair of the random variables and plays an important role in applications of random variables.E ( X n ) ] '. denoted by µ X .. • The n random variables X 1 . X i ) • CX is a non-negative definite matrix in the sense that for any real vector the quadratic form z ′CX z ≥ 0.. n. cov( X .. ∫ g ( x1 .... X n ] '..dxn −∞ −∞ −∞ ∞ ∞ ∞ The mean vector of X. n... ⎣ ⎦ Similarly for each (i.. 2..... X n are uncorrelated. is defined as µ X = E ( X) = [ E ( X 1 ) E ( X 2 ). 2...... n we can define the covariance Cov( X i . µ X n ⎤ '. j = 1..

These random variables are called jointly Gaussian if the random variables X 1 . X 1 ..... X n ]'.. X 2 .Multiple Jointly Gaussian Random variables For any positive integer n. X n have joint probability density function given by f X1 . x2 ........ E ( X 2 ).. X n ( x1 ......... X 2 .. X 2 .. These n random variables define a random vector X = [ X 1 .xn ) = where ( 1 − X'C−1 X X e 2 2π ) n det(CX ) is the covariance matrix and C X = E ( X − µ X )( X − µ X ) ' µ X = E ( X) = [ E ( X 1 ).. X n represent n jointly random variables.... 5 ... X 2 .E ( X n ) ] ' is the vector formed by the means of the random variables....

Thus ( V .Vector space Interpretation of Random Variables Consider a set V with elements called vectors and the field of real numbers .. 1v = v It is easy to verify that the set of all random variables defined on a probability space ( S . Similarly the set of all n − dim ensional random vectors forms a vector space. P ) forms a vector space with respect to addition and scalar multiplication. + cN v N = 0 implies that c1 = c2 = . and any v. 3. If c1 v1 + c2 v 2 + ... v 2 . w ∈ V .. (i) (ii) (iii) (iv) (v) For any pair of elements v. for any v. V is called a vector space if and only if 1. .. r ( sv ) = ( rs ) v for r . there exists a unique element (v + w) ∈ V.+) satisfies the following properties.. 2. w ∈ V . v 2 . Vector addition is associative: v + (w + z) = (v + w) + z for any three vectors v. w ∈ V ... Linear Independence Consider n random vectors v1 . s ∈ and v ∈ V 4.. r(v + w) = rv + rw 5.. For any element v ∈ V and any r ∈ .+) is a commutative group. z ∈ V There is a vector 0 ∈ V such that v + 0 = 0 + v for any v ∈ V. v + w = w + v. the scalar product rv ∈ V This scalar product has the following properties for any r . For any v ∈ V there is a vector − v ∈ V such that v + (-v) = 0 = (-v) + v. . v N . = cN = 0. F. (r + s)v = rv + sv 6. v N are linearly independent. then v1 . . An operation vector addition '+' is defined in V such that ( V . w.. Interpretation of random variables as elements of a vector space help in understanding many operations involving random variables. s ∈ ....

v > 2. < v.. < v. = cN = 0.. Y > = EX′Y = ∑ EX iYi i n The norm of a a random vector X is given by X =< X. if c1X1 + c2 X 2 + . w > = r < v.. Y > = EXY the joint expectation EXY defines an We can easily verify that EXY satisfies the axioms of inner product.. w > is a scalar such that ∀v.. z > = < v. z > 4. X N are linearly independent. where v is a norm inducedby the inner product 3.. X 2 ... < v + w . the inner product < v. w > 2 In the case of two random variables X and Y .. Thus < X. z ∈ V and r ∈ 1. . w > = < w. v > = v ≥ 0. then X1 .For N random vectors X1 .. the inner product is For two n − dimensional random vectors X = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣Xn ⎦ ⎣Yn ⎦ < X.. . The norm of a a random variable X is given by X 2 = EX 2 ⎡ X1 ⎤ ⎡Y1 ⎤ ⎢X ⎥ ⎢ ⎥ ⎢ 2 ⎥ and Y = ⎢Y2 ⎥ ... + cN X N = 0 implies that c1 = c2 = . X > = EX′X = ∑ EX i2 2 i n • The set of RVs along with the inner product defined through the joint expectation operation and the corresponding norm defines a Hilbert Space. z > + < w. X 2 . inner product between X and Y . X N . w. < rv . Schwarz Inequality For any two vectors v and w belonging to a Hilbert space V . Inner Product If v and w are real vectors in a vector space V defined over the field ..

|< v. Remark If X and Y are uncorrelated. .Y ) = EXY In this case. w >|≤ v w This means that for any two random variables X and Y E ( XY ) ≤ EX 2 EY 2 Similarly for any two random vectors X and Y E ( X′Y) ≤ EX′X EY′Y Orthogonal Random Variables and Orthogonal Random Vectors Two vectors v and w are called orthogonal if < v. then E ( X − µ X )(Y − µY ) = 0 ∴ ( X − µ X ) is orthogonal to (Y − µY ) If each of X and Y is zero-mean Cov( X . the orthogonal random variables form an important class of random variables. EXY = 0 ⇔ Cov ( XY ) = 0. w > = 0 Two random variables X and Y are called orthogonal if EXY = 0 . Similarly two random vectors X and Y are called orthogonal if EX′Y = ∑ EX iYi = 0 i n Just like the of independent random variables and the uncorrelated random variables.

we have to minimize the cost function .Minimum Mean-square-error Estimation Suppose X is a random variable which is not observable and Y is another observable random variable which is statistically dependent on X through the joint probability density function f X . Clearly X (Y ) is a function of Y . Given a value of Y what is the best guess for X ? This problem is known as the estimation problem and has many practical applications.Y ( x. ˆ E ( X − X (Y )) 2 is the mean of the square error. One application is the signal estimation from noisy observations as illustrated in the Fig. ˆ ˆ One meaningful criterion is to minimize E ( X − X (Y )) 2 with respect to X (Y ) and the corresponding estimation principle is called minimum mean square error principle. Observe that • • • • X is the unknown random variable ˆ X (Y ) is the estimate of X . ˆ X − X (Y ) is the estimation error. y ). below: Signal X Estimated Noisy observation Y + Estimation ˆ Signal X Noise ˆ Let X (Y ) be the estimate of the random variable X based on the random variable ˆ ˆ Y . We pose the followingproblem. For ˆ finding X (Y ). Such a function which we want to minimize is called a cost function in optimization theory. We have to find best estimate be X (Y ) in some meaningful sense.

Y e 1 2 (1− ρ 2 . Example Consider two zero-mean jointly Gaussian random variables X and Y with the joint pdf f X . therefore the minimization of ˆ ˆ E ( X − X (Y )) 2 with respect to X (Y ) is equivalent to minimizing the inside ˆ integral ∫ ( x − X ( y ))2 f X / Y ( x)dx −∞ ∞ ˆ with respect to X (Y ). the minimum mean-square error estimation involves conditional expectation E ( X / Y = y ). The condition for the minimum is ∂ ˆ ∂X ∞ −∞ ∞ 2 ˆ ∫ ( x − X ( y )) f X / Y ( x / y )dx = 0 ˆ Or 2 ∫ ( x − X ( y )) f X / Y ( x / y )dx = 0 -∞ ˆ ⇒ ∫ X ( y ) f X / Y ( x / y )dx = ∫ xf X / Y ( x / y )dx −∞ −∞ ∞ ∞ ˆ ⇒ X ( y) = E ( X / Y = y) Thus. y ) = − 1 2 2πσ X σ Y 1− ρ X .ˆ ˆ E ( X − X (Y )) 2 = ∫ ∫ ( x − X ( y )) 2 f X . -∞ < y < ∞ The marginal density fY ( y ) is a Gaussian random variable and given by . To find E ( X / Y = y ).Y ( x. y )dydx −∞ −∞ ∞ ∞ ˆ = ∫ ∫ ( x − X ( y )) 2 fY ( y ) f X / Y ( x)dydx −∞ −∞ ∞ ∞ ∞ ˆ = ∫ fY ( y )( ∫ ( x − X ( y )) 2 f X / Y ( x)dx)dy −∞ −∞ ∞ Since fY ( y ) is always positive.Y ( x. These operations are computationally density f X / Y ( x / y ) and perform −∞ exhaustive when we have to perform these operations numerically.Y ) X ⎡ x2 xy y2 ⎤ ⎢ σ 2 − 2 ρ XY σ σ + σ 2 ⎥ X Y ⎣ X Y⎦ -∞ < x < ∞. ∞ we have to determine the a posteriori probability ∫ xf X / Y ( x / y )dx.

the MMSE estimator of X given ˆ X ( y) = E ( X / Y = y) = ρ XY σ X σ Y y This example illustrates that in the case of jointly Gaussian random variables X and Y .Y 1 2 σ 2 (1− ρ 2 . Thus we have the linear minimum mean-square error criterion which minimizes E ( X − aY )2 with respect to a. y ) fY ( y ) − 1 2 2πσ X 1− ρ X . The estimation problem is now to find the optimal value for a.Y ) X X e Y ⎡ ρ XY σ X ⎢x− σ Y ⎣ ⎤ y⎥ ⎦ 2 which is Gaussian with mean Y = y is given by ρ XY σ X σ y . This important result gives us a clue to have simpler version of the mean-square error estimation problem discussed below. Therefore. Signal X Noisy observation Y + Estimated Estimation ˆ X = aY ˆ Signal X Noise .Y ( x. is linearly related with y.fY ( y ) = − 1 2π σ Y e y2 2 2σ Y ∴ f X /Y ( x / y) = = f X . Linear Minimum Mean-square-error Estimation and the Orthogonality Principle ˆ We assume that X and Y are both zero-mean and X ( y ) = ay. the mean-square estimator of X given Y = y .

This orthogonality principle forms the heart of a class of estimation problem called Wiener filtering. X e aY e is orthogonal to Y Y Orothogonal projection The optimum value of a is given by E ( X − aY )Y = 0 ⇒ EXY − aEY 2 = 0 EXY ⇒ a= EY 2 The corresponding minimum linear mean-square error (LMMSE) is . The orthogonality principle is illustrated geometrically in the following figure (Fig. ). Thus the optimum value of a is such that the estimation error ( X − aY ) is orthogonal to the observed random variable Y and the optimal estimator aY is the orthogonal projection of X on Y .d da E ( X − aY ) 2 = 0 d ⇒ E da ( X − aY ) 2 = 0 ⇒ E ( X − aY )Y = 0 ⇒ EeY = 0 where e is the estimation error.

We illustrate this in the following example. ∂ai This results in the orthogonality conditions E ( X − a1Y1 − a2Y2 )Y1 = 0 and E ( X − a1Y1 − a2Y2 )Y2 = 0 Rewriting the above equations we get a1 EY12 + a2 EY2Y1 = EXY1 and a1 EY2Y1 + a2 EY22 + = EXY2 Solving these equations we can find a1 and a2 . using the orthogonality principle) = EX 2 − aEXY The orthogonality principle can be applied to optimal estimation of a random variable from more than one observation. Further the corresponding minimum linear mean-square error (LMMSE) is LMMSE = E ( X − a1Y1 − a2Y2 ) 2 = E ( X − a1Y1 − a2Y2 ) X − a1 E ( X − a1Y1 − a2Y2 )Y1 − a2 E ( X − a1Y1 − a2Y2 )Y2 = E ( X − a1Y1 − a2Y2 ) X − 0 ( using the orthogonality principle) = EX 2 − a1 EXY1 − a2 EXY2 .LMMSE = E ( X − aY ) 2 = E ( X − aY ) X − aE ( X − aY )Y = E ( X − aY ) X − 0 ( E ( X − aY )Y = 0. Then the optimal values of a1 and a2 are given by ∂E ( X − a1Y1 − a2Y2 ) 2 = 0 i = 1. Let the LMMSE estimator be X = a1Y1 + a2Y2 . 2. Example Suppose X is a zero-mean random variable which is to be estimated from two ˆ zero-mean random variables Y1 and Y2 .

. Almost sure (a.... For example. for every ε > 0 there exists a positive integer N such that xn + m − xn < ε for all n > N and all m > 0. Note here that the sequence of numbers for each sample point is convergent.. X 2 ( s ). Convergence Everywhere A sequence of random variables is said to converge everywhere to X if X ( s ) − X n ( s ) → 0 for n > N and ∀s ∈ S . Note that for each s ∈ S . converges to the number 0... The sequence x1 . X 2 . represents a family of sequences of numbers... represent a sequence of numbers . X 2 . 2 n • The Cauchy criterion gives the condition for convergence of a sequence without actually finding the limit... Suppose we want to estimate the mean of the random variable on the basis of the observed data by means of the relation µn = 1 N ∑ Xi n i =1 n is increased? How do we How closely does µ n represent the true mean µ X as measure the closeness between µ n and µ X ? Notice that µ n is a random variable..... Thus X 1 .... Convergence of a random sequence is to be defined using different criteria....x n . cannot be defined as above..... X 2 . X 1 ( s ). Consider the event {s | X n ( s ) → X } 1 .... converges if and only if .. Five of these criteria are explained below.. 1 .....Convergence of a sequence of random variables Let X 1 . X n ...s. 1 ... X n .... X n be a sequence n independent and identically distributed random variables.. X n ( s ).. Convergence of a random sequence X 1 ... What do we mean by the statement µ n converges to µ X ? • Consider a deterministic sequence of real numbers x1 . x 2 . may not converge for every s ∈ S .x n ...... X 2 . The sequence converges to a limit x if correspond to every ε > 0 we can find a positive integer N such that x − xn < ε for n > N .. x 2 .. the sequence 1..) convergence or convergence with probability 1 A random sequence X 1 . X n ..

X n ..m....s. X 2 ..... converges in m... X n ⎯⎯→ X • The following Cauchy criterion gives the condition for m. X 2 . X 2 . for every ε > 0 there exists a positive integer N such that E ⎡ xn + m − xn ⎤ → 0 as n → ∞ for all m > 0. is said to converge to X almost sure or with probability 1 if P{s | X n ( s ) → X ( s )} = 1 as n → ∞. X 2 .m. The sequence X 1 . X n .s..s ... X n ... are independent and identically distributed random variables with a finite mean µ X .. means limit in mean-square.i.The sequence X 1 . convergence of a random sequence without actually finding the limit.. is said to converge in the mean-square sense (m. X n = X where l. then Remark: • µn = 1 n ∑ X i is called the sample mean. • The SLLN is one of the fundamental theorems of probability.s) to a random variable X if E ( X n − X )2 → 0 as n→∞ X is called the mean-square limit of the sequence and we write l..i... There is a weaker versions of the law that we will discuss letter Convergence in mean square sense A random sequence X 1 . if and only if .. or equivalently for every ε >0 there exists N such that P{s X n ( s ) − X ( s ) < ε for all for n ≥ N } = 1 a .. n i =1 1 n ∑ X i → µ X with probability 1as n → ∞....... ⎣ ⎦ 2 Example : 2 . n i =1 • The strong law of large number states that the sample mean converges to the true mean as the sample size increases. We write X n ⎯⎯→ X in this case One important application is the Strong Law of Large Numbers(SLLN): If X 1 . We also write m. X n .s..

then it is convergent in probability also.......If X 1 . to the random variable X .. for every ε > 0. X 2 .. n i =1 1N We have to show that lim E ( ∑ X i − µ X )2 = 0 n →∞ n i =1 Now. are iid random variables. X n ..... is said to convergent to X in probability if this sequence of probability is convergent that is P{ X n − X > ε } → 0 as n→∞ P for every ε > 0... 2.. X 2 .. (mean square convergent) then P{ X n − X > ε } → 0 as Example: n → ∞. then 1N ∑ X i → µ X in the mean square 1as n → ∞... X n .. n = 1. X n .. Suppose { X n } be a sequence of random variables with 3 . If a sequence is convergent in mean.. 1N 1 N E ( ∑ X i − µ X ) 2 = E ( ( ∑ ( X i − µ X )) 2 n i =1 n i =1 1 N 1 n n = 2 ∑ E ( X i − µ X ) 2 + 2 ∑ ∑ E ( X i − µ X )( X j − µ X ) n i=1 n i=1 j=1..... X 2 . because P{ X n − X 2 > ε 2 } ≤ E ( X n − X )2 / ε 2 (Markov Inequality) We have P{ X n − X > ε } ≤ E ( X n − X ) 2 / ε 2 If E ( X n − X ) 2 → 0 as n → ∞.. X 2 . X n .. We write X n ⎯⎯ X to denote convergence in probability of the → sequence of random variables X 1 ..j≠i = = 2 nσ X +0 ( Because of independence) 2 n 2 σX n 1N ∴ lim E ( ∑ X i − µ X ) 2 = 0 n→∞ n i =1 Convergence in probability Associated with the sequence of random variables X 1 . we can define a sequence of probabilities P{ X n − X > ε }...... The sequence X 1 .

The sequence is said to converge to X in distribution if FX n ( x ) → FX ( x ) as n → ∞..P{ X n = 1} = 1 − and P{ X n = −1} = 1 n 1 n Clearly P{ X n − 1 > ε } = P{ X n = −1} = as n → ∞. P Therefore { X n } ⎯⎯ { X = 0} → 1 →0 n Thus the above sequence converges to a constant in probability.... Then µn ⎯⎯ µ X as n → ∞... Convergence in distribution Consider the random sequence X 1 .. are independent and identically distributed random variables. X 2 .. n i =1 µn = E ( µn − µ X )2 = P{ µn − µ X (as shown above) n ≥ ε } ≤ E ( µn − µ X )2 / ε 2 2 σX 2 σX = 2 nε ∴ P{ µn − µ X ≥ ε } → 0 as n → ∞. X n . Remark: Convergence in probability is also called stochastic convergence. with sample mean µn = We have 1 n ∑ Xi n i =1 1 n ∴ E µn = ∑ X i = µ X n i =1 and 1 n P → ∑ X i . Weak Laws of Large numbers If X 1 . 4 . and a random variable X . X n . Suppose FX n ( x ) and FX ( x ) are the distribution functions of X n and X respectively... X 2 .....

X 2 ... X n .. X n ) We can show that ⎧= 0.. Relation between Types of Convergence as X n ⎯⎯ X → Convergence almost sure p X n ⎯⎯ X → Convergence in probability d X n ⎯⎯ X → Convergence in distribution m.. 0 ≤ z < a ⎪a otherwise ⎪1 ⎩ Clearly. to the random variable X .. X 2 .......for all x at which FX ( x ) is continuous. Here the two distribution functions eventually d coincide.. X 2 ... s . Example: Suppose X 1 ... ⎧0. We write X n ⎯⎯ X to denote convergence in distribution of the random → sequence X 1 . X n ⎯⎯→ X Convergence in mean-square 5 . z < 0 ⎪ n ⎪z FZn ( z ) = ⎨ n . z < a lim FZn ( z ) = FZ ( z ) = ⎨ n →∞ ⎩1 z ≥ a ∴{ zn } Converges to Z in distribution. is a sequence of RVs with each random variable X i having the uniform density ⎧1 ⎪ x≤b f X i ( x) = ⎨ b ⎪0 other wise ⎩ Define Z n = max( X 1 .. X n .

The conditions are: n 1. The CLT states that under very general conditions {Yn = ∑ X i } converges in distribution i =1 2 to Y ~ N ( µY . We shall consider the first condition only.µ X n and 2 var(Yn ) = σ Yn = E{∑ ( X i − µ i )}2 i =1 n =∑ E ( X i − µi )2 + ∑ i =1 2 2 2 = σ X1 + σ X 2 + .. X are independent with different means and 1 2 n same variance and not identically distributed.. X n The mean and variances of Yn are given by EYn = µYn = µ X1 + µ X 2 + .. + σ X n n n i =1 j =1. Suppose E ( X i ) = µ X i and var( X i ) = σ X i . j ≠ i ∑ n E ( X i − µ i )( X j − µ j ) ∵ X i and X j are independent for (i ≠ j ) Thus we can determine the mean and variances of Yn .. X ...... X n The mean and variance of each 2 of the random variables are known.. X 1 2 n variance. the central-limit theorem can be stated as follows: ...Central Limit Theorem Consider n independent random variables X 1 .. but not identically distributed.. The random variables X . The random variables X .. Form a random variable Yn = X 1 + X 2 + . are independent with same mean and 2. Can we guess about the probability distribution of Yn ? The central limit theorem (CLT) provides an answer to this question.. In this case. X . X 2 . 3.. The random variables X .... σ Y ) as n → ∞. X . X are independent with different means and 1 2 n each variance being neither too small nor too large...

Suppose X 1 . We can illustrate this by convolving two uniform distributions repeatedly. . X 2 . Then. Consider the sum of two statistically independent random variables. the variables each with mean µ X and variance σ X and Yn = ∑ i i =1 n sequence {Yn } converges in distribution to a Gaussian random variable Y with mean 2 0 and variance σ X .. Further convolution gives a parabolic distribution and so on. Y = X 1 + X 2 . X n is a sequence of independent and identically distributed random n ( X − µX ) 2 . That is. This can be shown with the help of the characteristic functions as follows: φY (ω ) = E ⎡e jω ( x + x ) ⎤ ⎣ ⎦ 1 2 = E ( e jω x1 ) E ( e jω x2 ) = φ x1 (ω )φ x2 (ω ) ∞ ∴ fY ( x ) = f X1 ( y ) * f X 2 ( y ) = −∞ ∫ f X1 (τ ) f X 2 ( y − τ )dτ where * is the convolution operation. Then the pdf fY ( y ) the convolution of f X1 ( x ) and f X 2 ( x ) . y 2 1 e −u lim FYn ( y ) = ∫ −∞ n →∞ 2π σ X 2 2σ X du Remarks • The central limit theorem is really a property of convolution... say. The convolution of two uniform distributions gives a triangular distribution.

. The characteristic function of Yn is given by φY (ω ) = E ( e n jω Yn ) n ⎡ ⎛ jω 1 ∑ X i ⎞ ⎤ ⎜ ⎟ ⎜ n i =1 ⎟ ⎥ ⎠ = E ⎢ e⎝ ⎢ ⎥ ⎣ ⎦ We will show that as n → ∞ the characteristic function φY is of the form of the n characteristic function of a Gaussian random variable. Further we consider each of X 1 . n 2 σ =σX . Thus. X 2 . 2 Yn E (Yn3 ) = E ( X 3 ) / n and so on.... Yn = ( X 1 + X 2 + . Clearly. µY = 0.(To be animated) Proof of the central limit theorem We give a less rigorous proof of the theorem with the help of the characteristic function.. Expanding e jwY in power series n . X n ) / n . X n to have zero mean.

( jω ) 2 2 ( jω )3 3 Yn + Yn + . n) where R(ω .. if samples are 2 taken from any distribution with mean µ X and variance σ X . we get Therefore. Then the pdf of Y consists of impulses and can never approach the Gaussian pdf. n 2! 3! 2 = σ X . (2) The CLT states that the distribution function FX n ( x ) converges to a Gaussian distribution function. d 2 Yn ⎯⎯ N (0. 2! 3! Assume all the moments of Yn to be finite. 2 φYn (ω ) = 1 − (ω 2 / 2!)σ X + R(ω .. Then e jωYn = 1 + jωYn + φY (ω ) = E ( e jωY ) = 1 + jωµY + n n 2 Substituting µYn = 0 and E (Yn2 ) = σ Yn ( jω ) 2 ( jω ) 2 E (Yn2 ) + E (Yn3 ) + . Note also that each term in R(ω . In other words. n) = 0 n →∞ ∴ lim In φYn (ω ) = e n →∞ − 2 ω 2σ X 2 which is the characteristic function of a Gaussian random variable with 0 mean and 2 variance σ X .... . the distribution function of the sample mean approaches to the distribution function of a Gaussian random variable. the sample mean ˆ µ X = ∑ X i converges in distribution to N ( µ X . 1N n i =1 2 σX n ). σ X ) → Remark: (1) Under the conditions of the CLT. n) involves a ratio of a higher moment and a power of n and therefore. suppose each X i has a Bernoulli distribution. lim R(ω . 2 ∴ φYn ( w) = 1 − (ω 2 / 2!)σ X + R(ω . n). n) is the average of terms involving ω 3 and higher powers of ω . The theorem does not say that the pdf fYn ( y ) is a Gaussian pdf in the limit. For example. as the sample size n increases.

…. X2. (5) The CLT can be used to simulate a Gaussian distribution given a routine to simulate a particular random variable Normal approximation of the Binomial distribution One of the application of the CLT is in approximation of the Binomial coefficients. If a random variable is result of several independent causes. Suppose X1. np(1 − p) ) → . (4) The central-limit theorem one of the most widely used results of probability.p. this distribution does not have a finite mean or a variance. Then Thus. As we have noted earlier. Suppose a random variable X i has the Cauchy distribution f X i ( x) = 1 π (1 + x 2 ) -∞ < x < ∞. -the observation error/ measurement error of any process is modeled as a Gaussian. i =1 n Yn − np np(1 − p) d ⎯⎯ N ( 0. 1) → Yn d ⎯⎯ N ( np. The characteristic function of X i is given by φ X ( w) = e − w i ˆ The sample mean µ X = ∑ X i will have the chacteristic function 1N n i =1 φY ( w) = e − w n Thus the sum of large number of Cauchy random variables will not follow a Gaussian distribution. then the random variable can be considered to be Gaussian. or 2 yn = ∑ X i is a Binomial distribution with µ y n = np and σ yn = n p ( 1 − p ) . -the thermal noise in a resistor is result of the independent motion of billions electrons and is modelled as a Gaussian. ………… Xn.. is a sequence of Bernoulli(p) random variables with P{ Xi =1}= p and P{ xi =0}= 1. For example. X3.(3) The the Cauchy distribution does not meet the conditions for the central limit theorem to hold.

dy ) = − . . 1 e 2 np (1− p ) 2π np (1 − p) ( assume the integrand interval = 1 ) This is normal approximation to the Binomial coefficients and known as the De-MoirreLaplace approximation.∴ ∴ P ( k − 1 < Yn ≤ k P (Yn = k ) = ∫ k −1 k −1 1 2π np(1 − p) 1 ( y − np )2 e 1 ( y − np )2 − . 2 np (1− p ) .

F. ξ ) is a random variable.. Random processes Recall that a random variable maps each sample point in the sample space to a point in the real line. The random process { X ( s. P} as an indexed family of random variables { X ( s. For example. A random process is illustrated below. ξ ).RANDOM PROCESSES In practical problems we deal with time varying waveforms whose value at a time is random in nature. This lecture covers the fundamentals of random processes.t ∈ Γ} where Γ is an index set which may be discrete or continuous usually denoting time. . X (t 0 . A random process maps each sample point to a waveform. Following figure illustrates a random procee. t ∈ T } is normally denoted by { X (t )}. t ). s ∈ S. ξ 0 ) is a single realization of the random process and is a deterministic function. the speech waveform. Remark • • For a fixed t (= t 0 ). For a fixed ξ (= ξ 0 ). ξ ). When both t and ξ are varying we have the random process X (t . s ∈ S . t ). X (t . A random process can be defined on {S . F. Consider a probability space {S . • • For a fixed ξ (= ξ 0 ) and a fixed t (= t 0 ). Thus a random process is a function of the sample point ξ and index variable t and may be written as X (t . the signal received by communication receiver or the daily record of stock-market data represents random variables that change with time. X (t . How do we characterize such data? Such data are characterized as random or stochastic processes. P}. ξ 0 ) is a single number.

Clearly. Continuous-time vs. (TO BE ANIMATED) . {X(t). s2 ) s2 s1 X (t . X (t ) is an example of a continuous-time process.X (t . discrete-time process If the index set Γ is continuous. { X (t ). Example Suppose X (t ) = A cos( w0 t + φ ) where A and w0 are constants and φ is uniformly distributed between 0 and 2π . t ∈ Γ} is a random process with two possible realizations X 1 (t ) = cos ω t and X 2 (t ) = − cos ω t. s3 ) S s3 X (t .t∈Γ} is called a continuous-time process. 4 realizations of the process is illustrated below. s1 ) Figure Random Process t ( To Be animated) Example Consider a sinusoidal signal X (t ) = A cos ω t where A is a binary random variable with probability mass functions p A (1) = p and p A (−1) = 1 − p. At a particular time t0 X (t0 ) is a random variable with two values cos ω t0 and − cos ω t0 .

8636π .φ = 0.8373π φ = 0.9320π φ = 1.6924π φ = 1.

X n is an example of a discrete-time process. n ≥ 0} is used to describe a random sequence indexed by the set of positive integers. Example Suppose X n = √ 2cos(ω0 n + Y ) where ω 0 is a constant and Y is a random variable uniformly distributed between π and . φ = 0. Sometimes the notation { X n . { X (t ).π . n ∈ Z } by sampling a continuous-time process { X (t ). We can define a discrete-time random process on discrete points of time. Advanced statistical signal processing techniques have been developed to process this type of signals. The discrete-time random process is more important in practical implementations. Such a random process can be represented as { X [n].4623π . t ∈ Γ} is called a discrete-time process. t ∈ Γ} at a uniform interval T such that X [n] = X (nT ). Particularly. we can get a discrete-time random process { X [n]. n ∈ Z } and called a random sequence.If the index set Γ is a countable set.

Example Consider the random sequence { X n . and the set of all such states is called the state space.9720π Continuous-state vs. Clearly X n can take only two values. discrete-state process: The value of a random process X (t ) is at any time t can be described from its probabilistic model. . Other-wise the random process is called continuous state.9003π φ = 0. Hence { X n .0 and 1. The state is the value taken by X (t ) at a time t. A random process is discrete-state if the state-space is finite or countable. It also means that the corresponding sample space is also finite countable. n ≥ 0} is a discrete-time twostate process.φ = 1. n ≥ 0} generated by repeated tossing of a fair coin where we assign 1 to Head and 0 to Tail.

. X (t2 )... Particularly. We can similarly define the first-order probability density function f X (t ) ( x) = dFX ( t ) ( x) dx .. X ( tn ) ( x1 . t ∈ Γ} is a discrete-state random process. t2 = E ( X (t 1 ) X (t2 )) Note that .xn ) = P ( X (t1 ) ≤ x1 . X (t2 ). ∀ n ≥ 1 and ∀tn ∈ Γ If the random process is continuous-state. X (t 2 ).xn ) = P ( X (t1 ) = x1 .. X (t1 ).xn ) ∂x1∂x2 .. X (tn ) ≤ xn ).. • • µ x (t ) = Mean of the random process at t = E ( X (t ) RX (t1 . it can be specified by Moments of a random process We defined the moments of a random variable and joint moments of random variables.... x2 .. x2 .. x2 .. following moments are important.. t ∈ Γ} can thus be described by specifying the n-th order joint distribution function FX ( t1 ).... t ∈ Γ}. Thus a random process { X (t ).. This distribution function is called the first-order probability distribution function. ∀ n ≥ 1 and ∀tn ∈ Γ or th the n-th order joint density function f X (t1 ). We can define all the possible moments and joint moments of a random process { X (t )... X (t n ) represents n jointly distributed random variables.∂xn If { X (t ).. X ( t2 )......... X (tn ) = xn )..... For any positive integer n . X (tn ) ( x1 ..xn ) = ∂n FX (t1 ).... X (t2 ) = x2 . t ∈ Γ} we have to use joint distribution function of the random variables at all possible values of t . t2 ) = autocorrelation function of the process at times t1 . X (t2 ) ≤ x2 ..... X ( t2 )... then it can be also specified by the collection of n-th order joint probability mass function p X ( t1 ).. X (tn ) ( x1 ..How to describe a random process? As we have observed above that X (t ) at a specific time t is a random variable and can be described by its probability distribution function FX (t ) ( x) = P ( X (t ) ≤ x). To describe { X (t ).. X ( tn ) ( x1 .. x2 .

X (t 2 )... Example (a) Gaussian Random Process For any positive integer n. X (t 2 ). t3 etc... E ( X 2 ).... The ratio ρX (t1. t ) = E ( X (t ) − µ X (t )) 2 = variance of the process at time t. to The autocorrelation function and the autocovariance functions are widely used characterize a class of random process called the wide-sense stationary process. t2 ) = RX (t2 . X (t2 ). X (tn )]' is jointly Gaussian with the joint density function given by 1 − X'C−1X X e 2 f X (t1 ). t2 ) CX (t1. The above definitions are easily extended to a random sequence { X n . X (t2 ). X (t1 ). t2 ) = CX (t1.. X (t 3 )) = Triple correlation function at t1 .E ( X n ) ] '. These moments give partial information about the process.. t1 ) CX (t2 . n ≥ 0}. t ) = EX 2 (t ) = sec ond moment or mean . ) and RX (t . t 2 . X (t n ) represent variables. The process X (t ) is called Gaussian if the random vector [ X (t1 ).. X (tn ) ( x1 . t2 ) = E ( X (t 1 ) − µ X (t1 ))( X (t2 ) − µ X (t2 )) =RX (t1 . t 3 ) = E ( X (t 1 ).. t2 ) of the random process at time t1 and t2 is defined by C X (t1 .. • The autocovariance function C X (t1 ... t1 .. We can also define higher-order moments R X (t1 .. t2 ) − µ X (t1 ) µ X (t2 ) C X (t . t2 ) is called the correlation coefficient. These n n jointly a random random vector random variables define X = [ X (t1 )... t2 .. X (t2 )..RX (t1 . x2 . ...square value at time t... X (tn )]'..xn ) = where C X = E ( X − µ X )( X − µ X ) ' ( 2π ) n det(CX ) and µ X = E ( X) = [ E ( X 1 ).

Here { X n . below: The mean and autocorrelation of X (t ) : .The Gaussian Random Process is completely specified by the autocovariance matrix and hence by the mean vector and the autocorrelation matrix R X = EXX ' . Thus the discrete – time random process { X n . (b) Bernoulli Random Process A Bernoulli process is a discrete-time random process consisting of a sequence of independent and identically distributed Bernoulli random variables. n ≥ 0} generated by repeated tossing of a fair coin where we assign 1 to Head and 0 to Tail. Thus 1 f Φ (φ ) = 2π X (t ) at a particular t is a random variable and it can be shown that 1 ⎧ ⎪ f X ( t ) ( x ) = ⎨ π A2 − x 2 ⎪0 ⎩ x<A otherwise The pdf is sketched in the Fig. n ≥ 0} is a Bernoulli process where each random variable X n is a Bernoulli random variable with 1 and 2 1 p X (0) = P{ X n = 0} = 2 p X (1) = P{ X n = 1} = (c) A sinusoid with a random phase X (t ) = A cos( w0 t + φ ) where A and w0 are constants and φ is uniformly distributed between 0 and 2π . n ≥ 0} is Bernoulli process if P{ X n = 1} = p and P{ X n = 0} = 1 − p Example Consider the random sequence { X n .

µ X (t ) = EX (t ) = EA cos( w0t + φ ) = ∫ A cos( w0t + φ ) −∞ ∞ 1 dφ 2π =0 RX (t1 . For any positive integer / n . Y (t1/ ).. Consider two random processes { X (t ). t ∈ Γ}. X (tn ).. Y (t2 ). To describe two or more random processes we have to use the joint distribution functions and the joint moments. X (t1 ). Thus these two random processes can be described by the ( n + m)th order joint distribution function .... X (t2 )....Y (t n/ ) represent 2n jointly distributed random variables. We often deal with the input and output processes of a system. t ∈ Γ} and {Y (t ). t2 ) = EA cos( w0t1 + φ ) A cos( w0t2 + φ ) = A2 E cos( w0t1 + φ ) cos( w0t2 + φ ) A2 E (cos( w0 (t1 − t2 )) + cos( w0 (t1 + t2 + 2φ ))) 2 1 A2 A2 π cos( w0 (t1 − t2 )) + = dφ ∫ cos( w0 (t1 + t2 + 2φ )) 2 2 −π 2π A2 = cos( w0 (t1 − t2 )) 2 = Two or More Random Processes In practical situations we deal with two or more random processes...

.. X (t2 ).. t2 ) == E ( X (t 1 ) − µ X (t1 ))( X (t2 ) − µ X (t2 )) = RXY (t1 ... X (t 1 2 )...Y (t n ) Two random processes can be partially described by the joint moments: Cross − correlation function of the processes at times t1 . Y (t1/ ) ≤ y1 .....Y ( t1 )..... t2 ) CX (t1... t ∈ Γ}.Y ( t2 )...FX (t ). t2 • • ... yn ) = ∂ 2n F / / / ( x1 ...Y (t1 ). y1 ..∂yn X ( t1 ).... x2 . we can study the degree of dependence between two random processes Independent processes: Two random processes { X (t )......Y ( t2 )... y2 ..... yn ) ∂x1∂x2 ... are called independent if Uncorrelated processes: Two random processes { X (t ). y1 . X / / / ( tn ). y2 ...∂xn ∂y1∂y2 .. X ( tn ).. y2 ..Y ( tn ) ( x1 . t2 ) On the basis of the above definitions..xn .. t ∈ Γ}.. x2 ... t2 ) = E ( X (t 1 )Y (t2 )) = E ( X (t 1 )Y (t2 )) Cross − cov ariance function of the processes at times t1 ..Y (t2 ).. t1 ) CY (t2 . ym ) / = P( X (t1 ) ≤ x1 .Y ( t m ) ( x1 . Y (t 2/ ) ≤ y2 .. t2 ) = CXY (t1. t2 ) − µ X (t1 ) µY (t2 ) Cross-correlation coefficient • ρXY (t1... are called uncorrelated if C XY (t1 .Y ( t1 ). x2 .xn . t2 ) = 0 ∀ t1 ... t ∈ Γ} and {Y (t ).. X (t2 ) ≤ x2 . X (t 1 2 ).Y (tm ) ≤ ym ) or the corresponding ( n + m )th order joint density function f X (t )... X (tn ) ≤ xn ... t2 RXY (t1 . t ∈ Γ} and {Y (t ).. y1 . t2 ∈ Γ This also implies that for such two processes . xn . X / / / ( tn ). C XY (t1 .

t2 ∈ Γ Example Suppose X (t ) = A cos( w0 t + φ ) and Y (t ) = A sin( w0t + φ ) where A and w0 are constants and φ is uniformly distributed between 0 and 2π . t ∈ Γ}. t2 ) = µ X (t1 ) µY (t2 ) . . are called orthogonal if R XY (t1 . t ∈ Γ} and {Y (t ). t2 ) = 0 ∀ t1 . Orthogonal processes: Two random processes { X (t ).RXY (t1 .

P}. Following figure illustrates a random procee.RANDOM PROCESSES In practical problems we deal with time varying waveforms whose value at a time is random in nature.t ∈ Γ} where Γ is an index set which may be discrete or continuous usually denoting time. Consider a probability space {S . ξ ). For example. Remark • • For a fixed t (= t 0 ). The random process { X ( s. the signal received by communication receiver or the daily record of stock-market data represents random variables that change with time. ξ ) is a random variable. t ). ξ 0 ) is a single realization of the random process and is a deterministic function. X (t . A random process is illustrated below. P} as an indexed family of random variables { X ( s. F. When both t and ξ are varying we have the random process X (t . . s ∈ S . ξ 0 ) is a single number. Random processes Recall that a random variable maps each sample point in the sample space to a point in the real line. the speech waveform. ξ ). X (t 0 . A random process maps each sample point to a waveform. t ∈ T } is normally denoted by { X (t )}. s ∈ S.. t ). X (t . F. For a fixed ξ (= ξ 0 ). Thus a random process is a function of the sample point ξ and index variable t and may be written as X (t . How do we characterize such data? Such data are characterized as random or stochastic processes. • • For a fixed ξ (= ξ 0 ) and a fixed t (= t 0 ). This lecture covers the fundamentals of random processes. A random process can be defined on {S .

X (t . Continuous-time vs. s2 ) s2 s1 X (t . s3 ) S s3 X (t .t∈Γ} is called a continuous-time process. s1 ) Figure Random Process t ( To Be animated) Example Consider a sinusoidal signal X (t ) = A cos ω t where A is a binary random variable with probability mass functions p A (1) = p and p A (−1) = 1 − p. {X(t). (TO BE ANIMATED) . Example Suppose X (t ) = A cos( w0 t + φ ) where A and w0 are constants and φ is uniformly distributed between 0 and 2π . X (t ) is an example of a continuous-time process. At a particular time t0 X (t0 ) is a random variable with two values cos ω t0 and − cos ω t0 . 4 realizations of the process is illustrated below. t ∈ Γ} is a random process with two possible realizations X 1 (t ) = cos ω t and X 2 (t ) = − cos ω t. { X (t ). discrete-time process If the index set Γ is continuous. Clearly.

9320π φ = 1.φ = 0.8636π .8373π φ = 0.6924π φ = 1.

Particularly. Example Suppose X n = √ 2cos(ω0 n + Y ) where ω 0 is a constant and Y is a random variable uniformly distributed between π and . t ∈ Γ} at a uniform interval T such that X [n] = X (nT ). Sometimes the notation { X n . X n is an example of a discrete-time process.4623π . We can define a discrete-time random process on discrete points of time.If the index set Γ is a countable set. Such a random process can be represented as { X [n]. φ = 0. n ≥ 0} is used to describe a random sequence indexed by the set of positive integers. n ∈ Z } and called a random sequence. The discrete-time random process is more important in practical implementations. n ∈ Z } by sampling a continuous-time process { X (t ).π . t ∈ Γ} is called a discrete-time process. { X (t ). we can get a discrete-time random process { X [n]. Advanced statistical signal processing techniques have been developed to process this type of signals.

discrete-state process: The value of a random process X (t ) is at any time t can be described from its probabilistic model. A random process is discrete-state if the state-space is finite or countable. . n ≥ 0} generated by repeated tossing of a fair coin where we assign 1 to Head and 0 to Tail. It also means that the corresponding sample space is also finite countable.φ = 1. The state is the value taken by X (t ) at a time t.0 and 1. Hence { X n . n ≥ 0} is a discrete-time twostate process.9720π Continuous-state vs. Other-wise the random process is called continuous state. Clearly X n can take only two values.9003π φ = 0. and the set of all such states is called the state space. Example Consider the random sequence { X n .

. For any positive integer n .... X (t2 ) ≤ x2 .. t ∈ Γ} we have to use joint distribution function of the random variables at all possible values of t . then it can be also specified by the collection of n-th order joint probability mass function p X ( t1 ). X ( t2 )... X (t2 ) = x2 . • • µ x (t ) = Mean of the random process at t = E ( X (t ) RX (t1 .xn ) = P ( X (t1 ) ≤ x1 .. x2 .. X (t n ) represents n jointly distributed random variables. X (t2 ).... t ∈ Γ} can thus be described by specifying the n-th order joint distribution function FX ( t1 )....xn ) = ∂n FX (t1 ). it can be specified by Moments of a random process We defined the moments of a random variable and joint moments of random variables.. following moments are important. Thus a random process { X (t ).xn ) ∂x1∂x2 . x2 ... X (tn ) ( x1 .. X (tn ) = xn ). X (t2 )...How to describe a random process? As we have observed above that X (t ) at a specific time t is a random variable and can be described by its probability distribution function FX (t ) ( x) = P ( X (t ) ≤ x). t2 ) = autocorrelation function of the process at times t1 ..... X (t1 ).. X ( tn ) ( x1 .. We can similarly define the first-order probability density function f X (t ) ( x) = dFX ( t ) ( x) dx ... t ∈ Γ}. X ( t2 )..... ∀ n ≥ 1 and ∀tn ∈ Γ If the random process is continuous-state.. x2 ... This distribution function is called the first-order probability distribution function.. X ( tn ) ( x1 .. t2 = E ( X (t 1 ) X (t2 )) Note that .∂xn If { X (t )... X (tn ) ≤ xn ). Particularly.. x2 . X (tn ) ( x1 . t ∈ Γ} is a discrete-state random process..xn ) = P ( X (t1 ) = x1 ... ∀ n ≥ 1 and ∀tn ∈ Γ or th the n-th order joint density function f X (t1 ). We can define all the possible moments and joint moments of a random process { X (t )... X (t 2 ). To describe { X (t )..

RX (t1 . The above definitions are easily extended to a random sequence { X n ... X (tn )]' is jointly Gaussian with the joint density function given by 1 − X'C−1X X e 2 f X (t1 ). t2 ) = RX (t2 . E ( X 2 ). ) and RX (t . X (t 3 )) = Triple correlation function at t1 . t2 ..xn ) = where C X = E ( X − µ X )( X − µ X ) ' ( 2π ) n det(CX ) and µ X = E ( X) = [ E ( X 1 ). We can also define higher-order moments R X (t1 ... t3 etc. X (t2 )..... t2 ) CX (t1.. t2 ) = E ( X (t 1 ) − µ X (t1 ))( X (t2 ) − µ X (t2 )) =RX (t1 . X (tn )]'. t2 ) of the random process at time t1 and t2 is defined by C X (t1 . to The autocorrelation function and the autocovariance functions are widely used characterize a class of random process called the wide-sense stationary process..... X (t2 ). The ratio ρX (t1.square value at time t..... t 3 ) = E ( X (t 1 ). • The autocovariance function C X (t1 .. X (t n ) represent variables.E ( X n ) ] '. t1 ) CX (t2 .. . X (t 2 ). These n n jointly a random random vector random variables define X = [ X (t1 ). t2 ) is called the correlation coefficient. Example (a) Gaussian Random Process For any positive integer n. t2 ) = CX (t1.. t 2 . These moments give partial information about the process. n ≥ 0}.. The process X (t ) is called Gaussian if the random vector [ X (t1 ). x2 . t ) = EX 2 (t ) = sec ond moment or mean .. t ) = E ( X (t ) − µ X (t )) 2 = variance of the process at time t. t2 ) − µ X (t1 ) µ X (t2 ) C X (t . X (t2 ). X (t1 ).. X (tn ) ( x1 . t1 .. X (t 2 ).

n ≥ 0} generated by repeated tossing of a fair coin where we assign 1 to Head and 0 to Tail. (b) Bernoulli Random Process A Bernoulli process is a discrete-time random process consisting of a sequence of independent and identically distributed Bernoulli random variables. Here { X n . Thus the discrete – time random process { X n . Thus 1 f Φ (φ ) = 2π X (t ) at a particular t is a random variable and it can be shown that 1 ⎧ ⎪ f X ( t ) ( x ) = ⎨ π A2 − x 2 ⎪0 ⎩ x<A otherwise The pdf is sketched in the Fig. n ≥ 0} is a Bernoulli process where each random variable X n is a Bernoulli random variable with 1 and 2 1 p X (0) = P{ X n = 0} = 2 p X (1) = P{ X n = 1} = (c) A sinusoid with a random phase X (t ) = A cos( w0 t + φ ) where A and w0 are constants and φ is uniformly distributed between 0 and 2π . n ≥ 0} is Bernoulli process if P{ X n = 1} = p and P{ X n = 0} = 1 − p Example Consider the random sequence { X n .The Gaussian Random Process is completely specified by the autocovariance matrix and hence by the mean vector and the autocorrelation matrix R X = EXX ' . below: The mean and autocorrelation of X (t ) : .

Thus these two random processes can be described by the ( n + m)th order joint distribution function ..µ X (t ) = EX (t ) = EA cos( w0t + φ ) = ∫ A cos( w0t + φ ) −∞ ∞ 1 dφ 2π =0 RX (t1 . X (t1 ). We often deal with the input and output processes of a system. t ∈ Γ} and {Y (t ).. t2 ) = EA cos( w0t1 + φ ) A cos( w0t2 + φ ) = A2 E cos( w0t1 + φ ) cos( w0t2 + φ ) A2 E (cos( w0 (t1 − t2 )) + cos( w0 (t1 + t2 + 2φ ))) 2 1 A2 A2 π cos( w0 (t1 − t2 )) + = dφ ∫ cos( w0 (t1 + t2 + 2φ )) 2 2 −π 2π A2 = cos( w0 (t1 − t2 )) 2 = Two or More Random Processes In practical situations we deal with two or more random processes. Y (t1/ ). For any positive integer / n ..... X (tn )...Y (t n/ ) represent 2n jointly distributed random variables.. X (t2 ). To describe two or more random processes we have to use the joint distribution functions and the joint moments. Consider two random processes { X (t ). Y (t2 ). t ∈ Γ}..

........ ym ) / = P( X (t1 ) ≤ x1 . t ∈ Γ}.. X / / / ( tn ).. t2 ) CX (t1.Y (t1 ).Y (t n ) Two random processes can be partially described by the joint moments: Cross − correlation function of the processes at times t1 .. are called uncorrelated if C XY (t1 ...xn ... y2 ....... are called independent if Uncorrelated processes: Two random processes { X (t ). y2 .. t ∈ Γ}...... yn ) ∂x1∂x2 . x2 .Y ( t1 ).. y1 . t ∈ Γ} and {Y (t )... t2 ∈ Γ This also implies that for such two processes . t2 RXY (t1 ... y1 .. X (t 1 2 ).Y ( t2 )..Y ( t2 ). X (t2 ) ≤ x2 .. t2 ) == E ( X (t 1 ) − µ X (t1 ))( X (t2 ) − µ X (t2 )) = RXY (t1 . X (t2 ).. X (tn ) ≤ xn .Y (t2 )... C XY (t1 .. t2 • • . t2 ) = E ( X (t 1 )Y (t2 )) = E ( X (t 1 )Y (t2 )) Cross − cov ariance function of the processes at times t1 . X / / / ( tn ).FX (t ). y1 .. yn ) = ∂ 2n F / / / ( x1 .. X ( tn ). x2 ... Y (t1/ ) ≤ y1 .. t2 ) = CXY (t1. t2 ) − µ X (t1 ) µY (t2 ) Cross-correlation coefficient • ρXY (t1. x2 .. X (t 1 2 ). Y (t 2/ ) ≤ y2 .Y ( t m ) ( x1 .xn .∂yn X ( t1 )..Y ( t1 )... t ∈ Γ} and {Y (t )...Y (tm ) ≤ ym ) or the corresponding ( n + m )th order joint density function f X (t ). we can study the degree of dependence between two random processes Independent processes: Two random processes { X (t ). xn ... t2 ) On the basis of the above definitions.Y ( tn ) ( x1 .... t1 ) CY (t2 .........∂xn ∂y1∂y2 . t2 ) = 0 ∀ t1 . y2 .

. t2 ∈ Γ Example Suppose X (t ) = A cos( w0 t + φ ) and Y (t ) = A sin( w0t + φ ) where A and w0 are constants and φ is uniformly distributed between 0 and 2π . t ∈ Γ}. t2 ) = 0 ∀ t1 . t ∈ Γ} and {Y (t ). Orthogonal processes: Two random processes { X (t ). are called orthogonal if R XY (t1 .RXY (t1 . t2 ) = µ X (t1 ) µY (t2 ) .

m) + µ X Example Bernoulli process: Consider the Bernoulli process { X n } with p X (1) = p and p X (0) = 1 − p This process is an iid process. n2 ... p X ( xn ) 1 2 N Moments of the IID process: It is easy to verify that for an iid process { X n } • • • Mean EX n = µ X = constant 2 Variance E ( X n − µ X ) 2 = σ X =constant Autocovariance C X (n.. m) + µ X = σ X δ (n. nN ... X n ( x1 . X 2 . X nN are jointly independent with a common distribution. Similarly. Using the iid property... FX n .. p X1 ..xn ) = p X ( x1 ) p X ( x2 ).1) = (1 − p ) 2 p and so on. the variance and the autocorrelation function are given by . the mean....... then { X n } is called an independent and identically distributed (iid) random process. m] = 1 for n = m and 0 otherwise. 0.xn ) = FX ( x1 ) FX ( x2 ). X 2 (1. x2 ... For example.. m] where δ [n. Thus for an iid random process { X n }. X n2 . 2 2 2 • Autocorrelation RX (n.. (a) Independent and Identically Distributed Process Consider a discrete-time random process { X n }. 0) = p (1 − p ) p X1 . X n . For any finite choice of time instants n1 . x2 ...... m) = C X (n.FX ( xn ) 1 2 N and equivalently p X n .. m) = E ( X n − µ X )( X m − µ X ) = E( X n − µ X )E( X m − µ X ) ⎧0 for n ≠ m =⎨ 2 ⎩σ X otherwise 2 = σ X δ [n.. if the random variables X n1 ... X n .Important Classes of Random Processes Having characterized the random process by the joint distribution ( density) functions and joint moments we define the following two important classes of random processes. we can obtain the joint probability mass functions of any order in terms of p.. X 3 (0... X n ( x1 .

µ X = EX n = p n var( X n ) = p (1 − p ) RX (n1 . If the probability distribution of X (t + r ) − X (t '+ r ) is same as that of X (t ) − X (t ′)..X (tn ) − X (tn −1 ) are jointly independent random variables. for t1 > t2 . for any choice of t . . the set of n random variables X (t1 ). The above definitions of the independent increment process and the stationary • increment process can be easily extended to discrete-time random processes.for t1 < t2 .. the autocorrelation function of X (t ) is given by RX (t1 . X (t2 ) ≤ x2 } = P ({ X (t1 ) ≤ x1}) P({ X (t2 ) ≤ x2 }/{ X (t1 ) ≤ x1}) = P ({ X (t1 ) ≤ x1 ) P({ X (t2 ) − X (t1 ) ≤ x2 − x1}) = FX (t1 ) ( x1 ) FX ( t2 ) − X (t1 ) ( x2 − x1 ) • The independent increment property simplifies the computation autocovariance function. X (t2 ) ( x1 . t 'and r . x2 ) = P ({ X (t1 ) ≤ x1 .. As an example... n2 ) = EX n1 X n2 = EX n1 EX n2 = p2 (b) Independent Increment Process A random process { X (t )} is called an independent increment process if for any n > 1 and t1 < t2 < . density and mass functions from the corresponding first-order quantities. FX (t1 ). the { X (t )} is called stationary increment process. For t1 < t2 . X (t2 ) − X (t1 ). t2 ) = var( X (t2 )) Therefore C X (t1 . t2 ) = var( X (min(t1 . t2 ) = EX (t1 ) X (t2 ) = EX (t1 )( X (t1 ) + X (t2 ) − X (t1 )) = EX 2 (t1 ) + EX (t1 ) E ( X (t2 ) − X (t1 )) = EX 2 (t1 ) + EX (t1 ) EX (t2 ) − ( EX (t1 )) 2 = var( X (t1 )) + EX (t1 ) EX (t2 ) ∴ C X (t1 . t2 ) = EX (t1 ) X (t2 ) − EX (t1 ) EX (t2 ) = var( X (t1 )) Similarly. of the C X (t1 . < tn ∈ Γ. The independent increment property simplifies the calculation of joint • probability distribution. t2 ))) . x1 < x2 .

• • If we call Z n = 1 as success and Z n = −1 as failure. We shall discuss these processes shortly. then X n = ∑ Z i i =1 n represents the total number of successes in n independent trials. Clearly k = n1 − n−1 where n1 = number of successes and n−1 = number of failures in n trials of Z n such that n1 + n−1 = n.Example: Two continuous-time independent increment processes are widely studied. • It is an independent increment process. X n can take integer values from −n to n Suppose X n = k . • This process is one of the widely studied random processes. { X n } is called a symmetrical random walk process. This follows from the fact that X n − X n −1 = Z n and {Z n } is an iid process. n+k n−k n+k n−k ⎧n C n + k p 2 (1 − p) 2 if and are non-negative integers ⎪ ∴ pX n (k ) = ⎨ 2 2 2 ⎪0 otherwise ⎩ . 2 Probability mass function of the Random Walk Process At an instant n. Then the sum process { X n } given by X n = ∑ Z i = X n −1 + Z n i =1 n with X 0 = 0 is called a Random Walk process. n+k n−k ∴ n1 = and n−1 = 2 2 Also n1 and n−1 are necessarily non-negative integers. Random Walk process Consider an iid process {Z n } having two states Z n = 1 Z n = −1 with the probability mass functions pZ (1) = p and pZ (−1) = q = 1 − p. 1 If p = . They are: (a) Wiener process with the increments following Gaussian distribution and (b) Poisson process with the increments following Poisson distribution.

Variance and Covariance of a Random Walk process Note that EZ n = 1× p − 1× (1 − p) = 2 p − 1 2 EZ n = 1× p + 1× (1 − p) = 1 and 2 var( Z n ) = EZ n − (EZ n )2 = 1. n2 ) Three realizations of a random walk process is as shown in the Fig.Mean. the autocovariance function is given by C X (n1 . then . below: Remark If the increment Z n of the random walk process takes the values of s and − s.4 p 2 + 4 p − 1 = 4 pq ∴ EX n = ∑ EZ i = n(2 p − 1) i =1 n and var( X n ) = ∑ var( Z i ) i =1 n ∵ Z i s are independent random variables = 4npq Since the random walk process { X n } is an independent increment process. n2 ) = 4 pq min(n1 .

X n −1 = xn −1 }) = P({Z n = xn − xn −1 }) = P({ X n = xn | X n −1 = xn −1 }) Wiener Process Consider a symmetrical random walk process { X n } given by X n = X (n∆ ) where the discrete instants in the time axis are separated by ∆ as shown in the Fig. is independent of the past..∴ EX n = ∑ EZ i = n(2 p − 1) s i =1 n and var( X n ) = ∑ var( Z i ) i =1 n = 4npqs 2 (c) Markov process A process { X (t )} is called a Markov process if for any sequence of time t1 < t2 < .. 0 ∆ 2∆ t = n∆ Γ . X (t2 ) = x2 ... If { X n } is a discretetime discrete-state random process. • Many practical signals with strong correlation between neighbouring samples are modelled as Markov processes Example Show that the random walk process { X n } is Markov. below.... P ({ X (tn ) ≤ x | X (t1 ) = x1 .... the process is Markov if P ({ X n = xn | X 0 = x0 . X 1 = x1 . X 1 = x1 . X n −1 = xn −1 }) = P ({ X n = xn | X n −1 = xn −1 }) • An iid random process is a Markov process. < tn ... Assume ∆ to be infinitesimally small... P({ X n = xn | X 0 = 0. X n −1 = xn −1 }) = P({ X n −1 + Z n = xn | X 0 = 0. X (tn −1 ) = xn −1 }) = P ({ X (tn ) ≤ x | X (tn −1 ) = xn −1 }) • Thus for a Markov process “the future of the process.. X 1 = x1 ..... Here. given present.” • A discrete-state Markov process is called a Markov Chain..

2πα t 2 t ns2 = s2 = αt ∆ A random process { X ( t )} is called a Wiener process or the Brownian motion process if it satisfies the following conditions: (1) X ( 0 ) = 0 (2) X ( t ) is an independent increment process. This process { X ( t )} is called the Wienerprocess. X n becomes the continuous-time process X (t ) with the pdf 1x − 1 f X ( t ) ( x) = e 2 α t .Clearly. Wiener Process is the integration of the white noise process. EX n = 0 1 1 var( X n ) = 4 pqns 2 = 4 × × ns 2 = ns 2 2 2 For large n. (3) For each s ≥ 0. A realization of the Wiener process is shown in the figure below: . t ≥ 0 variance α t . 1x − 1 f X ( s +t )− X ( s ) ( x) = e 2 αt 2πα t 2 X ( s + t ) − X ( s ) has the normal distribution with mean 0 and • • Wiener process was used to model the Brownian motion – microscopic particles suspended in a fluid are subject to continuous molecular impacts resulting in the zigzag motion of the particle named Brownian motion after the British Botanist Brown. the distribution of X n approaches the normal with men 0 and variance As ∆ → 0 and n → ∞.

RX ( t1 . t2 ) = EX ( t1 ) X ( t2 ) = EX ( t1 ) { X ( t2 ) − X ( t1 ) + X ( t1 )} = EX ( t1 ) E { X ( t2 ) − X ( t1 )} + EX 2 ( t1 ) = EX 2 ( t1 ) = α t1 Assuming t2 > t1 Similarly if t1 > t2 RX ( t1 . t2 ) = α min ( t1 . t2 ) = α min ( t1 . t2 ) = α t2 ∴ RX ( t1 . t2 ) X ( t ) is a Gaussian process. . t2 ) 1 x − 1 2α t f X (t ) ( x ) = e 2πα t 2 Remark C X ( t1 .

are independent. Some typical examples are • • • Number of alpha particles emitted by a radio active substance. The counting process { N (t ). P ({ N (t + ∆t ) = n} ) = Probability of occurrence of n events up to time t + ∆t = P ({ N (t ) = n. n(∆t ) ≥ 2} ) = P ({ N (t ) = n} ) (1 − λ∆t − 0(∆t )) + P ({ N (t ) = n − 1} ) (λ∆t + 0(∆t )) + P ({ N (t ) = (n − 1)} ) (0(∆t )) . Thus the increments N (t2 ) − N (t1 ) and N (t4 ) − N (t3 ). Number of binary packets received at switching node of a communication network. Number of cars arriving at a petrol pump during a particular interval of time. t ≥ 0} is a continuous-time discrete state process and any of its realizations is non-decreasing function of time. Such a process is called a counting process and we shall denote it by { N (t ). N (∆t ) = 0} ) + P ({N (t ) = n − 1. t ≥ 0} . t]. ∆t The assumptions are valid for many applications. N (∆t ) = 0} ) + P ({N (t ) = n − 1. n(∆t ) ≥ 2} ) = P ({ N (t ) = n} ) (1 − λ∆t − 0(∆t )) + P ({ N (t ) = n − 1} ) (λ∆t + 0(∆t )) + P ({ N (t ) = (n − 1)} ) (0( P ({ N (t ) = n. t ≥ 0} is called Poisson’s process with the rate parameter λ if (i) (ii) N(0)=0 N (t) is an independent increment process. n(∆t ) = 1}) + P ({ N (t ) < n − 1.Poisson Process Consider a random process representing the number of occurrences of an event up to time t (over time interval (0. n(∆t ) = 1}) + P ({ N (t ) < n − 1. (iii) P ({ N (∆t ) = 1} ) = λ∆t + 0(∆t ) P ({ N (∆t ) ≥ 2} ) = 0 ( ∆t ) ∆t → 0 where 0( ∆t ) implies any function such that lim (iv) 0(∆t ) = 0. Clearly { N (t ). et.

∆t → 0 lim P ({ N (t + ∆t ) = n} ) − P ({ N (t ) = n} ) ∆t ∴ = −λ ⎡ P ({N (t ) = n}) − P ({N (t ) = n − 1}) ⎤ ⎣ ⎦ (1) d P ({ N (t ) = n} ) = λ ⎡ P ({ N (t ) = n} ) − P ({ N (t ) = n − 1} ) ⎤ ⎣ ⎦ dt The above is a first-order linear differential equation with initial condition P({ N (0) = n}) = 0. It can be shown that (λ (t2 − t1 )) n e− λ (t2 −t1 ) P ({ N (t2 ) − N (t1 ) = n} ) = n! Thus the probability of the increments depends on the length of the interval t2 − t1 and not on the absolute times t2and t1. Thus the Poisson process is a process with stationary increments. . First consider the problem to find P ({ N (t ) = 0} ) From (1) d P ({ N (t ) = 0} ) = λ P ({ N (t ) = 0} ) dt ⇒ P { N (t ) = 0} = e − λt Next to find P ( N (t ) = 1) d ({P( N (t ) = 1}) = −λ P ({ N (t ) = 1}) − λ P ({ N (t ) = 0}) dt = −λ P { N (t ) = 1} − λ e − λt with initial condition P ({ N (0) = 1} ) = 0. (2) The independent and stationary increment properties help us to compute the joint probability mass function of N (t ). Solving the above first-order linear differential equation we get P {{ N (t ) = 1}} = λte− λt Now by mathematical indication we can show that P ({ N (t ) = n} ) = (λ t ) n e − λ t n! Remark (1) The parameter λ is called the rate or intensity of the Poisson process. This differential equation can be solved recursively. . For example.

Average arrival = 30 cars/hr = ½ car/min . N (t ) is a Poisson random variable with the parameter λt. we can readily show that CN (t1 . t2 ) = CN (t1 . As N (t ) is a random process with independent increment. Find the probability that during a period of 5 minutes (i) no car comes to the station. Variance and Covariance of the Poisson process We observe that at any time t > 0. t2 ))) = λ min(t1 . t2 ) + EN (t1 ) EN (t2 ) = λ min(t1 . N (t2 ) = n2 } ) = P ({ N (t1 ) = n1}) P ({N (t2 ) = n2 } /{N (t1 ) = n1}) = P ({ N (t1 ) = n1}) P({N (t2 ) − N (t1 ) = n2 − n1} ) (λt1 ) n1 e − λt1 (λ (t2 − t1 )) n2 − n1 e − λ (t2 −t1 ) = (n2 − n1 )! n1 ! Mean. t2 ) + λ 2t1t2 A typical realization of a Poisson process is shown below: Example: A petrol pump serves on the average 30cars per hour. t2 ) = var( N (min(t1 . t2 ) ∴ RN (t1 . EN (t ) = λ t and var N (t ) = λ t Thus both the mean and variance of a Poisson process varies linearly with time.P ({ N (t1 ) = n1 . Therefore. (ii) exactly 3 cars come to the station and (iii) more than 3 cars come to the station.

5 e (ii) P { N (5) = 3} = ⎝ 3! (iii) P{N (5) > 3} = 1 − 0.} represent the arrival time of the Poisson process...21 = 0.79 Binomial model: P = probability of car coming in 1 minute=1/2 n=5 ∴ P( X = 0) = (1 − P)5 = 0. The random process {Tn .5 = 0.55 Inter-arrival time and Waiting time of Poisson Process: o t1 t2 t n −1 tn T1 T2 Tn−1 Tn Let Tn = time elapsed between (n-1) st event and nth event.Probability of no car in 5 minutes (i) P { N (5) = 0} = e 1 − ×5 2 = e −2. Clearly T1 is a continuous random variable.0821 3 ⎛1 ⎞ ⎜ × 5⎟ 2 ⎠ −2. Let us find out the probability P({T1 > t}) P ({T1 > t}) = P({0 event upto time t}) = e− λt ∴ FT1 (t ) = 1 − P{T1 > t} = 1 − e − λt . T1 = time elapsed before the first event take place. 2. n = 1.

we can easily prove that the inter-arrival times are independent also. • The exponential distribution of the inter-arrival process indicates that the arrival process has no memory. Further. T2 ≤ t2 }) = P({T1 ≤ t1}) P({T2 ≤ t2 }/{T1 ≤ t1}) = P({T1 ≤ t1}) P({zero event occurs in (T1 .T2 (t1 . t2 ) = P({T1 ≤ t1 . P ({Tn > t}) = P( {0 event occurs in the interval (tn −1 . This is the time that elapses before the nth event occurs.∴ fT1 (t ) = λ e − λt t ≥ 0 Similarly. tn −1 ] }) = P( {0 event occurs in the interval (tn −1 . . tn −1 + t )/(n-1)th event occurs at (0. then {N (t ). t ≥ 0} process are 1 exponentially distributed with mean . Thus λ Wn = ∑ Ti i =1 n How to find the first order pdf of Wn is left as an exercise. t1 Another important quantity is the waiting time Wn . T1 +t2 ) }) = P({T1 ≤ t1}) P({T2 ≤ t2 }) = FT1 (t1 ) FT2 (t2 ) • It is interesting to note that the converse of the above result is also true. Thus P ({Tn > t0 + t1 / Tn > t1}) = P ({Tn > t0 }) ∀t0 . with fTn (t ) = λ e − λt n>0 Remark • We have seen that the inter-arrival times are identically distributed with the exponential pdf. tn −1 + t ]}) = e − λt ∴ fTn (t ) = λ e − λt Thus the inter-arrival times of a Poisson process with the parameter λ are exponentially distributed. T1 ]}) = P({T1 ≤ t1}) P({zero event occurs in (T1 . t ≥ 0} is a Poisson process with the parameter λ . T1 +t2 ) }/{one event occurs in (01 . If the inter-arrival times between two events of a discrete state {N (t ). For example FT1 . Note that Wn is the sum of n independent and identically distributed random variables.

.) 2! = e − λt cosh λ t and p X ( t ) (−1) = P({ X (t ) = −1}) = P({N (t ) = 1} + P ({N (t ) = 3}) + . (a) What is the mean arrival time of the customers? (b) What is the probability that the second customer will arrive 5 minutes after the first customer has arrived? (c) What is the average waiting time before the 10th customer arrives? Semi-random Telegraph signal Consider a two-state random process { X (t )} with the states X (t ) = 1 and X (t ) = −1.. t ] is even and X (t ) = −1 if the number of events of the Poisson process N (t ) in the interval (0.. Such a process { X (t )} is called the semi-random telegraph signal because the initial value X (0) is always 1... for t1 < t2 . (λ t ) 3 + .. = e − λt {1 + (λ t ) 2 + . t ] is odd.. For example. p X ( t ) (1) = P ({ X (t ) = 1}) = P({N (t ) = 0} + P ({N (t ) = 2}) + . Suppose X (t ) = 1 if the number of events of the Poisson process N (t ) in the interval (0.∴ EWn = ∑ ETi = ∑ ETi = i =1 i =1 n n n λ n and var (Wn ) = var ∑ (Ti ) = ∑ var(Ti ) = i =1 i =1 n n λ2 Example The number of customers arriving at a service station is a Poisson process with a rate 10 customers per minute.) 1! 3! = e − λt sinh λt = e − λt { + λt We can also find the conditional and joint probability mass functions... ..

p X ( t1 ), X ( t2 ) (1,1) = P({ X (t1 ) = 1}) P({ X (t2 ) = 1)}/{ X (t1 ) = 1)} = e − λt1 cosh λ t1 P ({N (t2 ) is even )}/{N (t1 ) is even)} = e − λt1 cosh λ t1 P ({N (t2 ) − N (t1 ) is even )}/{N (t1 ) is even)} = e − λt1 cosh λ t1 P ({N (t2 ) − N (t1 ) is even )} = e − λt1 cosh λ t1e − λ ( t2 −t1 ) cosh λ (t2 − t1 ) = e − λt2 cosh λ t1 cosh λ (t2 − t1 ) Similarly p X ( t1 ), X ( t2 ) (1, −1) = e − λt2 cosh λ t1 sinh λ (t2 − t1 ), p X ( t1 ), X ( t2 ) (−1,1) = e − λt2 sinh λt1 sinh λ (t2 − t1 ) p X ( t1 ), X ( t2 ) (−1, −1) = e − λt2 sinh λ t1 cosh λ (t2 − t1 )

**Mean, autocorrelation and autocovariance function of X (t )
**

EX (t ) = 1× e − λt cosh λt − 1× e− λt sinh λ t = e − λt (cosh λt − sinh λt ) = e −2 λ t EX 2 (t ) = 1× e − λt cosh λ t + 1× e− λt sinh λ t = e − λt (cosh λt + sinh λt ) = e − λt eλt =1 ∴ var( X (t )) = 1 − e −4 λt For t1 < t2 RX (t1 , t2 ) = EX (t1 ) X (t2 ) = 1× 1× p X ( t1 ), X ( t2 ) (1,1) + 1× (−1) × p X ( t1 ), X ( t2 ) (1, −1) + (−1) ×1× p X ( t1 ), X ( t2 ) (−1,1) + (−1) × (−1) × p X ( t1 ), X ( t2 ) (−1, −1) = e − λt2 cosh λ t1 cosh λ (t2 − t1 ) − e − λt2 cosh λt1 sinh λ (t2 − t1 ) − e − λt2 sinh λt1 sinh λ (t2 − t1 ) + e− λt2 sinh λt1 cosh λ (t2 − t1 ) = e − λt2 cosh λ t1 (cosh λ (t2 − t1 ) − sinh λ (t2 − t1 )) + e − λt2 sinh λ t1 (cosh λ (t2 − t1 ) −1 sinh λ (t2 − t1 )) = e − λt2 e − λ ( t2 −t1 ) (cosh λt1 + sinh λ t1 ) = e − λt2 e − λ ( t2 −t1 ) eλt1 = e −2 λ ( t2 −t1 ) Similarlrly for t1 > t2 RX (t1 , t2 ) = e −2 λ ( t1 −t2 ) ∴ RX (t1 , t2 ) = e −2 λ t1 −t2

Random Telegraph signal Consider a two-state random process {Y (t )} with the states Y (t ) = 1 and Y (t ) = −1. 1 1 and Y (t ) changes polarity with an Suppose P ({Y (0) = 1}) = and P ({Y (0) = −1}) = 2 2 equal probability with each occurrence of an event in a Poisson process of parameter λ . Such a random process {Y (t )} is called a random telegraph signal and can be expressed as Y (t ) = AX (t ) where { X (t )} is the semirandom telegraph signal and A is a random variable 1 1 independent of X (t ) with P ({ A = 1}) = and P ({ A = −1}) = . 2 2 Clearly, 1 1 EA = (−1) × + 1× = 0 2 2 and 1 1 EA2 = (−1) 2 × + 12 × = 1 2 2 Therefore, EY (t ) = EAX (t ) = EAEX (t ) ∵ A and X (t ) are independent =0 RY (t1 , t2 ) = EAX (t1 ) AX (t2 )

= EA2 EX (t1 ) X (t2 ) = e −2 λ t1 −t2

**Stationary Random Process
**

The concept of stationarity plays an important role in solving practical problems

involving random processes. Just like time-invariance is an important characteristics of many deterministic systems, stationarity describes certain time-invariant property of a random process. Stationarity also leads to frequency-domain description of a random process.

Strict-sense Stationary Process A random process { X (t )} is called strict-sense stationary (SSS) if its probability structure is invariant with time. In terms of the joint distribution function, { X (t )} is called SSS if

FX ( t1 ), X (t2 )..., X (tn ) ( x1 , x2 ..., xn ) = FX (t1 + t0 ), X ( t2 + t0 )..., X ( tn + t0 ) ( x1 , x2 ..., xn )

∀n ∈ N , ∀t0 ∈ Γ and for all choices of sample points t1 , t2 ,...tn ∈ Γ.

Thus the joint distribution functions of any set of random variables X ( t1 ), X ( t 2 ), ..., X ( t n ) does not depend on the placement of the origin of the time axis. This requirement is a very strict. Less strict form of stationarity may be defined. Particularly, if

FX ( t1 ), X ( t2 )..... X ( tn ) ( x1 , x2 .....xn ) = FX ( t1 +t0 ), X ( t2 +t0 )..... X ( tn +t0 ) ( x1 , x2 .....xn ) for n = 1, 2,.., k ,

then

**{ X (t )} is called kth order stationary.
**

• If { X (t )} is stationary up to order 1

FX ( t1 ) ( x1 ) = FX ( t1 + t0 ) ( x1 ) ∀t 0 ∈ T

**Let us assume t0 = −t1 . Then
**

FX ( t1 ) ( x1 ) = FX (0) ( x1 ) which is independent of time.

As a consequence EX (t1 ) = EX (0) = µ X (0) = constant • If { X (t )} is stationary up to order 2

FX ( t1 ), X ( t 2 ) ( x1 , x 2 .) = FX ( t1 + t0 ), X ( t 2 +t0 ) ( x1 , x 2 )

Put t 0 = −t 2

FX (t1 ), X (t2 ) ( x1 , x2 ) = FX ( t1 −t2 ), X (0) ( x1 , x2 ) This implies that the second-order distribution depends only on the time-lag t1 − t2 .

As a consequence, for such a process

RX (t1 , t2 ) = E ( X (t1 ) X (t2 ))

∞ ∞

=

−∞ −∞

∫ ∫xx

1 2

f X (0) X (t1 −t2 ) ( x1 , x2 )dx1 dx2

= RX (t1 − t2 ) Similarly, C X (t1 , t2 )= C X (t1 − t2 ) Therefore, the autocorrelation function of a SSS process depends only on the time lag t1 − t2 . We can also define the joint stationarity of two random processes. Two processes

{ X (t )} and {Y (t )} are called jointly strict-sense stationary {Z (t ) = X (t ) +

if their joint probability

**distributions of any order is invariant under the translation of time. A complex process
**

jY (t )} is called SSS if { X (t )} and {Y (t )} are jointly SSS.

Example An iid process is SSS. This is because, ∀n,

FX ( t1 ), X (t2 )..., X ( tn ) ( x1 , x2 ..., xn ) = FX ( t1 ) ( x1 ) FX ( t2 ) ( x2 )...FX ( tn ) ( xn ) = FX ( x1 ) FX ( x2 )...FX ( xn ) = FX ( t1 + t0 ), X (t2 +t0 )..., X ( tn + t0 ) ( x1 , x2 ..., xn )

Example The Poisson process is {N (t ), t ≥ 0} not stationary, because

EN (t ) = λ t

**which varies with time.
**

Wide-sense stationary process

It is very difficult to test whether a process is SSS or not. A subclass of the SSS process called the wide sense stationary process is extremely important from practical point of view.

**A random process { X (t )} is called wide sense stationary process (WSS) if
**

EX (t ) = µ X = constant and EX (t1 ) X (t2 ) = RX (t1 − t2 ) is a function of time lag t1 − t2 . (Equivalently, Cov( X (t1 ) X (t2 )) = C X (t1 − t2 ) is a function of time lag t1 − t2 )

Remark

**(1) For a WSS process { X (t )},
**

∴ EX 2 (t ) = RX (0) = constant var( X (t )=EX 2 (t ) − ( EX (t ))2 = constant CX (t1 , t2 ) = EX (t1 ) X (t2 ) − EX (t1 ) EX (t2 )

2 = RX (t2 − t1 ) − µ X

∴ CX (t1 , t2 ) is a function of lag (t2 − t1 ).

**(2) An SSS process is always WSS, but the converse is not always true.
**

Example: Sinusoid with random phase

Consider the random process { X (t )} given by X (t ) = A cos( w0 t + φ ) where A and w0 are constants and φ is uniformly distributed between 0 and 2π . • This is the model of the carrier wave (sinusoid of fixed frequency) used to analyse the noise performance of many receivers.

Note that

⎧ 1 0 ≤ φ ≤ 2π ⎪ f Φ (φ ) = ⎨ 2π ⎪0 otherwise ⎩

By applying the rule for the transformation of a random variable, we get 1 ⎧ -A ≤ x ≤ A ⎪ f X ( t ) ( x ) = ⎨ π A2 − x 2 ⎪0 otherwise ⎩ which is independent of t. Hence Note that

{ X (t )} is first-order stationary.

EX (t ) = EA cos( w0 t + φ ) 2π = ∫ A cos(w t + φ ) 2π dφ 0 0 1 = 0 which is a constant and RX (t1 . t2 ) = EX (t1 ) X (t2 ) = EA cos( w0t1 + φ ) A cos( w0t2 + φ ) = EA2 × cos( w0t1 + φ ) cos( w0t2 + φ ) 1 = EA2 × [c os( w0 (t1 + t2 ) + 2φ ) + c os( w0 (t1 − t2 )] 2 which will not be function of (t1 − t2 ) only. Example: Random binary wave Consider a binary random process { X (t )} consisting of a sequence of random pulses of duration T with the following features: 1 • Pulse amplitude AK is a random variable with two values p AK (1) = and 2 1 p AK (−1) = 2 . 2 = 2 Hence { X (t )} is wide-sense stationary. Example: Sinusoid with random amplitude Consider the random process { X (t )} given by X (t ) = A cos( w0 t + φ ) where φ and w0 are constants and A is a random variable. RX (t1 . Here. t2 ) = EX (t1 ) X (t2 ) = EA cos( w0 t1 + φ ) A cos( w0 t2 + φ ) A E[c os( w0 t1 + φ + w0 t2 + φ ) + c os( w0 t1 + φ − w0 t2 − φ )] 2 A2 = E[c os( w0 (t1 + t2 ) + 2φ ) + c os( w0 (t1 − t2 )] 2 A2 = c os( w0 (t1 − t2 ) which is a function of the lag t1 − t2 . EX (t ) = EA × cos( w0t + φ ) which is independent of time only if EA = 0.

. The start time of the pulse sequence can be any value between 0 to T.• • Pulse amplitudes at different pulse durations are independent of each other.a pulse of amplitude 1is used to transmit ‘1’ and a pulse of amplitude -1 is used to transmit ‘0’. Thus the random start time D (Delay) is uniformly distributed between 0 and T. Such waveforms are used in binary munication. Depending on the delay D. A realization of the random binary wave is shown in Fig. X (t ) = n =−∞ ∑ A rect n ∞ (t − nT − D) T For any t. the points t1 and t2 may lie on one or two pulse intervals. The random process X (t) can be written as. t2 ) let us consider the case 0 < t1 < t1 + τ < T . 1 1 + (−1)( ) = 0 2 2 1 1 EX 2 (t ) = 12 × + (−1) 2 = 1 2 2 EX (t ) = 1 × Thus mean and variance of the process are constants. To find the autocorrelation function RX (t1 . above.

.Case 1: X (t1 ) X (t2 ) = 1 Case 2: X (t1 ) X (t1 ) = (−1)(−1) = 1 Case 3: X (t1 ) X (t2 ) = −1 Thus.

⎧1 X (t1 ) X (t2 ) = ⎨ ⎩−1 or t2 < D < T 0 < D < t1 t1 < D < t2 ∴ RX (t1 . RX (τ ) -T1 T1 τ . t2 ) So that RX (t1 . RX (t2 . t1 ) = EX (t2 ) X (t1 ) = EX (t1 ) X (t2 ) = RX (t1 . t2 ) = 1 − t2 − t1 T t2 − t1 ≤ T For t2 − t1 > T . t1 and t2 are at different pulse intervals. t2 ) = EEX (t1 ) X (t2 ) | D = E ( X (t1 ) X (t2 ) | P(0 < D < t1 or t2 < D < T )) + E ( X (t1 ) X (t2 )) | t1 < D < t2 ). and we can write RX (τ ) = 1 − τ1 T τ ≤T The plot of RX (τ ) is shown below.P(t1 < D < t2 ) 1 1 t −t ⎛ t −t ⎞ = 1 × ⎜1 − 2 1 ⎟ + (1 × − 1 × ) 2 1 T ⎠ 2 2 T ⎝ t −t = 1− 2 1 T We also have. ∴ EX (t1 ) X (t2 ) = EX (t1 ) EX (t2 ) = 0 Thus the autocorrelation function for the random binary waveform depends on τ = t2 − t1 .

.....E ( X n ) ] '. X (t1 )... X (t n ) is jointly Gaussian with the joint density function given by 1 − X'C−1X X e 2 f X (t1 ).. ⎥ ⎟ CX = E ⎜ ⎢. For any positive integer n. C X (tn − t1 ) ⎤ ⎡C X (0) ⎢ ⎥ C (t − t ) C X (0). X (t2 ). ⎢ ⎥ ⎢ ⎥ ⎢.....xn ) = where C X = E ( X − µ X )( X − µ X ) ' ( 2π ) n det(CX ) and µ X = E ( X) = [ E ( X 1 ).... x2 ... for a Gaussian random process.Example Gaussian Random Process Consider the Gaussian process { X (t )} discussed earlier... X (tn ) ( x1 .. ⎥ ⎟ ⎜⎢ ⎢ X (t ) − µ ⎥ ⎢ X ( t ) − µ ⎥ ⎟ ⎜⎣ n X ⎦⎣ n X ⎦ ⎝ ⎠ C X (t2 − t1 )...xn ) depends on the time-lags.. If { X (t )} is WSS. X (t2 )... WSS implies strict sense stationarity... ⎥ ⎢.. then ⎡ EX (t1 ) ⎤ ⎡ µ X ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ EX (t2 ) ⎥ ⎢ µ X ⎥ ⎥ = ⎢. ⎥ ⎢ EX (t ) ⎥ ⎢ µ ⎥ n ⎦ ⎣ ⎣ X⎦ ⎛ X (t ) − µ ′⎞ X ⎤ ⎡ X (t1 ) − µ X ⎤ ⎟ ⎜⎡ 1 ⎜ ⎢ X (t 2 ) − µ X ⎥ ⎢ X ( t 2 ) − µ X ⎥ ⎟ ⎥⎢ ⎥ ⎟ ⎜⎢ ⎥ ⎢.. ⎢ ⎥⎢ ⎥ ⎟ ⎜ . X (tn ) ( x1 . ⎥ ⎢. ⎥ µ X = ⎢... X (t 2 ).. because this process is completely described by the mean and the autocorrelation functions. . E ( X 2 ).. ⎣ ⎦ We see that f X (t1 ).. C X (t2 − tn ) ⎥ =⎢ X 2 1 ⎢ ⎥ ⎢ ⎥ C X (0) ⎥ ⎢C X (tn − t1 ) C X (tn − t1 ). x2 . Thus.

± 2π ω . Rx (0) represents the average power of the signal. For a power signal x(t ). We shall discuss these properties next. Example Suppose x(t ) = A cos ω t. In this sense. The autocorrelation function of x(t) at lag τ is given by Rx (τ ) = lim 1 T ∫ A cos ω (t + τ ) A cos ω tdt T →∞ 2T −T A2 T = lim ∫ [cos(2ω t + τ ) + cos ωτ ]dt T →∞ 4T −T A2 cos ωτ = 2 We see that Rx (τ ) of the above periodic signal is also periodic and its maximum occurs when τ = 0. Particularly. The power of the signal is Rx (0) = A2 . Rx (0) = lim 1 T 2 ∫ x (t ) dt is the mean-square value. then Rx (0) is the average power delivered to the resistance. If x(t ) is a voltage T →∞ 2T −T waveform across a 1 ohm resistance.Properties Autocorrelation Function of a real WSS Random Process Autocorrelation of a deterministic signal Consider a deterministic signal x(t ) such that 0 < lim 1 T 2 ∫ x (t ) dt < ∞ T →∞ 2T − T Such signals are called power signals. 2 The autocorrelation of the deterministic signal gives us insight into the properties of the autocorrelation function of a WSS process. function is defined as Rx (τ ) = lim 1 T ∫ x (t + τ ) x (t ) dt T →∞ 2T −T the autocorrelation Rx (τ ) measures the similarity between a signal and its time-shifted version.± 2π ω . . etc.

For a discrete random sequence. The autocorrelation function is an important function charactersing a WSS random process. RX (τ ) is an RX ( −τ ) = R X (τ ). Thus. RX (τ ) ≤ RX (0). we can redefine a one-parameter autocorrelation function as R X (τ ) = EX (t + τ ) X (t ) If { X (t )} is a complex WSS process. If X (t) is a voltage signal applied across a 1 ohm resistance.Properties of the autocorrelation function of a WSS process Consider a real WSS process { X (t )} . 1. 2. RX (−τ ) = R* (τ ) X 3. We briefly describe them below. X (t + τ ) > ≤ X (t ) 2 2 X (t + τ ) 2 We have . RX (−τ ) = EX (t − τ ) X (t ) = EX (t ) X (t − τ ) = EX (t1 + τ ) X (t1 ) ( Substituting t1 = t − τ ) = RX (τ ) Remark For a complex process X(t). Thus. This follows from the Schwartz inequality < X (t ). then RX (τ ) = EX (t + τ ) X *(t ) where X * (t ) is the complex conjugate of X (t ). RX (0) = EX 2 (t ) ≥ 0. Since the autocorrelation function RX (t1 . even function of the time τ . RX (0) = EX 2 (t ) is the mean-square value of the process. It possesses some general properties. then RX (0) is the ensemble average power delivered to the resistance. t2 ) of such a process is a function of the lag τ = t1 − t2 . we can define the autocorrelation sequence similarly. For a real WSS process X (t).

RX (τ ) is a positive semi-definite function in the sense that for any positive integer n and real a j . If X (t ) is MS periodic. then R X (τ ) is also periodic with the same period. even and positive semidefinite. 5. Proof: Note that a real WSS random process { X (t )} is called mean-square periodic ( MS periodic) with a period T p if for every t ∈ Γ E ( X (t + Tp ) − X (t )) 2 = 0 ⇒ EX 2 (t + Tp ) + EX 2 (t ) − 2 EX (t + Tp ) X (t ) = 0 ⇒ RX (0) + RX (0) − 2 RX (Tp ) = 0 ⇒ RX (Tp ) = RX (0) Again . a j .2 RX (τ ) = {EX (t ) X (t + τ )}2 = EX 2 (t ) EX 2 (t + τ ) = RX (0) RX (0) ∴ RX (τ ) <RX (0) 4. i =1 j =1 ∑ ∑ ai a j RX (ti − t j )>0 n n Proof Define the random variable Y = ∑ ai X (ti ) j =1 n Then we have 0 ≤ EY 2 = ∑ ∑ ai a j EX (ti )X (t j ) i =1 j =1 n n n = ∑ ∑ ai a j RX (ti −t j ) i =1 j =1 n It can be shown that the sufficient condition for a function RX (τ ) to be the autocorrelation function of a real WSS process { X (t )} is that RX (τ ) be real.

We can write the crosscorrelation function . X (t ) = A cos( w0 t + φ ) where A and w0 are constants and 2π φ ~ U [0. Then τ →∞ τ →∞ lim RX (τ ) = µ 2 X Interpretation of the autocorrelation function of a WSS process The autocorrelation function R X (τ ) measures the correlation between two random variables X (t ) and X (t + τ ). Its autocorrelation ω0 function A2 cos ω 0τ 2π is periodic with the same period . 6. If R X (τ ) is periodic with period T p then RX (τ ) = X (t ) is MS periodic with a period T p . Such a signal has high frequency components. This property helps us in determining time period of a MS periodic random process. If R X (τ ) drops slowly. Later on we see that R X (τ ) is directly related to the frequency -domain representation of a WSS process. then the X (t ) and X (t + τ ) will be less correlated for large τ . 2 ω0 The converse of this result is also true.E (( X (t + τ + Tp ) − X (t + τ )) X (t )) 2 ≤ E ( X (t + τ + Tp ) − X (t + τ )) 2 EX 2 (t ) ⇒ ( RX (τ + Tp ) − RX (τ )) 2 ≤ 2( RX (0) − RX (Tp )) RX (0) ⇒ ( RX (τ + Tp ) − RX (τ )) 2 ≤ 0 ∴ RX (τ + Tp ) = RX (τ ) ∵ RX (0) = RX (Tp ) For example. Cross correlation function of jointly WSS processes If { X (t )} and {Y (t )} are two real jointly WSS random processes. the signal samples are highly correlated and such a signal has less high frequency components. This in turn means that the signal has lot of changes with respect to time. 2π ]. their cross-correlation functions are independent of t and depends on the time-lag. Suppose X (t ) = µ X + V (t ) where V (t ) is a zero-mean WSS process and lim RV (τ ) = 0. If R X (τ ) drops quickly with respect to τ . Fig. is MS periodic random process with a period .

then RXY (τ ) = EX (t + τ ) EY (t ) = µ X µY (iv) If X (t) and Y (t) is orthogonal process. RXY (τ ) = RYX (−τ ) (ii ) RXY (τ ) ≤ RX (0) RY (0) We have τ RXY (τ ) = EX (t + τ )Y (t ) 2 2 ≤ EX 2 (t + τ ) EY 2 (t ) using Cauch-Schwarts Inequality = RX (0) RY (0) ∴ RXY (τ ) ≤ RX (0) RY (0) Further.RXY (τ ) = EX (t + τ )Y (t ) The cross correlation function satisfies the following properties: (i ) RXY (τ ) = RYX (−τ ) This is because RXY (τ ) = EX (t + τ )Y (t ) = EY (t ) X (t + τ ) = RYX (−τ ) RYX (τ ) RXY (τ ) O Fig. RXY (τ ) = EX ( t + τ ) Y ( t ) = 0 . 1 RX (0) RY (0) ≤ ( RX (0) + RY (0) ) ∵ Geometric mean ≤ Arithmatic mean 2 1 ∴ RXY (τ ) ≤ RX (0) RY (0) ≤ ( RX (0) + RY (0) ) 2 (iii) If X (t) and Y (t) are uncorrelated.

We have Z (t ) = X (t ) + Y (t ) RZ (τ ) = EZ (t + τ ) Z (t ) = E[ X (t + τ ) + Y (t + τ )][ X (t ) + Y (t )] = RX (τ ) + RY (τ ) + RXY (τ ) + RYX (τ ) If X (t ) and Y (t ) are orthogonal processes.Example Consider a random process Z (t ) which is sum of two real jointly WSS random processes X(t) and Y(t).then RXY (τ ) = RYX (τ ) = 0 ∴ RZ (τ ) = RX (τ ) + RY (τ ) .

mean-square continuity. The continuity of the random process can be defined with the help of convergence and limits of a random process. t0 ) + RX ( t0 . and continuity in probability etc.m. We shall discuss the mean-square continuity and the elementary concepts of corresponding mean-square calculus.i. Proof: E ⎡ X ( t ) − X ( t0 ) ⎤ = E ( X 2 ( t ) − 2 X ( t ) X ( t 0 ) + X 2 ( t 0 ) ) ⎣ ⎦ 2 = R X ( t . the behavior of the RC network is described by a first order linear differential equation with the source voltage as the input and the capacitor voltage as the output. t ) − 2 RX ( t . t2 ) is continuous at (t0 . Mean-square continuity of a random process Recall that a sequence of random variables { X n } converges to a random variable X if lim E [ X n − X ] = 0 2 n →∞ and we write l. X ( t ) = X ( t0 ) or equvalently t →t0 lim E ⎡ X ( t ) − X ( t0 ) ⎤ = 0 ⎣ ⎦ t → t0 2 Mean-square continuity and autocorrelation function (1) A random process { X ( t )} is MS continuous at t0 if its auto correlation function RX ( t1 . t0 ). t0 ).i. For example.Continuity and Differentiation of Random Processes • We know that the dynamic behavior of a system is described by differential equations involving input to and output of the system. We can define continuity with probability 1. then . X n = X n →∞ A random process { X ( t )} is said to be continuous at a point t = t0 in the mean-square sense if l. Our aim is to extend these concepts to the ensemble of realizations and apply calculus to random process. t2 ) is continuous at (t0 . What happens if the input voltage to the network is a random process? Each realization of the random process is a deterministic function and the concepts of differentiation and integration can be applied to it. t0 ) If RX ( t1 . • We discussed the convergence and the limit of a random sequence.m.

.lim E ⎡ X ( t ) − X ( t0 ) ⎤ = lim RX ( t . Example Consider the random binary wave { X (t )} discussed in Eaxmple . X (t ) … 0 1 Tp Tp t −1 … The process has the autocorrelation function given by ⎧ τ τ ≤ Tp ⎪1 − RX (τ ) = ⎨ Tp ⎪0 otherwise ⎩ We observe that RX (τ ) is continuous at τ = 0. Therefore. RX (τ ) is continuous at all τ . This follows from the fact that ( E ⎡ X ( t ) − X ( t )⎤ ) ⎣ ⎦ 0 t →t0 2 ≤ E ⎡ ( X ( t ) − X ( t0 ) ) ⎤ ⎣ ⎦ 2 2 ⎡ ⎣ ∴ lim ⎣ E ⎡ X ( t ) − X ( t0 ) ⎤ ⎦ ≤ lim E ⎡ X ( t ) − X ( t0 ) ⎤ = 0 ⎦⎤ ⎣ ⎦ 2 t → t0 ∴ EX ( t ) is continuous at t0 . Atypical realization of the process is shown in Fig. t0 ) =0 (2) If { X ( t )} is MS continuous at t0 its mean is continuous at t0 . t ) − 2 RX ( t . The realization is a discontinuous function. t0 ) − 2 R X ( t0 . below. t0 ) ⎣ ⎦ 2 t →t0 t → t0 = R X ( t 0 . t0 ) + RX ( t0 . t0 ) + R X ( t 0 .

Then for each φ . t2 ) The m-s derivative of a random process X ( t ) at a point t ∈ Γ exists if ∂t1∂t2 exists at the point (t . the condition for existence of m-s derivative is ⎡ X ( t + ∆t1 ) − X ( t ) X ( t + ∆t2 ) − X ( t ) ⎤ − lim E ⎢ ⎥ =0 ∆t1 . t ) = α t Thus the autocorrelation function of a Wiener process is continuous everywhere implying that a Wiener process is m. 2π ]. Mean-square differentiability The random process { X ( t )} is said to have the mean-square derivative X ' ( t ) at a point t ∈ Γ.s.s. Therefore. continuous everywhere.S. t ) = α min ( t . X (t ) is differentiable.s. . ∆t2 →0 ∆t1 ∆t2 ⎣ ⎦ 2 Expanding the square and taking expectation results. We can similarly show that the Poisson process is m. In other words. the random process { X ( t )} has a m-s derivative X ' ( t ) if provided ⎡ X ( t + ∆t ) − X ( t ) ⎤ lim E ⎢ − X ' ( t )⎥ = 0 ∆t →0 ∆t ⎣ ⎦ 2 Remark (1) If all the sample functions of a random process X ( t ) is differentiable. derivative is X ′(t ) = − Aw0 sin( w0t + φ ) M.Example For a Wiener process { X ( t )} . Applying the Cauchy criterion. Derivative and Autocorrelation functions ∂ 2 RX ( t1 . the m. X ( t + ∆t ) − X ( t ) approaches X ' ( t ) in the mean square sense as ∆t ∆t → 0 . t2 ) = α min ( t1 . RX ( t1 . ∴ RX ( t . t ). t2 ) where α is a constant. Example Consider the random-phase sinusoid { X (t )} given by X (t ) = A cos( w0 t + φ ) where A and w0 are constants and φ ~ U [0. continuous everywhere. then the above condition is satisfied and the m-s derivative exists.

if X ( t ) is WSS. t ) − 2 RX ( t + ∆t1 . ∆t2 → 0 ∆t12 ∆t 2 2 ⎣ ⎦ ⎣ ⎦ ⎡ R ( t + ∆t1 . t ) ⎤ −2⎢ X ⎥ ∆t1∆t2 ⎣ ⎦ 2 Each of the above terms within square bracket converges to second partial derivative exists. t ) ⎤ = lim ⎢ X ⎥+⎢ ⎥ ∆t1 . ∆t2 →0 ∆t1 ∆t 2 ∂t1∂t2 ∂t1∂t2 ∂t1∂t2 ⎦ t =t . ⎜ ⎟ ∂t1 ⎝ dτ ∂t2 ⎠ d 2 RX (τ ) ∂τ dτ 2 ∂t1 =− d 2 RX (τ ) =− dτ 2 . t + ∆t2 ) + RX ( t .t 1 if the 2 =t ⎡ X ( t + ∆t1 ) − X ( t ) X ( t + ∆t2 ) − X ( t ) ⎤ ∂ 2 RX ( t1 . t + ∆t1 ) + RX ( t . we get ∂ 2 RX ( t1 . t2 ) ⎤ ⎥ ∂t1∂t2 ⎦ t =t . { X ( t )} is m-s differentiable at t ∈ Γ if ∂ 2 RX ( t1 . ∂ 2 RX ( t1 . ∂t1∂t2 Particularly. t ) − RX ( t . t + ∆t2 ) − RX ( t + ∆t1 . t ) ∈ Γ × Γ. t2 ) exists at (t .⎡ X ( t + ∆t1 ) − X ( t ) X ( t + ∆t2 ) − X ( t ) ⎤ − lim E ⎢ ⎥ ∆t1 . ∆t2 →0 ∆t1 ∆t 2 ⎣ ⎦ ⎡ R ( t + ∆t1 . RX ( t1 . t2 ) = RX ( t1 − t2 ) Substituting t1 − t2 = τ . t2 ) ∂ 2 RX ( t1 . t2 ) ⎤ ∴ lim E ⎢ − = + −2 ⎥ ⎥ ∆t1 . t2 ) ∂ 2 RX ( t1 − t2 ) = ∂t1∂t2 ∂t1∂t2 = ∂ ⎛ dRX (τ ) ∂ ( t1 − t2 ) ⎞ . t ) − 2 RX ( t + ∆t2 . t ) ⎤ ⎡ RX ( t + ∆t2 . t2 ) ∂ 2 RX ( t1 .t =t ⎣ ⎦ 1 2 2 =0 Thus. t + ∆t2 ) + RX ( t .

t2 ) ∴ does not exist at (t1 = 0. t2 ) = ⎨ 2 ⎩0 other wise ⎧α if t2 < 0 ∂RX ( 0. RX ( t1 . . Mean and Autocorrelation of the Derivative process We have. ⎧α t if t2 < 0 ∴ RX ( 0. t2 ) ⎪ ∴ = ⎨0 if t2 > 0 ∂t1 ⎪does not exist if if t = 0 2 ⎩ 2 ∂ RX ( t1 . a WSS process X ( t ) is m-s differentiable if RX (τ ) has second derivative at τ = 0. differentiable nowhere. Therefore. { X ( t )} is not mean-square differentiable. t2 ) = α min ( t1 . t2 ) where α is a constant.Therefore.. { X ( t )} is not mean-square differentiable. Example Consider a WSS process { X ( t )} with autocorrelation function RX (τ ) = exp ( −a τ ) RX (τ ) does not have the first and second derivative at τ = 0. Example For a Wiener process { X ( t )} . t2 = 0) ∂t1∂t2 Thus a Wiener process is m.s. Example The random binary wave { X ( t )} has the autocorrelation function ⎧ τ ⎪1 − RX (τ ) = ⎨ Tp ⎪0 ⎩ τ ≤ Tp otherwise RX (τ ) does not have the first and second derivative at τ = 0.

EX ( t1 ) X ' ( t2 ) = RXX ' ( t1 . t2 + ∆t2 ) − RX ( t1 . t2 ) = EX ( t1 ) lim = lim = lim = X ( t 2 + ∆t 2 ) − X ( t 2 ) ∆t2 →0 ∆t2 ∆t2 E ⎡ X ( t1 ) X ( t2 + ∆t2 ) − X ( t1 ) X ( t2 ) ⎤ ⎣ ⎦ RX ( t1 . t2 ) EX ' ( t1 ) X ' ( t2 ) = ∂t1∂t2 For a WSS process EX ( t1 ) X ' ( t2 ) = and EX ′ ( t1 ) X ′ ( t2 ) = RX ′ ( t1 − t2 ) = ∴ var( X ′ ( t ) = d 2 RX (τ ) dτ 2 d 2 RX (τ ) dτ 2 τ =0 dR (τ ) ∂ RX ( t1 − t2 ) = X ∂t dτ . t2 ) ∂t2 Similarly we can show that ∂ 2 RXX ( t1 .EX ' ( t ) = E lim X ( t + ∆t ) − X ( t ) ∆t EX ( t + ∆t ) − EX ( t ) = lim ∆t →0 ∆t µ ( t + ∆t ) − µ X ( t ) = lim X ∆t →0 ∆t = µ X '(t ) ∆t →0 For a WSS process EX ' ( t ) = µ X ' ( t ) = 0 as µ X ( t ) = constant. t2 ) ∆t2 ∆t2 →0 ∆t2 →0 ∂RX ( t1 .

.. integrable.τ ) dτ dτ X 1 2 1 t t 2 exists.Mean Square Integral Recall that the definite integral (Riemannian integral) of a function x ( t ) over the interval [t0 . t0 t n →∞ ∆k →0 ∑ X (τ )∆ k =0 k n −1 k Existence of M. < tn −1 < tn = t ∆ k = tk +1 − tk and τ k ∈ [tk −1 .. are partitions on the interval [ t0 . the m-s integral can be similarly defined as the process {Y ( t )} given by Y ( t ) = ∫ X (τ )dτ = l... t0 t0 ∫ ∫ R (τ .. continuous.. t ] is defined as the limiting sum given by t0 ∫ x (τ )dτ = t n →∞ ∆k →0 lim ∑ x (τ ) ∆ k =0 k n −1 k Where t0 < t1 < .S... Integral • It can be shown that the sufficient condition for the m-s integral exist is that the double integral t0 ∫ X (τ )dτ t to • If { X ( t )} is M.S.m.S.. tk ] . then the above condition is satisfied and the process is M. t ] and For a random process { X ( t )} . ......i.

S. Example The random binary wave { X ( t )} has the autocorrelation function .Mean and Autocorrelation of the Integral of a WSS process We have EY ( t ) = E ∫ X (τ )dτ t0 t = ∫ EX (τ )dτ t0 t t = ∫ µ X dτ t0 Therefore. if µ X ≠ 0. integral of a random process has physical importance – the output of an integrator due to stationary noise rises unboundedly. {Y ( t )} is necessarily non-stationary. Fig. t2 ) = EY (t1 )Y (t2 ) t1 t2 = µ X (t − t0 ) = E ∫ ∫ X (τ 1 ) X (τ 2 ) dτ 1dτ 2 t0 t0 = ∫ ∫ EX (τ 1 ) X (τ 2 ) dτ 1dτ 2 t0 t0 t1 t2 = ∫ ∫ RX (τ 1 − τ 2 ) dτ 1dτ 2 t0 t0 t1 t2 which is a function of t1 and t2 . Thus the integral of a WSS process is always non-stationary. RY ( t1 . (a) Realization of a WSS process X (t ) (b) corresponding integral Y (t ) Remark The nonstatinarity of the M.

RX (τ ) is continuous at τ = 0 implying that { X ( t )} is M.⎧ τ ⎪1 − RX (τ ) = ⎨ Tp ⎪0 ⎩ τ ≤ Tp otherwise { X ( t )} is mean-square integrable. Therefore.S. . continuous.

g ( X (t )) T = Similarly. the time-average of a function g ( X n ) of a continuous random process { X n } is defined by g(X n ) N = N 1 ∑ g( Xi ) 2 N + 1 i =− N The above definitions are in contrast to the corresponding ensemble average defined by Eg ( X (t )) = ∫ g ( x) f X ( t ) ( x)dx −∞ ∞ for continuous case for discrete case = i∈RX ( t ) ∑ g ( xi ) p X (t ) ( xi ) The following time averages are of particular interest: (a) Time-averaged mean . Another important average used in electrical engineering is the rms value given by 1 T 2 xrms T = lim x (t )dt T →∞ 2T ∫−T Can µ x T and xrms T represent µ X and EX 2 (t ) respectively ? To answer such a question we have understand various time averages and their properties. For example we can compute the time-mean of a single realization of the random process by the formula 1 T x (t )dt T →∞ 2T ∫−T which is constant for the selected realization. Time averages of a random process The time-average of a function g ( X (t )) of a continuous random process { X ( t )} is defined by 1 T g ( X (t )) dt 2T ∫−T where the integral is defined in the mean-square sense.Time averages and Ergodicity Often we are interested in finding the various ensemble averages of a random process { X ( t )} by means of the corresponding time averages determined from single realization of the random process. µ x µx T = lim T represents the dc value of x(t ).

i ≠ j j =− N ⎦ ( 2 N + 1) ⎣i =− N 2 If the samples X − N .. Mean and Variance of the time averages Let us consider the simplest case of the time averaged mean of a discrete-time WSS random process { X n } given by N 1 ∑ Xi 2 N + 1 i =− N The mean of µ X N µX N = E µX N =E N 1 ∑ Xi 2 N + 1 i =− N N 1 = ∑ EX i 2 N + 1 i =− N =µ X and the variance E µX ( N − µX ) 2 N ⎛ 1 ⎞ = E⎜ ∑ Xi − µX ⎟ ⎝ 2 N + 1 i =− N ⎠ 2 N ⎛ 1 ⎞ = E⎜ ∑ (Xi − µX )⎟ ⎝ 2 N + 1 i =− N ⎠ N N N 1 ⎡ ⎤ 2 = ∑ E( X i − µ X ) + 2 ∑ ∑ E ( X i − µ X )( X j − µ X ) ⎥ 2 ⎢ i =− N . We shall further assume that the random processes { X ( t )} and { X n } are WSS. determination of these distribution functions is difficult and we shall discuss the behaviour of these averages in terms of their mean and variances. X 1 .... .µX µX T = = 1 2T ∫ T −T X (t )dt (continuous case) (discrete case) N N 1 ∑ Xi 2 N + 1 i =− N (b) Time-averaged autocorrelation function RX (τ ) R X [ m] T = = 1 2T ∫ T −T X (t ) X (t + τ )dt (continuous case) (discrete case) N N 1 ∑ X i X i+m 2 N + 1 i =− N Note that. X N are uncorrelated... However.... X − N +1 . X 2 . g ( X (t )) T and g ( X n ) N are functions of random variables and are governed by respective probability distributions..

the above double integral is converted to a single integral as follows: Therefore. we conclude that µ X Let us consider the time-averaged mean for the continuous case. ⎯⎯⎯ µ X → From the above result.E ( µX N − µX ) 2 N ⎛ 1 ⎞ = E⎜ ∑ Xi − µX ⎟ ⎝ 2 N + 1 i =− N ⎠ N 1 ⎡ E ( X i − µ X )2 ⎤ = 2 ⎢ ∑ ⎥ ⎦ ( 2 N + 1) ⎣i =− N 2 = 2 σX 2N +1 N →∞ N We also observe that lim E ( µ X − µX ) 2 =0 N M .S . Putting t1 − t2 = τ and noting that the differential area between t1 − t2 = τ and t1 − t2 = τ + dτ is (2T − τ )dτ . We have 1 T µX T = X (t )dt 2T ∫−T 1 T ∴ E µX T = EX (t )dt 2T ∫−T 1 T µ X dt = µ X = 2T ∫−T and the variance 2 2 ⎛ 1 T ⎞ E ( µX T − µX ) = E ⎜ ∫−T X (t )dt − µ X ⎟ ⎝ 2T ⎠ ⎛ 1 T ⎞ = E⎜ ∫−T ( X (t ) − µ X )dt ⎟ ⎝ 2T ⎠ 1 T T = 2 ∫−T ∫−T E ( X (t1 ) − µ X )( X (t2 ) − µ X )dt1dt2 4T 1 T T = 2 ∫−T ∫−T C X (t1 − t2 )dt1dt2 4T The above double integral is evaluated on the square area bounded by t1 = ±T and t2 = ±T . We divide this square region into sum of trapizoida strips parallel to t1 − t2 = 0. 2 .

Such a principle is the ergodicity principle to be discussed below: Mean ergodic process A WSS process { X ( t )} is said to be ergodic in mean. → . ⎯⎯⎯ µ X as T → ∞ . then a time-average computed from a large realization can be used as the value for the corresponding ensemble average. if µX T M .S .E ( µX T − µX ) = 2 1 T T ∫−T ∫−T C X (t1 − t2 )dt1dt2 4T 2 1 2T = 2 ∫−2T (2T − τ )C X (τ )dτ 4T τ ⎞ 1 2T ⎛ = ∫−2T ⎜1 − ⎟ C X (τ )dτ 2T ⎝ 2T ⎠ t1 t1 − t2 = τ + dτ t1 − t2 = 2T T t1 − t2 = τ dτ −T T t2 −T t1 − t2 = −2T Ergodicity Principle If the time averages converge to the corresponding ensemble averages in the probabilistic sense.

Further.The process has the autocovariance function for τ ≤ Tp given by ⎧ τ ⎪1 − C X (τ ) = ⎨ Tp ⎪0 ⎩ τ ≤ Tp otherwise Here . 2T ⎡ τ ⎤ 1 1 ∫T CX (τ ) ⎢1 − 2T ⎥dτ ≤ 2T 2T −2 ⎣ ⎦ 2T −2T ∫ C X (τ ) dτ Therefore. lim E µ X T = µ X T →∞ and T →∞ lim var µ X T 2 =σX We have earlier shown that E µX T = µX and 2T ⎡ τ ⎤ 1 C X (τ ) ⎢1 − ⎥dτ T 2T −∫T ⎣ 2T ⎦ 2 Therefore. the condition for ergodicity in mean is var µ X = 2T ⎡ τ ⎤ 1 lim CX (τ ) ⎢1− ⎥dτ = 0 T →∞ 2T ∫ ⎣ 2T ⎦ −2T If C X (τ ) decreases to 0 for τ > τ 0 .Thus for a mean ergodic process { X ( t )} . a sufficient condition for mean ergodicity is 2T −2T ∫ C X (τ ) dτ < ∞ Example Consider the random binary waveform { X (t )} discussed in Eaxmple . then the above condition is satisfied.

Thus X (t ) will be autocorrelation ergodic if 1 lim T →∞ 2T 2T −2T ∫ ⎜1 − 2T ⎟ C (τ )dτ → 0 ⎝ ⎠ z ⎛ τ ⎞ . Autocorrelation ergodicity T 1 RX (τ ) T = X (t ) X (t + τ )dt 2T −∫ T If we consider Z (t ) = X (t ) X (t + τ ) so that. Hence found the condition for autocorrelation ergodicity of a jointly Gaussian process. Thus { X (t )} will be autocorrelation ergodic if 1 lim T →∞ 2T where 2T −2T ∫ ⎜1 − 2T ⎟C ⎝ ⎠ ⎛ τ1 ⎞ Z (τ 1 )dτ 1 = 0 CZ (τ 1 ) = EZ (t ) Z (t − τ 1 ) − EZ (t ) EZ (t − τ 1 ) 2 = EX (t ) X (t − τ ) X (t − τ ) X (t − τ − τ 1 ) − RX (τ ) Involves fourth order moment.2T −2T ∫ C X (τ ) dτ = 2 ∫ C X (τ ) dτ 0 2T ⎛ τ ⎞ = 2 ∫ ⎜1 − ⎟dτ ⎜ T ⎟ p ⎠ 0 ⎝ ⎛ Tp3 Tp2 ⎞ = 2 ⎜ Tp + 2 − ⎟ ⎜ 3Tp Tp ⎟ ⎝ ⎠ 2Tp = 3 Tp ∴ ∫ C X (τ ) dτ < ∞ −∞ ∞ Hence { X (t )} is not mean ergodic. µ Z = RX (τ ) Then { X (t )} will be autocorrelation ergodic if {Z (t )} is mean ergodic.

both µx → 0 and Rx (τ ) → Remark A random process { X (t )} is ergodic if its ensemble averages converges in the M. Thus the random-phased sinusoid is ergodic in both mean and autocorrelation.the ensemble averages of all orders of such a process are independent of time. The converse is not true.2 Now CZ (τ ) = EZ (t ) Z (t + τ ) − RX (τ ) Hence. 2π ] is a random variable. both the time-averaged mean and the time-averaged autocorrelation function converges to the corresponding ensemble averages. X (t) will be autocorrelation ergodic 1 If lim T →∞ 2T ⎛ τ ⎞ 2 ⎜1 − ∫T ⎝ 2T ⎟ ( Ez (t ) z (t + α ) − RX (τ ) ) dα → 0 ⎠ −2 2T Example Consider the random –phased sinusoid given by X (t ) = A cos( w0 t + φ ) where A and w0 are constants and φ ~ U [0. This implies that an ergodic process is necessarily stationary in the strict sense. sense to the corresponding time averages. . This is a stronger requirement than stationarity. We see that as T → ∞. RX (τ ) = µx T = 1 T A cos( w0t + φ1 )dt 2T ∫−T 1 A sin( w0T ) = Tw0 = and Rx (τ ) T 1 2T T −T T ∫ A cos(w t + φ ) A cos(w (t + τ ) + φ )dt 0 1 0 1 0 0 1 A2 = 4T = −T ∫ [cos w τ + A cos(w (2t + τ ) + 2φ )]dt A2 cos w0τ A2 sin( w0 (2T + τ )) + 2 4w0T A2 cos w0τ T T 2 For each realization.there are stationary random processes which are not ergodic.S. We have earlier proved that this process is WSS with µ X = 0 and A2 cos w0τ 2 For any particular realization x(t ) = A cos( w0t + φ1 ).

Following Fig. shows a hierarchical classification of random processes. X (t ) = a X (t ) = 3 a 4 X (t ) = 1 a 2 1 a 4 X (t ) = X (t ) = 0 t . Random processes WSS random process WSS Process Ergodic Processes Example Suppose X (t ) = C where C ~ U [0 a]. below. { X (t )} is a family of straight line as illustrated in Fig.

Here µ X = . Hence { X (t )} is T →∞ 2T −T not mean ergodic.a and 2 1 T µ X T = lim ∫ Cdt is a different constants for different realizations.

we can define the Fourier transform and the inverse Fourier transform as follows: G ( f ) = ∫ g (t )e − j 2π ft dt −∞ ∞ and g (t ) = ∫ G ( f )e j 2π ft df −∞ ∞ The Fourier transform is a linear transform and has many interesting properties. that is. particularly a WSS process? The answer is the spectral representation of WSS random process. We can define the Fourier transform also in terms of the frequency variable f = ω . . the energy of the energy of the signal f (t ) is related by the Parseval’s theorem T −T ∫g ∞ 2 (t)dt = −∞ ∫ G( ω ) 2 dω How to have the frequency-domain representation of a random process. The Fourier transform G (ω ) exists if g (t ) satisfies the following Dirichlet conditions 1) g (t ) is absolutely integrable. The signal g (t ) can be obtained from G (ω ) by the inverse Fourier transform (IFT) as follows: g (t ) = IFT (G (ω )) = 1 ∞ jω t ∫ G (ω )e dω 2π −∞ The existence of the inverse Fourier transform implies that we can represent a function g (t ) as a superposition of continuum of complex sinusoids. Wiener (1930) and Khinchin (1934) independently discovered it. Einstein (1914) also used the concept. Particularly. In this 2π case. G (ω ) has the unit of volt/radian. The function G (ω ) is also called the spectrum of g (t ).Spectral Representation of a Wide-sense Stationary Random Process Recall that the Fourier transform (FT) of a real signal g (t ) is given by ∞ G (ω ) = FT ( g (t ) = ∫ g (t )e − jω t dt −∞ = cos ω t + j sin ω t is the complex exponential. where e ∞ −∞ − jω t ∫ g (t ) dt < ∞ 2) g (t ) has only a finite number of discontinuities in any finite interval 3) g (t ) has only finite number of maxima and minima within any finite interval. The Fourier transform G (ω ) is the strength of the sinusoids of frequency ω present in the signal. If g (t ) is a voltage signal measured in volt.

the power associated with X T (t ) is 1 2T 1 ∫T X (t)dt = 2T − 2 T T ∞ 2 −∞ ∫ FTX T ( ω ) d ω and The average power is given by . Recall that the power of a WSS process X (t ) is a constant and given by EX 2 (t ). 2 Therefore. This difficulty is avoided by a frequency-domain representation of X (t ) in terms of the power spectral density (PSD). But the very notion of stationarity demands that the realization does not decay with time and the first condition of Dirichlet is violated. As t → ∞.Difficulty in Fourier Representation of a Random Process We cannot define the Fourier transform of a WSS process integral FT ( X (t )) = ∫ X (t )e − jωt dt −∞ ∞ X (t ) by the mean-square The existence of the above integral would have implied the existence the Fourier transform of every realization of X (t ). The PSD denotes the distribution of this power over frequencies. Define the mean-square integral FTX T (ω ) = T −T ∫X T (t)e − jω t dt Applying the Pareseval’s theorem we find the energy of the signal T −T ∫ ∞ X T2 (t)dt = −∞ ∫ FTX T ( ω ) dω . X T (t ) will represent the random process X (t ). Definition of Power Spectral Density of a WSS Process Let us define X T (t) = X(t) =0 -T < t < T otherwise t ) 2T = X (t )rect ( where rect ( t ) is the unity-amplitude rectangular pulse of width 2T centering the 2T origin.

T ∞ ∞ FTX T ( ω ) 1 1 2 E ∫ X T (t)dt = E ∫ FTX T ( ω ) dω = E ∫ dω 2T −T 2T −∞ 2T −∞ 2 2 where E FTX T (ω ) 2T 2 is the contribution to the average power at frequency ω and represents the power spectral density for X T (t ). the PSD S X (ω ) of the process X (t ) is defined in the limiting sense by S X (ω ) = lim E FTX T (ω ) 2T 2 T →∞ Relation between the autocorrelation function and PSD: Wiener-Khinchin-Einstein theorem We have E | FTX T (ω ) |2 FTX T (ω ) FTX T * (ω ) =E 2T 2T T T 1 − jω t + jω t = ∫T −∫T EX T (t1 )X T (t2 )e 1 e 2 dt1dt2 2T − = 1 2T T T −T −T ∫ ∫R X (t1 − t2 )e − jω (t1 −t2 ) dt1dt2 t1 t1 − t2 = τ + dτ t1 − t2 = 2T T t1 − t2 = τ dτ −T T t2 −T t1 − t2 = −2T . Therefore. the left-hand side in the above expression represents the average power of X (t ). As T → ∞.

The differential area in terms of τ is given by the shaded area and equal to (2T − | τ |)dτ .Note that the above integral is to be performed on a square region bounded by t1 = ±T and t2 = ±T . E FTX T (ω ) X T * (ω ) 1 2T − jωτ = ∫ RX (τ )e (2T − | τ |)dτ 2T 2T −2T 2T |τ | = ∫ Rx (τ )e − jωτ (1 − )dτ −2 T 2T ∞ −∞ If R X (τ ) is integrable then the right hand integral converges to ∫ RX (τ )e − jωτ dτ as T → ∞. Substitute t1 − t 2 = τ so that t 2 = t1 − τ is a family of straight lines parallel to t1 − t2 = 0. ∴ lim T →∞ E FTX T (ω ) 2T 2 = ∫ RX (τ )e − jωτ dτ −∞ ∞ As we have noted earlier. Thus S X (ω ) = ∞ −∞ ∫R X (τ )e − jωτ dτ and using the inverse Fourier transform 1 RX (τ ) = 2π ∞ −∞ ∫ S X (ω )e jωτ dw Example 1 The autocorrelation function of a WSS process X (t ) is given by RX (τ ) = a 2 e −b τ b>0 Find the power spectral density of the process. Therefore. the power spectral density S X (ω ) = lim E FTX T (ω ) 2T 2 T →∞ is the contribution to the average power at frequency ω and is called the power spectral density X (t ) . . The double integral is now replaced by a single integral in τ .

.S ∞ − jωτ (ω ) = ∫ R (τ )e dτ X x −∞ ∞ −b τ − jωτ e dτ = ∫ a2e −∞ 0 ∞ − jωτ − jωτ dτ + ∫ a 2 e−bτ e dτ = ∫ a 2 ebτ e 0 −∞ a2 a2 = + b − jω b + jω 2a 2 b = b2 + ω 2 The autocorrelation function and the PSD are shown in Fig.

Φ ~ U ( 0. Rx (τ ) = EX (t + τ ) X (t ) = E ( A + B sin (ω c (t + τ ) + Φ ))( A + B sin (ω c t + Φ )) = A2 + B2 cos ω cτ 2 ∴ S X (ω ) = A2δ (ω ) + B2 ( δ (ω + ω c ) + δ (ω − ω c ) ) 4 where δ (ω ) is the Dirac Delta function. S X (ω ) A2 B2 2 −ω c o B2 2 ωc ω Example 3 PSD of the amplitude-modulated random-phase sinusoid X (t ) = M (t ) cos ( ω ct + Φ ) . Find RX (τ ) and S X (ω ). ⇒ RX (τ ) = E M (t + τ ) cos (ω c (t + τ ) + Φ ) M (t ) cos (ω cτ + Φ ) = E M (t + τ ) M (t ) E cos (ω c (t + τ ) + Φ ) cos (ω cτ + Φ ) ( Using the independence of M (t ) and the sinusoid) = RM (τ ) A2 cos ω cτ 2 ∴ S X (ω ) = A2 ( SM (ω + ωc ) + SM (ω − ωc ) ) 4 where S M (ω ) is the PSD of M(t) .Example: Suppose X (t ) = A + B sin (ω c t + Φ ) where A is a constant bias and Φ ~ U [0. . 2π ]. 2π ) where M(t) is a WSS process independent of Φ.

S M (ω ) ω S X (ω ) −ω c o ωc ω Example 4 The PSD of a noise process is given by N0 ω ω ± ωc ≤ 2 2 Otherwise = 0 Find the autocorrelation of the process. S N (ω ) = 1 ∴ RN (τ ) = 2π = ∞ −∞ ∫ S (ω ) N e jωτ dω No cos ωτ dτ 2 1 ×2× 2π ωc + ω 2 ωc − ∫ω 2 = No 2π No 2π = ⎡ ⎛ ω⎞ ω ⎞⎤ ⎛ ⎢ sin ⎜ ωc + 2 ⎟ − sin ⎜ ωc − 2 ⎟ ⎥ ⎠ ⎝ ⎠⎥ ⎢ ⎝ τ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ sin ωτ 2 cos ω o τ ωτ 2 .

The average power of a random process X (t ) is EX 2 (t) = R X (0) = 1 ∞ ∫ S X (ω )dw 2π −∞ The average power in the band [ω1 . it shares the properties of the Fourier transform. Here we discuss important properties of S X (ω ). ω 2 ] is 2 ∫ S X (ω )d ω ω1 ω2 .Properties of the PSD S X (ω ) being the Fourier transform of RX (τ ).

Therefore. . ω 2 ] • If { X (t )} is real.Average Power in the frequency band [ω1 . ∞ S X (ω ) = = = −∞ ∞ ∫R X (τ )e − jωτ dτ (τ )(cos ωτ + j sin ωτ )dτ −∞ ∞ ∫R X −∞ ∫R ∞ 0 X (τ ) cos ωτ dτ = 2∫ RX (τ ) cos ωτ dτ Thus S X (ω ) is a real and even function of ω . R X (τ ) is a real and even function of τ .

S X ( w) = limT →∞ E X T (ω ) is always non-negative. If { X (t )} is M. Thus the non-negative definite property of an autocorrelation function can be tested through its power spectrum. . 3) Recall that a periodic function has the Fourier series expansion. Thus 2T 2 • If { X (t )} has a periodic component. Remark 1) The function SX (ω) is the PSD of a WSS process { X (t )} if and only if ∞ SX (ω) is a non-negative. R X (τ ) is periodic and so S X (ω ) will have impulses.S. periodic we can have an equivalent Fourier series expansion { X (t )} .• From the definition S X (ω ) ≥ 0. real and even function of ω and −∞ ∫ SX (ω) < ∞ 2) The above condition on SX (ω) also ensures that the corresponding autocorrelation function R X (τ ) is non-negative definite.

RZ (τ ) = RX (τ ) + RY (τ ) + RXY (τ ) + RYX (τ ) If we take the Fourier transform of both sides. Definition of Cross Power Spectral Density Given two real jointly WSS random processes X(t) and Y(t).Cross power spectral density Consider a random process Z (t ) which is sum of two real jointly WSS random processes X(t) and Y(t). As we have seen earlier. it can be shown that S XY (ω ) = ∫ RXY (τ )e − jωτ dτ −∞ ∞ and SYX (ω ) = ∫ RYX (τ )e − jωτ dτ −∞ ∞ .) stands for the Fourier transform. the cross power spectral density (CPSD) S XY (ω ) is defined as S XY (ω ) = lim E T →∞ ∗ FTX T (ω ) FTYT (ω ) 2T where FTX T (ω ) and FTYT (ω ) are the Fourier transform of the truncated processes t t ) and YT (t ) = Y(t)rect ( ) respectively and 2T 2T ∗ X T (t ) = X(t)rect ( denotes the complex conjugate operation. We can similarly define SYX (ω ) by SYX (ω ) = lim E T →∞ FTYT∗ (ω ) FTX T (ω ) 2T Proceeding in the same way as the derivation of the Wiener-Khinchin-Einstein theorem for the WSS process. Thus we see that S Z (ω ) includes contribution from the Fourier transform of the crosscorrelation functions RXY (τ ) and RYX (τ ). S Z (ω ) = S X (ω ) + SY (ω ) + FT ( RXY (τ )) + FT ( RYX (τ )) where FT (. These Fourier transforms represent cross power spectral densities.

The cross-correlation function and the cross-power spectral density form a Fourier transform pair and we can write RXY (τ ) = ∫ S XY (ω )e jωτ dω −∞ ∞ and RYX (τ ) = ∫ SYX (ω )e jωτ dω −∞ ∞ Properties of the CPSD The CPSD is a complex function of the frequency ω. Some properties of the CPSD of two jointly WSS processes X(t) and Y(t) are listed below: (1) * S XY (ω ) = SYX (ω ) Note that RXY (τ ) = RYX (−τ ) ∴ S XY (ω ) = ∫ RXY (τ )e − jωτ dτ −∞ ∞ ∞ = ∫ RYX ( −τ )e − jωτ dτ −∞ ∞ = ∫ RYX (τ )e jωτ dτ −∞ * = SYX (ω ) (2) Re( S XY (ω )) is an even function of ω and Im( S XY (ω )) is an odd function of ω We have S XY (ω ) = ∫ RXY (τ )(cos ωτ + j sin ωτ )dτ −∞ ∞ ∞ = ∫ RXY (τ ) cos ωτ dτ + j ∫ RXY (τ )sin ωτ )dτ −∞ −∞ ∞ = Re( S XY (ω )) + j Im( S XY (ω )) where Re( S XY (ω )) = ∫ RXY (τ )cos ωτ dτ is an even function of ω and −∞ ∞ Im( S XY (ω )) = ∫ RXY (τ )sin ωτ dτ is an odd function of ω and −∞ ∞ (3) X(t) and Y(t) are uncorrelated and have constant means. then S XY (ω ) = SYX (ω ) = µ X µY δ (ω ) Observe that .

then S XY (ω ) = SYX (ω ) = 0 If X(t) and Y(t) are orthogonal. RXY (τ ) = EX (t + τ )Y (t ) =0 = RXY (τ ) ∴ S XY (ω ) = SYX (ω ) = 0 (5) The cross power PXY between X(t) and Y(t) is defined by PXY = lim 1 T E ∫ X (t )Y (t )dt T →∞ 2T −T Applying Parseval’s theorem.RXY (τ ) = EX (t + τ )Y (t ) = EX (t + τ ) EY (t ) = µ X µY = µY µ X = RXY (τ ) ∴ S XY (ω ) = SYX (ω ) = µ X µY δ (ω ) (4) If X(t) and Y(t) are orthogonal. we get 1 T E ∫ X (t )Y (t )dt T →∞ 2T −T 1 ∞ E ∫ X T (t )YT (t )dt = lim T →∞ 2T −∞ 1 1 ∞ * E = lim ∫ FTX T (ω ) FTYT (ω )dω T →∞ 2T 2π −∞ * ∞ EFTX T (ω ) FTYT (ω ) 1 dω = ∫ lim 2π −∞ T →∞ 2T 1 ∞ = ∫ S XY (ω )dω 2π −∞ 1 ∞ ∴ PXY = ∫ S XY (ω )dω 2π −∞ PXY = lim Similarly. PYX 1 ∞ ∫ SYX (ω )dω 2π −∞ 1 ∞ * = ∫ S XY (ω )dω 2π −∞ * = PXY = .

... PZ (ω ) = PX (ω ) + PY (ω ) + PXY (ω ) + PYX (ω ) Remark • PXY (ω ) + PYX (ω ) is the additional power contributed by X (t ) and Y (t ) to the resulting power of X (t ) + Y (t ) • If X(t) and Y(t) are orthogonal. n = 0. Assume g[ n] to be obtained by sampling a continuous-time signal g (t ) at an uniform interval T such that g[ n] = g ( nT ). the power of the sum of the processes is equal to the sum of respective powers. ±1. The discrete-time Fourier transform (DTFT) of the signal g[n] is defined by G (ω ) = ∑ g[ n]e − jω n n =−∞ ∞ .Example Consider the random process Z (t ) = X (t ) + Y (t ) discussed in the beginning of the lecture. RZ (τ ) = RX (τ ) + RY (τ ) + RXY (τ ) + RYX (τ ) Taking the Fourier transform of both sides. ±2. Power spectral density of a discrete-time WSS random process Suppose g [ n ] is a discrete-time real signal. then S Z (ω ) = S X (ω ) + SY (ω ) + 0 + 0 = S X (ω ) + SY (ω ) Consequently PZ (ω ) = PX (ω ) + PY (ω ) Thus in the case of two jointly WSS orthogonal processes. S Z (ω ) = S X (ω ) + SY (ω ) + S XY (ω ) + SYX (ω ) ∴ 1 ∞ 1 ∞ 1 ∞ 1 ∞ 1 ∞ ∫ S Z (ω )dω = ∫ S X (ω )dω + ∫ SY (ω )dω + ∫ S XY (ω ) dω + ∫ SYX (ω )dω 2π −∞ 2π −∞ 2π −∞ 2π −∞ 2π −∞ Therefore. Here Z (t ) is the sum of two jointly WSS orthogonal random processes X(t) and Y(t). We have.

that is. ±1. • G (ω ) is always periodic in ω with a period of 2π . Thus G (ω ) is uniquely defined in the interval −π ≤ ω ≤ π . below. The symbol Ω for frequency of a continuous signal is used in the signal-processing literature just to distinguish it from the corresponding frequency of the discrete-time signal. ±2. • Suppose { g[n]} is obtained by sampling a continuous-time signal g a (t ) at a n = 0..G (ω ) exists if { g[n]} is absolutely summable. This is illustrated in the Fig. The signal g[n] is ∞ obtained from G (ω ) by the inverse discrete-time Fourier transform g[n]) = 1 2π π − ∫π g (ω )e jω n dw Following observations about the DTFT are important: • ω is a frequency variable representing the frequency of a discrete sinusoid. uniform interval T such that g[n] = g a (nT ).. n =−∞ ∑ g[ n] < ∞. The frequency ω of the discrete-time signal is related to the frequency Ω of the continuous time signal by the relation Ω = ω T where T is the uniform sampling interval.. Thus the signal g[n] = A cos(ω0 n) has a frequency of ω0 radian/samples. • We can define the Z − transform of the discrete-time signal by the relation .

G ( z ) = ∑ g[n]z − n n =−∞ ∞ where z is a complex variable. The power spectral density S X (ω ) ∞ of the WSS process { X [n]} is the discrete-time Fourier transform of autocorrelation sequence. The difficulty is avoided similar to the case of the continuous-time WSS process by defining the truncated process ⎧ X [ n] ⎪ X N [ n] = ⎨ ⎪0 ⎩ for n ≤ N otherwise The power spectral density S X (ω ) of the process { X [n]} is defined as S X (ω ) = lim N →∞ 1 2 E DTFTX N (ω ) 2N +1 ∞ where DTFTX N (ω ) = ∑ X N [n]e − jwn = ∑ X [ n]e − jwn n =−∞ n =− N N Note that the average power of { X [n]} is RX [0] = E X 2 [n ] and the power spectral density S X (ω ) indicates the contribution to the average power of the sinusoidal component of frequency ω. The very notion of stationarity poses problem in frequency-domain representation of { X [n]} through the Discrete-time Fourier transform. S X (ω ) = ∑ RX [ m ] e − jω m m =−∞ −π ≤ w ≤ π RX [ m] is related to S X (ω ) by the inverse discrete-time Fourier transform and given by RX [ m]) = 1 2π π − ∫π S X (ω )e jω m d ω . Wiener-Einstein-Khinchin theorem The Wiener-Einstein-Khinchin theorem is also valid for discrete-time random processes. G (ω ) is related to G ( z ) by G (ω ) = G ( z ) z = e jω Power spectrum of a discrete-time real WSS process { X [n]} Consider a discrete-time real WSS process { X [ n]}.

A generalized PSD can be defined in terms of z − transform as follows S X ( z) = m =−∞ ∞ ∑ R [m ] z x −m Clearly. ±2. ±3. Example Properties of the PSD of a discrete-time WSS process . ±1. below.. S X (ω ) = S X ( z ) z =e jω Example Suppose RX [m] = 2 S X (ω ) = ∑ RX [ m ] e− jω m m =−∞ ∞ ⎛1⎞ = 1 + ∑ ⎜ ⎟ e − jω m m =−∞ ⎝ 2 ⎠ m≠0 m ∞ −m m = 0.Thus RX [ m] and S X (ω ) forms a discrete-time Fourier transform pair.. Then = 3 5 − 4cos ω The plot of the autocorrelation sequence and the power spectral density is shown in Fig..

n = 0. ±1. The autocorrelation function RX [m] is defined by R X [ m] = E X [ n + m] X [ n ] = E X a (n T + mT ) X a (n T ) = RX a ( mT ) ∴ RX [ m] = RX a ( mT ) m = 0. S X (ω ) is real and even. ±2.. Therefore. Thus the sequence { RX [m] } is obtained by sampling the autocorrelation function RX a ( τ ) at a uniform interval T . the autocorrelation function RX [m] is real and even... Interpretation of the power spectrum of a discrete-time WSS process Assume that the discrete-time WSS process { X [n]} is obtained by sampling a continuoustime random process { X a (t )} at an uniform interval.For the real discrete-time process { X [ n]}.. ω2 ] is given by 2 ∫ S X (ω ) d ω ω1 ω2 • • S X (ω ) is periodic in ω with a period of 2π . that is. • The average power of { X [n]} is given by 1 π EX 2 [ n] = RX [0] = ∫ S X (ω )d ω 2π −π Similarly the average power in the frequency band [ω1 .. • S X (ω ) ≥ 0.. ±2. The frequency ω of the discrete-time WSS process is related to the frequency Ω of the continuous time process by the relation Ω = ω T . X [n] = X a (nT ). ±1.

S W (ω ) N0 2 O ω . corresponding autocorrelation function is given by RW (τ ) = N δ (τ ) where δ (τ ) is the Dirac delta. 2 The The average power of white noise Pavg = EW 2 (t ) = 1 ∞ N ∫ dω → ∞ 2π −∞ 2 The autocorrelation function and the PSD of a white noise process is shown in Fig. below.White noise process A white noise process {W (t )} is defined by SW (ω ) = N0 2 −∞ < ω < ∞ where N 0 is a real constant and called the intensity of the white noise.

RW (τ ) N0 δ (τ ) 2 O τ Remarks • • • The term white noise is analogous to white light which contains all visible light frequencies. we can model it as a white noise process. If the system band-width (BW) is sufficiently narrower than the noise BW and noise PSD is flat . which is the noise generated in resistors due to random motion electrons. t j ) = 0 for ti ≠ t j . We generally consider zero-mean white noise process. since they have very flat psd over very wide band (GHzs) . A white noise process is unpredictable as the noise samples at different instants of time are uncorrelated. Thermal noise. it cannot be realized since it has infinite average power.: CW (ti . • • White noise is a mathematical abstraction. is well modelled as white Gaussian noise.

Particularly. 2 Find RX (τ ) and S X (ω ). N and {W (t ) } is a zero-mean WGN process with PSD of 0 and independent of Φ.• A white noise process can have any probability density function. Rx (τ ) = EX (t + τ ) X (t ) = E ( B sin (ωc (t + τ ) + Φ ) + W (t + τ ) )( W (t ) + B sin (ω c t + Φ )) B2 cos ω cτ + RW (τ ) 2 N B2 ∴ S X (ω ) = (δ (ω + ωc ) + δ (ω − ωc ) ) + 20 4 where δ (ω ) is the Dirac Delta function. because its samples at arbitrarily close intervals also will be independent. Example A random-phase sinusoid corrupted by white noise Suppose X (t ) = B sin (ω c t + Φ ) + W (t ) where A is a constant bias and Φ ~ U [0. if the white noise process {W (t )} is a Gaussian random process. The corresponding autocorrelation function RX (τ ) is given by RX (τ ) = N 0 B sin Bτ 2π Bτ . . Such a process represents a ‘purely’ random process. 2π ]. Thus the WSS process { X (t )} is bandlimited white noise if S X (ω ) = N0 2 −B < ω < B For example. thermal noise which has constant PSD up to very high frequency is better modeled by a band-limited white noise process. = Band-limited white noise A noise process which has non-zero constant PSD over a finite frequency band and zero elsewhere is called band-limited white noise. • A white noise process is called strict-sense white noise process if the noise samples at distinct instants of time are independent. then {W (t )} is called a white Gaussian random process. A white Gaussian noise process is a strictsense white noise process.

S N0 2 X (ω ) −B O B ω Observe that • The average power of the process is EX 2 (t ) = RX (0) = RX (τ ) = 0 for τ = ± N0 B 2π • π B . This means that X (t ) and X (t + ) B B B where n is a non-zero integer are uncorrelated.. Thus we can get uncorrelated .± 2π 3π nπ . below..± .The plot of S X (ω ) and RX (τ ) of a band-limited white noise process is shown in Fig. Further assume that { X (t )} is a zero-mean process..

samples by sampling a band-limited white noise process at a uniform interval of π B . and given by ⎧N ⎪ 0 S X (ω ) = ⎨ 2 ⎪0 ⎩ ω − ω0 < otherwise B 2 The corresponding autocorrelation function is given by Bτ N 0 B sin 2 cos ω 0τ RX (τ ) = Bτ 2π 2 S N0 2 X (ω ) B −ω 0 − 2 −ω 0 B −ω 0 + 2 O B ω0 − 2 ω0 ω0 ω 0 + B 2 ω . • A band-limited white noise process may also be a band-pass process with PSD as shown in the Fig.

Coloured Noise A noise process which is not white is called coloured noise. White Noise Sequence A random sequence {W [n]} is called a white noise sequence if SW (ω ) = Therefore RW (m) = N δ ( m) 2 N 2 −π ≤ ω ≤ π . Thus the noise process { X (t )} with RX (τ ) = a 2 e −b τ b > 0 and PSD S X (ω ) = 2 a 2b is an example of a b2 + ω 2 coloured noise.

π →ω Remark 1 N N × 2π = 2π 2 2 The average power of the white noise sequence is finite and uniformly distributed over all frequencies. • The model white noise sequence looks artificial. Such a sequence may be called strict-sense white noise sequence.d.i. random sequence is always white. SW (ω ) RW [m] N 2 N N 2 NN 22 2 • • • • • m→ −π A realization of a white noise sequence is shown in Fig. below. • An i. but it plays a key role in random signal modelling. It plays the similar role as that of the impulse function in the modeling of deterministic signals. then {W [n]} is • The average power of the white noise sequence is EX 2 [n] = called a white Gaussian noise (WGN) sequence. A class of WSS processes .where δ (m) is the unit impulse sequence. The autocorrelation function and the PSD of a whitenoise sequence is shown in Fig. A WGN sequence is is a strict-sense stationary white noise sequence. • If the white noise sequence {W [n]} is a Gaussian sequence.

d. random variables is also important in statistical inference. • The notion of the sequence of i.i.called regular processes can be considered as the output of a linear system with white noise as input as illustrated in Fig. White noise process Linear System Regular WSS random process .

. We can thus write. In this lecture. The dynamic behavior of an LTI system to deterministic inputs is described by linear differential equations. For example. The purpose of this study is two-folds: (1) Analysis of the response of a system (2) Finding a LTI system that can optionally estimate an unobserved random process from an observed process. ( a1 x1 (t ) + a2 x2 (t ) ) dt = a1 d d x1 ( t ) + a2 x2 (t ) dt dt Hence the linear differentiator is a linear system. We are familiar with time and transfer domain (such as Laplace transform and Fourier transform) techniques to solve these equations.Response of Linear time-invariant system to WSS input: In many applications. physical systems are modeled as linear time invariant (LTI) system. we may have to find LTI system (also called a filter) to estimate the signal from the noisy observations. Basics of Linear Time Invariant Systems: A system is modeled by a transformation T that maps an input signal x(t ) to an output signal y(t). The observed random process is statistically related to the unobserved random process. we develop the technique to analyse the response of an LTI system to WSS random process. given by d x(t ) y (t ) = dx d Then. Thus for a linear system T ⎡ a1 x1 ( t ) + a2 x2 ( t ) ⎤ = a1T ⎡ x1 ( t ) ⎤ + a2T ⎡ x2 ( t ) ⎤ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ Example : Consider the output of a linear differentiator. y(t ) = T [ x(t )] Linear system The system is called linear if superposition applies: the weighted sum of inputs results in the weighted sum of the corresponding outputs.

then y (t ) = T ∞ −∞ ∫ x( s) δ ( t − s ) ds ds = = −∞ ∫ x( s ) T δ ( t − s ) ∞ −∞ [ Using the linearity property ] ∫ x( s) h ( t . t ≤ t0 ) Response of a linear time-invariant system to deterministic input A linear system can be characterised its impulse response h(t ) = T δ (t ) where δ (t ) is the Dirac delta function. s ) = T δ ( t − s ) is the response at time t due to the shifted impulse δ ( t − s ) . ∞ y (t ) = −∞ ∫ x ( s ) h ( t − s ) ds = x(t ) * h(t ) where * denotes the convolution operation. s ) ds Where h ( t . The system is called time-invariant if T x ( t − t0 ) = y ( t − t0 ) ∀ t0 It is easy to check that that the differentiator in the above example is a linear timeinvariant system.Linear time-invariant system Consider a linear system with y(t) = T x(t). s ) = h (t − s ) Therefore for a linear invariant system. If the system is time invariant. Causal system The system is called causal if the output of the system at t = t0 depends only on the present and past values of input. y (t ) = x (t ) * h(t ) = h(t ) * x(t ) . Thus for a LTI System. h ( t. Thus for a causal system y ( t0 ) = T ( x(t ). δ (t ) LTI system h(t ) Recall that any function x(t) can be represented in terms of the Dirac delta function as follows ∞ x(t ) = −∞ ∫ x(τ ) δ ( t − s ) ∞ ds If x(t) is input to the linear system y(t) = T x(t).

x(t ) LTI System h(t ) y (t ) X (ω ) LTI System H (ω ) Y (ω ) Taking the Fourier transform. EY ( t ) = E ∫ h ( s )X ( t − s ) ds −∞ ∞ ∞ = = −∞ ∞ ∫ h ( s )EX ( t − s ) ds ∫ h ( s )µ ∞ −∞ X ds −∞ = µX ∫ h ( s )ds = µ X H (0) where H (0) is the frequency response H (ω ) at 0 frequency ( ω = 0 ) given by .s. The output {Y (t )} of the system is given by Y (t ) = ∞ −∞ ∫ h ( s ) X ( t − s ) ds = ∞ −∞ ∫ h (t − s ) X ( s ) ds where we have assumed that the integrals exist in the mean square (m. Suppose { X (t )} is a WSS process input to the system. Mean and autocorrelation of the output process {Y ( t )} The mean of the output process is given by.) sense. we get Y (ω ) = H (ω ) X (ω ) where H (ω ) = FT h ( t ) = ∞ −∞ ∫ h(t ) e − jω t dt is the frequency response of the system Response of an LTI System to WSS input Consider an LTI system with impulse response h(t).

∴ EY ( t + τ )Y (t ) ) = E ∞ ∞ −∞ ∫ h ( s ) X ( t + τ − s ) dsY (t ) E X ( t + τ − s ) Y (t ) ds R XY (τ − s ) ds = = = −∞ ∞ ∫ h(s) −∞ ∫ h(s) h(τ ) * R XY (τ ) = h(τ ) * h(−τ ) * R X (τ ) Thus the autocorrelation of the output process {Y ( t )} depends on the time-lag τ . Thus RY (τ ) = RX (τ ) * h (τ ) * h ( −τ ) The above analysis indicates that for an LTI system with WSS input (1) the output is WSS and ..e. i. Therefore.H (ω ) ω =0 = ∞ −∞ ∫ h ( t )e − jω t ∞ dt ω =0 = −∞ ∫ h ( t )dt Therefore. the mean of the output process {Y ( t )} is a constant The Cross correlation of the input X(t) and the out put Y(t) is given by ∞ EX ( t + τ )Y t ) = E X (t + τ ) ∞ −∞ ∫ h ( s ) X ( t − s ) ds = = = −∞ ∞ ∫ h (s) ∫ h(s) ∫ h ( −u ) E X ( t + τ ) X ( t − s ) ds RX (τ + s ) ds RX (τ − u ) du [ Put s = − u ] −∞ ∞ = h ( −τ ) * RX (τ ) −∞ ∴ R XY (τ ) = h ( −τ ) * R X (τ ) = h (τ ) * RX (τ ) also RYX (τ ) = R XY ( −τ ) = h (τ ) * R X ( − τ ) Thus we see that RXY (τ ) is a function of lag τ only. The autocorrelation function of the output process Y(t) is given by. EY ( t ) Y ( t + τ ) = RY (τ ) . X ( t ) and Y ( t ) are jointly wide-sense stationary.

we get the power spectral density of the output process given by SY (ω ) = S X (ω ) H (ω ) H * (ω ) = S X (ω ) H (ω ) 2 Also note that R XY (τ ) = h ( −τ ) * R X (τ ) and RYX (τ ) = h (τ ) * RX (τ ) Taking the Fourier transform of RXY (τ ) we get the cross power spectral density S XY (ω ) given by S XY (ω ) = H * (ω ) S X (ω ) and SYX (ω ) = H (ω ) S X (ω ) S X (ω ) S XY (ω ) H (ω ) * H (ω ) SY (ω ) RX (τ ) h(−τ ) RXY (τ ) h(τ ) RY (τ ) . The average power of the output process {Y ( t )} is given by PY = RY ( 0 ) = RX ( 0 ) * h ( 0 ) * h ( 0 ) Power spectrum of the output process Using the property of Fourier transform.(2) the input and output are jointly WSS.

Example: (a) White noise process X ( t ) with power spectral density N0 is input to an ideal low 2 pass filter of band-width B. H (ω ) N 2 −B B ω The input process X ( t ) is white noise with power spectral density S X (ω ) = The output power spectral density SY (ω ) is given by. . 2 = 1× = N0 2 N0 2 −B ≤ω ≤ B − B ≤ω ≤ B ∴ RY (τ ) = Inverse Fourier transform of SY (ω ) = 1 2π N 0 jωτ N sin 2π Bτ e dω = 0 πτ 2 2 −B ∫ B The output PSD SY (ω ) and the output autocorrelation function RY (τ ) are illustrated in Fig. SY (ω ) = H (ω ) S X (ω ) 2 N0 . below. Find the PSD and autocorrelation function of the output process.

2 Find (a) output PSD SY (ω ) (b) output auto correlation function RY (τ ) (c) average output power EY 2 ( t ) R .S Y (ω ) N0 2 −B O B ω Example 2: A random voltage modeled by a white noise process X ( t ) with power spectral density N0 is input to a RC network shown in the fig.

(a) SY (ω ) = H (ω ) S X (ω ) 2 S X (ω ) R C ω 2 +1 N0 1 = 2 2 2 R C ω +1 2 (b) Taking inverse Fourier transform = 2 2 1 RY (τ ) = N 0 − RC e 4 RC τ (c) Average output power EY 2 ( t ) = RY ( 0 ) = N0 4 RC .The frequency response of the system is given by 1 jCω H (ϖ ) = 1 R+ jCω 1 = jRCω + 1 Therefore.

Discrete-time Linear Shift Invariant System with WSS Random Inputs We have seen that the Dirac delta function δ (t ) plays a very important role in the analysis of the response of the continuous-time LTI systems to deterministic and random inputs. δ [n ] LTI h[n] system The transfer function of such a system is given by H (ω ) = ∑ h[ n]e − jω n n =−∞ ∞ An analysis similar to that for the continuous-time LTI system shows that the response y[ n] of a the linear time-invariant system with impulse response h[ n ] to a deterministic input x[ n] is given by y[ n] = ∑ x[ k ]h[ n − k ] = x[ n]* h[n] k =−∞ ∞ Consider a discrete-time linear system with impulse response h[n] and WSS input X [n] h[n] X [n] Y [n] Y [n] = X [n]* h[n] = ∑ h[k ] X [n − k ] k =−∞ ∞ E Y [n] = E X [n]* h[n] For the WSS input X [n] µY = E Y [n] = µ X * h[ n] = µ X ∑ h[ n] = µ X H (0) n =−∞ ∞ . Similar role in the case of the discrete-time LTI system is played by the unit sample sequence δ [n] defined by δ [ n] = ⎨ ⎧1 for n = 0 ⎩0 otherwise Any discrete-time signal x[ n] can be expressed in terms of δ [n] as follows: x[n] = ∑ x[ k ]δ [ n − k ] k =−∞ ∞ A discrete-time linear shift-invariant system is characterized by the unit sample response h[ n ] which is the output of the system to the unit sample sequence δ [n ].

= RX [m]* h[m]* h[− m] RY [m] is a function of lag m only. we get S Y ( z ) = S X ( z ) H ( z ) H ( z −1 ) Notice that if H (z ) is causal. Consider the case of the discrete-time system with a random sequence x[n ] as an input. x[n ] h[n ] y[n ] RY [m] = R X [m] * h[m] * h[ −m] Taking the z − transform. the output is a correlated process. From above we get SY ( ω ) = | Η (ω ) |2 S X (ω ) S X (ω ) S Y (ω ) Hω ) 2 • Note that though the input is an uncorrelated process. then H ( z −1 ) is anti causal. Similarly if H (z ) is minimum-phase then H ( z −1 ) is maximum-phase.where H (0) is the dc gain of the system given by H (0) = H (ω ) ω =0 = ∑ h[n]e − jwn n =−∞ ∞ ∞ ω =0 = ∑ h[n] n =−∞ RY [m] = E Y [n]Y [n − m] = E ( X [n]* h[n])( X [n − m]* h[n − m]) . RX [ m ] S X ( z) H (z ) H ( z −1 ) RY [m] SY ( z ) .

Example If H ( z) = 1 1 − α z −1 and x[n ] is a unity-variance white-noise sequence. If S X (ω ) is an analytic function of ω . we get RY [m ] = 1 a |m| 2 1−α Spectral factorization theorem A WSS π −π random signal X [n ] that satisfies the Paley Wiener condition ∫ | ln S X (ω ) | dω < ∞ can be considered as an output of a linear filter fed by a white noise sequence. then 2 Given EX 2 [n] = σ X ∴ S X (ω ) = 2 σX 2π SY ( z ) = H ( z ) H ( z −1 ) S X ( z ) 1 ⎞⎛ 1 ⎞ 1 ⎛ =⎜ ⎟ −1 ⎟⎜ ⎝ 1 − α z ⎠⎝ 1 − α z ⎠ 2π By partial fraction expansion and inverse z − transform. then S X ( z ) = σ v2 H c ( z ) H a ( z ) −π π where H c ( z ) is the causal minimum phase transfer function H a ( z ) is the anti-causal maximum phase transfer function and σ v2 a constant and interpreted as the variance of a white-noise sequence. Innovation sequence v[n] H c ( z) Figure Innovation Filter X [n ] Minimum phase filter => the corresponding inverse filter exists. and ∫ | ln S X (ω ) | dω < ∞ . X [n ] 1 H c ( z) Figure whitening filter V [ n] .

(∵ hc [0] = Limz →∞ H C ( z ) = 1 H C ( z ) and ln H C ( z ) are both analytic => H C ( z ) is a minimum phase filter. where c[k ] = 1 π iω n ∫ ln S X (ω )e dw is the kth order cepstral coefficient. ∑ c( k ) zk = H C ( z −1 ) z< ρ 1 2 S XX ( z ) = σ V H C ( z ) H C ( z −1 ) 2 where σ V = ec ( 0) Salient points • S XX (z ) can be factorized into a minimum-phase and a maximum-phase factors i. however for a signal with rational power spectrum.e..... spectral factorization can be easily done. 2π −π For a real signal c[k ] = c[−k ] and c[0] = 1 π ∫ ln S XX ( w)dw 2π −π k =−∞ S XX ( z ) = e ∑ c[ k ] z − k ∑ c[ k ] z − k ∞ −1 ∞ =e Let c[0] ∞ e k =1 e k =−∞ ∑ c[ k ] z − k H C ( z ) = ek =1 ∑ c[ k ] z − k z >ρ = 1 + hc (1)z -1 + hc (2) z −2 + ..Since ln S X ( z ) is analytic in an annular region ρ < z < ln S X ( z ) = ∑ c[ k ] z − k k =−∞ ∞ 1 ρ . H C ( z ) and H C ( z −1 ). Similarly let −1 H a ( z ) = ek =−∞ ∞ ∑ c( k ) z− k = ek =1 Therefore. • In general spectral factorization is difficult. .

X p [n] is a predictable process. Example Wold’s Decomposition Any WSS signal X [n ] can be decomposed as a sum of two mutually orthogonal processes • • • a regular process X r [ n] and a predictable process X p [n] . therefore we can have a filter 1 to filter the given signal to get the innovation sequence. . HC ( z) • X [n ] and v[n] are related through an invertible transform. the process can be predicted from its own past with zero prediction error. 1 H C (z) exists (=> stable). that is. X [n ] = X r [n ] + X p [n ] X r [ n ] can be expressed as the output of linear filter using a white noise sequence as input.• Since is a minimum phase filter. so they contain the same information.

- Fully Invariant and Characteristic Interval Valued Intuition is Tic Fuzzy Dual Ideals of BF
- EE-0030 - Probability and Random Processes for Electrical and Computer Engineers
- Intuitionistic Fuzzy Perfectly Regular Weakly Generalized Continuous Mappings
- Stark Woods Solution Chp1-7 3ed
- Gamma Sag Semi Ti Spaces in Topological Spaces
- group
- Brienna Matson Assignment 8
- Einstein
- Ratio and Rates2
- On uniformly continuous uniform space
- f 110258124371169
- UT Dallas Syllabus for math2451.501 06s taught by Viswanath Ramakrishna (vish)
- Book 101
- Week 04 Notes
- IAEJ Template
- 45462507.pdf
- CHAPTER 1 Part 1 Student
- IITM2
- Optimal and Hierarchical Formation Control for UAV
- TESTS 2
- 1-s2.0-S0362546X09000947-main
- Weber - Determinations of Electrodynamic Measure Concerning a Universal Law of Electrical Action 1846
- GDandT Fundamentals
- Perelman's Proof Of The Poincare Conjecture - A Nonlinear PDE Perspective (Terence Tao)
- Mod13_Part3 of Probability and Statistics
- Class_11_SA2
- Absolute Stability of Nonlinear Systems of Automatic Control
- A Tutorial on Preview Control Systems
- MCT Book - Chapter 9
- Tensegrity Systems and Foot Bridge Poster by Andrea Micheletti

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd