You are on page 1of 17

For any Homework related queries, Call us at :- +1 678 648 4277

You can mail us at :- support@statisticshomeworksolver.com


reach us at :- .statisticshomeworksolver.com
Probabilistic Systems Analysis Homework Help
1. Consider an exponentially distributed random variable X with
parameter λ.Let F be the
distribution function. Find the real number µ that satisfies: F (µ) = 1 . This
number µ is called
the median of the random variable.
2. One of two wheels of fortune, A and B, is selected by the flip of a fair
coin, and the wheel chosen is spun once to determine the experimental
value of random variable X . Random variable Y , the reading
obtained with wheel A, and random variable W , the reading obtained
with wheel B, are described by the PDFs

f Y (y) 1, 0 < y ≤ and f W (w) 3,≤ ;


0 < 1w
3
1;
= 0, = 0,
4 otherwise.
otherwise,
If we are told the experimental value of X was less than 1 , what is the
conditional probability
that wheel A was the one selected?
3.A 6.041 graduate opens a new casino in Las Vegas and decides to make
the games more challenging from a probabilistic point of view. In a
new version of roulette, each contestant spins the following kind of
roulette wheel. The wheel has radius r and its perimeter is divided into
20 intervals, alternating red and black. The red intervals (along the
perimeter) are twice the width of the black intervals (also along the
perimeter). The red intervals all have the same length and the black
intervals all have the same length. After the wheel is spun, the center of
the ball is equally likely to settle in any position on the edge of the
wheel; in other words, the angle of the final ball position (marked at
the ball’s center) along the wheel’s perimeter is distributed uniformly
between 0 and 2π radians.
(a)What is the probability that the center of the ball settles in a
red interval?
(b)Let B denote the event that the center of the ball settles in a
black interval. Find the
(a)conditional PDF f Z | B (z), where Z is the distance, along the
perimeter of the roulette wheel, between the center of the ball and
the edge of the interval immediately clockwise from the center of
the ball?
(b)What is the unconditional PDF f Z (z)?

Another attraction of the casino is the Gaussian slot machine. In this


game, the machine pro- duces independent identically distributed
(IID) numbers X 1 , X 2 , ... that have normal distribution N(0, σ 2 ). For
every i, when the number X i is positive, the player receives from the
casino a sum of money equal to X i . When X i is negative, the player
pays the casino a sum of money equal to
|X i |.
(d)What is the standard deviation of the net total gain of a player after
n plays of the Gaussian slot machine?
(e)What is the probability that the absolute value of the net total gain
after n plays is greater than 2 √ nσ?
3.Let a continuous random variable X be uniformly distributed over the
interval [−1, 1]. Derive the PDF f Y (y) where:
(a) Y = sin( π X2)
(b) Y =
sin(2πX)
5.A Communicati on Example During lecture, we looked at a simple
point-to-point communi- cation example. The task was to transmit
messages from one point to another, by using a noisy channel. We
modeled the problem probabilistically as follows:
• A binary message source generates independent successive messages
M 1 , M 2 , · · · : each mes- sage is a discrete random variable that takes
the value 1 with probability p and the value 0 with probability 1 −
p. For a single message, we write the PMF as:

pM (m)1 − p If m = 0
= p If m = 1
• A binary symmetric channel acts on an input bit to produce an output
bit, by flipping the input with “crossover” probability e, or
transmitting it correctly with probability 1 − e.

For a single transmission, this channel can be modeled using a conditional


PMF pXY | Xand
where (y|x),
Y are random variables for the channel input and output
bits respectively. We then have:
1− e
pY (y|
Ife y =Ifx.y =/
x)| = x.
X
Next, we make a series of assumptions that make the above single-
transmission description enough to describe multiple transmissions as
well. We say that transmissions are:
(a)Independent: Outputs are conditionally independent from each other,
given the inputs.
(a)Memoryless: Only the current input affects the current output.
(b)Time-invariant: We always have the same conditional PMF.
We write:

Any transmission scheme through this channel must transform


messages into channel inputs (encoding), and transform channel
outputs into estimates of the transmitted messages (decoding). For this
problem, we will encode each message separately (more elaborate
schemes look at blocks and trails of messages), and generate a
sequence of n channel input bits. The encoder is therefore a map:
{0, 1} −→ {0, n 1}
M n −→ X 1 , X 2 , · · · , X nThe
decoder is simply a reverse map:
{0, 1}−→ {0, 1}
M
Y 1 , Y2 , · · · , Ynˆ
−→
Note that we use the “hat” notation to indicate an estimated
quantity, but bare in mind that M and Mˆ are two distinct random
variables. The complete communication problem looks as follows:
Finally, to measure the performance of any transmission scheme, we
look at the probability of error, i.e. the event that the estimated
message is different than the transmitted message:

P (error) = P (Mˆ /= M ).
(a)No encoding:
The simplest encoder sends each message bit directly through the
channel, i.e. X 1 = X = M . Then, a reasonable decoder is to use
the output channel directly as message estimate: Mˆ = Y1 = Y .
What is the probability of error in this case?
(b)Repeti ti on code with majority decoding rule:
The next thing we attempt to do is to send each message n > 1
times through this channel. On the decoder end, we do what seems
natural: decide 0 when there are more 0s than 1s, and decide 1
otherwise. This is a “majority” decoding rule, and we can write it as
follows (making the dependence of Mˆ on the channel outputs
explicit):

0 If y1 + · · · + yn < n/2.
M (y 1 , · · · , 1 If y1 + · · · + yn ≥
ˆ

yn ) = n/2.
Analytical results:
i. Find an expression of the probability of error as a function of e, p
and n. [Hint: First use the total probability rule to divide the
problem into solving P(error|M = 0) and P(error|M = 1), as we did
in the lecture.]
ii. Choose p = 0.5, e = 0.3 and use your favorite computational
method to make a plot of
P(error) versus n, for n = 1, · · · , 15.
iii.Is majority decoding rule optimal (lowest P(error)) for all p? [Hint:
What is the best decoding rule if p = 0?].
Simulation:
i. Set p = 0.5, e = 0.3, generate a message of length 20.
ii.Encode your message with n = 3.
iii.Transmit the message, decode it, and write down the value of the
message error rate (the ratio of bits in error, over the total number of
message bits).
iv. Repeat Step 3 serveral times, and average out all the message error
rates that you obtain. How does this compare to your analytical
expression of P(error) for this value of n?
v. Repeat Steps 3 and 4 for n = 5, 10, 15, and compare the respective
average message error rates to the corresponding analytical P(error).
(c) Repeti ti on code with maximum a posteriori ( M A P ) rule:
In Part b, majority decoding was chosen almost arbitrarily, and we have
alluded to the fact that it might not be the best choice in some cases.
Here, we claim that the probability of error is in fact minimized when the
decoding rule is as follows

This decoding rule is called “maximum a posterior” or MAP, because


it chooses the value of M which maximizes the posterior probability
of M given the channel ouput bits Y 1 = y1, · · · , Yn = yn (another
term for Bayes’ rule).
i. Denote by N 0 the number of 0s in y1 , · · · , yn , and by N 1 the
number of 1s. Express the MAP rule as an inequality in
terms of N 0 and N 1 , and as a function of e and p. [Hint: Use
Bayes’ rule to decompose the posterior probability. Note that
pY1 ,··· ,Yn (y 1 , · · · , y n ) is a constant during one decoding.]
i. Show that the MAP rule reduces to the majority rule if p = 0.5.
ii. Give an interpretation of how, for p /= 0.5, the MAP rule
deviates from the majority rule. [Hint: Use the simplification
steps of your previous answer, but keep p arbitrary.]
iii.(Optional) Prove that the MAP rule minimizes P(error).

G1†. Let X 1 , X 2 , . . . , X n , . . . be a sequence of independent and


identically distributed random variables. We define random variables
S 1 , S 2 , . . . , S n , . . . (n = 1, 2 . . .) in the following manner: S n = X 1 +
X 2 + · · · + X n . Find
E [X 1 | S n = s n , S n+ 1 = s n+ 1 , . . .]

1. We are given that


2 ∫F (µ)
= 1 µλe dx =1
⇒ − 2
0
∫ µ
λ−λx −λx µ 1
⇒ x−λµ
⇒ 0−λµ =1 λe
ln 2 dx = −e |0 = 1 − e
⇒ µ λ = 2
= ln 2

2. First we can draw a tree with the the following branches:


f Y (y)
Y>1/4 1

A y
Y<1/4
1/4 1

W>1/4 f W (w)
3
B

W<1/4 w
Then, using the PDFs given in the question we can determine two
probabilities that are clearly relevant for this question and give branch
labels for the tree:
Finally, Bayes’s Rule gives

3. (a) We know that the total length of the edge for red interval is two
times that for black inteval. Since the ball is equally likely to fall in
any position of the edge, probability of falling in a

(b) Conditioned on the ball having fallen in a black interval, the


ball is equally likely to fall anywhere in the interval. Thus, the
PDF is

(c) Since the ball is equally likely to fall on any point of the edge, we can
see it is twice as likely
(d) The total gains (or losses), T, equals to the sum of all Xi, i.e. T = X1
+X2 + + · · · Xn, Since all the Xi’s are independent of each other, and
they have the same Gaussian distribution, the sum will also be a
Gaussian with

Therefore, the standard deviation for T is √nσ.


(e)

4. (a) Y = g(X) = sin(πX). Because g(x) is a monotonic function for −1 < x <
1, we can define 2 an inverse function h(y) = 2 arcsin y and use the PDF
formula given in lecture:
2 ... 2 2..
= X π . ... √ 1 −
=f 1 √arcsi
2
· π y .
2 π 1y 2
n
= √
−1y
π 1−
The final y2
 y < −1
answer is: 
π
 √1
fY (y) 01−y −1 ≤
0
y< 1
=
1≤y

(b) Y = sin(2πX). In rearranging the equation, we find X = arcsin Y


.
This is a many-to-
corresponding value of X . For example, if Y = 1, X could
4
have
4
been
1
or − 3 . This means

one mapping of X to Y . That is, given a value of Y , it
is not possible to pinpoint the
we cannot use the formula used in part (a).
The CD F is still the means to achieving the PDF, but we take a graphical
approach instead. First, let’s consider the extreme cases. By virtue of
the sin X function, the value of Y varies from -1 to 1. It is obvious that
no value of the random variable Y can be less than -1, so F Y (y ≤ −1)
= 0. Also, every value of Y is less than 1, so F Y (y ≥ 1) = 1.
For −1 ≤ y ≤ 0, consider the following diagram:
y = sin(2*pi*x) for −1 < x < 1
1.5

0.5
y value

a b c d
0

−0.5

−1

−1.5
−1.5 −1 −0.5 0 0.5 1 1.5
x value

PDF of x
PDF of x

0.6

f(x) 0.4

0.2

0
a b c d

−0.2

−1.5 −1 −0.5 0 0.5 1 1.5


x

The value b indicates the conventional


response of arcsin y which is between [
other words, by rearranging — ,π 2
the original equation:b = ]. In π
2
arcsin y
So, we now have F Y (y) = P((a ≤ X ≤ b) ∪ (c ≤ X ≤ d)), where
the event on the right is the shaded region in the above PDF of X .
By symmetry, and mutual exclusivity, this shaded region can be
expressed as four times the more darkly shaded region of the PDF.
— 1 2 1
4
≤ bπ 4
F Y (y| − 1 ≤≤y ≤ 0) = P ((a
1
≤ X ≤ b) ∪ (c4P≤ X− ≤ ≤
4 d))

= X ≤b
arcsin
= y +
1π . 1
= 24(0.5)
b− − 4
For any Homework related queries, Call us at :- +1 678 648 4277
You can mail us at :- support@statisticshomeworksolver.com
reach us at :- www.statisticshomeworksolver.com
For 0 ≤ y ≤ 1, consider the following
diagram:
y = sin(2*pi*x) for
−1 < x < 1
1.5

1
y value

0 d a b c
−0.5

−1 0.5
−1.5
−1.5 −1 −0.5 0 0.5 1 1.5
x value

PDF of x

0.6

0.4
f(x)

0.2

0
d a b c

−0.2 0
x
−1.5 −1 −0.5 0.5 1 1.5

By an analogous argument to the previous case, we find that F Y (y|0 ≤ y ≤


1) is represented by the shaded region from −1 to d, a to b, and c to 1.
Once again, however, this is exactly 1 region from − to b. So,
four times the more darkly shaded
the same as in the case
the expression for the C DF is
−1 ≤ y ≤ 0. FY 0 ≤ y ≤ arcsin + y
(y| 1) = 1
To
 yπ ≤ −1
2
summariz
F Y (y) arcsin y +
e: π −1 ≤ y
Now use the identity
= 0 12
arcsin y = < 1
√1 and differentiate with
1− 1 ≤ y
d respect
 to y:
y
d 1
2  2
√1
y d Y 0 π
0
y 1−y
5.Analytical Results

(a)No encoding:
P (error) = P (Mˆ = 1, M = 0) + P (Mˆ = 1, M = 0)
= P (Y = 1, X = 0) + P (Y = 0, X = 1)
= P(Y = 1|X = 0)P(X = 0) + P(Y = 0|X = 1)P(X = 1)
= e(1 − p) + ep
(b)Repeti ti on encoding:
i. Similar to the previous case, the errors can be attributed to the
conditions Mˆ = 1, M = 0 and Mˆ = 0, M = 1. Assuming we
encode each bit by repeating it n = 2m + 1, m = 0, 1.. times, we
get
P (error) = P (Mˆ = 1, M = 0) + P (Mˆ = 1, M = 0)
= P (Y 1 + Y 2 . . . Y n ≥ m + 1|M = 0)(1 − p) + P (Y 1 + Y 2 . . .
Y n ≤ m|M = 1)p
= P(atleast
! (m + 1) errors|M = 0)(1 − p) + P(atleast
=(m +Σ 1) n
k ek ( 1 − e) = 0)p
errors|M
n (n− k)
k= m+ 1

Note: It can be seen that the probability of error is


independenta of the priori probabilitiesp and (1 − p). The error
varies only as a functioen, n of
ii.The variation of the error probability as a function of n is shown in
Figure 1
iii.Even though, the error is independent of p, in degenerate cases
such as p = 1, p = 0, the majority decoding does not give us
any advantage. It is easier to always output Mˆ = 1, M ˆ= 0
respectively.
Simulations The matlab code for simulating the communication
channel is attached with the solutions. Figure 2 shows the close
agreement between the analytical results (indicated by ◦) and
the simulated error rates (indicated by ∗).
0.35
e=0.3,p=0.5

0.3

0.25

0.2
P(error)

0.15

0.1

0.05

0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
n

Figure 1: Analytical Results

0.35
e=0.3,p=0.5

0.3

0.25

0.2
P(error)

0.15

0.1

0.05

0 5 10 15

Figure 2: Simulation results


n

(C) Repeti ti on code with maximum a posteriori ( M A P ) rule:


i. Using Baye’s rule to compute the posterior probabilities, we get

P(M = 0|Y1 = y1, · · P (Y 1 =P (Y y1 , · · · , Y n = yn |


M = 0)P 1 = y
(M =1 ,0)
··
· , Yn = yn ) · , Y n= = yne)N1 (1 − e) N0 (1 − p)
=
Similarly,P(M = 1|Y1 = y1 , · P (Y 1 = y1 , · ·
· · , Yn = yn ) · , Y n = yn )
= P (Y 1 = y1 , · · · , Y n = yn |
M = 1)P (M = 1)
For any Homework related queries, Call us at :- +1 678 648 4277
You can mail us at :- support@statisticshomeworksolver.com
reach us at :- www.statisticshomeworksolver.com
= eN0 (1 − e) N1 (p)
Since, the denominators are the same, the MAP decision rule
reduces to
Mˆ (y , · 0 If eN1 (1 − e) N0 (1 − p)
1
· ·n , y ) ≥ eN0 (1 − e) N1 (p)
= 1 Otherwise.

ii. When, p = 0.5, the decision rule reduces to

Mˆ(y1 , · ·0 If e (1 − e) ≥
N1 N0

· , yn ) = e (1 − e)
N0 N1

1 Otherwise.

01 If N0 >
Otherwise.
Mˆ (y1 , · · ·N,1yn ) =
The test expression can be rewritten as (1 − e)N0 − N 1 ≥ eN0 − N 1 If we
assume that
e ≤ 0.5, then (1 − e) ≥ e. Therefore,we can re-write the test as,
iii.In this case, we see that the decision rule and hence the
probability of error depends oupon the a-priori probabilities p
and 1 − p. In particular, if we consider the degenerate cases again
p = 0, p = 1, we see from the rule that we always decide in
favor of Mˆ = 0, Mˆ = 1 irrespective of the recieved bits. This is in
contrast to the majority decoding, that still has to count the
number of 1s in the output.
iv. Here we resort to a more intuitive proof of the statement that also
illustrates the concept of ’risk’ in decision making (A
mathematically rigorous proof may be found in 6.011 and
6.432). Consider the case when we have no received data and
make the decision based entirely on prior probabilities P(M = 0)
= 1 − p and P(M = 1) = p. If we decide Mˆ = 1, then we have
the 1 − p probability of being wrong. Similarily if we chose Mˆ =
0,
iii.we have a probability p of being wrong. We choose Mˆ to
minimize the risk of being wrong or maximize the
probability of being right. Thus we choose
Mˆ = arg max P (X = M )
M = (0,1)

When we are given the data Y , we have to deal with the a-


posteriori probabilities P(M = 1|Y ), P(M = 0|Y ) instead of a-
priori probabilities P(M = 0), P(M = 1) and the argument
remains unchanged. Thus, to minimize probability or being
wrong, or maximize the probability of being right,we choose

Mˆ = arg max P (X = M |Y )
M = (0,1)

G1†. Note that we can rewrite E[X 1 | S n = s n , S n + 1 = s n + 1 , . . .] as follows:


E [X 1 | S n = s n , S n+ 1 = s n+ 1 , . . .]
= E [X 1 | S n = s n , X n+ 1 = s n+ 1 − s n , X n+ 2 = s n+ 2 −
s n+ 1 , . . .]
= E [X 1 | S n = s n ],
where the last equality holds due to the fact that the
X i ’s are independent. We also note that
E [X 1 + · · · + X n | S n = s n ] = E [S n | S n
= sn ] = sn
It follows from the linearity of expectations that

E [X 1 + · · · + X n | S n = s n ] = E [X 1 | S n = s n ] + · · · +
E [X n | S n = s n ]/
Because the X i ’s are identically distributed, we have the following
relationship:

E [X i | S n = s n ] = E [X j | S n = s n ], for any 1 ≤ i ≤ n, 1 ≤ j ≤ n.
Therefore,

= s nn]n=sns. n
E [X 1 + · · · + X n | S n = s n ] = nE [X 1 | 1S n n ⇒
E [X | S = s ] =

You might also like