Professional Documents
Culture Documents
Information Theory Coding and Cryptography Unit 2 by Arun Pratap Singh PDF
Information Theory Coding and Cryptography Unit 2 by Arun Pratap Singh PDF
5/26/14
UNIT : II
STOCHASTIC PROCESS :
In probability theory, a stochastic process or sometimes random process (widely used) is a
collection of random variables; this is often used to represent the evolution of some random value, or
system, over time. This is the probabilistic counterpart to a deterministic process (or deterministic
system). Instead of describing a process which can only evolve in one way (as in the case, for example,
of solutions of an ordinary differential equation), in a stochastic or random process there is some
indeterminacy: even if the initial condition (or starting point) is known, there are several (often infinitely
many) directions in which the process may evolve.
In the simple case of discrete time, as opposed to continuous time, a stochastic process involves
a sequence of random variables and the time series associated with these random variables (for
example, see Markov chain, also known as discrete-time Markov chain). Another basic type of a
stochastic process is a random field, whose domain is a region of space, in other words, a random
function whose arguments are drawn from a range of continuously changing values. One approach to
stochastic processes treats them as functions of one or several deterministic arguments (inputs, in
most cases regarded as time) whose values (outputs) are random variables: non-deterministic (single)
quantities which have certain probability distributions. Random variables corresponding to various
times (or points, in the case of random fields) may be completely different. The main requirement is
that these different random quantities all have the same type. Type refers to the codomain of the
function. Although the random values of a stochastic process at different times may be independent
random variables, in most commonly considered situations they exhibit complicated statistical
correlations.
, an S-valued stochastic
, indexed by a totally
where each
is an S-valued random variable on
space of the process.
STATISTICAL INDEPENDENCE :
In probability theory, to say that two events are independent (alternatively called statistically
independent or stochastically independent )[1] means that the occurrence of one does not affect
the probability of the other. Similarly, two random variables are independent if the realization of one
does not affect the probability distribution of the other.
.
Why this defines independence is made clear by rewriting with conditional probabilities:
and similarly
.
Thus, the occurrence of B does not affect the probability of A, and vice versa. Although the derived
expressions may seem more intuitive, they are not the preferred definition, as the conditional
probabilities may be undefined if P(A) or P(B) are 0. Furthermore, the preferred definition makes
clear by symmetry that when A is independent of B, B is also independent of A.
More than two events
A finite set of events {Ai} is pairwise independent iff every pair of events is independent.[2] That
is, if and only if for all distinct pairs of indices m, n
.
A finite set of events is mutually independent if and only if every event is independent of any
intersection of the other events.[2] That is, iff for every subset {An}
and
are
for all x, y and z such that P(Z = z) > 0. On the other hand, if the random variables are continuous and
have a joint probability density function p, then X and Y are conditionally independent given Z if
for any x, y and z with P(Z = z) > 0. That is, the conditional distribution for X given Y and Z is the same
as that given Z alone. A similar equation holds for the conditional probability density functions in the
continuous case.
Independence can be seen as a special kind of conditional independence, since probability can be
seen as a kind of conditional probability given no events.
Independent -algebras[edit]
The definitions above are both generalized by the following definition of independence for -algebras.
Let (, , Pr) be a probability space and let A and B be two sub--algebras of . A and B are said to
be independent if, whenever A A and B B,
and an infinite family of -algebras is said to be independent if all its finite subfamilies are independent.
The new definition relates to the previous ones very directly:
Two events are independent (in the old sense) if and only if the -algebras that they generate
are independent (in the new sense). The -algebra generated by an eventE is, by
definition,
Two random variables X and Y defined over are independent (in the old sense) if and only
if the -algebras that they generate are independent (in the new sense). The -algebra
generated by a random variable X taking values in some measurable space S consists, by
definition, of all subsets of of the form X1(U), where U is any measurable subset of S.
Using this definition, it is easy to show that if X and Y are random variables and Y is constant,
then X and Y are independent, since the -algebra generated by a constant random variable is the
trivial -algebra {, }. Probability zero events cannot affect independence so independence also
so the covariance cov(X, Y) is zero. (The converse of these, i.e. the proposition that if two random
variables have a covariance of 0 they must be independent, is not true. Seeuncorrelated.)
Characteristic function
Two random variables X and Y are independent if and only if the characteristic function of the
random vector (X, Y) satisfies
In particular the characteristic function of their sum is the product of their marginal characteristic
functions:
though the reverse implication is not true. Random variables that satisfy the latter condition are
called sub-independent.
Examples :
Rolling a die
The event of getting a 6 the first time a die is rolled and the event of getting a 6 the second time
are independent. By contrast, the event of getting a 6 the first time a die is rolled and the event
that the sum of the numbers seen on the first and second trials is 8 are not independent.
Drawing cards
If two cards are drawn with replacement from a deck of cards, the event of drawing a red card on
the first trial and that of drawing a red card on the second trial are independent. By contrast, if two
cards are drawn without replacement from a deck of cards, the event of drawing a red card on
the first trial and that of drawing a red card on the second trial are again not independent.
Pairwise and mutual independence
Consider the two probability spaces shown. In both cases, P(A) = P(B) = 1/2 and P(C) = 1/4 The first
space is pairwise independent but not mutually independent. The second space is mutually
independent. To illustrate the difference, consider conditioning on two events. In the pairwise
independent case, although, for example, A is independent of both B and C, it is not independent
of B C:
BERNOULLI PROCESS :
In probability and statistics, a Bernoulli process is a finite or infinite sequence of binary random
variables, so it is a discrete-time stochastic process that takes only two values, canonically 0 and 1.
The component Bernoulli variables Xi are identical and independent. Prosaically, a Bernoulli process
is a repeated coin flipping, possibly with an unfair coin (but with consistent unfairness). Every
variable Xi in the sequence is associated with a Bernoulli trial or experiment. They all have the
same Bernoulli distribution. Much of what can be said about the Bernoulli process can also be
generalized to more than two outcomes (such as the process for a six-sided die); this generalization
is known as the Bernoulli scheme.
A Bernoulli process is a finite or infinite sequence of independent random variables X1, X2, X3, ...,
such that
Two other common interpretations of the values are true or false and yes or no. Under any
interpretation of the two values, the individual variables Xi may be called Bernoulli trialswith
parameter p.
In many applications time passes between trials, as the index i increases. In effect, the
trials X1, X2, ... Xi, ... happen at "points in time" 1, 2, ..., i, .... That passage of time and the
associated notions of "past" and "future" are not necessary, however. Most generally,
any Xi and Xj in the process are simply two from a set of random variables indexed by {1, 2, ..., n}
or by {1, 2, 3, ...}, the finite and infinite cases.
Several random variables and probability distributions beside the Bernoullis may be derived from
the Bernoulli process:
The number of successes in the first n trials, which has a binomial distribution B(n, p)
The number of trials needed to get r successes, which has a negative binomial
distribution NB(r, p)
The number of trials needed to get one success, which has a geometric distribution NB(1, p),
a special case of the negative binomial distribution
. It is
where the
(or
for the two-sided process). Given a cylinder set, that is, a specific
at times
10
where k is the number of times that H appears in the sequence, and n-k is the number of times
that T appears in the sequence. There are several different kinds of notations for the above; a
common one is to write
where each
is a binary-valued random variable. It is common to write
probability P is commonly called the Bernoulli measure.[1]
for
. This
Note that the probability of any specific, infinitely long sequence of coin flips is exactly zero; this
is because
, for any
has measure zero. Nevertheless, one can still say that some classes of infinite sequences of coin
flips are far more likely than others, this is given by theasymptotic equipartition property.
To conclude the formal definition, a Bernoulli process is then given by the probability
triple
, as defined above.
BINOMIAL DISTRIBUTION :
The law of large numbers states that, on average, the expectation value of flipping heads for any
one coin flip is p. That is, one writes
One is often interested in knowing how often one will observe H in a sequence of n coin flips. This
is given by simply counting: Given n successive coin flips, that is, given the set of all
possible strings of length n, the number N(k,n) of such strings that contain k occurrences of H is
given by the binomial coefficient
If the probability of flipping heads is given by p, then the total probability of seeing a string of
length n with k heads is
10
11
Of particular interest is the question of the value of P(k,n) for very, very long sequences of coin
flips, that is, for the limit
. In this case, one may make use of Stirling's approximation to
the factorial, and write
Inserting this into the expression for P(k,n), one obtains the Gaussian distribution; this is the
content of the central limit theorem, and this is the simplest example thereof.
The combination of the law of large numbers, together with the central limit theorem, leads to an
interesting and perhaps surprising result: the asymptotic equipartition property. Put informally,
one notes that, yes, over many coin flips, one will observe H exactly p fraction of the time, and
that this corresponds exactly with the peak of the Gaussian. The asymptotic equipartition property
essentially states that this peak is infinitely sharp, with infinite fall-off on either side. That is, given
the set of all possible infinitely long strings of Hand T occurring in the Bernoulli process, this set
is partitioned into two: those strings that occur with probability 1, and those that occur with
probability 0. This partitioning is known as the Kolmogorov 0-1 law.
The size of this set is interesting, also, and can be explicitly determined: the logarithm of it is
exactly the entropy of the Bernoulli process. Once again, consider the set of all strings of length n.
The size of this set is
. Of these, only a certain subset are likely; the size of this set
is
for
. By using Stirling's approximation, putting it into the expression for P(k,n),
solving for the location and width of the peak, and finally taking
This value is the Bernoulli entropy of a Bernoulli process. Here, H stands for entropy; do not
confuse it with the same symbol H standing for heads.
von Neumann posed a curious question about the Bernoulli process: is it ever possible that a
given process is isomorphic to another, in the sense of the isomorphism of dynamical systems?
The question long defied analysis, but was finally and completely answered with the Ornstein
isomorphism theorem. This breakthrough resulted in the understanding that the Bernoulli process
is unique and universal; in a certain sense, it is the single most random process possible; nothing
is 'more' random than the Bernoulli process (although one must be careful with this informal
statement; certainly, systems that are mixing are, in a certain sense, 'stronger' than the Bernoulli
process, which is merely ergodic but not mixing. However, such processes do not consist of
independent random variables: indeed, many purely deterministic, non-random systems can be
mixing).
11
12
POISSON PROCESS :
In probability theory, a Poisson process is a stochastic process that counts the number of event and
the time that these events occur in a given time interval. The time between each pair of consecutive
events has an exponential distribution with parameter and each of these inter-arrival times is
assumed to be independent of other inter-arrival times. The process is named after the French
mathematician Simon Denis Poisson and is a good model of radioactive decay,[1] telephone
calls[2] and requests for a particular document on a web server,[3] among many other phenomena.
The Poisson process is a continuous-time process; the sum of a Bernoulli process can be thought of
as its discrete-time counterpart. A Poisson process is a pure-birth process, the simplest example of
a birth-death process. It is also a point process on the real half-line.
The basic form of Poisson process, often referred to simply as "the Poisson process", is a
continuous-time counting process {N(t), t 0} that possesses the following properties:
N(0) = 0
Stationary increments (the probability distribution of the number of occurrences counted in any
time interval only depends on the length of the interval)
The probability distribution of the waiting time until the next occurrence is an exponential
distribution.
The occurrences are distributed uniformly on any interval of time. (Note that N(t), the total
number of occurrences, has a Poisson distribution over (0, t], whereas the location of an
individual occurrence on t
(a, b] is uniform.)
1. Homogeneous
2. Non- Homogeneous
12
13
Thus, the number of arrivals in the time interval (a, b], given as N(b) N(a), follows a Poisson
distribution with associated parameter a,b
13
14
A rate function (t) in a non-homogeneous Poisson process can be either a deterministic function
of time or an independent stochastic process, giving rise to a Cox process. A homogeneous
Poisson process may be viewed as a special case when (t) = , a constant rate.
RENEWAL PROCESS :
Renewal theory is the branch of probability theory that generalizes Poisson processes for
arbitrary holding times. Applications include calculating the expected time for a monkey who is
randomly tapping at a keyboard to type the word Macbeth and comparing the long-term benefits of
different insurance policies.
A renewal process is a generalization of the Poisson process. In essence, the Poisson process is
a continuous-time Markov process on the positive integers (usually starting at zero) which
has independent identically distributed holding times at each integer (exponentially distributed)
before advancing (with probability 1) to the next integer:
. In the same informal spirit, we may
define a renewal process to be the same thing, except that the holding times take on a more general
distribution. (Note however that the independence and identical distribution (IID) property of the
holding times is retained).
Let
be a
distributed random variables such that
sequence
of
positive
independent
identically
each
given by
(where is the indicator function) represents the number of jumps that have occurred by time t,
and is called a renewal process.
14
15
Sample evolution of a renewal process with holding timesSi and jump times Jn.
where
is the cumulative distribution function of
density function.
and
So
15
16
as required.
RANDOM INCIDENCE :
The Poisson process is one of many stochastic processes that one encounters in urban service
systems. The Poisson process is one example of a "point process" in which discrete events
(arrivals) occur at particular points in time. For a general point process having its zeroth arrival at
time T0 and the remaining arrivals at times T1, T2, T3, . . ., the interarrival times are
Such
stochastic
process
is
fully
characterized
by
the
family
of
joint
pdf's
for all integer values of p and all possible
combinations of different n1, n2, . . ., where each ni is a positive integer denoting a particular
interarrival time. Maintaining the depiction of a stochastic process at such a general level,
although fine in theory, yields an intractable model and one for which the data (to estimate all the
joint pdf 's) are virtually impossible to obtain. So, in the study of stochastic processes, one is
motivated to make assumptions about this family of pdf's that
PREPARED BY ARUN PRATAP SINGH
16
17
Thus, for Yk, if we selected any one of the family of joint pdf's fYn1,Yn2, . . ., Ynp (yn1, yn2, . . . , yk, . . .,
ynP) and "integrated out" all variables except yk, we would obtain fY(.). Note that we have said
nothing about independence of the Yk's
They need not be mutually independent, pairwise independent, or conditionally independent in
any way. For the special case in which the Yk's are mutually independent, the point process is
called a renewal process. The Poisson process is a special case of a renewal process, being
the only continuous-time renewal process having "no memory." However, the kind of process
we are considering can exhibit both memory and dependence among the inter-event times. In
fact, the dependence could be so strong that once we know the value of one of the Yk's we
might know a great deal (perhaps even the exact values) of any number of the remaining Yk's.
Example :
Consider a potential bus passenger arriving at a bus stop. The kth bus arrives Yk time units after
the (k - 1)st bus. Here the Yk's are called bus headways. The probabilistic behavior of the Yk's will
determine the probability law for the waiting time of the potential passenger (until the next bus
arrives). Here it is reasonable to assume that the Yk's are identically distributed but not
independent (due to interactions between successive buses). One could estimate the pdf fY(.)
simply by gathering data describing bus interarrival times and displaying the data in the form of a
histogram. (This same model applies to subways and even elevators in a multielevator building.)
Suppose that buses maintain perfect headway; that is, they are always T0 minutes apart. Then
17
18
That is, the time until the next bus arrives, given random incidence, is uniformly distributed
between 0 and T0, with a mean E[V] = T0/2, as we might expect intuitively.
18
19
19
20
20
21
21
22
22
23
23
24
of bacteria, the number of people with a disease within a population, or the number of customers in
line at the supermarket.
When a birth occurs, the process goes from state n to n + 1. When a death occurs, the process goes
from state n to state n 1. The process is specified by birth rates
rates
and death
Example :
A pure birth process is a birthdeath process where
A pure death process is a birthdeath process where
for all
for all
.
.
for all
M/M/1 model and M/M/c model, both used in queueing theory, are birthdeath processes used to
describe customers in an infinite queue.
/FIFO (in complete Kendall's notation) queue. This is a queue with Poisson arrivals,
drawn from an infinite population, and C servers with exponentially distributed service time
with K places in the queue. Despite the assumption of an infinite population this model is a good
model for various telecommunication systems.
M/M/1 queue
The M/M/1 is a single server queue with an infinite buffer size. In a non-random environment the
birthdeath process in queueing models tend to be long-term averages, so the average rate of
arrival is given as and the average service time as
M/M/1 queue when,
The difference equations for the probability that the system is in state k at time t are,
24
25
M/M/c queue
The M/M/c is a multi-server queue with C servers and an infinite buffer. This differs from the
M/M/1 queue only in the service time, which now becomes
and
with
M/M/1/K queue
The M/M/1/K queue is a single server queue with a buffer of size K. This queue has applications
in telecommunications, as well as in biology when a population has a capacity limit. In
telecommunication we again use the parameters from the M/M/1 queue with,
In biology, particularly the growth of bacteria, when the population is zero there is no ability to
grow so,
Additionally if the capacity represents a limit where the population dies from over population,
The differential equations for the probability that the system is in state k at time t are,
MARKOV PROPERTY :
In probability theory and statistics, the term Markov property refers to the memoryless property of
a stochastic process. It is named after the Russian mathematician Andrey Markov.[1]
A stochastic process has the Markov property if the conditional probability distribution of future states
of the process (conditional on both past and present values) depends only upon the present state, not
on the sequence of events that preceded it. A process with this property is called a Markov process.
The term strong Markov property is similar to the Markov property, except that the meaning of
25
26
"present" is defined in terms of a random variable known as a stopping time. Both the terms "Markov
property" and "strong Markov property" have been used in connection with a particular "memoryless"
property of the exponential distribution.[2]
The term Markov assumption is used to describe a model where the Markov property is assumed to
hold, such as a hidden Markov model.
A Markov random field[3] extends this property to two or more dimensions or to random variables
defined for an interconnected network of items. An example of a model for such a field is the Ising
model.
A discrete-time stochastic process satisfying the Markov property is known as a Markov chain.
26
27
27
28
28
29
29
30
using little-o notation. The qij can be seen as measuring how quickly the transition
from i to j happens
Jump chain/holding time definition
Define a discrete-time Markov chain Yn to describe the nth jump of the process and
variables S1, S2, S3, ... to describe holding times in each of the states where the distribution ofSi is
given by qYiYi.
Transition probability definition
For any value n = 0, 1, 2, 3, ... and times indexed up to this value of n: t0, t1, t2, ... and all states
recorded at these times i0, i1, i2, i3, ... it holds that
where pij is the solution of the forward equation (a first-order differential equation)
PREPARED BY ARUN PRATAP SINGH
30
31
31
32
Hidden Markov models are especially known for their application in temporal pattern recognition such
as
speech,
following,
[8]
handwriting,
partial discharges
gesture
[9]
recognition,[7]
part-of-speech
tagging,
musical
score
and bioinformatics.
A hidden Markov model can be considered a generalization of a mixture model where the hidden
variables (or latent variables), which control the mixture component to be selected for each
observation, are related through a Markov process rather than independent of each other. Recently,
hidden Markov models have been generalized to pairwise Markov models and triplet Markov models
which allow to consider more complex data structures [10][11] and to model nonstationary data.
32
33
33
34
34
35
35