You are on page 1of 38

F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 1

Contents
Chapter 4 ......................................................................................................................................... 3
Basics of Statistical Modeling ........................................................................................................ 3
4.1 Introduction ........................................................................................................................... 3
4.2 Statistical Experiments.......................................................................................................... 4
4.3 Random Variables ................................................................................................................. 8
4.3.1 Basic Concepts ............................................................................................................... 8
4.3.2 Properties of the CDF and the PDF of a random variable ........................................... 13
4.3.3The Conditional Distribution ........................................................................................ 15
4.3.4StatisticalExpectation .................................................................................................... 17
4.3.5Functions of Random Variable ..................................................................................... 22
4.4 Two Random Variables ...................................................................................................... 25
4.4.1 Statistical Expectation in Two Dimensions ................................................................. 29
4.4.2 Functions of two Random Variables............................................................................ 31
4.4.3Two functions of two random varaibles: ...................................................................... 33

F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 2

Fig. 4. 1: Illustration of the random variable as a mapping function from the sample space of the
sigma algebra into Borel sets on the real line. 8
Fig. 4. 2:Illustration of the domain and range of a discrete-valued random variable. .................... 9
Fig. 4. 3: The CDF of a mixed random variable. .......................................................................... 12
Fig. 4. 4: Illustration of the measurability of functions of random variables. .............................. 23
Fig. 4. 5: Illustration of the concept of two-dimensional random vectors for two random variables
on the same sample space.. ........................................................................................................... 26
Fig. 4. 6: Illustration of region of integration for the marginal density. ................................. 28


F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 3


Chapter 4
Basics of Statistical Modeling
4.1 Introduction
Statistical experiments are conducted to infer information about various processes, in order to
extract outcomes that guide decision making. Effectiveness of a certain drug on a particular
disease, surgical procedures, therapy techniques, demographic effects on diseases, are few
examples of statistical inference based on a statistical experiment. A statistical experiment may
be described in terms of a population, a phenomenon to be investigated, and a scaling procedure
to quantify the spread of the phenomena in a population. Traditionally, a statistical experiment E
is described in terms of a trilogy: sample space enlisting elementary outcomes or alphabet of
the experiment; possible outcomes (events) enlisting the dictionary of possible events; and a
scale (measure) P that weighs the occurrences of events in the dictionary. That is: E = {, ,
P}. [Papoulis]The form and level of complexity of each of these components depend on the
specifics of the experiment. The power of statistical theory is in its ability to describe
experiments of varying levels of complexity. This chapter will discuss the basic elements that
describe statistical experiments. In particular, we shall focus on the concepts of random variables
and random process. In the following chapter we discuss their generalizations to random fields.
Navigating through a wide and extensive literature such as statistics is not an easy
endeavor; indeed, it is quite a risky task given the space limitations as well as the purpose of the
book. Therefore, we hasten and say upfront, that there is no unique approach for selecting what
subjects to cover and what is an appropriate level of rigor. However, our experience dictates that
understanding the foundation of the statistical theory is an essential ingredient for many
approaches of image analysis. This chapter is a tour de force into some of the common
terminologies and basics of statistical modeling that are essential to appreciate the material
covered in the book. The statistical literature is accessible, and the readers can learn more
beyond what is covered here.
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 4

4.2 Statistical Experiments
We follow the tradition of describing a statistical experiment; i.e., E = {, , P}.
- Sample space :
The sample space enumerates the elementary outcomes of an experiment, and by construct,
various combinations of them as well. For example, if



for some , are the
elementary outcomes of E (i.e., its single-valued possibilities or entities of its alphabet), it is a
natural requirement that combinations of these outcomes, that may also happen in E, should be
included in . However, events may be constructed arbitrary and some may never happen!
Hence, it is the tradition to list only the elementary outcomes (elements of the alphabet) in ,
and the events (all possible combinations of the elementary outcomes or dictionary of the events
in the experiment) are given a special entity, denoted by a field or algebra .
- Field (algebra) :
As the events need to be scaled or measured, an enumeration procedure is needed to generate the
dictionary of events, including those who may not happen; these will be given zero weight or
measure. Therefore, desirable characteristics of will be such that it will be closed under all
countable (may be infinite in number) inclusions of elementary events, such as unions.
Specifically, we define for events

and such that:


a) The empty set (or null event) and the sample space (sure event) must be in .
b)

implies that

for all

.
c) If

then

.
The above requirements are all natural for the field to be an inclusive space that spans
(includes) all measurable events.
- Probability measure P:
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 5

The probability measure P is a function that assigns weights to the events in . It is the tradition
to have a positive scale normalized to the unit interval. It is natural that events may be related
(dependent) or unrelated (independent) from each others. Events also may be joint (share some
elementary outcomes) or disjoint (share no elementary outcomes) in their constructs. Therefore,
a measure P satisfies the following requirements (axioms): [Ash]
a)

for all events

and .
b) The empty set (or null event) has zero weight and the sample space (sure event) has
a weight of 1; that is: and
c) If

then

if the events

are
mutually disjoint.
As a consequence of axioms b and c, if

is the complement event of

; then


Example 1:Consider a histology experiment conducted on cells from a tissue specimen taken
during a procedure to determine the existence of cancer. In this experiment tissue is treated and
viewed under a microscope with viewed cells cancerous (C) or noncancerous (NC). As biopsies
are performed after reasonable doubts to justify doing them, the weights of the C and NC events
differ with procedure. A possible statistical experiment may be described as follows:
, , and
Example 2: Consider the same experiment in Example 1, in which viewing the cells under the
microscope revealed a contradictory conclusion with the agent used to prepare the tissue. The
histologist chart showed unspecified cells (US) under these circumstances. That is the sample
space became: . As these events are independent, the sigma algebra would be
simply the collection of five possibilities: , and the probability measure will
be such that , and
Example 3:Consider the experiment of recording the life span of cells in a specimen under the
effect of an agent. Suppose that the viewing domain of the microscope allows individual cell
viewing and tracking, and that it is possible to tag each cell as it moves within the specimen until
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 6

its death. It is expected that a counting experiment will be a tedious one especially with large
degrees of motion of the cells, yet a histogram may be constructed indicating the number of cells
that die with time.
Events containing similar elementary outcomes are called joint, while those with no
common elementary outcomes are classed disjoint. Probability measures on disjoint events are
additive. Events with no bearing on each others are called independent. Probability measures on
independent events are multiplicative. As cause and effect is a natural rule of physics, it is also
logical to ask of the likelihood of related events with respect to each others. Conditional
probability measures addresses this issue, providing the value of the probability of an event
given a second event took place as the ratio of their joint probability normalized by the
probability of the second event. We formally define these concepts as follows.
Definition 1: Two events

and

on the same sample space are said to disjoint if their


intersection is empty; i.e.,


Definition 2: Two events

and

, not necessarily on the same sample space , are said to be


independent if their joint probability measure is multiplicative;
i.e.,


Definition 3: The conditional probability measure of event

given the occurrence of event

,
where

and

, not necessarily on the same sample space , is denoted by

and is
defined by:

provided that


The expression for the conditional probability in Definition 3 is usually called Bayes Theorem.
As direct corollary to the above definitions, independent events do not affect the value of the
conditional probability measure. Likewise, disjoint events have a zero-valued joint probability
measure and, subsequently, zero-valued conditional probability measure.
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 7

Example 4: Consider events

and that

is a subset of

; symbolically,

It
follows that the conditional probability

and that the joint probability


measure


A stimulus may affect a set of events even though they may be disjoint. In this case,
conditional probabilities enable measuring the overall effect of that stimulus. For example,
suppose that an experiment produces three disjoint events

such that

. We can measure the effect of a stimulus (i.e., another event) on the set of
events by noting that

, and since the intersections are disjoint as well, then


the probability measure is additive with respect to the components

; i.e.,

,which can be written in terms of conditional expectations as:


This can be generalized to a larger number of events in what is known as


the total probability theorem which is stated below.
Theorem 1: Consider a set of events

and . Suppose the events are


mutually disjoint; i.e., they span the sample space. Consider an event . We can show that


The proof of the total probability theorem follows from the fact that the probability measure of
disjoint events is analytically additive. [Ash]
In general, enlisting the events in and will not be a simple task, if not impossible,
with real-world problems. Likewise, obtaining the corresponding functional form for the
probability measure will be a rough task to accomplish. A logical task that statistions faced has
been to search for automatic procedures to generate events and their measure for particular
experiments. Indeed, a general approach has evolved to accomplish this task through the concept
of random functions; commonly referred to as random variables. This type of function maps
events on the sample space and the sigma algebra, which may be quite abstract, into another
representation on the Cartesian domain on which assigning a scale (measure or weight) will be
possible using the common tools of calculus. We examine this concept next as prelude to
describing multivariable experiments and complicated stochastic phenomena.
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 8

4.3 Random Variables
4.3.1 Basic Concepts
A random variable is a mapping from the sigma algebra into the real line such that the
even . As illustrated in Fig. 1, the random variable is a mapping from
events in into other forms of events (sets) defined on the real line; these sets are called Borel
sets. For compact representation, we insist that the inverse mapping of the Borel sets are events
on ; i.e., the set

for all Borel sets [Peiper].


To avoid potential confusion on the difference between the sample space and the sigma
algebra vis--vis the random variable, we use as stated before, the alphabet of the experiment to
represent and its dictionary to represent ; hence, is technically a mapping from
into. Of course, it is understood that events in are formed of elementary events, i.e., must
subsequently be contained in . Like any function, has a domain which will be and a
range which is the real line . Hence, a short hand for this relationship would be as follows:
. As domains and ranges may be continuous or discrete, we will be able to consider
cases where the mapping between continuous entities, in which will be denoted as
continuous random variable, and between discrete entities, in which will be denoted as
discrete random variable. As the mapping is between events, it is natural that weights (measure)
of the two ends of the mapping would be the same. That is, in the illustration shown in Fig. 1, we
must have

Fig. 4. 1: Illustration of the random variable as a mapping function from the sample space of the sigma algebra into
Borel sets on the real line.
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 9

We shall provide few illustrative examples to demonstrate the power of the random variable
concept in describing a statistical experiment.
Example 5: Consider the experiment in example 1 above. Suppose we have a function
which takes an arbitrary value of 1 for C and -1 for NC. It is evident that we can establish
events . In particular, we have the following possibilities:
{





The above example illustrates a discrete-valued random variable, which has a finite range and
domain. Fig. 2 illustrates this mapping.

Fig. 4. 2: Illustration of the domain and range of a discrete-valued random variable.

Example 6: Suppose a statistical experiment with sample space


consisting of eight elementary outcomes. Considera function

. We may
illustrate that

is a discrete-valued random variable on by examining whether


is an event. It is evident that:
{




F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 10

Hence,

is a random variable.
The above illustrates the value of the random variable concept as an event generator.
Equally important is to see whether the random variable will also enable measurability of the
generated events. In other words, can we evaluate The ability to
evaluate this quantity achieves the measurability requirements for the events in , and thus
completely describes the statistical experiment. Indeed, the quantity has
a special name in mathematical statistics, known as the cumulative probability distribution
function (CDF). From its definition, the event shrinks as and expands as ;
therefore, the CDF is non-decreasing. Indeed it is also monotone; hence, continuous from right;
that is:

As the CDF is a
fundamental concept in the theory of statistics we shall examine it in further details. Indeed, we
shall, for the ease of notations, connote the CDF as and will give it the special
symbol

. To make the notations sync, we shall define the CDF formally as follows:
Definition 3:Given a random variable defined for a statistical experiment E = {, , P}.
The cumulative probability distribution (CDF) of is a function on the real line which
measures the event We connote this function as

for every Borel set


The CDF, like any function, may be studied from the theory of calculus. Therefore, the
description of the statistical experiments and indeed the statistical theory gets transitioned from a
seemingly guessing game into a well-founded mathematical theory through the concepts of
random variable. The monotoncity, hence, continuity of

is evident from the definition.


We can easily infer the following properties of the CDF.
Fact: The CDF function

have the following properties:


(i)


(ii)

is differentiable over .
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 11

Definition 4: The probability density function (PDF)

of a random variable is the


derivative of the CDF function; i.e.,

.
We shall see later on that both

and

enable various studies related to the


experiment under consideration.
The differentiability of the CDF is a functional property due its continuity, and that does
not imply that the random variable itself (the function that maps events to Borel sets on the real
line) is continuous or discrete. The distinction between continuous and discrete random variables
is in the form of the Borel sets that are generated from the mapping. A discrete random variable
means that its domain may take only finite but discrete values; i.e., the Borel sets are discrete,
while a continuous random variable has a continuous domains; i.e., the Borel sets are continuous.
As a consequence of these characteristics, we may be able to define the probability of a point
event for discrete random variables; i.e.,

. Point events for continuous random variables are emptysets; i.e., their measures will be zero.
The implications for these characteristics and differentiability of

will be such that the


following distinctions will be made between discrete and continuous random variables:



(1)

For the integer-valued case, the set


will be integer valued and the upper limit of the summation in Eq. (1) would be
easy to determine. If we use the generalized function representation, we may express the PDF for
the discrete random variables in terms of the Dirac delta function; i.e.,

hence,

Therefore, a unified expression for the CDF may be written


for both continuous and discrete random variables in terms of an integral functional only; i.e.,


In practice, however, a clear distinction will be implicit in terms
of the definition of the experiment and the values that events may take. Indeed, on principle, we
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 12

may define a random variable to describe events that has both continuous and discrete domains.
We provide below some illustrative examples for the CDF and PDF for various random
variables.
Example 7: Consider a random variable with the CDF shown in Fig. 3 which has the following
functional form:




We note that the CDF in this example is not differentiable at The problem,
however, is at where the CDF has a jump (discontinuity) of size

We can address the


behavior of the CDF at the jump noting that:



Fig. 4. 3: The CDF of a mixed random variable.
Hence, functional analysis enables us to address the various scenarios that the CDF may have.
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 13

4.3.2 Properties of the CDF and the PDF of a random variable
As the CDF is a probability measure for the event then, the positivity
and upper bound of the CDF are evident. Likewise, as the name implies, the CDF is non-
decreasing and monotone, hence, the continuity from the right is evident. We can also infer the
properties of the CDF for disjoint events as well as independent events using the basic
definitions. We also note that the following identities follow straight from the definition of the
CDF. i)


The PDF, being a derivative of the CDF, possess the positivity and summability (integrability)
constraints. Writing the PDF for discrete random variable as:


The most important virtues of the CDF is their ability to provide a computational mechanism (or
algorithm) to evaluate the probability measure using the form of the probability mass of the
Borel sets corresponding to the events on the sigma algebra through summation (discrete
random variable) or integration (continuous random variable). Some examples will illustrate
these concepts.
Example 8: Consider a statistical experiment defined within the boundaries illustrated in Figure
1. Suppose the mass function of the Borel sets are such that they have an exponential form:




We can evaluate the probability measure of the event by the mass of its
equivalent Borel set



. In particular, we can evaluate the
probability of such events as:




F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 14

Over the long history of statistics, various functional forms for

resulted for specifics


practical experiments. Some of the most common PDFs are listed in Table 1.
Table 1: Common PDF functional forms.
Name Continuous PDF Discrete PDF
Uniform


Exponenti
al



;
discrete-valued
Gaussian


N/A
Cauchy


N/A
Laplace



N/A
Rayleigh


N/A
Maxwell


N/A
Gamma



N/A
Beta


N/A
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 15

Binomial N/A


n: independent trails

Poisson N/A



The parameters in the CDF forms in Table 1 represent the modes as well as resolution measures
of the functions and will be addressed later on.
It is important to remember the context of the definitions of the CDF and the PDF and the
brilliance of the statisticians, physicists and mathematicians who, over the years, developed these
tools that enable proper description and characterization of statistical experiments. The random
variable concept has been the catalyst for mapping the statistical experiment from a guessing
game into a functional representation that is governed by the rules of calculus.
4.3.3The Conditional Distribution
The CDF and the PDF may be defined for conditional events. For example, we may study the
behavior of event A defined such that:
with respect to another event B which may or may not be defined on the same sample space
We can consider the conditional CDF as measure of event { in same way we defined
the conditional probability before


We point out that } means that , and the event may or may
not be a subset of the same sample space . We maintain that the conditional CDF possesses all
the properties of the CDF as discussed before. In particular, it is straight forward to justify the
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 16

following properties:

, and

for


Indeed, the total probability theorem that we studied before, may be expressed as follows:
consider a partition

on a sample space with known apriori measures

Any event may defined in terms of the


partition such that:

, and the CDF for random variable will be defined as


follows:

.
In the same fashion we studied the probability density function (PDF) before, we can
define the conditional PDF as well; that is, for continuous random variables we have:

.
Example 9: Consider the two events

and


. We can show that:



By derivatives, we obtain the conditional PDF



Example 10:Consider the events

and


. We can show that:
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 17



The conditional PDF will be:




The conditional distribution is an important mechanism to measure the effects of a stimulus over
a certain behavior, or the degree of dependency of events over each others. Later on we will
generalize the concepts of random variable to describe experiments that can create indexed set of
random variables, where the index is defined on the set of real numbers or on a graph; we will
exploit the dependency of the random variables to perform clustering of classes into meaningful
description, and we will be able to study various control mechanisms of experiments that can
only be studied over a long period of observations.
4.3.4StatisticalExpectation
The CDF and the PDF in Table 1 are described in terms of parameters that quantify the basic
behavior of these functions, such as their mean value and the dispersion (variance) around that
mean value. Over the years of statistical analysis, statisticians have developed metrics that can be
computed from the CDF or the PDF which provide important clues about the behavior of a
statistical experiment. In fact, some of these metrics may provide an approximation or upper
bounds of the probabilities that a certain event may take without actual calculation of the CDF
and the PDF; thus a prediction may be possible for the behavior of events based on numerical
calculations. Among these metrics is the statistical expectation which will be formally defined
below.
Definition 5:Given a random variable with PDF

, itsexpect value is defined by


F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 18


{



The expectation as defined above for a single random variable is a numerical indication of how
symmetric the PDF is around the origin. Due to the positivity constraints of the PDF, a zero
expected value connotes symmetry, while a positive value indicates that the PDF is tilted
towards the positive (right hand side) of the origin, and a negative value indicates that the PDF is
tilted to left of the origin. Hence, in reality the quantity provides the mean value of the
PDF or its locus of symmetry on the -axis. Therefore, the mean value or the average of
the random variable From the strict mathematical sense, we can define the effect of a general
function of the random variable on the PDF. For example, we may define the expected value of a
function as follows:
Definition 6:Given a random variable with PDF

, suppose we define a mapping


; i.e., the domain and range of is similar to the expect value of is
defined by

{



In this case, the mean is a special case of The expectation of measures the
average of the function. Indeed, we shall study the fact that a function of a random variable is
also a measurable function, which defines a random variable as well. Hence, we may calculate
as above, using the PDF of random variable , or first calculate the PDF of
and then we calculate as follows:
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 19


{



The above result is very fascinating, even though we have not formally derived it, and indeed has
enormous practical implications in the statistical analysis, this due to the fact that calculating the
PDF is not a simple matter, even for simple transformations. Therefore, the PDF of a random
variable serves a great deal for functions of as well. We focus on few definitions which are
special cases of and provide some specific clues on the PDF of .
Definition 7:Given a random variable with PDF

, the variance of is defined as:



The variance measures the dispersion (width or concentration) of the PDF around the mean
value. It serves to estimate the domain of the values that the random variable may take. In
fact, an upper bound has been derived, through the Chebyshev inequality, which estimates the
upper bound of the probability of a normalized event.
Fact:The Chebyshev inequality [Papoulis]. Consider a random variable with PDF having a
mean value and variance

Define the normalized event We can


show that:

.
The proof the Chebyshev inequality follows directly from the definitions and the positivity of the
density function. Its importance is in providing a measure for the concentration of the values of
the random variable around its mean value.
Definition 8: Given a random variable with PDF

, the

moment of around the origin


is defined as:
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 20



The

central moment

is obtained in a similar fashion by the proper substitution in


the above equation. We now provide few examples to illustrate how the expectations may be
calculated for some of the PDFs in Table 1.
Example 11: Evaluate the mean and the variance of the Poisson distribution. This distribution is
that of a counting random variable which estimates the instances of occurrences of a particular
event, within a certain period, given its frequency of occurrences.

)
Similarly,

)
(

) (

)
On the other hand, we note that

Hence, it is immediate that


for the Poisson distribution.
Example 12: Evaluate the mean and the variance of the Cauchy distribution.


F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 21

The mean value is zero since

is an even symmetric function. The difficulty arrives


with the variance since the 2
nd
order expectation is infinite. Hence, the variance is
undefined. Yet, the parameter measures the relationship between the maximum height
and the span (width) of the PDF.
The knowledge of the PDF (or the CDF) enables computing the expectations. It is a
natural question to ask whether the knowledge of the expectations enable the
determination of the functional form of the PDF? Unfortunately, it is not possible to infer
the PDF from the expectations (moments) except in the case of the Gaussian PDF which is
explicitly defined in terms of its mean and variance. Yet, statisticians studied mechanisms
to generate the moments automatically through the so-called moment generating function,
which is defined below.
Definition 9: Given a random variable with PDF

, the moment generating function


of is defined as


The moments may be generated from as follows.


Example 13: Evaluate the moment generating function for the standard uniform PDF defined
over the interval .


It is evident that the mean value for example may be obtained from first derivative of
and taking the limit as approaches zero (note the need for using the l'Hpital's rule).

F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 22

One form of expectations, that mimics the moment generating function yet provides a great
functional ability to analyze, and indeed construct, the PDF of a random variable is known
as the characteristic function and is defined as follows:
Definition 10: Given a random variable with PDF

, the characteristic function

of
is defined as

is the conjugate of the Fourier transform, which is a unique mapping of the function

, thus a direct link is established between the PDF and the characteristic function,
which is very useful in a number of applications.
Example 14: Evaluate the characteristic function for the standard uniform PDF defined over
the interval .


4.3.5Functions of Random Variable
It is natural to ask how a transformation of a random variable (which is a function) would
be considered, in the context of the statistical experiment that the random variable was
introduced, in order to characterize in terms of the theory of calculus instead of
enumerations and guessing. The key issue is the measurability of functions of random
variables; i.e., whether the mapping is actually constrained by the original domain (of
course the range as well) of the original random variable?A random variable is a
mapping from the sigma algebra into the real line such that the event
, and we also insisted that the inverse mapping is an event as well, i.e.,


. Now let us define a function ()It is evident that the transformation
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 23

will generate events on that must be Borel sets. Likewise, the inverse mapping will generate
events that must be in and , by construction. Hence, the measurability of ()
is quite evident and inherent. Fig. 4 illustrates this concept, where the mappings of
onto the real line produce Borelsets

, which may be remapped back to


the event

.

Fig. 4. 4: Illustration of the measurability of functions of random variables.
Theorem [Piper]: If a random variable and is a Borel function, then
() is a random variable measurable by ; i.e., the algebra generated by
is a subset of the algebra generated by
As we done before with random variables, we characterize () by its
cumulative probability distribution function or by its probability density function.
Depending on the functional form of , we may obtain the CFD

in
terms of

. Table 2 lists few examples of the CDF and PDFs of functions of


random variables (Papoulis).
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 24

Table 2: The CDF and PDF of functions of random variables for various functional forms.
() CDF

PDF

()

( )

)
Monotone increasing

)
Monotone decreasing

)
Continuous and has
countable number of real
roots


Integrate


Derivations of the entries in Table 2 are straight forward and follow from the definitions of
the CDF, which should be obtained first, and then the derivative operator is applied to
obtain the PDF.
Functions of random variables is an important concept in mathematical statistics that
enables statistical inference about one random variable (or, in general, a random process
as will be studied later) in terms of another. Indeed, simulations of random variables from
different distributions may be obtained from simulating a random variable from a standard
uniform; i.e.,

such form has been studied and various random


number generators exist to generate samples from it. We will study simulations of random
variables after we briefly cover the case of random experiments that may be described by
two random variables; step towards generalizations to sequence of unstructured random
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 25

variables, which will be followed by random variables defined on a graph or have a
particular meaningful index. As our intention is to study random processes and random
fields that have been useful for biomedical image analysis, we will not dwell into the details
of random variables and statistical inferences beyond the basic background needed for our
subsequent coverage.
4.4 Two Random Variables
Events in random experiments may take various forms, and may be better understood in
terms of higher dimensional mapping, instead of one-dimensional. In this case the
measurability of the events on would be in terms of the mass of events (Borel
sets)on multidimensional spaces. Consider two random variables

, the
events

and

may be paired with respect to


the two parameters such that a new event


may be defined on the space

as shown in Fig. 5. We may


express the random variable with respect to

as
(

) or in a vector; i.e., (

). We will be interested mainly on


random variables defined on the same sample space, hence, for simplicity we will write

and (

)
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 26


Fig. 4. 5: Illustration of the concept of two-dimensional random vectors for two random variables on the same
sample space. For measurability, the sets

and are Borel sets.


In direct analogy to the single random variables case, we may determine the probability
mass distribution on the Borel sets defined on the plane

, by the assignment

) hence, when dealing with a pair of random variables

, we have three
induced probability measures:

The mass distribution on


represented by

consists of the unit mass distributed along

. Similarly,


represents a unit mass distributed along

, and

represents a unit mass distributed


along

. The measures

are related to

as follows:

)
Now if

; i.e., the entire real line, then

, so that

. Similarly, if

; i.e., the entire real line, then

, so that

. We formally define these quantities below.


F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 27

Definition 11:The probability measure

defined on the Borel sets in the plane


is called the joint probability measure induced by the joint mapping

. The probability mass distribution is called the joint distribution. The


probability measures

and

are called the


marginal probability measures induced by random variables

respectively. The
corresponding probability mass distributions are called the marginal probability
distributions.
Definition 12: The joint probability distribution function, or the joint cumulative distribution
function for random variables

, denoted by

is defined by

for all values

The special cases

and

are
called the marginal distribution functions for random variables


respectively.
We follow the same development for the single random variable case and define the probability
density function for two random variables.
Fact (Feller Vol. 1): Ifthe joint probability measure

induced by

is absolutely
continuous, a function

exists such that:

is called the probability density function for

.Conversely, given


and

that satisfy the previous properties, there exists two random variables

defined on some sample space with joint probability distribution and density
functions

and

respectively.
We may enlist the main properties of

following the same manner as we did with single


random variables; specifically, we note that:
i.


ii.


iii.


F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 28

iv.


Example 15: Consider two jointly continuous and positive random variables

with the
following joint density function:




Find


Solution: The required quantities may be obtained by integrating

over the specific


region, see Fig. 6.



Fig. 4. 6: Illustration of region of integration for the marginal density

.
Also,

.
The cumulative distribution function will be computed from the basic definition:
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 29

given that

leads to




We may address the issue of dependence of random variables here due to its importance.
We maintain from definition 2 and example 9, that events

are independent if their joint probabilities


distribute; that

.
Note that in the above example,

has a separable form, but the two random variables


are not independent due to the restrictions

The independence concept will come


handy in various applications, including simulations of random variables.
4.4.1 Statistical Expectation in Two Dimensions
Expectations in two dimensions follow in the same manner the one-dimensional case (see
definitions 5 and 6). In particular, the expected value of a function

of random
variables

is defined as follows:
Definition 13: Given a pair of random variables

, suppose we define a mapping


) ; i.e., the domain and range of

is similar to (

) the expect
value of

is defined by
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 30


{


where

is the probability density function for continuous random variables and

is
the probability mass function for discrete random variables.
Definition 14: Given a pair of random variables

, the covariance function is defined


as:
Cov(


where

are the marginal means as defined in Definition 5. That is, for any random
variable

, we have:



where

is the probability density function for continuous random variable

and

is the
probability mass function for discrete random variable

.
A special case for definition 14 that does not include the marginal means is called the auto correlation ;
i.e.,

.
We note that all the properties of the expectation operators in higher dimensions are similar to that in one-
dimension.
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 31

In two dimensions (multi-dimensions, in general), the random variables may possess various degrees of
relatedness; which may span the entire scale of fully related to completely unrelated or independent.
We define the relatedness in terms of correlation coefficient ; i.e.,
The Correlation coefficient

, where

is the standard deviation of random variable

.
If the covariance is zero, then the two random variables are said to be uncorrelated. If the two
random variables are independent, then their covariance becomes zero, and therefore, by
definition, the two random become uncorrelated. We note that the converse is not true, except
in the case of Gaussian random variables, which will be considered later.
4.4.2 Functions of two Random Variables
It is natural to study the mappings (transformations) of two random variables as we did with
single random variables. Here we consider the transformation

where is a
general function. We used the bold face letter to represent the functional form for which
may be a vector. A given realization (certain value or instant of will be represented by small
bold face letter The key issue is the measurability of which may be understood in a
similar fashion as we did in one-dimensional case; see Fig. 4. We maintain that under normal
conditions, is measurable and, therefore, we may define a complete functional description
for it in terms of the probability distribution function;

). We consider few
cases below.
Example 16: Consider two jointly continuous random variables

with probability density


function

Define a summation function

We maintain that is a
measurable random variable whose CDF may be calculated as follows.


and the density function

would be evaluated as follows:


F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 32


The inner differentiation may be evaluated using Leibnitz rule; i.e.,


Hence,



As a special case, if

are independent, then joint denisty becomes separable, and

will
have the following form:

,
which is the convolution integral. For this case, the CDF would be:

For illustartion, let us consider the simple case


where both

are independent and have uniform distribution. Let

It is straight forward to show that







Table 3 lists few examples for the distribution of functions of two random variables; all can be
derived from the basic defintions. These functions maps two random variables into a single
random variable; i.e., there is a reduction in dimensionality involved in the transformation. We
will consider the case where the functions of random varaibles are vectors next.



F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 33

Table 3: Some one-dimensional functions of two random variables and the correspodning CDF
and PDF.
CDF

PDF


4.4.3Two functions of two random varaibles:
As a step twards generlized representation of functions of random varaibles, let us consider
vector representations of the random varaibles

, i.e., (

) and a functional form


for that vector representation (

) will be quite
straight forward in the case where is a linear mapping of

) [

] (

), for some constants

If the transformation is inverible; i.e.,

exists. Hence, = . The


F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 34

Jacobian of the transformation is defined as: [

]. Then the joint probability density


function of

may be written in terms of the joint density

of random
varaibles

where |.| stands for the determinenat.


Example 17: Consider two jointly continuous random variables

with probability density


function

Consider the polar coordinates:

), (

) (


). We
may define the Jacobian of the polar transformation as follows:
[

] = [

] [


] Hence,

.
4.5 Simulation of Random Variables
In this section we discuss how to simulate a one and two-dimensional random variables of
general distributions, starting from standard uniform distribution. This will bring about the
linkage between a distribution or density function and deterministic functional forms, which
represents instants or realizations of the random variables. Generating such instants may be used
in a reverse fashion to estimate the density function using histograms. Simulation is an art and a
well-studied subject in statistical design; stochastic systems are studied based on simulations. We
only scratch the surface here to bring about some feeling of the outputs of random experiments in
a form of realizations of the random variables that map the sample space into Borel sets on the
real line. The fundamental step in the simulation is the simple observations that since the
distribution function is monotone and non-decreasing, an inverse may be possible; hence
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 35

mapping from uniform distribution to any other form is possible. We illustrate this as follows:
Let are two random variables, not necessarily on the same sample space or have the
same form of distribution. Suppose the two random variables are related to each through the
monotone transformation: The distribution function of with respect to the
distribution of is as follows:

) =

, if is uniformly distributed. Hence, we may take


Therefore, if we generate a realization from , a realization from is obtained by inverse
mapping.
4.6 Summary
This chapter introduced the basic concepts of statistical modeling which serves as building block
for the material in the book. We defined the concept of random variable from an operational
point view and illustrated its significance in describing statistical experiments. Characteristics of
a single random have been studied, including the probability distribution and density function
and the concept of statistical expectation. Extensions to function of single random variables and
joint distribution of two random variables were considered. The chapter concluded with basics of
simulation of one and two-dimensional random variables.
4.7 Computer Laboratory
A random variable of can be easily converted to any other distribution such as normal
distribution. The methodology of generating random numbers has several forms. However, the
one used in great majority is the linear congruential generators (LCG), whose recursive formula
is given by (A4.1), where

is a sequence of integer values ranging from 0 to , starting
from a seed point

(A4.1)
Where m is the modulus, is the multiplier, is the increment, and all are non-negative values.
In order to obtain numbers in the interval the generated integer sequence should be divided
by to obtain

. There are many variations of LCGs, the form used in this project is
the multiplicative LCG were the addition of is not needed, so the recursive formula of the
generator can be given by

(A4.2)
We may convert the generated uniform streams to follow a Gaussian distribution of zero mean
and unit variance using the Box-Muller approach described as follows;
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 36

If

and

are independent random variates (realization from random variable) from ,


then

and

are independent standard normal deviates (realization from ) obtained as


follows:
2 1 1
2 cos ln 2 u u Z t = (A4.3a)
2 1 2
2 sin ln 2 u u Z t = (A4.3b)
The approach is to run the random generator to get a very large number of samples of a uniform
random variable, then extract even-ordered terms to be

and the odd-ordered terms to be

, in
order to guarantee their independence.
Task 1:
(a) Generate three streams (1D, 2D and 3D) of random numbers with 100,000 samples; you
can use the Matlab function for random number generation.
(b) Visualize the generated samples, Hint: you can use a scatter plot.
(c) Compute the histogram of the three streams, and then normalize them to become a
probability density function (pdf).
(d) Visualize the pdfs of the three streams. Are the samples uniformly distributed? Do the
pdfs represent a standard uniform distribution? Comment.

Task 2: use the streams generated in task 1 to do the following,
(a) Convert the generated streams to a standard normal distribution
(b) Visualize the data samples. Are they normally distributed with mean 0 and unit variance?
(c) Compute the normalized histogram (pdfs) of the data samples from (a) and visualize the
distribution, comment.

Task 3: use the streams generated in task 2 to do the following;
(a) Convert the 1D normal variate to a normal distribution with mean -2 and standard
deviation 3.
(b) Convert the 2D normal variate to a normal distribution with mean vector
(

=
2
1
M and
Covariance matrix
(

=
9 4
4 4
.
(c) Convert the 3D normal variate to a normal distribution with mean vector
5
5
6
M
(
(
=
(
(

and
Covariance matrix
5 2 1
2 5 0
1 0 4
(
(
=
(
(

.
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 37

(d) For (a), (b) and (c) estimate the mean and covariance matrix and compare the estimated
values with the ground truth ones, comment.
4.8 Exercises
1. Given a probability space Let be the events such that
Prove/disprove the following:
(a)
(b)
(c) .

2. Given a sample space


(a) Construct the smallest algebra

that contains the two sets


and


(b) Construct the smallest algebra

that contains the two sets


and


(c) What members of

that and

measurable?


3. Let be a nonnegative finitely additive set function on the field . Show that
(


4. Let be arbitrary subsets of a set O. Describe explicitly the smallest field
containing . How many sets are there in ? (i.e., give an upper bound thats
attainable.
5. Derive the entries in Table 2.
6. Derive the entries in Table 3.
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 38

d) 4.9 References
1. Robert Ash, Real Analysis and Probability, Academic Press, New York, 1972.
2. Athanasios Papoulis, Probability, Random Variables and Stochastic Processes, 3
rd

Edition, McGraw-Hill, New York, 1991.
3. Paul E. Pfeiffer, Concepts of probability Theory, 2
nd
Edition, Dover, New York, 1978.
4. Wilbur Davenport, Random Processes, McGraw-Hill, New York, 1970
5. William Feller, An Introduction to Probability Theory and Its Applications, Vol.1,
Wiley, New York, 1968.
6. Reuven Rubinstein, Simulation and the Monte Carlo Method, Wiley, New York, 1981.

You might also like