This action might not be possible to undo. Are you sure you want to continue?
Department of Economics
Course: Econ 506, Fall 2012 August 28, 2012
Instructor: Anil K. Bera (abera@illinois.edu), 225E DKH
Class Hours: 1:30  3:10 TuTh
Class Room: 215 DKH
Oﬃce Hours: 12:00  1:00 TuTh
TA: YuHsien Kao (kao21@illinois.edu)
Prologue
April 1242
Baghdad, Iraq
Baghdad took no note of the arrival of Shams (Sun) of Tabriz, a wondering Suﬁ
Saint, from Samarkand to the city’s famous Dervish Lodge. Shams told the master of
the lodge, Baba Zaman, that he wanted to share his accumulated knowledge to the
most competent student. Why? Because, Shams said, “Knowledge is like brackish
water at the bottom of an old vase unless it ﬂows somewhere.” Baba Zaman got serious
and asked a bizarre question:“ You say you are ready to deliver all your knowledge
to another person. You want to hold the Truth in your palm as if it were a precious
pearl and oﬀer it to someone special. That is no small task for a human being. Are
not you asking too much! What are you willing to pay in return?”
Raising an eyebrow, Shams of Tabriz said ﬁrmly, “ I am willing to give my head.”
This is an introductory course in mathematical statistics, and its purpose is to
prepare you for the econometrics course, Econ 507 (Spring 2013). To carry out a
good applied econometrics study, it is necessary to master the econometric theory.
Econometric theory requires a good knowledge of statistical theory which in turn has
its foundation on probability theory. Finally, one cannot study probability without
set theory. Therefore, we will begin at the beginning. We will start with the set
theory, and discuss probability and the basic structure for statistics. Then we will
slowly move into diﬀerent probability distributions, asymptotic theory, estimation
and hypothesis testing.
After doing all these, the whole course will be just like a candle.“ It will provide
us much valuable light. But let us not forget that a candle will help us to go from one
place to another in the dark. If we, however, forget where we are headed and instead
concentrate on the candle, what good will it be?”
As you have guessed the course materials will be highly theoretical. No statistical
background will be assumed. However, I will take it for granted that you already
know diﬀerential and integral calculus and linear algebra. Good Luck!
Course Outline:
1. Introduction
(a) Why statistics?
(b) Statistical data analysis: Life by numbers
2. Probability Theory
(a) Algebra of sets
(b) Random variable
1
(c) Distribution function of a random variable
(d) Probability mass and density functions
(e) Conditional probability distribution
(f) Bayes theorem and its applications
(g) More on conditional probability distribution
(h) Mathematical expectation
(i) Bivariate moments
(j) Generating functions
(k) Distribution of a function of a random variable
3. Univariate Discrete and Continuous Distributions
(a) The basic distribution–hypergeometric
(b) Binomial distribution (as a limit of hypergeometric)
(c) Poisson distribution (as a limit of binomial)
(d) Normal distribution
(e) Properties of normal distribution
(f) Distributions derived from normal (χ
2
, t and F)
(g) Distributions of sample mean and variance
4. Asymptotic Theory
(a) Law of large numbers
(b) Central limit theorems
5. Estimation
(a) Properties of an estimator
(b) Cram´erRao inequality
(c) Suﬃciency and minimal suﬃciency
(d) Minimum variance unbiased estimator and RaoBlackwell theorem
(e) Maximum likelihood estimation
(f) Nonparametric method and density estimation
6. Hypothesis Testing
(a) Notion of statistical hypothesis testing
(b) Type I and II errors
(c) Uniformly most powerful test and NeymanPearson lemma
(d) Likelihood ratio (LR) test
(e) Examples on hypothesis testing
(f) Rao’s score or the Lagrange multiplier (LM) test
(g) Wald (W) test
2
Recommended Text:
A First Course in Probability and Statistics by B.L.S. Prakasa Rao, 2008, World
Scientiﬁc.
However, I will not follow this book closely. For your convenience detailed notes
(in four volumes) on the whole course will be made available in the course web
page. As you will notice, the lecture notes, given the subject matter, are very
dry and mechanical. We will try to make things more lively by analyzing some
interesting data (some even depicting your lives) sets and contemporary real
world problems.
YuHsien Kao, TA for this course will meet with the class on Fridays, 1:30 
2:50pm, 215 DKH. Her oﬃce hours will be: 11:00am  12:30pm Mondays.
Course Webpage: Please check Compass regularly for Announcements/ Updates
on Homeworks, Exams etc.
Assessment: There will be two closed book examinations. You will also receive four
homework assignments. The grading of the course will be based on:
Homework 20%
First Exam (around midsemester on a Th) 40%
Second Exam(on the last day of the class) 40%
Epilogue
In late October of 1244 in Konya, Turkey, Shams found the student he was looking
for: Jalaluddin Rumi, already a famous Islamic scholar in Turkey. Under the tutelage
of Shams, Rumi became one of the most revered poets in the world, as Rumi said,
“ I was raw. I was cooked. I was burned.”
March 1248
Konya, Turkey
Rumi’s son Aladdin hired a killer who did not require much convincing.
It was a windy night, unusually chilly for this time of the year. A few nocturnal
animals hoofed and howled from afar. The killer was waiting. Shams of Tabriz came
out of the house holding an oil lamp in his hand and walked in the direction of the
killer and stopped only a few steps away from the bush where the killer was hiding.
“It is a lovely night, isn’t it?” Shams asked.
Did he know the killer was there? Soon six others joined the killer. The seven
of them knocked Shams to the ground, and the killer pulled his dagger out of his
belt......
Together they lifted Shams’ body which was strangely light, and dumped him into
a well. Gasping loudly for air, each of them took a step back and waited to hear the
sound of Shams’ body hitting the water.
It never came.
Taken from: Elif Shafak (2010), The Forty Rules of Love, Penguin Books.
3
CONTENTS
1. Introduction
(a) Structure of the course.
2. Probability Theory
(a) Algebra of sets
(b) Random variable
(c) Distribution function of a random variable.
(d) Probability mass and density functions .
(e) Conditional probability distribution
(f) Bayes theorem and its applications . .
(g) More on conditional probability distribution .
(h) Mathematical expectation
(i) Bivariate moments
(j) Generating functions
(k) Distribution of a function of a random variable
3. Univariate Discrete and Continuous Distributions
(a) The basic distributionhypergeometric ....
(b) Binomial distribution (as a limit of hypergeometric)
(c) Poisson distribution (as a limit of binomial)
(d) Normal distribution ...... .
(e) Properties of normal distribution
(f) Distributions derived from normal (X
2
, t and F)
1
2
6
10
13
19
22
25
35
57
66
74
82
85
96
98
. 100
• 105
Introduction to Statistics for
Econometricians
by
Anil K. Bera
1.1 Introduction.
If you look around, you will notice the world is full of uncertainity. With
all the enormous amount of past information, we never can tell about the exact
weather condition of tomorrow. The same is true for many economic variable, such
as stock price, exchange rate, inflation, unemployment, interest rate, mortgage
rate, etc. [If you know the exact future price of major stock you can make a
million! In that case, of course, you won't be taking this course.] Then, what is
the role of statistics in this uncertain world? The basic foundation of statistics
is based on the idea that there is an underlying principle or common rule in
the midst of all the chaos and irregularities. Statistics is a science to formulate
these common rules in a systematic way. Econometric:; is that field of science
which deals with application of statistics to economics. Statistics is applicable
to all branches of science and humanities. You might have heard of fields, like
sociometry, psychometry, cliometrics and biometrics. These are application of
statistics to sociology, psychology, history and biology, respectively. Application of
statistics in economics is somewhat controversial, since unlike physical or biological
sciences, in economics we can't conduct purely random experiments. In most
cases what we have historical data on certain economic variables. For all practical
purposes, we can view these data as result of some random experiment and then
use the statistical tools to analyze the data. For example, regarding stock price
movement, based on the available data we can try to find the underlying probability
distribution. This distribution will depend on some unknown parameters which
can be estimated using the data. We can also test some hypothesis regarding the
parameters or we can even test whether the (assumed) probability distribution is
valid or not.
Just like any other science, in statistics there are many approaches, Classical
and Bayesian, parametric and nonparametric etc. These are not always substitutes
of each other and in many cases, they can be successfully used as compliments of
each other. However) in this course, we will concentrate on the classical parametric
approach.
1
2
2.1 Basic Set Theory.
The objective of econometrics is "advancement of economic theory in its rela
tion to statistics and mathematics." This course is concerned with the "statistics"
part, and the foundation of statistics is in probability theory. Again, "probability"
is defined for events, and these events can be described as sets or subsets. To see
the link
Econometrics + Statistics + Probability + Event + Set.
Let us start with a definition of a "set".
Definition 2.1.1: A set is any (well defined) collection of objects.
Example 2.1.1:
i) C = {I, 2, 3, ... } set of all positive integers.
ii) D = {2,4,6, ... } set of all positive even integers.
iii) F = {Students attending Econ 472}
iv) G = {Students attending Econ 402}
An object in a set is an element, e.g., 1 is an element of the set "C". We will
denote this as 1 E C, " E " means "belongs to." Note that the set C contains more
elements than the set D. We can say that D is a "subset" of C and will denote
this as DeC. Formally,
Definition 2.1.2: Set B is a subset of set A, denoted by B C A if x E B implies
x E A.
You know that with real numbers we can do lot of operations like addition
(+), substraction (), multiplication (x) etc. Similar operations could also be done
with sets, e.g., we can "add" (sort of) two sets, substract one set from the another
etc. Two very important operations are "union" and "intersection" of sets.
Definition 2.1.3 : Union of two sets A and B is C, defined by C = AU B, if
C = {xix E A and/or x E B.}
2
3
In other words, by union of two sets we mean collection of elements which
belong to at least one of the two sets.
Example 2.1.2: In Example 2.1.1, CUD = C. If we define another set E =
{l, 3,5, ...}, set of all positive odd integers, then C = DUE.
The operation can be defined for more than two sets. Suppose we have n sets
A
I
,}!2,' .. , An. Then Al U A2 . " UAn denoted by Ui=I Ai is defined as
n
U Ai = {xix E at least one Ai, i = 1,2, ... ,n}
i=I
In a similar fashion, we can define union of infinite ( but countable) number
00
of sets A
1
,A
2
,A
3
, ••• as U Ai = Al UA2 UA3 •.••
i=I
Example 2.1.3: Let Ai = {i}, i.e. Al {I}, A2 = {2}, ... etc.
00
Then U Ai C of Example 2.1.1.
i=I
00
Or let Ai = [i, iJ, an interval in the real line R, then UAi n.
i=I
The next concept we discuss is "intersection". Intersection of two sets A and
B, denoted by A n B, consists of all the common elements of A and B. Formally,
Definition 2.1.4 : Intersection of two sets A and B is C, denoted by C An B,
if C = {xix E A and x E B}.
Example 2.1.4 : In Example 2.1.1, enD D and F n G={students attending
both Econ 472 and 402}
As in the case of "union", we can also define the operation "n" for more than two
sets. For example,
n
nAi = Al n Az ... n An {xix E Ai, for all i = 1,2"", n}
i=l
00
1 2 3
nAi = Al n A2 n A3 ••• = {xix E Ai, for all t , , ,... }
i=I
It is easy to represent the above two concepts diagramatically called Venn
diagram [see Figure 2.1.1]
3
4
A
AOB
Figure 2.1.1
B
Be (l...
Figure 2.1.2
 3/:·
5
Continuing with Example 2.1.1, suppose those students taking Econ 472 have
already attended Econ 402, i.e. there is no student in Econ 472 class who is taking
Econ 402 now. Then if we talk about F n G, the set will be empty. We will call
such a set a null set and will denote it by 4>. By definition for any set A, AU4> = A,
An 4> = 4>.
Example 2.1.5: If! Example 2.1.1 and 2.1.2, D n E = 4>.
Earlier we noted, in Example 2.1.1, DeC. Now remove the elements of D
from the set C, what we are left with is the set E in Example 2.1.2. We write this
as E = C  D, i.e., the difference between sets is the elements of one set after
removing the elements those belong to the other set. Formally,
Definition 2.1.5: The difference between two sets A and B, denoted by A  B,
is defined as C = A  B = {xix E A and x r:f. B}. Note, " r:f. " means "does not
belong to". In Venn diagram A  B can be represented as in Figure 2.1.2.
Now it is clear that a set consists of elements satisfying certain properties.
We can imagine a big set which consists of elements with very little restriction.
For example, in Example 2.1.1, regarding sets C and D, we can think of n, set
of all real numbers. We will vaugely called such a big (reference) set, a .space
and will denote as S. l\ote here, C c S, DeS. So let S = n. Define
Q = {set of all rational numbers}, then S  Q = {set of all irrational numbers}.
Another way to think about S  Q is as "compliment" of Q in S, which denoted
as QCIS.
Definition 2.1.6. Compliment of a set A with respect to a Space S, denoted by
ACIS= {x E Six r:f. A}.
In most cases, the reference set S will be obvious from the context and we
will omit S from the notation and will write AcIS as simply Ac.
Example 2.1.6: In Examples 2.1.1 and 2.1.2 DC IC = E, EC IC D.
See the Venn diagrams in Figure 2.1.3.:
Consider the identity (A UBr = AC n BC. Without the diagram, we can
easily prove this. The trick is if we want to show that a set C is equal to another
set D, show the following:
4
6
Figure 2.1.3
 yA
7
If for every x, x E C then xED :::} C C D
If for every x,x E D then x E C:::} Dee
combine these two and obtain C D.
Let us prove the above identity. Let x E (A UBt so x ¢:. (A. U B), i.e.,
x ¢:. A and x ¢:. B. In other words x E AC and x E B
C
i.e., x E AC nBc.
Therefore, (A. U B) C C AC n BC. N ~ x t assume x E AC n BC, and reversing the
above arguments, we see that x E (A U Bt. So we have AC n B
C
c (A U Bt.
Hence (A U Bt = AC nBc.
Now try to prove the following identity
These identities are known as De Morgan's law. Try to prove the following gener
alizations:
Let us now link up the set theory with the concepts of "event" and "probability".
Suppose we throw one coin twice. The coin has two sides, head (H) and tail (T).
What are the possible outcomes?
Both tail (T T)
Both head (H H)
Tail head (T H)
Head tail (H T).
Collect these together in a set D={ (T T), (H H), (T H), (H Tn, this is the
collection of all possible outcomes. We may be interested in the following special
outcomes:
Al {outcomes with first head} = {(HH),(HTn.
A2 {outcomes with at least one head} = {(HH),(HT),(T,H)}.
A3 {outcomes with no tail} = {(HH)}.
5
8
AI, A
2
, As ... are all events, and note A
I
,A
21
As C n. We can think
of a collection of subsets of n and a particular event will be an element of that
collection. Under this framework we can define the probabilities of different events.
So far we have considered sets which are collection of single element, e.g., we
had a set C = {I, 2, 3, ... }. We can think uf <t set whose elements are also sets,
i.e., a set of sets. We can call this a collection of a class of sets. By giving a
different structure to this class of sets, we can define many concepts, such as ring
and field. For our future purpose, all we need is the concept of a  field (sigma
field). This will be denoted by A (script A). A 0"  field is nothing but a collection
of sets AI, A
2
, As, ... satisfying the following properties
(i) AI,Az, ... E A ===?
00
UAi EA.
i=l
(ii) If A E A then AC E A.
In other words A is closed under the formation of countable unions and under
complimentation. From the above two conditions, it is clear that for A to be a
O"field,the null set and the space n must belong to A.
Example 2.1.7: n {1,2,3,4}, A afield on n c ~ n be written as
A = {<p, (1, 2), (3, 4),(1, 2, 3, 4)}
Example 2.1.8:
n = R (real line)
A = {countable union of intervals like( a, b]}
A is called Borel field and members of A are called Borel sets in R.
2.2 Random Variable.
As you can guess, the word "random" is associated with some sort of uncer
tainty. If we toss a coin, we know the possibilities: head (H) or tail (T); but we are
uncertain about exactly which one will appear. Therefore, "tossing a coin" can be
regarded as a random experiment where the possibilities are known but not the
6
9
exact outcome. The probability theory, the collection of all possible outcomes is
known as Jample Jpace.
Example 2.2.1:
(i) Toss a coin. The sample space is 51 = {H, T}.
(ii) Toss two coins or one coin twice, the sample space
51 = {(HH),(TT),(HT), (TH)}
(iii) Throw a die,
51 {(.), (:), (: .), (::), (:: .), (:::)}
Instead of assigning symbols, we can give these outcomes, some numbers (real
numbers). For example, for the above Example (i), we can define
x 0 if the outcome is T
= 1 if the outcome is H
For Example (iii) above, X can take values 1, 2, 3, 4, 5, 6. X defined in such a
way is called a random variable. Once a random variable is defined, we can talk
about the probability distribution of the random variable.
Let us first formally define "Probability". For Example (i), we have the sample
space 51 {H, T}. The O"field defined on 51 is A = {cP, 51, (H), (T)}. Elements
of A are called the events. "Probability" is nothing but assigning real numbers
(satisfying some conditions) to each of these events.
Definition 2.2.1: Probability denoted as P is a function from A to [0,1].
P : A t [O,lJ
satisfying the following axioms
(i) pen) = 1
(ii) If AI, A
2
, A
3
, . •• E A are disjoint (i.e., Ai n Aj = cP for all i f:: j) then
00
7
10
Example 2.2.2:
n == {H,T}
A = {t,h, n, (H), (T)}
P(t,h) = 0, pen) 1, P(H) = f' peT) = t
Earlier we indicated that a random variable can bedefined by assigning real
number to the elements of n. Now define a afield on the real line n and denote
it by B. Formally, we can define a random variable X as
Definition 2.2.2: A random variable X is a function X : n + R such that for
all B E B, XI (B) EA.
Note that here X I(B) = {w E nIX(w) E B). For a diagramatic representa
tion of random variable X as a function, see Figure 2.2.1
In other words X (.) is a measurable function from the sample space to the
real line. "Measurability" is defined by requiring that the inverse image of X is
an element of the afield, i.e., an event. Recall that, probability is defined only
for events. By requiring that X is measurable, in a sense, we are assuring its
probability distribution.
Example 2.2.3: Toss a coin twice, then the sample space n and a afield A can
be defined as
n = {(HH), (TT), (HT), (TH)}
A {t,h, n, (HH), (TT), (HT), (TH), ((HH)(TT)),
((TT)(HT)), ((HT)(TH)), ((HH)(TH)), ((HH)(HT))
((TT)(TH)), ((HH)(TT)(HT)), ((TT)(HT)(TH)),
((HH)(TT)(TH)),((HH)(HT)(TH))}
Define X = number of heads. Then X takes 3 values
x=o
=1
2
8
11
]
XI
Figure 2.2.1
 ~ l 
12
First assign the following probabilities
1 1 1 1
P(HH) = 4' P(TT) P(HT) = 4'
P(TH) = .
4' 4
The triplet (n, A, P) is called a probability space and peA) is the probability
of the event A.
CorreSponding to (n, A, P), there exists another probability space (R, B, pX),
where pX is defined as
PX(B) = P[X
1
(B)] for B E B.
In the above example, take B = 1, then
PX(l) = P[X
1
(1)]
= P[(HT), (TH)]
P[( HT) U (TH)]
P(HT) +P(TH) (why?)
1 1 1
= 4 + 4; = 2'
Similarly, we can show that
1
pX(0) = 1 and pX(2)
4 4
pX(.) is called the probability measure induced by X. To summarize, we have
defined two functions
X : n + R
pX : B + [0, 1].
where B is a afield defined on R [see Example 2.1.8]
For the above example, the two functions can be described as
w X(w)
pX
(TT)
(HT),(TH)
(HH)
0
1
2
1/4
1/2
1/4
9
13
The last two columns describe the probability distribution of the random
variable X. Sometimes we will simply denote it by P(X).
x P(X)
0 1/4
1 1/2
2 1/4
Most of the time probability distributions (of discrete random variables) are
presented this way. From the above discussion, it is clear that each such probability
distribution originates from an n, sample space of a random experiment.
Definition 2.2.3: Listing of the values along with the corresponding probabilities
is called the probability distribution of a random variable.
Note: Strictly speaking, this definition applies to "discrete" random variable only.
Later, we will define, "discrete" and"continuous" random variables.
2.3 Distribution Function of a Random Variable.
Sometimes it is also called cumulative probability distribution and is denoted
by F(·). Let us denote by "x" the value( s) X can take, then F(·) is simply defined
as
F( x) Probabili ty of the event X:::; x
Pr(X :::; x).
Note: \Ve will use "PrO" to denote probability of an event without defining the
set explicitly, and PO or pX(.), when the set is explicitly stated in the argument.
Also note that the probability spaces for P and pX are respectively, (n, A, P) and
(n,B,p
X
).
Let us now provide a formal definition of the distribution function. Let
W(x) = {w E nIX(w) :::; x}.
Since X is measurable, W(x) E A. In the probability space (n,B,p
X
), we can
write the probability of W(x) as
P(W(x)) = pX[(_oo, xl].
10
14
This is well defined since (00, xl E B. This probability is called the distribution
function of X, i.e.,
F(x) = Pr(X ~ x) P (W(x» = pX [( 00, xl].
For our example:
w px = Pr(X = x) F(x) = Pr(X ~ x)
(TT) o 1/4 1/4
(H T) (T H) 1 1/2 1/2 +1/4 = 3/4
(HH) 2 1/4 3/4 + 1/4 = 1
Or simply
X F(X)
0
1
2
1/4
3/4
1
If we plot, F(x) will look like as in Figure 2.3.1. Note that it is a step function.
Also notice, the discontinuties at x = 0, 1 and 2.
2.3.1 Properties of the Distribution Function.
(i) 0 ~ F(x) ~ 1. Since F( x) is nothing but a probability, the result follows from
the definition of probability.
(ii) F(x) is a nondecreasing function of x Le., if Xl > Xz, then F(xt) 2: F(xz).
Proof:
F(Xl) = Pr(X ~ xd pX [( 00, Xl]] pX(Ad (say)
F(xz) = Pr(X ~ xz) = pX [( 00, xzn pX(Az) (say)
Since Az C AI, we have
PX(Ad 2: PX(A2) (why 7)
I.e., F(XI) 2: F(xz)
11
15
Fex)
F
. 0
i
19ure 2.3.1
1/ A
16
(iii) F(oo) = 0 where F(oo) = limn+ooF(n).
Proof: Define the event
An = {w E DIX(w) ::; n}
Note that P(An) = Pr(X ::; n) = pX [( 00, nl] = F( n).
Now limn+oo An = ¢>
F(oo) = lim F(n) = lim P(An)
n+oo n+oo
= P( lim An)' (why 7)
n+oo
= P(¢» = O. (why 7)
Note: The first (why 7) follows from the "continuity" property of P(.). It says:
if {An} is a monotone sequence of events, then P(limn+oo An) = limn+oo P(A
n
).
(Try to prove this; see Workout ExamplesI, Question 6).
(iv) F(oo) = 1, where F(oo) = limn+ooF(n).
The proof is similar to (iii), Define
An = {w E DIX(w) ::; n}
F( 00) = lim P(An) = P( lim An) = P(D) = l.
n+oo n+oo
(v) For all x, F( x) is continuous to the right or right continuous. [What does it
really mean is that F(x + 0) = F(x) where F(x + 0) = lime.j..o F(x + c:).]
Proof: Define the set
1
An = {w E DIX (w) ::; X + )
n
1
F(x + ) = P(An)
n
lim F(x + ~ ) = lim pX [( 00,
xl] =F(x).
n+oo n n+oo
12
17
1
F(x +0) = limF(x +c) = lim F(x + ).
e:.j.O ntoo n
Therefore, F(x +0) = F(x).
We can show that F(x) may not be continuous to the left. i.e., F(xO) ¥ F(x)
where F(x 0) lime:to F(x + c:). To prove this, define
1
Bn = {w E nIX(w) :::; x  }
n
F(x  0) = lim F(x  ~ ) lim P(Bn)
ntoo n ntoo
P( lim Bn) = pew E QIX(w) < x) = Pr(X < x).
ntoo
However,
F(x) Pr(X:::; x) = Pr(X < x) +P(X = x) (why?)
Hence,
F(x)  F(x  0) Pr(X = x)
Therefore, whenever Pr(X x) > 0, there will be a jump in F(x) at X = x, or
discontinuity at X x. In the Figure 2.3.1, we noted the discontinuity at x = 0,1,
and 2. Also note that
Pr(X = 0) = ~ > 0
Pr(X = 1) = ! > 0
Pre X = 2) = ~ > 0
If Pr(X = x) = 0 for all x then F(x) will be continuous since, in that case
F(x) F(x + 0) = F(x  0) for all x.
2.4 Probability Mass and Density Functions.
Once we have defined the distribution function, we can talk about the "Prob
ability mass function" (for discrete variables) and "Probability density function"
(for continuous variables).
13
18
Let n contain finite (or count ably infinite) number of elements. Here by
countably infinite we mean onetoone correspondence with the set of integers, N =
{I, 2, 3, ..... }. To see an example, consider an experiment of tossing a coin until we
get a head. Then n = {H, T H, TTH, ..... }. If we define X as the number of trials
to get a head then X = 1,2,3, ...... Denote that as n = {WI, W2, W3, .•• }. Therefore,
n contains discrete points. For any event A E A, we define the probability
peA) L: P(Wi).
"',EA
For a random variable X 1 constructed on n will also take discrete values. Let us
now denote the range of X as X and the associated probability space as (X, B, pX).
Therefore, we will have a discrete random variable X with discrete probability
distribution pX. Given that
pX(X) 1.
the total mass will be distributed on a discrete number of points. Therefore,
sometimes the probability distribution of X, pX is called probability mass func
tion(pmf).
Example 2.4.1:
n = {(HH), (TT), (HT), (TH)}
X = # heads
Then
1.e.,
X
p,x
o 1/4
1
1/2
2 1/4
14
19
pX(X) = L
3
Pr(X = Xi)
i=l
Example 2.4.2 :
(i) Toss a coin n times and let X = # heads. Then X takes (n+ 1) values, namely,
X = 0,1,2, ... , n. The probability distribution of X with the corresponding
points in the sample space can be written as
pX
w X
TTTT ... TTT o (1/2t
HTTT ... TTT
2
2
2
(1/2)n )
THTT ... TTT (1/2)n ( Add = n(1/2)n
TTTT ... TTH (1/2)nj
HHTT .. , TTT
THHT ... TTT
TTTT ... THH
THHH ... HHH (n 1)
HTHH ... HHH (n  1)
Add = n(1/2)n
HHHH ... HHT
(n
(1/2)n
(1/2)n
U/2)n
(1/2)n
HHHH ... HHH n (1/2)n
So here Pr(X 1) = n O)n ,Pr(X = 2) = ot and so on. Later we
will derive this probability distribution simply as a special case of binomial
15
20
distribution. Check here that if we add pX for all the values of X, it is equal
to one.
(ii) Let us now consider our earlier example of tossing a coin until we get a head,
and define X = # heads. Then X will take countably infinite number of
values with the following probability distribution.
pX
w x
H 1 1/2
TH 2 (1/2)2
TTH 3 (1/2)3
It is easy to check that here the total probability is equal to t+(t) 2 + ( t )
3
+
... = 1.
(iii) Now suppose X takes n values, (Xl,X2, ... ,X
n
) = {Xi, i = 1,2, ... ,n}.
Let Pr(X=xi)=Pi, 2 1,2, ... ,n
The distribution function for this probabiity mass function is
F(x) = Pr(X $ x) = I: Pi·
Xi
Any set of pi s can serve our purpose. All we need is to satisfy the following two
conditions:
(i) Pi 2: 0 Vi.
(ii) 2:
i
Pi = 1.
As we noted before when the distribution is discrete there will be jumps in F(x),
and therefore, it will not be continuous and hence not differentiable. Now suppose,
F(x) is continuous and, differentiable excpept a few points and
f(x) = dF(x)
dx
16
21
where f( x) is continuous function (except at a few points). We will then call X
a continuous random variable with probability density function (p. d. f.) f(x).
Therefore, the relation between f(x) and F(x) can also be written as
F(x) = [Zoo f(t)dt.
Recall F(00) 1, therefore
[: f(x)dx = 1.
Also we noted earlier that F( x) is nondecreasing, therefore we should have f( x) :2:
o "Ix. We define f(x) to be a pdf of a continuous random variable X if the
following two conditions are satisfied
(i) f(x):2: 0 "Ix E X
(ii) [: f(x)dx Ix f(x)dx 1.
Note: Here X denotes the range of X.
For a continuous variable X,
Pr(a::; X :::; b) = Pr[X :::; b] Pr[X ::; a]
= F(b)  F(a)
= [boo f(x)dx  [a f(x)dx
oo
= lb f(x)dx.
Note that for the discrete case, this probability can be written as
Pr(a :::; X :::; b) = 'I: Pr(X Xi)
a ~ X i ~ b
When F is continuous Pr(X = a) = F(a)  F(a) = O. Therefore, for
continuous case, Pr(a ::; X :::; b) = Pr(a < X :::; b) Pr(a:::; X < b) = Pr(a <
X < b). [see Figure 2.4.1]
17
22
Figure 2.4.1
17A
23
Example 2.4.3:
0, for x < °
Let F(x) = x, for x E [O,IJ
{
1, for x > 1.
as given in Figure 2.4.2.
Here F( x) is ".differentiable," therefore we can construct f( x) as
0, for x < °
f(x)::::; 1, for x E [0, IJ
{
0, for x > 1.
Simply we can write this as [see Figure 2.4.3]
f(x) = {I, for x E [0,1]
0, elsewhere.
Here X is a continuous random variable; however, note the discontinuities of f( x)
at 0 and 1. This distribution is known as uniform distribution [since for x E [0, 1],
f(x) is constant].
So far we have talked variables which are either discrete or continuous. A
random variable, however, could be of mixed type. Let
X = expenditure on cars
If we assume X is continuous, then Pr(X = 0) = O. But there will be many
individuals who do not have any expenditure on cars. Suppose half of the people
do not have any expenditure on cars, during a certain period then it is reasonable
to put Pr(X = 0) = 0.5. Suppose we assume F(x) = 0.5 +0.5(1  e
X
) for x > 0,
and F(x) = 0 for x < 0. The corresponding probability function is [see Figure
2.4.4J
Pr(X < 0) = 0
Pr(X =0) 0.5
f(x) = 0.5e
x
for x > 0
18
24
Figure 2.4.2
,..
' ()
i
F19ure 2.4.3
16 /1 
25
j ~ )
'x... >
Figure 2.4.4
26
Note that here f(x) ;:::: 0 and
i: f(x)dx 0.5 +0.51= eXdx = 1.0.
Hence, this is a well defined probability distribution.
2.5 Conditional Probability Distribution.
Let us consider two events A, B E A. We are interested in evaluating proba
bility of A only for those cases when B also occurs. We will denote this probability
as P(AIB) and will assume PCB) > O. We can treat B as the (total) sample space.
First note that
P(AIB) peA n BIB).
This is true because when fl is the sample space
peA) = P(Alfl) peA n fllfl).
Here B is our sample space. Also note that P(BIB) = 1. Now,
peA IB) = peA nBIB) = peA n Blfl)
P(AIB) (why?)
n B P(BIB) P(Blfl)
p(An B)
PCB) .
We will write this conditional probability simply as P(AIB)
P ~ t m ) ' This is
called conditional probability of (event) A given (event) B.
Note: (Above why?) Use old definition of probability
#cases for An B
peA n BIB)
#cases for B
(#cases for An B/#cases in fl) peA n Blfl)

(#cases forB/#cases in fl) P(Blfl)
Example 2.5.1 : Let
fl {(HH),(TT),(HT),(TH)}
and A (HT), B = {(HT), (TH)}, An B = (HT).
19
27
1 1 1
Therefore,
P(A) = 4' P(B) = 2'
p(AnB)=4
Let us first intiutively find the conditional probabilities. For (AlB), we know
that either (HT) or (TH) has appeared, and we want to find the probability that
(HT) has occurred. Since all the element of n has equal probability, P(AIB) =
Similarly P(BIA) = 1 since (HT) has already occurred. Now let us use the formula
to get the conditional probabilities.
P(AIB) = P(A n B) = t = # P(A)
P(B) t 2
P(A n B) 1
P(BIA) = P(A) = t = 1 # P(B) (Interpret this result)
Here the probability of A (or B) changes after it has been given that B (or A) has
appeared. In such a case we say that the two events A and B are dependent.
Example 2.5.2: Let us continue with the same sample space
n = {(HH), (TT), (HT), (TH)}
but now assume A = {(TT),(HT)}, . B = {(HT),(TH)}
We have An B = {(HT)}
Therefore,
P(A) P(B) =
P(A n B) =
4
p(AnB) 1 1
P(AIB) = P(B) = t = "2 = P(A) (Interpret this result)
P(BIA) = p(AnB) = i = = P(B)
P(A) 2
Therefore, we have P(AIB)= =P(A)
1.e., P(AB)=P(A).P(B).
In this case, we say that A and B are independent.
Result 2.5.1: Conditional probability satisfies the axioms of probability.
20
28
Proof:
C) P(AIB) = P(AB) > 0
Z PCB)
(
;;) p(nIB) pen n B) PCB) 1
.. = PCB) = PCB) = .
(iii) Let AI) A21 A
3
) ••• be a sequence of disjoint then
P (QI (Ai n B))

PCB)
_ P(Ai nB)
PCB)
= f P(Ai nB)
i=1 PCB)
<Xl
= LP(AiIB)
i=l
Note that (Ai n BYs are disjoint, since (Ai n B) n (Aj n B)=Ai n Aj n B = ¢ for
ii=j.
Note: Conditional distributions are vary useful in many practical applications.
Such as,
(i) Forecasting: Give data on T periods, XI, X
2
, .•. ,XT if we want to forecast
the value in (T + 1)th period, that could be obtained from the conditional
disribution P(XT+IIX
I
, X 2 ) • •• ,XT).
(ii) Duration dependence: We can consider the conditional probability of a getting
a job given the duration of unemployment.
(iii) Wage differential: Wage distributions could be different for unionized and
nonunionized workers.
21
UNIVERSITY OF ILLINOIS Department of Economics Course: Econ 506, Fall 2012 Instructor: Anil K. Bera (abera@illinois.edu), 225E DKH Class Hours: 1:30  3:10 TuTh Class Room: 215 DKH Oﬃce Hours: 12:00  1:00 TuTh TA: YuHsien Kao (kao21@illinois.edu) Prologue April 1242 Baghdad, Iraq Baghdad took no note of the arrival of Shams (Sun) of Tabriz, a wondering Suﬁ Saint, from Samarkand to the city’s famous Dervish Lodge. Shams told the master of the lodge, Baba Zaman, that he wanted to share his accumulated knowledge to the most competent student. Why? Because, Shams said, “Knowledge is like brackish water at the bottom of an old vase unless it ﬂows somewhere.” Baba Zaman got serious and asked a bizarre question:“ You say you are ready to deliver all your knowledge to another person. You want to hold the Truth in your palm as if it were a precious pearl and oﬀer it to someone special. That is no small task for a human being. Are not you asking too much! What are you willing to pay in return?” Raising an eyebrow, Shams of Tabriz said ﬁrmly, “ I am willing to give my head.” This is an introductory course in mathematical statistics, and its purpose is to prepare you for the econometrics course, Econ 507 (Spring 2013). To carry out a good applied econometrics study, it is necessary to master the econometric theory. Econometric theory requires a good knowledge of statistical theory which in turn has its foundation on probability theory. Finally, one cannot study probability without set theory. Therefore, we will begin at the beginning. We will start with the set theory, and discuss probability and the basic structure for statistics. Then we will slowly move into diﬀerent probability distributions, asymptotic theory, estimation and hypothesis testing. After doing all these, the whole course will be just like a candle.“ It will provide us much valuable light. But let us not forget that a candle will help us to go from one place to another in the dark. If we, however, forget where we are headed and instead concentrate on the candle, what good will it be?” As you have guessed the course materials will be highly theoretical. No statistical background will be assumed. However, I will take it for granted that you already know diﬀerential and integral calculus and linear algebra. Good Luck! Course Outline: 1. Introduction (a) Why statistics? (b) Statistical data analysis: Life by numbers 2. Probability Theory (a) Algebra of sets (b) Random variable 1 August 28, 2012
Univariate Discrete and Continuous Distributions (a) (b) (c) (d) (e) (f) (g) The basic distribution–hypergeometric Binomial distribution (as a limit of hypergeometric) Poisson distribution (as a limit of binomial) Normal distribution Properties of normal distribution Distributions derived from normal (χ2 . t and F ) Distributions of sample mean and variance 4.(c) (d) (e) (f) (g) (h) (i) (j) (k) Distribution function of a random variable Probability mass and density functions Conditional probability distribution Bayes theorem and its applications More on conditional probability distribution Mathematical expectation Bivariate moments Generating functions Distribution of a function of a random variable 3. Hypothesis Testing (a) (b) (c) (d) (e) (f) (g) Notion of statistical hypothesis testing Type I and II errors Uniformly most powerful test and NeymanPearson lemma Likelihood ratio (LR) test Examples on hypothesis testing Rao’s score or the Lagrange multiplier (LM) test Wald (W) test 2 . Estimation (a) (b) (c) (d) (e) (f) Properties of an estimator Cram´rRao inequality e Suﬃciency and minimal suﬃciency Minimum variance unbiased estimator and RaoBlackwell theorem Maximum likelihood estimation Nonparametric method and density estimation 6. Asymptotic Theory (a) Law of large numbers (b) Central limit theorems 5.
Did he know the killer was there? Soon six others joined the killer. The grading of the course will be based on: Homework 20% First Exam (around midsemester on a Th) 40% Second Exam(on the last day of the class) 40% Epilogue In late October of 1244 in Konya.Recommended Text: A First Course in Probability and Statistics by B. Gasping loudly for air. isn’t it?” Shams asked. 1:30 2:50pm. and dumped him into a well. “It is a lovely night. unusually chilly for this time of the year.L. Taken from: Elif Shafak (2010). given the subject matter. I will not follow this book closely. TA for this course will meet with the class on Fridays. As you will notice. Shams found the student he was looking for: Jalaluddin Rumi. each of them took a step back and waited to hear the sound of Shams’ body hitting the water. as Rumi said. World Scientiﬁc. YuHsien Kao.. It never came. Assessment: There will be two closed book examinations. Together they lifted Shams’ body which was strangely light. It was a windy night. the lecture notes. Shams of Tabriz came out of the house holding an oil lamp in his hand and walked in the direction of the killer and stopped only a few steps away from the bush where the killer was hiding. and the killer pulled his dagger out of his belt. 2008. I was cooked.. A few nocturnal animals hoofed and howled from afar. 215 DKH. For your convenience detailed notes (in four volumes) on the whole course will be made available in the course webpage.. We will try to make things more lively by analyzing some interesting data (some even depicting your lives) sets and contemporary real world problems. already a famous Islamic scholar in Turkey.. The seven of them knocked Shams to the ground. 3 .12:30pm Mondays. Turkey.” March 1248 Konya. “ I was raw. Prakasa Rao. Exams etc. Her oﬃce hours will be: 11:00am . Course Webpage: Please check Compass regularly for Announcements/ Updates on Homeworks.S. are very dry and mechanical. Under the tutelage of Shams.. The killer was waiting. Penguin Books. Turkey Rumi’s son Aladdin hired a killer who did not require much convincing. However. You will also receive four homework assignments. The Forty Rules of Love. Rumi became one of the most revered poets in the world. I was burned.
1 2. . . Introduction (a) Structure of the course. Univariate Discrete and Continuous Distributions (a) (b) (c) The basic distributionhypergeometric . Probability mass and density functions . . . Mathematical expectation Bivariate moments Generating functions Distribution of a function of a random variable 6 10 13 19 22 25 35 57 (j) (k) 66 74 3. . Conditional probability distribution Bayes theorem and its applications . Probability Theory (a) Algebra of sets 2 (b) Random variable (c) (d) (e) (f) (g) (h) (i) Distribution function of a random variable. . . .CONTENTS 1. More on conditional probability distribution . t and F) • 105 . Binomial distribution (as a limit of hypergeometric) Poisson distribution (as a limit of binomial) 82 85 96 98 . 100 (d) Normal distribution . (e) (f) Properties of normal distribution Distributions derived from normal (X 2 . . .
Bera .Introduction to Statistics for Econometricians by Anil K.
Statistics is applicable to all branches of science and humanities. what is the role of statistics in this uncertain world? The basic foundation of statistics is based on the idea that there is an underlying principle or common rule in the midst of all the chaos and irregularities. In most cases what we have historical data on certain economic variables. The same is true for many economic variable. Just like any other science. history and biology. This distribution will depend on some unknown parameters which can be estimated using the data. based on the available data we can try to find the underlying probability distribution. For example. Application of statistics in economics is somewhat controversial. cliometrics and biometrics. inflation. Classical and Bayesian. mortgage rate. regarding stock price movement. Econometric:. such as. Statistics is a science to formulate these common rules in a systematic way.1 Introduction. [If you know the exact future price of major stock. For all practical purposes. in economics we can't conduct purely random experiments. we never can tell about the exact weather condition of tomorrow. parametric and nonparametric etc. we can view these data as result of some random experiment and then use the statistical tools to analyze the data. of course. like sociometry.stock price.1. We can also test some hypothesis regarding the parameters or we can even test whether the (assumed) probability distribution is valid or not.you can make a million! In that case. With all the enormous amount of past information. If you look around. they can be successfully used as compliments of each other. interest rate.] Then. These are not always substitutes of each other and in many cases. You might have heard of fields. we will concentrate on the classical parametric approach. 1 . psychology. unemployment. exchange rate. However) in this course. you will notice the world is full of uncertainity. since unlike physical or biological sciences. is that field of science which deals with application of statistics to economics. etc. you won't be taking this course. respectively. psychometry. in statistics there are many approaches. These are application of statistics to sociology.
if C = {xix E A and/or x E B. Similar operations could also be done with sets. substract one set from the another etc. Definition 2.6.2 2. We will denote this as 1 E C. . Formally.1. we can "add" (sort of) two sets. } set of all positive even integers. denoted by B C A if x E B implies x E A.2: Set B is a subset of set A. and these events can be described as sets or subsets. e. Again. multiplication (x) etc.." Note that the set C contains more elements than the set D. and the foundation of statistics is in probability theory. e.} 2 .. substraction ()." This course is concerned with the "statistics" part. 3. } set of all positive integers. ii) D = {2. defined by C = AU B.. 1 is an element of the set "C".1. We can say that D is a "subset" of C and will denote this as DeC. "probability" is defined for events.g. iii) F = { Students attending Econ 472} iv) G = {Students attending Econ 402} An object in a set is an element.3 : Union of two sets A and B is C.1. Definition 2. " E " means "belongs to. You know that with real numbers we can do lot of operations like addition (+). To see the link Econometrics + Statistics + Probability + Event + Set.1 Basic Set Theory.1: i) C = {I. Example 2. 2. Two very important operations are "union" and "intersection" of sets.1: A set is any (well defined) collection of objects... Let us start with a definition of a "set".. The objective of econometrics is "advancement of economic theory in its rela tion to statistics and mathematics.g. Definition 2.4.1. .
for all 1. iJ. denoted by A n B. i=l 00 n Ai = Al n Az .}.1.e..4 : In Example 2. CUD = C. } It is easy to represent the above two concepts diagramatically called Venn diagram [see Figure 2. .. consists of all the common elements of A and B. 3 ... n} t i=I n Ai = Al n A2 n A {xix E Ai. D and F n G={students attending As in the case of "union". U Ai n. Intersection of two sets A and B.4 : Intersection of two sets A and B is C. Al Then {I}. we can define union of infinite ( but countable) number of sets A 1 . Suppose we have n sets A I .1.. For example. .1.3 In other words. If we define another set E {l.1. set of all positive odd integers.}!2. n An n 3 ••• {xix = E Ai.. for all i = 1. then Or let Ai = [i.•• Example 2. Definition 2. A2 = {2}. Example 2. 2.' . i=I 00 The next concept we discuss is "intersection".1. = The operation can be defined for more than two sets.. then C = DUE. an interval in the real line R.2. An..1.2: In Example 2. . i. Formally. U i=I 00 Ai C of Example 2.n} In a similar fashion. . .3: Let Ai = {i}.1.1] 3 . .A3 .. denoted by C if C = {xix E A and x E B}. ••• as U Ai = i=I 00 Al U A2 U A 3 •. we can also define the operation "n" for more than two sets.1. " U An denoted by Ui=I Ai is defined as U Ai i=I n = {xix E at least one Ai.1.1. 3.1.A2. Example 2.. etc. enD both Econ 472 and 402} An B. i = 1.5. Then Al U A2 .2"".. by union of two sets we mean collection of elements which belong to at least one of the two sets.
. Figure 2.1.1 B Be (l.1.2 .3/:· ..4 A AOB Figure 2.
Q is as "compliment" of Q in S. C c S. Then if we talk about F n G. l\ote here.B. we can think of n. An 4> = 4>. n E = 4>.B = {xix E A and x r:f.1. the set will be empty.1.B can be represented as in Figure 2.1. DeC.D. By definition for any set A.3.Q = {set of all irrational numbers}.1. then S . i. regarding sets C and D.1 and 2. " means "does not belong to". denoted by A . Formally.. in Example 2. the reference set S will be obvious from the context and we will omit S from the notation and will write A cIS as simply A c. is defined as C = A . So let S = n.1. In Venn diagram A . AU4> = A. Now remove the elements of D from the set C. which denoted as QCIS.2.2. Define Q = {set of all rational numbers}.1.1. Example 2. there is no student in Econ 472 class who is taking Econ 402 now. In most cases.e. " r:f. show the following: 4 Br .1.2.: Consider the identity (A U = A C n BC. Definition 2.1. We will vaugely called such a big (reference) set.1. B}. Note.5: If! Example 2.1. Compliment of a set A with respect to a Space S.5 Continuing with Example 2.6. Definition 2.e.1. We can imagine a big set which consists of elements with very little restriction. The trick is if we want to show that a set C is equal to another set D.1. EC IC D. We write this as E = C . Example 2. set of all real numbers. suppose those students taking Econ 472 have already attended Econ 402. i. We will call such a set a null set and will denote it by 4>.space and will denote as S.6: In Examples 2. A}.1 and 2. denoted by ACIS= {x E Six r:f. D Earlier we noted.1. See the Venn diagrams in Figure 2.5: The difference between two sets A and B. Another way to think about S . a . in Example 2.2 DC IC = E. we can easily prove this. the difference between sets is the elements of one set after removing the elements those belong to the other set. DeS.1.1. what we are left with is the set E in Example 2. Now it is clear that a set consists of elements satisfying certain properties. For example. Without the diagram.1.
3 .yA .1.6 Figure 2.
(HTn. Let us prove the above identity. The coin has two sides..e. (A.7 If for every x. So we have AC n B C c (A U Bt. U B). N~xt assume x E A C n BC. (A. 5 . A and x ¢:.e. (H Tn. (H H).H)}. this is the collection of all possible outcomes. {outcomes with at least one head} = {(HH).(T. U B) C C A C n BC.. Let x E (A U Bt so x ¢:. Try to prove the following gener alizations: Let us now link up the set theory with the concepts of "event" and "probability". Now try to prove the following identity These identities are known as De Morgan's law. (T H). and reversing the above arguments. Therefore. x ¢:.x E D then x E C:::} Dee combine these two and obtain C D. x E AC nBc. We may be interested in the following special outcomes: Al A2 A3 {outcomes with first head} = {(HH).(HT). Collect these together in a set D={ (T T). head (H) and tail (T). {outcomes with no tail} = {(H H)}. i. In other words x E AC and x E B C i. B. Suppose we throw one coin twice. Hence (A U Bt = AC nBc. x E C then xED :::} C C D If for every x. we see that x E (A U Bt. What are the possible outcomes? Both tail (T T) Both head (H H) Tail head (T H) Head tail (H T).
8: n= R (real line) A = {countable union of intervals like( a.. .. . (3.7: n {1. 2..8 We can think of a collection of subsets of n and a particular event will be an element of that collection. we know the possibilities: head (H) or tail (T). all we need is the concept of a . As. "tossing a coin" can be regarded as a random experiment where the possibilities are known but not the 6 . but we are uncertain about exactly which one will appear. (i) AI. i=l (ii) If A E A then AC E A. If we toss a coin. we had a set C = {I. 3. A afield on n c~n be written as A = {<p.3. For our future purpose. This will be denoted by A (script A). 2. b]} A is called Borel field and members of A are called Borel sets in R. e. a set of sets. .. (1. the word "random" is associated with some sort of uncer tainty. E A ===? U Ai EA.2 Random Variable. Example 2. satisfying the following properties 00 AI. From the above two conditions. We can call this a collection of a class of sets. 2. We can think uf <t set whose elements are also sets.. Therefore. 4)} Example 2.2.Az. By giving a different structure to this class of sets. 2). A 2 . In other words A is closed under the formation of countable unions and under complimentation.field is nothing but a collection of sets AI. 3.field (sigma field).. we can define many concepts. are all events. As . A 2 . A 0" .(1.g.1.1. Under this framework we can define the probabilities of different events. As you can guess.. i. such as ring and field. and note A I .4}. 4). it is clear that for A to be a O"field...e. So far we have considered sets which are collection of single element.the null set and the space n must belong to A. }..A21 As C n.
the collection of all possible outcomes is known as Jample Jpace. (:). (T)}.). T}. X can take values 1. The sample space is 51 = {H. P :A satisfying the following axioms t [O.(TT).e. Once a random variable is defined. (:::)} Instead of assigning symbols. for the above Example (i). Definition 2.(HT). •• E A are disjoint (i.1: (i) Toss a coin. Let us first formally define "Probability". The probability theory. (ii) Toss two coins or one coin twice. (::). 5. X defined in such a way is called a random variable.lJ (i) pen) = 1 00 (ii) If AI. some numbers (real numbers).). 4.1]. 51. we can talk about the probability distribution of the random variable. (:: . A 2 . T}. 3. Ai n Aj = cP for all i f:: j) then 7 . we can define x 0 if the outcome is T = 1 if the outcome is H For Example (iii) above. Example 2. 51 {(. (: . the sample space 51 = {(HH). 6.2. The O"field defined on 51 is A = {cP. "Probability" is nothing but assigning real numbers (satisfying some conditions) to each of these events.. 2.9 exact outcome. For example. (H). (TH)} (iii) Throw a die.1: Probability denoted as P is a function from A to [0. we can give these outcomes. Elements of A are called the events. For Example (i).).2. . we have the sample space 51 {H. A 3 .
(TT).h.2. (HT). probability is defined only for events.3: Toss a coin twice. ((H H)(HT)) ((TT)(TH)).h) = 0. XI (B) EA. (TH)} {t. (TT). n. (HT). (T)} P(t.) is a measurable function from the sample space to the real line. (TH). "Measurability" is defined by requiring that the inverse image of X is an element of the afield. ((H H)(TT)(HT)). ((HH)(TT)(TH)). Formally. ((HT)(TH)).2.h. Recall that.2: n == {H. we can define a random variable X as Definition 2. For a diagramatic representa tion of random variable X as a function.1 In other words X (.2. ((H H)(TT)). see Figure 2. Then X takes 3 values x=o =1 2 8 .10 Example 2. then the sample space be defined as n and a afield A can n= A {(H H).. we are assuring its probability distribution. P(H) = f' peT) = t Earlier we indicated that a random variable can bedefined by assigning real number to the elements of n. Now define a afield on the real line n and denote it by B.2: A random variable X is a function X : all B E B. ((H H)(TH)). an event. pen) 1. n. (HH).((HH)(HT)(TH))} Define X = number of heads. ((TT)(HT)).e.2. By requiring that X is measurable. ((TT)(HT)(T H)).T} A = {t. Example 2. (H). i. n + R such that for Note that here X I(B) = {w E nIX(w) E B). in a sense.
11 ] XI Figure 2.1  ~l .2.
A. where B is a afield defined on R [see Example 2. take B = 1.) is called the probability measure induced by X. 4 The triplet (n. we can show that pX (0) = 1 4 and pX (2) 1 4 pX (. (TH)] P[( HT) U (TH)] P(HT) 1 1 + P(TH) 1 (why?) = 4 + 4. pX). where pX is defined as PX(B) = P[X.8] For the above example. To summarize. In the above example. A. B.12 First assign the following probabilities P(H H) = 4' 1 P(TT) 1 4' P(HT) = 4' 1 1 P(TH) = . CorreSponding to (n. 1].1.1 (B)] for B E B.(TH) (HH) X(w) 0 pX 1 2 1/4 1/2 1/4 9 . then PX(l) = P[X.1 (1)] = P[(HT). the two functions can be described as w (TT) (HT). there exists another probability space (R. P) is called a probability space and peA) is the probability of the event A. P). we have defined two functions X : n + R pX : B + [0. = 2' Similarly.
we will define. 2. Definition 2.p X ).). Sometimes we will simply denote it by P(X). it is clear that each such probability distribution originates from an n. Note: Strictly speaking.B. Sometimes it is also called cumulative probability distribution and is denoted by F(·). Let W(x) = {w E nIX(w) :::.13 The last two columns describe the probability distribution of the random variable X. Also note that the probability spaces for P and pX are respectively. "discrete" and" continuous" random variables. 10 xl]. Later. this definition applies to "discrete" random variable only. Let us denote by "x" the value( s) X can take. x Note: \Ve will use "PrO" to denote probability of an event without defining the set explicitly. and PO or pX (. . ).B. From the above discussion. W(x) E A. x}. then F(·) is simply defined as F( x) Probabili ty of the event Pr(X :::. A. In the probability space (n. (n. Let us now provide a formal definition of the distribution function.p X write the probability of W(x) as we can P(W(x)) = pX[(_oo. when the set is explicitly stated in the argument. x).3: Listing of the values along with the corresponding probabilities is called the probability distribution of a random variable. sample space of a random experiment. P) and (n.2. Since X is measurable. X:::. x 0 1 2 P(X) 1/4 1/2 1/4 Most of the time probability distributions (of discrete random variables) are presented this way.3 Distribution Function of a Random Variable.
e.. Since F( x) is nothing but a probability. Also notice. then F(xt) 2: F(xz). (ii) F(x) is a nondecreasing function of x Le. px = Pr(X = x) F(x) = Pr(X ~ x) (TT) o 1 2 1/4 1/2 1/4 1/4 1/2 3/4 (H T) (T H) (HH) Or simply + 1/4 = 3/4 + 1/4 = 1 X 0 1 2 F(X) 1/4 3/4 1 If we plot. xzn Since Az C AI. (i) 0 ~ F( x) ~ 1. i. the discontinuties at x = 0. Xl]] pX (Ad pX (Az) (say) (say) = Pr(X ~ xz) = pX [( 00.. the result follows from the definition of probability.1. xl E B. if Proof: Xl > Xz. (why 7) F(XI) 2: F(xz) 11 . F( x) will look like as in Figure 2. we have PX(Ad 2: PX(A2) I.e. 1 and 2. 2. xl]..3.1 Properties of the Distribution Function. This probability is called the distribution F(x) For our example: w = Pr(X ~ x) P (W(x» = pX [( 00. F(Xl) = Pr(X ~ xd F(xz) pX [( 00.14 This is well defined since (00.3. function of X. Note that it is a step function.
15 Fex) . F19ure 2.1 0 i 1/ A .3.
. Define An = {w E DIX(w) ::. n) = pX [( 00. n+oo n+oo (v) For all x.j. F( x) is continuous to the right or right continuous.16 (iii) F(oo) = 0 where F(oo) = limn+ooF(n). nl] = F( n). It says: if {An} is a monotone sequence of events. . n+oo xl] 12 =F(x).] An = {w E D IX (w) ::. Now limn+oo An = ¢> F(oo) = n+oo F(n) = n+oo P(An) lim lim = P( lim An)' (why 7) n+oo = P(¢» = O. (Try to prove this. X + ) 1 n F(x + ) = P(An) lim F(x n+oo 1 n + ~) n = lim pX [( 00. where F(oo) = limn+ooF(n). Proof: Define the event An = {w E DIX(w) ::. n} Note that P(An) = Pr(X ::.). see Workout ExamplesI. (iv) F(oo) = 1. Question 6). The proof is similar to (iii). [What does it really mean is that F(x Proof: Define the set + 0) = F(x) where F(x + 0) = lime. n} F( 00) = lim P(An) = P( lim An) = P(D) = l.o F(x + c:). then P(limn+oo An) = limn+oo P(A n ). (why 7) Note: The first (why 7) follows from the "continuity" property of P(.
n + 0) = F(x).} n 1 F(x .1. To prove this.4 Probability Mass and Density Functions.0) = lim F(x ntoo ~) n ntoo lim P(Bn) P( lim Bn) = pew E QIX(w) < x) = Pr(X < x). define ¥ F(x) Bn = {w E nIX(w) :::. 13 .0) for all x.. Pr(X:::. F(x + 0) = limF(x e:. we noted the discontinuity at x = 0.O + c) = ntoo lim F(x 1 + ).e. we can talk about the "Prob ability mass function" (for discrete variables) and "Probability density function" (for continuous variables). x . F(x) Hence. We can show that F(x) may not be continuous to the left.j. i. Also note that Pr(X = 0) = ~ > 0 Pr(X = 1) = ! >0 Pre X = 2) = ~ > 0 If Pr(X F(x) x) = 0 for all x then F(x) will be continuous since. and 2. In the Figure 2.3.0) Pr(X = x) Therefore. there will be a jump in F(x) at X = x.F(x .1. whenever Pr(X x) > 0. or discontinuity at X x. in that case F(x + 0) = F(x .17 F(x Therefore. = 2. Once we have defined the distribution function. x) = Pr(X < x) + P(X = x) (why?) F(x) . ntoo However. F(xO) where F(x 0) lime:to F(x + c:).
W2. B. To see an example. TT H.EA For a random variable X 1 constructed on n will also take discrete values..1: n= X = {(HH)...2...... .e. n contains discrete points. the total mass will be distributed on a discrete number of points. (TT). (HT). "'. sometimes the probability distribution of X. consider an experiment of tossing a coin until we get a head. Therefore. Therefore. .. Example 2.3. W3. we define the probability peA) L: P(Wi). (TH)} # heads Then 1. X p. If we define X as the number of trials to get a head then X = 1. Then n = {H. Given that pX(X) 1. Here by countably infinite we mean onetoone correspondence with the set of integers.x o 1 2 1/4 1/2 1/4 14 . Let us now denote the range of X as X and the associated probability space as (X. T H. }. Denote that as n = {WI. 2... we will have a discrete random variable X with discrete probability distribution pX. Therefore..•• }. }. 3.. . pX is called probability mass func tion(pmf). . N = {I. For any event A E A. pX)..18 Let n contain finite (or count ably infinite) number of elements.4.
X = 0. .. TTT HTTT .... HHH (n 1) (n . THH o (1/2)nj 2 2 2 U/2)n (1/2)n THHH ..1) (1/2)n (1/2)n Add = n(1/2)n (n n (1/2)n So here Pr(X 1) = n O)n ... .. TTH HHTT .. TTT THHT . namely..Pr(X = 2) = and so on.. Later we will derive this probability distribution simply as a special case of binomial 15 ot .1.19 pX (X) = L Pr(X = Xi) i=l 3 Example 2. TTT TTTT . ... HHT HHHH .2. HHH HTHH ... Then X takes (n+ 1) values... TTT THTT .2 : (i) Toss a coin n times and let X = # heads.. n.... The probability distribution of X with the corresponding points in the sample space can be written as w X pX (1/2t (1/2)n ) (1/2)n ( Add = n(1/2)n TTTT . TTT TTTT .. HHH HHHH .4..
. .X n ) Let = {Xi. differentiable excpept a few points and f(x) = dF(x) dx 16 . w x 1 pX H TH TTH 2 3 1/2 (1/2)2 (1/2)3 It is easy to check that here the total probability is equal to .. (ii) Let us now consider our earlier example of tossing a coin until we get a head. (ii) 2: i Pi = 1.2. 2 1.n The distribution function for this probabiity mass function is F(x) = Pr(X $ x) = I: Pi· Xi Any set of pi s can serve our purpose. and define X = # heads. it will not be continuous and hence not differentiable. = 1. As we noted before when the distribution is discrete there will be jumps in F(x). Pr(X=xi)=Pi. t + (t) 2 + (t )3 + (iii) Now suppose X takes n values. Check here that if we add pX for all the values of X. . . it is equal to one. (Xl. . All we need is to satisfy the following two conditions: (i) Pi 2: 0 Vi. and therefore. Then X will take countably infinite number of values with the following probability distribution... Now suppose.20 distribution. . F( x) is continuous and.. i = 1..n}. ..X2.2..
b) X < b). therefore [ : f(x)dx = 1. Therefore. Note: Here X denotes the range of X. b) = 'I: a~Xi~b Pr(X Xi) When F is continuous Pr(X = a) = F(a) . X :::. For a continuous variable X. X :::.1] 17 . We will then call X a continuous random variable with probability density function (p. Also we noted earlier that F( x) is nondecreasing. a] = F(b) . X < b) = Pr(a < continuous case. the relation between f(x) and F(x) can also be written as F(x) Recall = [Zoo f(t)dt.) f(x).[aoo f(x)dx = lb f(x)dx. therefore we should have f( x) :2: o "Ix. X :::. for Pr(a:::. [see Figure 2. F(00 ) 1. We define f(x) to be a pdf of a continuous random variable X if the following two conditions are satisfied (i) (ii) f(x):2: 0 "Ix E X [ : f(x)dx Ix f(x)dx 1. Therefore. b]. b) = Pr(a < X :::. Pr(a::. f.F(a) = [boo f(x)dx . this probability can be written as Pr(a :::.Pr[X ::. Note that for the discrete case. b) = Pr[X :::.F(a) = O.4. d.21 where f( x) is continuous function (except at a few points). Pr(a ::.
4.1 17A .22 Figure 2.
Here X is a continuous random variable.4. ° as given in Figure 2.2.4. Suppose half of the people do not have any expenditure on cars.during a certain period then it is reasonable to put Pr(X = 0) = 0.1] elsewhere. Let X = expenditure on cars If we assume X is continuous. however. 0. ° f(x)::::. Simply we can write this as [see Figure 2. IJ for x > 1.5 for Pr(X = 0) f(x) = 0. for x < for x E [O. for x E [0.3] f(x) = {I.5. But there will be many individuals who do not have any expenditure on cars. and F(x) = 0 for x < 0.5(1 . for x E [0.4.X ) for x > 0. f(x) is constant]. could be of mixed type.4J Pr(X < 0) =0 0. The corresponding probability function is [see Figure 2. { 0.23 Example 2. for x < 1. A random variable. Suppose we assume F(x) = 0.5e.3: 0. Let F(x) = { x. 1. Here F( x) is ". then Pr(X = 0) = O. This distribution is known as uniform distribution [since for x E [0. So far we have talked variables which are either discrete or continuous.4. note the discontinuities of f( x ) at 0 and 1.IJ for x > 1." therefore we can construct f( x) as 0.differentiable.x 18 x >0 . however.5 + 0. 1].e.
4...3 i . 16 /1  .4.24 Figure 2.2 ' () F 19ure 2.
.25 j~) 'x. Figure 2..4.4 > .
5 Conditional Probability Distribution. 2.(TH)} (HT). First note that P(AIB) peA n BIB). We can treat B as the (total) sample space. We will write this conditional probability simply as P(AIB) called conditional probability of (event) A given (event) B.51= eXdx = 1. We will denote this probability as P(AIB) and will assume PCB) > O.0.5 + 0. We are interested in evaluating proba bility of A only for those cases when B also occurs. B = {(HT). (TH)}. This is true because when fl is the sample space peA) = P(Alfl) peA n fllfl). this is a well defined probability distribution. Note: (Above why?) Use old definition of probability peA P~tm)' This is n BIB) #cases for An B #cases for B (#cases for An B/#cases in fl) (#cases forB/#cases in fl)  peA n Blfl) P(Blfl) Example 2.26 Note that here f(x) .5. Here B is our sample space. Also note that P(BIB) = 1. 19 . Let us consider two events A. Now. An B = (HT). B E A. Hence. P(AIB) peA nB IB) = peA n BIB) = peA n Blfl) P(BIB) P(Blfl) (why?) p(An B) PCB) .:::: 0 and i: f(x)dx 0.(TT).(HT).1 : Let fl and A {(HH).
P(A n B) = ~ 4 P(AIB) = p(AnB) P(B) P(BIA) = t = "2 = P(A) (Interpret this result) = p(AnB) P(A) = i = ~ = P(B) ~ 2 Therefore. Result 2. Since all the element of n has equal probability. 20 . (HT).P(B).1: Conditional probability satisfies the axioms of probability.(HT)}. Similarly P(BIA) = 1 since (HT) has already occurred.5. . In such a case we say that the two events A and B are dependent. In this case. P(AIB) = P(A n B) P(B) = t = ~ # P(A) t 2 P(BIA) = P(A n B) P(A) = t 1 = 1 # P(B) (Interpret this result) Here the probability of A (or B) changes after it has been given that B (or A) has appeared. (TH)} A = {(TT). 1 1 P(B) = ~. and we want to find the probability that (HT) has occurred. P(A) = 4' 1 P(B) = 2' 1 1 p(AnB)=4 Let us first intiutively find the conditional probabilities.2: Let us continue with the same sample space n= but now assume We have Therefore. we have 1. P(AIB)= P~t:/ =P(A) P(AB)=P(A). For (AlB). P(AIB) = ~. (TT).e. Example 2.(TH)} An B = {(HT)} P(A) =~..27 Therefore. Now let us use the formula to get the conditional probabilities.5. we know that either (HT) or (TH) has appeared. we say that A and B are independent. B = {(HT). {(HH).
X 2 ) • •• .28 Proof: C) P(AIB) Z = P(AB) > 0 PCB) = pen n B) PCB) 1 PCB) = PCB) = . Let P  (QI (Ai n B)) PCB) PCB) P(Ai nB) PCB) _ I::~l P(Ai n B) = f i=1 <Xl = LP(AiIB) i=l Note that (Ai ii=j. Such as. since (Ai n B) n (Aj n B)=Ai n Aj n B = ¢ for Note: Conditional distributions are vary useful in many practical applications. XI.) . 21 .XT if we want to forecast the value in (T + 1)th period. AI) A21 A 3 ) ••• be a sequence of disjoint ~vents. .. .XT). (ii) Duration dependence: We can consider the conditional probability of a getting a job given the duration of unemployment. X 2 .. (i) Forecasting: Give data on T periods. (iii) Wage differential: Wage distributions could be different for unionized and nonunionized workers.•. then ( . that could be obtained from the conditional disribution P(XT+IIX I . n BY s are disjoint. (iii) p(nIB) ~.