You are on page 1of 513



Probability and
Stochastic Processes

Features of this Text

Who will benefit from using this text?

This text can be used iri J u nior or Senior level courses in probability and stoch astic
processes. The m at liernatical exposition \vill appeal to students arid practitioners in
rnan:y ar eas. The exarnples , qt1izzes, and problerns ar e typical of those encountered
by practicirig electrical and computer engineers. Professionals iri the telecomrnuni-
catioris and vvireless industr}' vvill find it p ar t icularl}' useful.

What's New?

This text has beeri expanded \vith rie"'' irit roductory rnaterial:
Over 160 ne\v horne,vork problerris
New chapters ori Seq'uen,tial Trial.5, D erived Ran,dorn Variables and Coridi-
tion,al Probability Models.
MATLAB examples a.nd problerns give students hands-on access to tlieory and
applications. Every chapter includes guidarice on ho"'' to use MATLA.B to
perform calcl1lations and sirnl1lations relevant t o the subject of the chapter.
Advanced rnater ia.l online in Sigrial Processin,g arid Marko'u Chain,s supple-

Notable Features

The Friendly Approach

The friendly arid accessible vvriting st}rle gives students an intt1itive feeling for
the forrnal rnathem atics.
Quizzes and Homework !P roblems
An exterisi,re collection of in-ch apter quizzes pro,rides checkpoints for read-
ers to gauge their understariding. Ht1ndreds of erid-of-chapter problerns are
clearly rnarked as to their degree of difficulty from beginner t o expert .
Student Companion Website coll ege/yates
Available for download: All MATLA.B rri-files in t h e text , t he Q'uiz Solution,s
Man,ual, a St,u den,t Solv.tion,s Man,11,al; t he Sigrial Processirig Supplernen,t, and
the Marko'u Chairis Sv.pplernen,t.

Instructor Support

Instrt1ctors can r egister for the Iristrl1ctor Cornpanion Site at www.wi l
coll ege/yates

Probability and
Stochastic Processes
A Friendly Introduction
for Electrical and Computer Engineers

Th ird Edition

Roy D. Yates
R?J,tgers; The State Uriivers'i ty of N e111 Jersey

David J. Goodman
Ne'llJ York Uriivers'i ty


V .P. & E xecu tive Publish er Don Fowley

Executive Editor Da n Sayre
Sponsoring Editor l\!Iary O 'Sullivan
Project Edit or Ellen Keoha ne
Production Editor Eugenia Lee
Cover Designer Samant ha LoV\T

This book vvas set in Con1puter lVIodern by t he a u t hors using LATEX and prin ted a nd bound
by RRDonnelley. The cover \Vas printed by JlRDonnelley.

A bout t he cover: The cover sh o\vs a circun1horizon tal arc. As noted in \:\Tikipedia, t his is a n
ice-h alo forn1ed by plate-sh aped ice crystals in high level cirrus clouds . The misleading tern1
"fire rainbow" is sometimes used to descri be t his rare phenomenon , alt hough it is neit her a
rainboV\T, nor related in any \vay to fire.

This book is printed on acid-free paper. 9

F ounded in 1807, J ohn W iley & Sons, Inc. has been a valued source of kno\vledge and
understanding for more t h an 200 years, helping people around t h e \vorld meet t heir needs
and fulfill t heir aspirations . Our con1pany is built on a foundation of principles t h at include
responsibility to t he communities \\Te serve a nd vvhere vve live and \Vork. In 2008, vve
launch ed a Corporate Citizenship Initiative, a global effort to address t he environn1en tal,
social, econ on1ic, and ethical challenges we face in ou r business. Among t he issues \Ve are
addressing are carbon impact, p aper specifications and procuremen t, ethical conduct \vit hin
ou r business and among our vendors, a nd comn1unity and charitable support . F or more
inforn1ation , please visit our website: WV\T\V.V\Tiley.con1/go/citizenship.

Copyright 2014 J ohn vViley & Sons, Inc. All rights reserved. No pa rt of t his publication n1ay be
reproduced , stored in a retrieval system or t ra nsn1itted in any forn1 or by any means, electronic,
mechanical, photocopying, recording, scanning or otherV\Tise, except i:ts pern1itted under
Sections 107 or 108 of t he 1976 United States Copyrigh t _A.ct, wit hou t eit her t he prior \\Tritten
permission of t he Publisher , or a ut h orization t hrough payn1en t of t he appropriate p er-copy fee
to t he Copyright Clearance Cen ter , Inc. 222 Rose\vood Drive, Danvers, l'v'.IA 01923, vvebsite WV\TVV
.copyright .com. Requests to t he Publisher for permission should be addressed to t he
P ern1issions Depar t men t, J ohn vViley & Sons, Inc., 111 River Street, Hoboken , NJ 07030-
5774, (201)748-6011 , fax (201)748-6008, V\Tebsite http : //\vvvw.wil~y. con1/go/pern1issions .

Evaluation copies are provided to qualified acaden1ics and p rofessionals for revie\v purposes
on ly, for use in t heir cou rses during t he next a,c aden1ic year. T h ese copies are licensed and
may n ot be sold or t ransferred to a t hird party. Upon completion of t he revie\v period,
please return t he evalu ation copy to vViley. Return instructions a nd a free of charge return
mailing label are available at Vl.' \VvV.\ /go/returnlabel. If you have chosen to adopt
t his textbook for use in your course, please accep t t his book as your com plin1entary d esk
copy. Outside of t he United States, please contact your local sales representative.

ISBN 978-1-118-32456-1
Printed in t he United States of _A.n1erica
10 9 8 7 6 5 4 3 2 1

T o Alissa, B rett, D an1iel, H annah, L eila, Milo, T heresa,

T on1y, and Zach

Welcome to the third edition

You are r eading t he t liird edition of our t extbook. Altliough the funda rrientals of
probab ility and stochastic processes liave not ch anged since -vve wrote the first ed i-
t ion , t lie v.rorld inside and ot1tside t1niversities is different now t lian it v.ras in 1998.
Outside of academia, applications of probabilit:y t heory have expanded enorrriously
iri the past 16 years. Think of the 20 billion+ Web searches eacli mont h and tlie bil-
lions of dail}' cornputerized stock excliange trarisactions, each b ased on probabilit}'
rriodels, rriany of tlierri devised by electrical and corriputer erigirieers.
Universit ies and secoridary schools, recognizing t lie ft1ndarnental importan ce of
probab ility theory t o a wide r ange of subject a reas , are offering co11rses in t he sub-
ject t o yo11nger students thari the ones who studied probability 16 }'ears ago. At
Rutgers, probabilit}' is riow a required course for Electrical and Corriputer Engi-
neering sopliorriores.
vVe liave responded in severa l "''ays to t hese ch ariges a rid t o the st1ggestions of
studerits and instrt1ct ors -vvho used tlie earlier editioris. Tlie first and second edit ions
contain rriat eria.1 fo11nd iri postgraduate as well as advanced t1ndergrad11ate courses .
By cont r ast , t he printed a rid e-book versions of tliis t hird edit iori foc1is on the
needs of llridergraduat es stud}ring probability for the first tirrie. T lie rriore advanced
rriater ia l in tlie earlier editions, covering r a ridorri sign a l processing a nd ]\/I arkov
ch ains, is available at t lie cornpanion "''ebsit e ( www.wil ey . com/ college/yates).
T o prornote intt1ition into the practical applicat ioris of t lie rriatlierriatics, have
exparided t lie riurriber of examples and quizzes and horriev.rork problerris to about
600, an increase of about 35 percen t compared to the second edition. ]\/I ariy of t lie
exarriples are rnatlierriatica.1 exercises. Ot liers are questions tliat are sirriple versions
of the ones encot1ntered by professiorials v.rorking ori practical applications .

How the book is organized

]\/I ot ivated by our teacliing experience, we have rearra nged the seqt1ence iri -vvh icli
"''e present t lie elementar:y rriat erial on prob ability rriodels, count irig rriethods, con-
ditional probability rriodels, and derived random variables. Iri this edition, the first
cha pter covers furidam entals, including axiorris and probability of events, and the
second chapter covers counting rnethods and sequential experirrierits. As before, vie
introduce discr et e randorri var iables and continuot1s r andorri variables in separ ate
chap ters. The subj ect of Chapter 5 is rrittltiple d iscret e a rid continuous r andom
variables. The first and second editions preserit derived randorn v ariables arid con-
d itiona l r aridorn variables in t lie int rodt1ctions t o discrete a nd cont inuous randorri
variables. In this third editiori, derived randorn variables and coriditional randorri


variables appear in their ovvn chapters>v.rliich cover botli discret e and corit int1ous
randorri v ariables.
Cliapter 8 introdt1ces r andom vectors. It exterids the rnat eri al on rnttlt iple ran-
dorri v ariables in Chapter 5 a rid relies on principles of linear algebra to d erive
propert ies of randorri vectors that are useful in real-world data analysis and sim11la-
t ions. Chapter 12 on est im ation relies on t lie properties of random vect ors derived
in Chapter 8. Chapters 9 through 12 cover st1bject s relevant to data analysis in-
cluding Gaussian approxirnat ions based on the central lirnit theorem >estirnat es of
rnodel paramet ers, liypothesis t esting, and estirnatiori of randorn variables. Chap-
t er 13 introduces stoch ast ic processes in t lie context of the probabilit}' rnodel t hat
guides the entire book: a n experirrierit consisting of a procedt1re and obser vations.
E ach of the 92 sections of the 13 ch ap ters ends wit h a quiz. B}' workirig on
the quiz and checking the solution at the book's vvebsit e, st11dents will get qt1ick
feedback on how v.rell the}' liave grasped t lie rriat erial in each sect ion.
vVe think that 60- 80% (7 to 10 chapters) of t he book vvould fit into a orie serriester
t1ndergraduate cot1rse for begirining students in probability. \Ve anticipat e that all
courses will cover the first five cliapters>arid tliat instructors vvill select the rerriain-
ing cot1rse conterit based on t lie needs of their students. The "roadrnap'> ori page ix
displays t lie thirteen chapter tit les and s11ggests a few possible uridergradt1at e syl-
The Signal P rocessing Supplernerit (SPS ) and Markov Chains Supplernent (1!lCS)
are the firial clia pters of t lie third edition. Tliey a re now available at t he book's
vvebsite. They coritain postgraduate-level rnaterial. \Ve, and colleagues at other t1ni-
versities>ha\re used tliese tvvo chapters in gradt1ate courses that rnove very quickly
through the earl}' ch apters to revievv rnaterial alread}' farniliar t o students arid to
fill in gaps in learning of d iverse post graduat e populat ions.

What is distinctive about this book?

Tlie eritire t ext adheres to a single rnodel t h at begins witli an experirrient
consisting of a proced11re and observatioris .
The rnathematical logic is apparent to readers . Every fact is identified clearl}'
as a definition, a n axiorri, or a theorem . There is a n explanation, in sirnple
English >of t he int11it iori behirid every concept v.rhen it first appears in t he
t ext.
The rnathernatics of discret e randorn variables is int rodt1ced separat ely from
the rnatlierriatics of contint1011s randorri variables.
St ochastic processes and st atist ical iriference fit cornfortabl}' vvithiri t he 11ni-
fyirig rriodel of t lie text .
An abundarice of ex ercises puts the theory to t1se. N e \ fV ideas are augmented
vvith det ailed solutions of nurrierical examples.
Each section begiris v.rith a brief st aternerit of the irnportant coricepts intro-
duced iri t he section and concludes vvit h a sirnple quiz t o help students gauge
their grasp of the rie"'' mat erial.


1. Experiments, mode ls, probabilities
2. Sequential experiments
3. Discrete random variab les
4. Continuous random variables
5. Multip le random variables
6. Derived random variables
Conditional probability models


9. Sums of random 8. Random vectors 8. Random vectors
variables 12. Estimation 11. Hypothesis testing
10. The sample mean 13. Stochastic processes



A road map for the text.

Each problern at the end of a chapter is labeled vvith a reference to a section in

the ch apter and a degree of difficulty ranging frorr1 "easy" to "experts or1ly.''
For exarnple P roblem 3.4.5 requires material from Section 3.4 but not frorn
later sections . Each problem also has a label that reflects ot1r estimate of
degree of difficulty. Skiers "'rill recogr1ize the follovving syrnbols:
11oderate Diffictllt t Experts Or1ly
Every ski area emphasizes that t:hese designatior1s are relative to the trails
at that area. Simila,r ly, the difficulty of our problems is relative to the other
problerr1s in this text.
There is considerable st1pport on the World \ N'ide Web for students and ir1-
structors, ir1cludir1g MATLAB programs and solutions to the quizzes and prob-

Further Reading

Libraries and bookstores contain ar1 endless collection of textbooks at all levels co\r-
ering the topics presented in this textbook. We know of two in comic book forrr1at
[GS93 , PosOl]. The reference list or1 page 489 is a brief sarr1pling of books that
can add breadth or depth to the rr1at erial ir1 t11is t ext. 11Iost books on probability,
statistics, stochastic processes, and randorr1 sigr1al processing contain expositions of


the basic principles of probability and r andorn v ariables , covered in Chapters 1- 5.

Iri advanced texts, tliese exposit ioris serve mairily t o est ablish notation for rriore
specialized topics . [LGl l ] and [Gub06] share our foct1s on electrical and corriputer
engineering applications. [BT08] , [Ros12] and [D ra67) and introduce the funda-
rrientals of probability and randorn variables to a gerieral audience of students vvith
a calcl1lus background. [KMT12] is a compreherisive graduate level t extbook v.rith
a thorough presentatiori of fundarnentals of probability, stochastic processes , and
data, analysis . It uses the b asic theory to develop tecliniques including h idden
l\/Iarkov rnodels, quet1ing t heory, and macliine learning used in many practical ap-
p lications. [Bil12] is rriore advanced m athernatically; it presents probability as a
brancli of measme theory. [MRl O] and [S1![M10] iritroduce probability theory in
the context of data anal:ysis . [Dav lO] and [HL l 1] are beginners' introductions to
MATLAB . [Ber98] is in a class b:y itself. It preserits the coricept s of probability frorri
a historical perspective, focusing on the lives arid coritributioris of rnat herriaticians
and others vvho stimulated rriajor advances iri probability and statistics arid their
application in variOllS fields including psychology, econorriics, government policy,
and risk m anagerrient .


We are grateful for assista.nce arid suggestions from rriariy sources including our stu-
dents at Rutgers and New York u niversities, iristructors "''ho adopted the previous
editions, revievvers, and the vViley tearn.
At \i\Tiley, vie are pleased to ackriowledge the ericomagement and entliusiasrn
of our executive editor D aniel Sayre and t he support of sponsoring editor l\/Iary
O 'Sullivan , project editor Ellen Keohane, production editor Eugenia Lee, and cover
designer Sarriantlia LoV\r.
vVe also convey special thanks to Ivari Seskar of vVINLAB at Rl1tgers University
for exercising his m agic to rriake the \i\TINLAB corriputers particularly hospitable
to the electroriic versioris of t lie book and to the Sllpporting material on t he \ i\T orld
vVide vVeb.
Tlie organization and content of t lie second edition has benefited considerably
frorri the input of m an}' fa.cult}' colleagt1es includirig Alhl1ssein Abouzeid at R ens-
selaer Polytechnic Institt1te, Krishna Arora at F lorida State U niversity, Frarik
Candocia at Florida Iriternational Uriiversity, Robin Carr at Drexel u riiversit}r,
Keith Chugg at USC, Cliarles Doeririg at University of 11.Iicliigan, Roger Green
at N ortli Da kota State U niversity, vVitold K rzymien at U niversity of Alberta,
Edl Scharniloglt1 at Un iversit y of New l\/Iexico, Arthtu Dav id Snider at Un iver-
sity of South F lorid a, Jur1shari Zliang at Arizoria State U niversit}r, and colleagt1es
Narayan l\/Ianda}rarri, Leo R azurriov, Christopher Rose, Predrag Spasojevic, and
vVade Trappe at Rutgers.
Uriiql1e arriong our teacliirig assistarits, Dave Farn ola ri took t he course as an
t1ndergradl1ate. Later as a. teacliing assistarit , lie did an excellent job V\rriting home-
"''ork soll1tions v.rith a t utorial flavor. Other gradt1ate stt1dents who provided valt1-
able feedback and suggestions on tlie first edition include Ricki Abboudi , Zheng


Cai, Pi-Chun C11er1, Sorab11 Gupta , Va.he Hagopian , Arnar }/l ahboob, l van a J\/{aric,
David P andian , Mo11arr1rr1ad Saquib, Sennur Ulukus, and A:ylin Yer1er.
T11e first edition also b enefited frorn revievvs and suggestions corrveyed to the
pt1blisher by D.L. Clark at California State Polytechnic u r1iversit y at Pornor1a ,
JVIark Clements at Georgia T ech , Gust a:vo de Veciana at the Ur1iversity of T exas at
Austin, Fred Fontaine at Cooper U r1ion, Rob Fro11ne at vValla .\ iV alla College, C11ris
Genovese at Carr1egie Mellon, Simon H a:ykin at J\!{cJVIast er , and R atnesh Kurr1ar at
the University of Kentucky .
Finally , "''e acknov.rledge "''ith respect arid gr atitude t11e inspiration and g11idance
of our t eac11ers and rr1en t ors vvho conve}red to us "''lien \Ve wer e st11dents the im-
portan ce and elegance of p robability theo1y . \i\Te cite ir1 particular Robert Gallager
and the late Alvin Drake of MIT and t 11e lat e Colir1 Cherry of Imperial College of
Science and T echr1olog}'

A Message to Students from the Authors

A lot of students find it r1ard to do \veil in this course. We t hir1k t11ere a re a fe"''
r easons for this difficult}' One reason is that sorne p eople find the cor1cepts 11ard
to use and understand. Many of thern a re su ccessful in ot her courses but find
the ideas of probabilit}' difficult t o gr asp. Usually these students r ecognize that
learning probabilit}' theory is a struggle, and rr1ost of therr1 work h ard enough to do
"''ell. HoV\rever , they find thernsel\res putting in rnore effort t 11an in other courses t o
ac11ieve sirnilar results.
O t her people h ave t he opposit e problerri. The \vork looks eaS}' to them , and
they underst and everything they hear in class and read in t11e book. Ther e ar e
good reasons for assurr1ir1g this is easy rr1at erial. There are very fe"'' b asic concep ts
to absorb . The terrrrinology (like t11e word probability), in rr1ost cases, contair1s
farr1iliar \vords. W ith a fevv except ions , the rr1athernatical rr1an ipulations are not
corr1plex. You can go a long \Va}' sol\ring problerr1s with a four-f\1nction calculator.
For rr1any people, this a.pparent sirr1plicity is d angerousl}' rnisleading because it
is very tricky to apply t he m ath t o specific problerr1s . A fe\v of } ' OU vvill see things
clea.rly er1ough to do eve:ryt11ing right t he first t irr1e. Hov.rever , rr1ost p eople who
do v.rell in probability need t o practice wit h a lot of exa rr1ples to get corr1fortable
"''ith the vvork and to r eall}' t1r1derstand wh at the subject is a bout. St uder1t s ir1
this course end tlp like elernentary sc11ool children vvho do vvell vvit h multiplicat ion
t ables a nd long division b t1t borr1b 011t on word problerr1s. The 11ard par t is fig11rir1g
011t \vh at to do "''it h t11e n1rrr1bers, not actua lly doir1g it . Most of t he work in this
co11rse is that v.ray, a nd t11e only \vay to do well is to practice a lot . Taking t he
rr1idterrr1 and fir1al are similar to rur1r1ing in a five-rr1ile race. J\!{ost people can do it
in a respectable tirne , provided the}' t rain for it. Sorr1e people look at t 11e runners
"'' ho do it and say, "I 'rn a.s strong as they are . I 'll j ust go out there and j oin in."
vVithout the t rainir1g , rr1ost of t 11err1 are exhat1sted and walking after a rr1ile or t
So , 011r advice to students is , if this looks really weird t o you , keep working at
it . Yot1 will probab ly cat c11 on. If it looks really simple , don 't get too corr1placent.
It rr1ay b e h arder t h a n y ou t hirik. Get into t he 11abit of doing t 11e q11izzes and


problerris , and if }' Oll dori 't answer all the quiz qt1est ions correctl}' , go over them
t1nt il } ' OU underst and eac11 one.
vVe can 't resist corrirnenting on the role of probability and stochastic processes
in ot1r careers . The theoretical material covered in this book has helped bot h of
us devise ne\v corrirnunicat ion t ecliriiques arid improve the operation of practical
syst erns. '\'/Ve hope you fir1d t he subject intririsicall}' iriteresting. If you rriast er the
basic ideas , }' Oll will have many opport uriities to apply thern in other courses a rid
throughout yol1r career.
We h ave worked hard t o produce a text that "''ill be useful to a large population
of students and instructors . We "''elcorne comrrierits, criticism , and suggestioris .
Feel free t o send llS e-rriail at ryates @111irdab. r~u,tgers. edv, or dgoodrna,n,@poly. edv,. Iri
addition, tlie "''ebsite www . wi col lege/yates provides a variet}' of st1pple-
rriental rriaterials, includir1g the MATLAB code used to produce t lie examples in t he
t ext.

Roy D. Yates David J. Goodman

R u tgers 1 The State University of Ne111 J ersey Ne111 Yo rk University

Se1Jtem,ber 27, 201 3


Fea,tv,res o.f this Text i
Pre,fa,ce vii

1 Experimen,ts, Models, and Probabilities 1

Gettin,g Started with Proba,bility 1
1.1 Set Theory 3
1.2 Applying Set Theory to Probability 1
1. 3 Probability Axioms 11
1.4 Conditional Proba,bility 15
1. 5 Pa,rtition,s and the La,w o.f Total Probability 18
1. 6 Independence 24
1. 1 l\IIA.TLAB 21
Problems 29

2 Sequential Experiments 35
2.1 Tree Diagrams 35
2. 2 Counting Methods 40
2.3 Independent Trials 49
2.4 Reliability A na,lysis 52
2. 5 1\1!.A.TLAB 55
Problems 51

3 Discrete Ra,ndom Varia,bles 62

3.1 Definitions 62
3. 2 Proba,bility M a,ss F11,nction 65
3.3 Families o.f Discrete Ra,n,dom Variables 68
3.4 C11,m,v,lative Distrib11,tion Function (CD F) 11
3.5 A vera,ges and Expected Va,lue 80
3. 6 Function,s o.f a Ra,ndom Variable 86
3. 1 Expected Va,lv,e of' a Derived Random Va,ria,ble 90
3.8 Va,riance a,nd Standa,rd Devia,tion 93
3. 9 1\1!.A.TLAB 99
Problems 106


4 Contin11,ous Ra,n,dom Varia,bles 118

4.1 Continv,ous Sa,mple Space 118
4. 2 The Cumulati1;e Distribution FtLnction 121
4. 3 Proba,bility Density Function 123
4.4 Expected Valv,es 128
4.5 Fam,ilies o.f Continuous Ran,dom Varia,bles 132
4. 6 Gav,ssian Ra,ndom, Varia,bles 138
4. 1 Delta Functions) Mixed Random Variables 145
4. 8 1\1!.A.TLAB 152
Problems 154

5 Multiple Random Variables 162

5.1 Joint Cum11Jative Distribv,tion Function 163
5. 2 Joint Probability Mass Function 166
5.3 Ma,rginal PMF 169
5.4 J oin,t Probability Den,sity Fun,ction 111
5.5 Ma,rginal PDF 111
5. 6 Independent Ra,n,dom Variables 118
5.1 Expected Val11,e o.f a, Function of' Tv;o Random
Va,ria,bles 181
5. 8 Cova,ria,nce) Correlation and Independence 184
5. 9 Biva,riate Ga,ussia,n Random Varia,bles 191
5.10 Mv,ltiva,ria,te Probability Models 195
5 .11 1\1!.A.TLAB 201
Problems 206

6 Probability Models of' Derived Random Va,riables 218

6.1 P MF of' a, F11,nction of' Two Discrete Random Varia,bles 219
6.2 Function,s Yielding Contint1,ous Random Va,ria,bles 220
6.3 Function,s Yielding Discrete or Mixed Ra,ndom
Va,ria,bles 226
6.4 Continv,ov,s Functions of' Two ContinuotLS Random
Va,ria,bles 229
6.5 PDF of' the Sv,m o.f T11;0 Ra,ndom Varia,bles 232
6. 6 1\1!.A.TLAB 234
Problems 236

1 Conditional Probability Models 242

1.1 Conditioning a, Random Varia,ble by an Event 242
1. 2 Conditional Expected Va,lue Given, an Event 248

7.3 Conditioning Two Random Va,ria,bles by an Event 252
1.4 Conditioning by a Ra,ndom Varia,ble 256
7. 5 Condition,al Expected VaJ11,e Given a, Ra,ndom Va,riable 262
7. 6 Biva,ria,te Ga,ussia,n Ra,ndom Va,ria,bles: Conditional
PDFs 265
7. 7 1\11.;\TLAB 268
Problems 269

8 Ra,ndom Vectors 277

8.1 Vector Notation 277
8. 2 Indepen,dent Random Varia,bles and Ra,ndom Vectors 280
8. 3 Function,s o.f Ra,ndom Vectors 281
8.4 Expected Va,lue Vector a,nd Correlation Matrix 285
8. 5 Gav,ssian Ra,ndom, Vectors 291
8. 6 1\11.;\TLAB 298
Problems 300

9 Sv,ms of' Ra,ndom Varia,bles 306

9.1 Expected Valv,es o.f Sums 306
9. 2 Moment G enera,tin,g Functions 310
9.3 MGF o.f the Sum o.f Independent Ran,dom Variables 314
9.4 Ra,ndom Sv,ms o.f Independent Ra,ndom Varia,bles 317
9. 5 Central Limit Theorem 321
9. 6 1\11.;\TLAB 328
Problems 331

10 The Sa,mple Mean 337

10.1 Sa,mple Mean: Expected Value and Va,ria,nce 337
10. 2 Deviation o.f a Ra,ndom Va,ria,ble .from the Expected
fulue ~9
10. 3 La,11Js of' Large Numbers 343
10.4 Point Estima,tes of' Model Para,meters 345
10. 5 Confidence Interva,ls 352
10. 6 1\11.;\TLAB 358
Problems 360

11 Hypothesis Testin,g 366

11.1 Sign,i.fica,n,ce Testing 367
11. 2 Binary Hypothesis Testing 370
11. 3 Mv,ltiple Hypothesis Test 384


11. 4 1\IIA.TLAB 381

Problems 389

12 Estimation of' a, Ra,ndom Va,ria,ble 399

12.1 Minimv,m Mean Sq11,a,re Error Estima,tion 400
12.2 Linear Estima,tion o.l X given Y 404
12.3 MAP a,n,d ML Estimation 409
12.4 Linea,r Estima,tion o.l Ra,ndom Va,ria,bles from Random
Vectors 414
12.5 1\11.J\TLAB 421
Problems 423

13 Stocha,stic Processes 429

13.1 De,finitions a,nd Exa,mples 430
13. 2 Ra,ndom Va,riables .from Ra,ndom Processes 435
13.3 In,dependent) Identically Distributed Random
Sequences 431
13.4 The Poisson, Process 439
13. 5 Properties o.l the Poisson Process 443
13. 6 The Bro11Jnia,n Motion Process 44 6
13. 1 Expected Va,lue and Correla,tion 448
13.8 Sta,tionary Processes 452
13. 9 Wide Sense Sta,tiona,ry Stocha,stic Processes 455
13.10 Cross-Correla,tion 459
13.11 Gav,ssian, Processes 4 62
13.12MATLAB 464
Problems 468

Appendix A Fa,milies of' Ra,ndom Va,riables 411

A.1 Discrete Ra,ndom, Varia,bles 411
A.2 Con,tinuov,s Ra,ndom Va,riables 419

Appendix B A Fe11J M a,th Facts 4 83

Re.f'eren,ces 4 89

Index 491

Experiments, Models,
and Probabilities

Getting Started with Probability

The t it le of this book is Proba b'i lity arid Stochastic Processes. We say arid 11ear and
r ead t h e -vvord probability and its relatives (possible; probable; probably) in rna n}'
contexts. vVit 11in t he r ealrn of applied rr1athem atics, t h e rneaning of probability is
a question t hat has occl1pied rnathernaticians, philosophers, scient ists, and social
scient ists for hundreds of years.
Everyone accepts t h at the probabilit}' of an ever1t is a nurnber between 0 an d
1. Sorr1e people interpret probability as a physical property (like m ass or volume
or t ernperat tue) t 11at ca.r1 be rr1easl1red. This is t err1pt ir1g v.rhen \'Ve talk abotlt t he
probability that a coin flip v.rill corne tlp heads. This probabilit}r is closely relat ed
to t 11e nature of t 11e coin. Fiddlir1g around '\A.Tith t he coir1 can alter t he probabilit}r
of heads.
Anot her ir1terpretation of proba.bilit}' relat es to t he knowledge t hat we h ave abol1t
sornethir1g. We rnight assig11 a low probability t o t he trt1t h of the stat ernent, It is
rain,in,g n,ovJ 'iri Phoeriii;; A rizon,a, because '\Ve kno'\v t h at Phoer1ix is in t h e deser t .
However , our kno-vvledge ch an ges if we learn that it was r a ir1ing an hot1r a.go in
P 11oer1ix. This knowledge '\vould cause us to assign a higher proba bility t o t he
t rut h of t he stat ernent , It is rainin,g r1,ov1 in, Phoen,ix.
Both vie'\vs a re useft1l w11en '\Ve apply probability t heory t o practical problems.
VV11ichever vieV\r '\A.Te t ake, V\Te V\Till r el}r Oil t he abstract rr1athem atiCS Of probability,
"'' hich consists of definitions, axiorns , an d ir1ferer1ces (t heorerns) t hat follow frorn
t he axiorns. W hile t 11e structt1r e of the Sl1bj ect conforms t o principles of pt1re logic,
t he terrr1inology is not er1t irely abstract . Inst ea,d , it reflects the practical origins
of probability t heory, which '\vas developed t o describe phenomena that cannot be
predicted wit h cer tair1ty. The point of view is differ ent frorr1 t 11e one -vve t ook '\vl1en
"''e started studying physics . Ther e we said t h at if '\Ve do t he sarr1e t hing in t he
sarr1e '\vay over a nd O\rer again - send a sp ace sht1t tle ir1to orbit, for exarr1ple -



the result will always be t he same. To predict the rest1lt , -vve h ave t o t ake account
of all relevant fact s.
T lie rriathematics of probabilit.Y begiris when the sit uat ion is so cornplex t liat we
just can't replicate everything irnportant exactl}', like when vie fabricat e and t est
an iritegrat ed circ11it . In t liis case, repetit ions of the sarne procedure yield different
res11lts. The situation is not t otally chaotic, ho\vever. W liile each outcorrie rriay
be unpredict able, t here are consist ent patterns t o be observed -vvhen "''e repeat t he
procedure a large nurriber of tirries. Understariding these patterns helps engineers
est ablish t est procedures t o erisure t hat a factory meet s quality obj ectives . Iri t his
repeatable procedure (rnaking and testing a chip) "''ith llnpredictable outcornes (the
quality of individual chips), the probability is a number bet -vveeri 0 arid 1 that st at es
the proport iori of t irnes \ Ve expect a cer tairi thing t o happen , such as t he proport ion
of chips that pass a t est .
As an introd11ction to probabilit}' and stochastic processes, tliis book serves tliree
It int rod11ces stt1dents t o t lie logic of probability t heory.
It lielps st t1dents develop irit uition into ho\v tlie theory relat es to practical
situat ions.
It t eaches students ho-vv to apply proba bility t heory t o solving erigirieering
To exliibit tlie logic of t he subject , -vve slio\v clearly in t he t ext three categories
of t lieoretical material: d efinitions, axiorns, and t heorerris. Definit ions est ablish
the logic of probabilit y t heor}', a rid axiorris are facts that -vve accept wit hot1t proof.
Theorerns are coriseqt1ences t liat follow logically from definitions arid axiorris. Each
theorern lias a proof tliat refers t o definitions, ax iorns, and ot her tlieorerns. Al-
though there are dozens of defiriitions and theorems , t liere are only three axioms
of probability theory. These t hree axioms are the fo11ndatiori on "'' hich the entire
subj ect rest s. To rrieet our goal of presenting the logic of the s11bject , -vve could
set out t lie rnaterial as dozens of definit ions folloV\red by three axiorns followed by
dozens of t heorerns. Each theorern \vould be accompanied by a complet e proof.
W hile rigorous, this ap proacli would cornpletel}' fail to meet our second airn of
conveying the irit uit ion necessar}' t o work on practical problerns. To address this
goal, we augrnent the purely rnat1iernat ica1 rriaterial -vvith a large number of exarnples
of practical pheriomena t riat can be anal}rzed b}' means of probability theory. We
also interleave definit ions arid theorerns, presenting sorne t heorerns wit li complete
proofs, presenting others -vvit h partial proofs, and omitting sorne proofs alt ogether.
vVe find t h at most engirieering stt1dents stt1dy probability "''it h t he airn of llSing it
to sol-ve practical problerns , a rid we cater rnostly to this goal. \N"e also enco1rrage
students to take an interest in t lie logic of the subject - it is very elegant - arid
"''e feel t liat t lie rnaterial presented is st1fficient to enable these students to fill in
the gaps we ha;ve left in the proofs .
Therefore, as } ' OU read t his book you will find a progression of defiriit ions, axiorris,
theorerns , more definit ior1s , and rriore t lieorerris, all interlea:ved wit li exarnples and
comments d esigried t o contrib11 te to yot1r underst anding of t he t heory. \N"e also
inclt1de brief qt1izzes that you shot1ld try t o sol-ve as you read the book. Each one


This riotation tells us to form a set by perforrriing tlie operat ion to t lie left of the
vertical bar , I, on t he nl1mbers to t he right of the bar. T herefore,

C = {1, 4, 9, 16, 25} . (1.4)

Sorne set s have an infinite number of elements. For ex arriple

D = { ::c 2 l ~r; = 1, 2, 3, .. . } . (1.5)

The dot s tell us to cont in ue t he sequence t o t he left of t he dots . Sirice t here is no

ntlrriber t o tlie right of t he dots , -vve cont inue the sequen ce indefinitely, forrning an
infinite set containing all perfect squares except 0. The definit ion of D implies that
144 ED arid 10 tj D .
In addit ion t o set inclusion , we also h a;ve t he notion of a S?J,bset , -vvhich describes
a relationsh ip bet -vveeri t -vvo sets . B}' definit ion , A is a subset of B if every rnernber
of A is also a rnerriber of B. '\Ve use t he syrribol c t o denote Sl1bset . Tlius A c B
is rnatliernatical not ation for t he st aternerit "the set A is a subset of the set B ."
Using the defiriitions of sets C and D in Equat ions (1.3) and (1.5) , we observe t hat
C c D. If

I = {all positive integers, riegative integers, and 0} , (1.6)

it follows that C c I , an d D c I .
The defiriition of set eql1ality, A = B , is
A = B if and onl}' if B C A arid A C B.
This is t he rnathernatica.l \vay of st ating that A and B are ident ical if and orily if
every elernen t of A is an element of B and every elernen t of B is ari element of A .
This definit ion implies t h at a set is unaffected b}' tlie order of the elerrierits in a
definition. For exarnple, {O, 17, 46} = {17, 0, 46} = {46 , 0, 17} are all t he same set.
To -vvork -vvith set s matlierriat ically it is necessary t o defirie a v,n,iversal set. This
is the set of all t hings t ha t -vve coltld possibly consider in a giveri contex t . In an}'
study , all set operations relate t o the 11riiversal set for t liat stud}' The rnernbers of
the uriiversal set include all of the elemeri ts of all of the set s iri the study . \ e "''ill
use t he letter S t o denote the universal set . For ex arriple, the universal set for A
cotlld be S = {all universities in the u nited States , all planets} . Tlie uriiversal set
for C cou ld be S = I = { 0, 1, 2, ... }. B}' d efin ition, every set is a subset of t he
universal set . T hat is, for any set X , X c S .
The n,ull set , which is also irriportarit, ma}' seem like it is not a set at all. By
defin it ion it h as no elernerits . Tlie notation for tlie null set is 0 . By definition 0 is
a subset of e\rer}' set. For an}' set A , 0 c A .

It is customary t o refer to Venn diagrams t o displa}'
relationslrips arnong sets . By cori\rention, t lie region
enclosed by t he large rectangle is the uriiversal set S .
Closed surfaces \vitliin t h is rectarigle denote sets . A
Venn diagrarri depicting the relat ionsmp A c B is
sho-vvn on the left.


vVhen we do set algebr a , forrr1 ne\v set s from existir1g sets . There are t hree oper-
ations for doing t his: 7J,Tl, io'TI,, i'Tl,tersect'ion,, and cornplerne'Tl,t. Un ion and intersection
cornbir1e tvvo existing sets to produce a third set. The complement operation forms
a ne\v set frorn or1e existir1g set . The notation and definitions follovv.
The 'un,ion, of sets A and B is t:he set of all elerr1en ts
AUB that are eit11er in A or ir1 B , or in both. The unior1 of
A arid B is denoted by A U B. In this Venr1 diagrarn,
I A U B is the corr1plete shaded area. Forrr1ally,

;i; E A U B if a nd or1l}' if x E A or ;i; E B.

The set operation union corr esponds t o t he logi cal

"or" operation.

The i'Tl,tersectio'TI, of t \vo sets A arid B is the set of all

AnB elem ents that ar e contained bot11 in A arid B. The

I intersection is der1oted b}T A n B. Another notatior1

for intersection is AB. Forrnally, the definition is
x E A n B if a nd on ly if x E A and ;i; E B.
The set operation intersection corresponds to the log-
ical "arid" f\1nction.

A The cornplerne'Tl,t of a set A , denoted b}' A c, is the set

of all elerr1ents in S t11at are not in A. The corr1plernent
of S is t he r1 ull set 0 . Formally,
x EA c if and only if ;i; tj A.

Ir1 vvorking \vit h probability "''e \vill often refer to t vvo irnportant properties of col-
lectior1s of sets . Here are t11e definitions .

A collection of set s A 1 , ... , A n is rnut11,ally ex;cl1J,sive if

A and only if

A,i n A.i = 0, (1. 7)

The vvord d'isjrxirit is sometirnes used as a synonyrn for
mutually exclusive.


A1 A2

A collection of set s A1 , ... , A n is collectively eJ;haustive

if and only if


Ir1 the defir1ition of collect ively eJ}ia'ustive , used t 11e sorr1e\vhat curr1bersorne no-
t ation A1 U A2 U U An for t he t1nion of J\T sets. Just as 2=:
1 x ,i is a short 11and
for x 1 + x 2 + +1;n, we will use a s11orthand for l1r1ions and intersections of n, sets:
LJ Ai = A1 U A2 U U An, (1.9)
'i= l

Ai = A1 n A2 n n An (1.10)
i= l

We \vill see that collections of sets that are bot 11 rr1ut11all}' exclusive and collec-
tively exhaustive ar e sufficiently useft1l to rr1erit a definitiori.

A collection of set s A 1 , ... , An is a partiti on, if it is

bot h rr1t1tually exclusive and collectivel}' exhaustive.

Frorn the definition of set operatior1s, we can derive rr1any irr1portant relationships
betvveer1 sets and ot her sets derived frorr1 thern. One exarnple is

A n B c A. (1.11 )

To prove that t l1is is t rue , it is necessary t o sho\v t hat if ;i; EA n B , t h er1 it is als o
true that x EA. A proof t 11at set s are eq11al, for exarnple , X = Y , req11ires tvvo
separ at e proofs : X c Y and Y c X. As we see in t he followir1g t11eorem , t11is can
be corr1plicat ed t o s11ovv.

==~ Theorem 1.1===

D e M orga'n 's la'tlJ relat es all three basic operati oris:

Proof There are two p arts to t he proof:

To show (A U B )c C A c n B c, suppose x E ( A U B )c . That implies x ti A U E. H en ce,
::e ti A a nd x ti B , v;,rhich toget h er imply ::e E A c and ::e E B e. That is, x E _4c n B e.


To sho'v A cnBc C (-4 UB)c, suppose x E A cn B c. In this case, x E A c and x E B e.

Equivalently, ::i; ti A and x ti B so that x ti A U B. Hence, x E (A U B)c.

===- Exa mple

Phonesmart offers customers two kinds of smart pho nes, Apricot (A) and Banana (B) .
It is possible to buy a Bana na phone wit h an optiona l externa l battery E. A pricot
customers can buy a phone with an externa l batte ry (E) or an extra me mory card (C)
or bot h . Draw a Venn diagram that shows the re lat ionshi p among the items A ,B,C
and E ava ila ble to Phonesmart custo mers.

Since each phone is either Ap ri cot or Banana, A and B form a

pa rtition . Since the external battery E is ava ilable for bot h kinds of
phones, E inte rsects both A and B . However, since the memory
card C is available o nly to Apricot customers, C C A. A Venn
diagram represe nting these facts is shown on the right .

Quiz 1.1.------
Gerlandas offers custorners kir1ds of pizza crust, T uscan (T)
and Neapolitan (N) . In a,ddition, each pizza rnay have rnush-
roorr1s (M) or onior1s (0') as described by the Ven r1 diagram
at right. For the sets specified belovv, shade the corresponding --r
regior1 of t11e venn diagram.
(a) N (b) N u J\!f
(c) J\Tn M (d) T Cn ]VJC

1.2 Applying Set Theory to Probability

Probabilit:y is based on a repeat able experiment that cor1sists of
a procedt1re and observations. An ov,tcorne is an observation. Ari
event is a set of outcornes.
The mathernatics vve st11dy is a branch of measure t11eor:y. Probability is a n11mber
that describes a set. T11e higher the nurr1ber , the rnore probability there is. Ir1
this sense probability is like a quantity that rneast1res a physical phenomenor1; for
exarr1ple, a vveight or a terr1perature. Ho,;vever, it is not necessary to think abo11t
probabilit}' in physical t errns . "\N'e can do all the rnath abstractly, just as we defir1ed
sets and set operations in t11e previous paragraphs witl1ot1t any reference to physical
Fort11natel}' for er1gir1eers, the lar1gt1age of probability (including the word prob-
ability itself) rnakes t1s t11ink of things t11at we experience. T11e basic rr1odel is a


repeatable ex;perirnen,t. A ri experirnent consists of a procedv,re and observation,s .

There is uncertairity in vvliat vvill be observed ; ot hervvise, performirig t he experirrient
vvould be t1nnecessary. Sorne examples of experirrierits iricll1de
1. Flip a coin. Did it la nd \vith heads or t ails facing up?
2. Walk t o a bus st op. Ho"'' long do you wait for t he arrival of a bus?
3. Give a lectl1re. How m any students are seat ed in t he fourt h row?
4. Transmit one of a collection of v.raveforrns over a ch arinel. '\i\Th at v.raveform
arri\res at the recei\rer?
5. Transmit orie of a collection of vvaveforrris over a chanriel. W hich waveforrri
does the recei,rer iderit ify as the t ransrriitted waveform?
F or the rnost p art, we "''ill anal}rze rnodels of actual physical experirnerits. '\ i\Te
creat e rnodels because rea1 experirrierits generally are too cornplicated t o anal}rze.
For exarriple, to describe all of t lie factors affecting your wait irig t irrie at a b us stop ,
yot1 rna}' consider
The time of d ay. (Is it rush liol1r?)
The speed of each ca,r t hat p assed b}' while you vvaited.
The weight, horsepovver , and gear r atios of eacli kirid of bus l1sed by t he b us
T lie psychological profile arid vvork scliedtlle of each bt1s driver. (Sorrie dri\rers
drive faster than ot riers .)
The statt1s of all roa,d construction wit hin 100 rriiles of the bus stop.
It shol1ld b e a ppar erit t liat it vvot1ld b e difficl1lt t o an alyze the effect of eacli of
these factors on the likelihood t hat you v.rill "''ait less t ha n five rninutes for a bus .
Consequent ly, it is necessa,r}' t o study a rnodel of t he experirrierit t hat capt 11res the
important part of the actu al physical experirrien t . Since "''e "''ill focus ori t he model
of the experiment alrnost excll1sively, we often will l1se t lie word experirnen,t t o r efer
to t he rnodel of a n experirnent.

Example 1.2
An experiment consists of the following procedure, observation, and model:
Procedure: Monitor activity at a Phonesmart store .
Observation: Observe w hich type of phone (Apricot or Banana) the next customer
Model: Apricots and Bananas are equa ll y likely. Th e result of each purchase is
unrelated to the resu Its of previous purchases.

As vie have said , an experiment consist s of both a procedure and observations.

It is irnportant t o underst and t liat two experirrients v.rith the sarrie procedure but
"''ith different observations are different experirrients . For example, consider t hese
t vvo experirnents:


Example 1.J,- --===

Monitor the Phonesmart store until three customers purchase phones. Observe the
sequence of Apricots and Bananas .

Exa mple 1.4:---==

Monitor the Phonesmart store until three customers purchase phones. Observe the
number of Apricots .

These t wo experirr1er1ts have t11e sa.rr1e procedure: rnor1itor t11e P hor1esrnart store
until t11ree custorr1ers purchase phor1es. They different experirner1ts because the:y
require difl'erer1t observations. vve will describe models of experiments in terrr1s of a
set of possible experimental outcorr1es. In the context of probability, -vve give precise
rneanir1g to the -vvord outcorne.

Definition 1.1== Outcome

An, outcome of an, experirnen,t is an,y JJossible obser1;ation, of that experirnen,t.

Irr1plicit in the definition of an outcorne is the notion that each outcome is distin-
guis11able frorn every ot11er Ol1tcome. As a result, vie defir1e the universal set of a ll
possible outcornes. In probability terrns , -vve call this l1r1iversal set t11e sarnple space.

Definition 1.2 Sample Space

The sample space of an, experirnen,t is the fi'nest-grain,7 rnutv,ally exclv,sive 7 collec-
tively ex;haust'ive set of all possible outcornes.

The fin,est-grain, property sirr1pl}' mear1s that all possible distir1gl1ishable outcornes
are ident ified separately. T he requirement that 011tcornes be rr1utually exclusive
sa}'S that if or1e outcome occurs, then no other Ol1tcorne also occurs. For t 11e set of
out cornes to be collectivel}' exhaustive, every ot1tcorr1e of the experiment mt1st be
in the sarnple space.

===- Example 1.5,- -===

T he sample space in Example 1.2 is S = {a, b} where a is the outcome "Apricot

sold, " and b is the outcome "Banana sold."
T he sample space in Example 1.3 is

S = { aaa, aab, aba, abb, baa, bab , bba , bbb} (1 .12)

T he sample space in Example 1.4 is S = {O , 1, 2, 3}.

Example 1.6
Manufactu re an integrated circuit and test it to determine w hether it meets quality
objectives. T he possib le outcomes a re "accepted" (a) and "rejected" (r) . T he sa m pie
space is S = {a , r}.


Set Algebra Probability

Set Ever1t
Universa.1 set Sample space
Elerr1ent Outcorr1e

Table 1.1 T he terminology of set theory and probability.

Ir1 corrrrr1on speech, a n event is sorr1et hing t hat occurs. In a n experirr1ent , \Ve
rr1ay say that an ever1t occurs when a certain phenomenon is observed. To define
an event rnatl1err1atica.lly, -vve rr1ust ider1tify all 011tcomes for \vhich the phenornenon
is obser ved. That is, for e[-t.Ch outcorne, either t he particular ever1t occt1r'S or it does
not. In probabilit}' t errns, \Ve define an event in terrns of t he outcorr1es in t he sarr1ple

Definition 1.3 == Event

A n, event is a set of outcornes of ari experirnen,t.

Table 1.1 relates t he terrr1inology of probability t o set t heory. All of this ma}'
seern so s irr1ple t h at it is borir1g. vVhile t his is true of t he defirlitions therriselves,
applying t11err1 is a different rr1atter. Definir1g the sample sp ace and its outcomes
are key elem ents of the solution of any probability problerri. A probabilit}' problem
arises frorn some practical situation t hat can be rnodeled as an experirr1ent . T o -vvork
on the problerr1 , it is necessary to define the experirr1er1t carefull}' and then derive
the sarr1ple sp ace. Getting t his right is a b ig step t o-vvard solvir1g the problerri.

=== Example 1.1===

Sup pose we roll a s ix-sided die and observe the number of dots on t he s ide facing
upwards. We can label these o utcomes i = 1, ... , 6 where i denotes the outcome that
'i dots appear o n the up face. The sample space is S = {1, 2, ... , 6}. Each subset of
S is an event. Examples of events a re

The event E 1 = {Roll 4 or higher} = {4, 5 , 6}.

The event E 2 = {The ro lI is even} = {2, 4 , 6} .
E3 = {The roll is the square of an integer} = {1 , 4}.

=== Example 1.H===

Observe the number of minutes a c ustomer spends in the Phonesmart sto re. An out-
come T is a nonnegative real number. The samp le space is S = {TIT> 0 } . The event
"the customer stays longer t han five minutes is {T IT > 5} .


Example 1.9
Monitor three customers in the Phon esmart store. Classify the behavior as buying (b)
if a customer purchases a smartphone . Otherwise the behavior is no purchase (n,). An
outcome of t he experiment is a seq uence of three customer decisions. We can denote
each outcome by a three- l etter word such as bn,b indicating that the f irst and third
customers buy a phone and t he seco nd customer does not. We de note the even t that
customer i buys a pho ne by B.i and the event customer i does not buy a phone by
JV.i . T he eve nt B 2 = {r1Jyr1, ,r1,bb, blm,, bbb } . We ca n also express an outcome as an
intersection of events B.i and N.i . For example the outcome brdJ = B 1 N 2 B 3 .

Quiz 1.2
]\/I onitor t 11ree consecut ive packets going through a Internet rot1ter. Based or1 t 11e
packet 11eader , each packet can be classified as either video ( v) if it was ser1t frorr1
a Yout ube server or as ordinar}' data. ( d) . Yo11r observatior1 is a sequen ce of t hree
letters (each letter is either v or d) . For exarnple, t wo video packets follovved by
one data packet corresponds to vvd . \i\Trite the elements of t he followir1g sets:

A1 = {second packet is video}, B1 = {secor1d packet is dat a},

A2 = {all packets are the same}, B2 = {video an d dat a alternat e},
A 3 = {one or rr1ore video packets}, B 3 = { or rr1ore data packet s}.

F or each pair of events A1 and B1 , A2 and B2 , an d so on, iden t ify whether t he pair
of events is eit 11er rr1ut ually exclusive or collectively exha11stive or bot h.

1.3 Probability Axioms

A proba bility rr1odel assigns a nt1rr1ber bet ween 0 and 1 to ever}'

event. T he probability of the union of mut ua.11}' excl11sive events is
the surn of t he probabilities of the events in t he t1nion.
Thus far our model of an experirnent consists of a procedure a.nd observatior1s. This
leads to a set-theory representation v.rit h a sarnple space (ur1iversa.1 set S) , ot1tcomes
(s t hat elerr1er1ts of S) , arid events (A that are set s of elerr1ents) . To corr1plet e
the model, we assigr1 a probabilit}' P(A] t o every e\ren t, A , in t he sarr1ple space .
vVith respect t o our ph}rsical idea. of t 11e experirr1er1t, the probability of a n event is
the proport ior1 of t he t ime t hat e\rent is obser\red in a large nurnber of runs of the
experirnent. T11is is t he relative fre q11,ericy notion of probability. Matherr1atically,
this is expressed in t he follovving axiorr1s .

- - - Definition 1.4 Axioms of Probability

A JJ'ro bability rneas11,re P [] is a f11,rictior1, that rnaJJS even,ts in, the sarnple space to real
riurnbers S?J,ch that


Axiom 1 For an,y even,t A; P[A] > 0.

Axiom 2 P[S] = 1.
Axiom 3 For an,y CO?J,n,table collectiori A 1 , A 2 , ... of rn?J,t11,ally ex;cl?J,sive e'verits

We v.rill build our entire t heory of probabilit:y on t hese t hree axiorns. Axiorr1s
1 and 2 sirnply establish a proba bilit}' as a r1urnber betV\reen 0 and 1. Axiorr1 3
stat es that t:he probability of t:he ur1ion of rnutl1ally exclusive events is the Sl1rr1 of
the individual probabilities . \ N"e -vvill llSe this a.xiorr1 over and over in developing
the theory of probability and in solving problerris. In fact , it is really all 11ave
to v.rork witli. Ever}rthing else follo-vvs from Axiorn 3. To use Axiorn 3 t o solve a
practical problem , we "''ill learn in Section 1.5 to ar1a.lyze a corr1plicat ed event as t 11e
l1nior1 of rr1ut ually excll1sive events whose probabilit ies "''e car1 calculate. Then , we
"''ill add t he probabilities of t 11e rr1ut11ally exclusive events to fir1d the probability of
the corr1plicat ed ever1t \Ve are interest ed iri.
A useful exter1sion of Axiorr1 3 a pplies t o the llnion of two rr1l1t 11a lly exclusive

---== Theorem 1.2

For rn11,t?J,ally excl'US'ive even,ts A 1 an,d A 2 ;

Alt hough it m a}' appear that Theorerr1 1.2 is a trivial special case of Axiorr1 3, this
is not so. In fa ct, a sirr1ple proof of Theorerr1 1.2 rr1ay a lso llSe Axiorr1 2! If you ar e
curio11s, Problern 1.3.13 gives t he first st eps to\vard a proof. It is a sirnple rnatter
to extend Theorern 1.2 to an}' finite union of rr1ut u ally exclusi,re set s.

- - - Theorem l.3- - -
If A = A 1 u A2 u ... u Arn an,d Ai n A.i = 0 f OT i -=J j ) then,

p [A] = L p [Ai ] .
i= l

In Chapter 10, we s110-vv that the probability measure established by t 11e axiorr1s
conesponds to the idea of r elati,re freq11er1C}' T h e correspondence r efers t o a se-
quent ial experiment cor1sisting of n, repetit ions of the basic experirnent . We refer to
each repetit ior1 of t11e experirnent as a trial. In t11ese ri trials, NA (n,) is t he number
of times that e\ren t A occurs . The relative frequen cy of A is t he fractior1 NA (n,) /r1,.
Theorerr110.7 proves that limn-+oo NA(ri)/n, = P[A].


Here vie list sorr1e properties of probabilities t11at follov.r directly from t11e three
axioms. W hile we do not st1pply the proofs>we suggest t hat students prove at least
sorne of these t heorerns ir1 order t o gain experience workir1g wit11 the axioms.

~-- Theorem 1.4

The probability rneas?J,re P [] satisfies
(a) P[0 ) = O.
{b) P [Ac) = 1 - P [A).
(c) For an,y A an,d B (n,ot n,ecessarily rn'1J,tv,ally ex;clusive),

P [A u BJ = P [A) + P [BJ - P [A n BJ .

{d) If Ac B , then, P[AJ < P[B).

Another consequer1ce of the axiorr1s car1 be expressed as the follovving t heorerr1:

Theorem 1.5
The probability of ari everit B = { s 1 > s 2 , ... , Srn} is th e s11,rn of the probabilities of
the outcornes con,tain,ed iri the e'Uen,t:
P [B] = L P[{si}] .
i= l

Proof Each outcome Siis a n even t (a set) w it h t h e single elemen t Si . Since outco1nes by
definition are mutually exch1sive, B can be expressed as t he union of m mutually exclusive


\Vit h { si } n { Sj} = 0 for i =P j . Apply ing Theorem 1.3 \Vi t h B i = {si} yields

P [BJ= L P[{si}]. (1.14)

i =l

Comments on Notation
vVe use the notation P [] t o indicate t11e probability of an event . The expression in
the square brackets is an event. W ithin t he context of one experirnent , P[A) can
be vievved as a fur1ction t11at t ransforrns event A to a nurnber between 0 and 1.


Note that { s,i } is the forrnal notation for a set vvith t he single elerr1en t si . For
convenience, vie will sorr1etimes vvrit e P [si] r ather t.han the rr1ore cornplete P[ {s,i}]
to denote the probability of this Ol1tcorr1e.
We will also abbreviat e t he notation for the probability of t:he intersection of
events, P[A n B). Sorr1etirnes \Ve "'' ill ,vrite it as P [A , B] and sornetirr1es as P[AB].
Thus b}' definition, P[A n B ) = P [A , B ) = P [AB).

Equally Likely Outcomes

A large ntlrr1ber of experirr1ents have a sarr1ple space s = {S1) ... ) Sn} ir1 wr1ich Ollr
kr1ovvledge of the practica.1 sittlation leads us to believe that no one 011tcorne is an}'
rr1ore likel}r than any other . In these experirr1er1ts \Ve say tr1at the ri outcomes are
eqv,ally likely. In st1ch a case, the axiorns of probability imply that every ot1tcorr1e
has probability 1/n,.

Theorem 1.6,--::==
For ari experirnen,t 'tvith sam,ple space S = { s 1 , ... , Sn} in, vJhich each outcorne si 'is
equally likely,
1 < 'i < ri.

Proof Since a ll outcomes have equal probability, t here exists p such t h at P [si] = p for
i = 1, ... , n . Theorem 1.5 implies

f> [SJ = P [s1] + + P [sn] = n,p. (1.15)

Since Axiom 2 says P[S] = 1, p = l /n,.

Example 1.10
As in Example 1.7, roll a six-sided die in which all faces a re eq ually likely. What is the
probability of each outcome? Find t he probabilities of t he events: "Rol l 4 or higher,"
"Roll an even nu mber ," and "Roll the square of an intege r."

T he probab ility of each outcome is P[i] = 1/ 6 for 'i = 1, 2, ... , 6. The pro babilities of
t he th ree events are
P[Rol l 4 or hig her) = P [4) + P[5] + P [6] = 1/ 2.
P[Rol l a n even number) = P[2] + P [4) + P[6] = 1/ 2.
P[Rol l t he square of a n integer) = P[l] + P [4) = 1/3 .

Quiz 1.3
A stl1dent 's test score T is ar1 ir1teger betvveer1 0 a nd 100 corresponding to the
experirr1ental ot1tcomes so , ... , s 100 A scor e of 90 to 100 is ar1 A , 80 to 89 is a B ,


70 to 79 is a C , 60 to 69 is a D , a rid belovv 60 is a failing grade of F . If all scores
betvveer1 51 a nd 100 ar e equa lly likel}' a nd a score of 50 or less r1ever occl1rs, find
the following probabilities:
(a) P [{s100}] (b) P[A]
(c) P[F ] (d) P[T < 90]
( e) P [a C grade or better] (f) P [student passes]

1.4 Conditional Probability

Conditional probabilities correspond to a modified probability

model that reflects partial inforrnation a.bout the outcorne of an
experirr1er1t. The rnodified rnodel has a srnaller sample space than
the original model.

As we Stlggest ed earlier, it is sorr1etirnes l1seful to interpret P [A] as our knowledge

of the occurren ce of event A before a n experiment takes p lace. If P [A] ~ 1, we
have advance knov.rledge that A v.rill almost cer tainly occur. P [A] ~ 0 r eflects
strong kr1owledge that A is unlikel}' to occl1r wh en the experiment t a kes place.
vVith P [A] ~ 1/ 2, we have litt le know ledge abot1t "'' het11er or not A v.rill occur.
Thus P [A] r eflect s our knowledge of t he occl1rrence of A prior t o perforrning an
experirnen t . Sornetimes, -vve refer to P[A] as the a priori probability, or t h e prior
probability, of A.
In rnany practical sitl1a,t ior1s, it is not possible t o find out t11e precise outcome of
an experirnent. R ather than the Ol1tcorne s ,;,, itself, "''e obtain information that the
outcorne is in the set B. That is, we learn that sorne event B has occl1rred , v.r11ere
B consists of several outcornes . Cor1ditional probability describes Ollr knowledge of
A v.rhen "''e know that B 11as occl1rred bl1t "''e still don 't kr1ow the precise outcorr1e .
The notation for this r1ew probability is P[AIB]. We read this as "the probability of
A given B ." Before goir1g to the rnat11err1atica1 definition of conditional probability ,
"''e provide an exarnple thtit gives an indication of how conditional probabilities can
be t1sed.

=== Example 1.ll==:::::i

Consider an experiment that consists of testi ng two integrated circu its (IC ch ips) that
come from the same silicon wafer and observing in each case whether a ch ip is accepted
(a) or rejected (r). The sample space of the experiment is S = {rr,ra,ar,aa} . Let B
denote the event that the first chip tested is rejected . Mathematica lly, B = {rr,ra} .
Simi larly, let A = {rr,ar} de note the event t hat the second chip is a fail ure.
T he ch ips come from a high-qua lity productio n line. T herefore the prior pro babi lity
P [A] is very low. In advance, we are pretty certai n t hat the second circuit will be
accepted. However, some wafers become contaminated by dust, and these wafers have


a high proportion of defect ive chips. When the first chip is a reject, the outcome of th e
experiment is in event B and P (A IB J. the probability t hat the second chip will a lso be
rejected, is higher than the a priori probability P( A] because of the likelihood that dust
contam ina ted the ent ire wa fer.

Definition 1.5 Conditional Probability

T he cor1,ditior1,al probabilit y of the even,t A given, the ocC?J,TTen,ce of the even,t B is

P (A IB] = P [AB] .
p (B ]

Cor1ditional probabilit:y is defined only when P(B] > 0. In rnost experirnents,

P(B] = 0 rnear1s t 11at it is certair1 that B never occ11rs. In t 11is case, it is illogical t o
speak of the probabilit}' of A given t hat B occurs. Note t 11at P(AIB ] is a respect able
probability rneas11re relative t o a sarr1ple sp ace t hat consist s of all the 011tcornes in
B. Tliis rr1eans that P [A IB] h as properties corresponding t o t h e three axiorns of
probability .

Theorem 1.7
A coriditiorial JJ'ro bability rneas11,re P (A IB] has the fo llo111irig pro1Jert'ies that corre-
sporid to the axiorns of probability.
A J;iorri 1: P [A IB] > O.
A xiorri 2: P [B IB ] = 1.
A xiorn 3: If A= A1 u A 2 u v.1ith A i n A.i = 0 f or i ~ j, then,

You should be able t o prove t hese st at em ents using Defir1ition 1.5.

Example 1.12
With respect to Example 1.11, consider the a priori probabi lity model

P [rr ] = 0.01 , P [ra] = 0.01, P [ar] = 0.01 , P [aa] = 0.97. (1.16)

Find the pro bability of A = "seco nd chip rejected" and B = "first ch ip rejected ." Also
find the cond itional probability that the second chip is a reject given that the first ch ip
is a reject .

We saw in Example 1.11 t hat A is the union of two mutually excl usive events (outcomes)
rr a nd ar. Therefore, the a priori probabi lity t hat the second chip is rejected is

P (A] = P [rr ] + P [ar] = 0.02 ( 1.17)



This is also the a priori probability that th e first c hi p is rej ected:

P [BJ = P [rr] + P [ra] = 0.02. (1.18)

The co ndit ional probabilit y of the second ch ip being rejected given t hat the first c hi p
is rejected is, by defi nition, the ratio of P [AB] to P [B ], where, in this exam ple,

P [AB] = P [both rejected] = P [rr ] = 0.01 (1.19)

P [AB]
P [A IB] = P [BJ = 0.01 / 0.02 = 0.5. (1.20)

The in formation that the first chip is a reject drastically changes our state of knowledge
a bout t he second chi p. We started wit h near certainty, P[A] = 0.02, t hat the second
chi p would not fail a nd e nded w ith comp lete un certainty about the q ua lity of the second
ch ip, P[AIB ] = 0.5.

- = Example 1.13
Shuffle a deck of cards and observe the bottom card. What is t he cond it ional probabil ity
that t he bottom card is t he ace o f cl u bs give n that the bottom card is a black card?

The sample space consists of the 52 cards t hat can appear on t he bottom of the deck.
Let A denote the event that t he bottom card is the ace of c lubs. S in ce a ll cards are
equally li kel y to be at the bottom, the p robab il ity that a particular card, such as the
ace o f cl u bs, is at the bottom is P [A] = 1/ 52. Let B be the event that the bottom
card is a black card. The event B occurs if the bottom card is one of the 26 clubs or
spades, so that P[B] = 26/ 52. Given B, t he cond itional probability of A is

P [AIB] = P [AB ] = P [A] = 1/ 52 = ~. (1.21)

P [B] P [B] 26/52 26
The key step was obse rving that AB = A, because if the bottom ca rd is the ace of
clubs, then the bottom card must be a black card. Mathematically , this is an example
of the fact that A c B implies that AB= A.

Example 1.14
Roll two fair fou r-sided dice. Let X 1 and X 2 denote t he number of dots t hat a p pear
on die 1 and d ie 2, respectively. Let A be the event X 1 > 2. What is P[A]? Let B
denote the event X 2 > X 1 . What is P [B]? What is P[A IB]?
x2 15
w~b~g i~ b;,~b~~~~i.~g th~t the ~~ ~ pi~ sp~~~ h~~ el~-

me nts correspondi ng to the four possible va lues of X 1 and

4 B. the same four values of X2. Since the dice are fair, the
outcomes are equal ly li kely, each with probabi lity 1/ 16.
We draw the sa m p le space as a set of black c irc les in
a two-dime ns io nal d iagram , in which t he axes represent
the events X 1 a nd X 2 . Each outcome is a pair of va l-

1 2 3 4
X 1 ues (X1 >X2 ) . The rectangle re presents A. It conta ins 12
outcomes, each w ith probab ility 1 / 16.


To find P(A], we add up the probabi lit ies of outcomes in A , so P(A] = 12/ 16 = 3/ 4.
The triangle re presents B. It co nta ins six outcomes. Therefore P(B ] = 6/16 = 3/8 .
The event AB has t hree o utcomes, (2, 3) , (2, 4), (3, 4) , so P[AB] = 3/ 16. From the
definition of cond itional probability, we write

P [AIB ] = P [AB]
p [B ]
- -21 (1 .22)

We can also derive th is fact from the diagram by restrict ing our attention to the s ix
outcomes in B (t he cond it ioning event) a nd noting that three of the six outcomes in
B (one-ha lf of the tota l) are a lso in A.

Quiz 1.4
]\/I onitor three cor1secl1t ive packets goir1g through an Interr1et router. Classify each
one as eit her video ('v) or dat a (d) . Yotlr observatior1 is a seqtlence of three letters
(each one is either v or d) . For exam ple, three video packets correspor1ds t o vvv .
T he Otltcorr1es vvv and ddd each ha:ve probability 0.2 whereas each of t he other
ot1tcomes vvd, vdv, vdd, dvv , drvd, an d ddv has probabilit:y 0.1. Col1nt the nl1rnber
of video packets JV v in the three packets yol1 have observed. Describe in vvords and
also calc11late the follo,vir1g probabilit ies:
(a) P (JVv = 2] (b) P [Nv > 1]
( c) P [{ vvd} IJ\Tv = 2] (d) P({ ddv }INv = 2]
(e) P [Nv = 21Nv > 1] (f) P [Nv > 1 INv = 2]

1.5 Partitions and the Law of Total Probability

A partit ion d ivides the sarnple space into rn t1tt1a.lly excll1sive set s.
T he lavv of total probabilit}' expresses the probability of an even t
as the st1rr1 of t he p robabilit ies of Ol1tcomes t hat in the separat e
set s of a part ition.

Example 1.15
Fl ip four coins, a penny, a nickel, a dime, a nd a quarter. Examine the coins in order
( penny , then nickel, then dime, then q uarter) a nd observe whether each coin shows a
head (h,) or a tail (t) . W hat is t he sample s pace? How many elements are in the samp le

The sample space consists of 16 fo ur-lett er words, with each letter either h, or t. For
examp le, t he outcome tth,h, refers to the pen ny and the nicke l show ing tails and t he
dime and quarte r showing heads . The re are 16 members of t he sample space.


A B, c,

Figure 1.1 In t his example of T h eorem 1.8, t h e p ar tit ion is B = {B1, B2, B3, B4} and
Ci= _4 n Bi for i = 1, ... , 4. It sh ould be a ppa ren t t h at A = C 1 u C2 u C3 u C4 .

- - Example 1.16~=~

Continuing Example 1.15 , let Bi = {outcomes with i heads}. Each Bi is an event

contain ing one or more outcomes. For example, B 1 = {ttth,, ttht , t htt , h,ttt} contains
four outcomes. The set B = {B0 , B 1 , B 2 , B 3 , B 4 } is a partition . Its mem bers are
mutually exclusive and collectively exhaustive . It is not a sample space because it lacks
the finest-grain property. Learning that an experiment produces an eve nt B 1 tells you
that o ne coin came up heads, but it doesn 't tell you w hich coin it was.

The experirr1er1t in Exa rnple 1.15 and Exarnple 1.16 refers to a "to:y problem ,"
one that is easil}' visualized but isn 't sorr1ethir1g \Ve "''ould do in the course of our
professional "''ork. 11Iathernatica.lly, 11owe\rer, it is equivalent to rnar1y real engi-
neering problems. For ex ample, obser\re a pair of rr1oderns tra r1srr1itting fol1r bits
frorr1 one corr1puter t o another. For each bit, observe \vhet11er the receiving rr1odern
det ects the bit correctl}' (c) or rr1akes an error (e) . Or t est four integrat ed circuits.
For each one, observe "'' hether the circl1it is acceptable (a) or a reject (r) . Ir1 all
of t hese examples, the sarnple sp ace contains 16 fo11r-letter words formed \vith an
alp11ab et conta.inir1g tV\ro ~etters. If \Ve are ir1ter ested only in the r1urr1ber of tirnes
one of the lett ers occurs, it is sufficient t o refer only t o t 11e partit ion B , which does
not con tair1 all of t he inforrr1ation about t11e experiment but does cor1tain all of
the information -vve r1eed. The pa rtit ior1 is sirnpler to deal "''ith t 11an t h e sarnple
space because it h as fewer rr1err1bers (t here a re five e\rer1ts in the partit ion and 16
outcornes ir1 the sarr1ple sp ace) . The simplification is rr1ucl1 more signifia1nt \vl1en
the cornplexity of the experiment is 11igher. For exarr1ple, in t esting 20 circuits the
sarnple sp ace 11as 2 20 = 1,048,576 rr1err1bers, while the corresponding p art ition has
only 21 rr1err1bers .


We observed in Sect ion 1.3 that the en t ire theory of probability is based or1 a
t1nion of rr1t1tually exclusi ve event s . The followir1g t heorerr1 110-vv t o use a
partition to represent an event as a union of mutually exclusive events.

r---== Theorem 1.8

For a partition, B = { B 1 , B 2 , ... } an,d an,y e1Jerit A iri the sarnple spac e, let Ci =
A n B ,i . For i -=/= j, the e'uen,ts Ci an,d C.1 are rr1/1J.t'ually excl'Usive an,d

A = C1 U C2 U .

Figure 1.1 is a picture of T11eorerr1 1.8.

- - - Example 1.11- - -
ln t he coin-tossi ng experiment of Exa mple 1.15, let A eq ual t he set of outcomes with
less t ha n three heads :

A = {tttt, httt, thtt, ttht, ttth , hhtt, h,th,t , htth, tth,h , th,th,, th,h,t} . (1 .23)

From Examp le 1.16 , let Bi = {outcomes with i heads}. Sin ce {B 0 , ... , B 4 } is a pa r-

t it ion, T heorem 1.8 states that


In this exa mp le, B ,i c A, for i = 0, 1, 2. T herefore A n B i = Bi for i = 0, 1, 2. Also,

for i = 3 and i = 4, A n B i = 0 so that A = B 0 U B 1 U B 2 , a un ion of mutua lly
exc lusive sets. In words , t his examp le states that the event "less than three heads" 1s
the un ion of events "zero heads," "one head," and "two heads. "

We advise }'Oll t o rr1ake sure }'Oll l1nderstand Theor err1 1.8 and Example 1.17.
]\/! any practical problerns use the rr1athernatical techniq11e contained in t11e theorerri.
For exarr1ple, find t11e probabilit}' that there are three or rnore bad circ11its ir1 a batcl1
that cornes frorn a fabrication m achine.
The followir1g theorern refers t o a partitior1 {B1 , B 2 , ... , Brn} and any event , A.
It stat es that we can find t11e probability of A by adding t11e probabilities of the
parts of A t11at are in the separate componer1ts of the event space.

=== Theorem 1.91===

For ariy even,t A , arid partit'ion, { B1 , B2, ... , B rn},

P [A] = L P [A n B,i] .
i= l

P roof The proof follows d irectly from Theorem 1.8 a nd T heorem 1.3. In t his case, the
mutually exclusive sets are C i = {-4 n B i } .


Theorerr1 1.9 is oft en used wh en the sarr1ple space can be vvritten in the forrr1 of a
table. In this table, the rows and columns each represent a partition. This rnethod
is s11own in t he follovving exarr1ple.

Example 1.18
A company has a model of email use. It classifies a ll emails as e ither long (Z) , if they
a re over 10 MB in size, o r brief (b). It also observes whether the ema ii is just text
(t), has attached images (i), or has an attached video (v). T his model implies an
experiment in which the procedure is to monitor an email and the observation consists
of the type of email, t, i , or v, and the length, l orb. The sample space has six
outcomes: S = {lt ,bt ,li,b'i ,lv,bv} . In this problem, each email is classifed in two
ways: by length and by type . Using L for the event that an email is long and B for the
event that a email is brief, {L, B} is a partitio n. Similarly , the text (T) , image (I), and
video (V) classification is a partition {T, I , V}. The sa m pie space can be represented
by a table in which the rows and columns are labeled by events and the intersection of
each row and column event contains a single outcome. The corresponding table entry
is the probability of that outcome . In this case , the table is

L 0.3 0.12 0. 15 (1.25)
B 0.2 0.08 0. 15

For example , from the tab le we can read that the probab ility of a brief image email is
P[bi] = P[BI) = 0.08. Note that {T, I , V} is a partition corresponding to {B1, B 2, B 3 }
in Theorem 1.9. Thus we can app ly Theorem 1.9 to find the probability of a long ema il:

P [L) = P [LT) + P [LI) + P [LV ) = 0.57. (1 .26)

Law of Tota l Probability

In m an}' applicatior1s, we begin v.rith inforrr1ation about conditional probabilit ies
and use the la;\v of total probability to calculate unconditional probabilities .

r---==T heorem 1.10 Law of Total Probability

For a partition, {B1, B2, . . . , Brn} 'llJ'ith P [Bi] > 0 for all i,
p [A) = L p [AlBi] p [B.i ] .
i =l

Proof This follo,vs from Theorem 1.9 and t he identity I=> [ABi ] = l=> [AIBi] P[Bi], which is a
direct consequence of t he definition of conditional probability.

The usefulness of t11e result can be seen in the next example.



==;;.. Example 1.l lJ--==

A company has t hree machin es B 1 , B 2 , and B 3 making 1 kn resistors. Resistors
withi n 50 n of the nom ina l value are considered acceptable. It has been observed that
80% of the resistors produced by B 1 and 90% of the resistors produced by B 2 are
acceptab le. The percentage for mach ine B 3 is 60%. Each hour, machine B 1 produces
3000 resistors, B 2 produces 4000 resistors , and B 3 produces 3000 resistors. All of the
resistors are mixed together at ra ndom in one bin and packed for shipment. What is
the probabi lity t hat the company ships an acceptable resistor?

Let A= {resistor is acceptable}. Using the resistor accuracy information to formulate

a probabi lity model, we write


The production figures state that 3000 + 4000 + 3000 = 10 ,000 resistors per hour are
prod uced. The fraction from machine B 1 is P[B1) = 3000/ 10,000 = 0.3. Simi larly,
P[B2 ) = 0.4 and P[B 3) = 0.3. Now it is a sim ple matter to apply the law of total
probabi lity to find the acceptable probabil ity for all resistors shi pped by the compa ny:

P [A) = + P [AIB2) P [B2) + P [AIB3) P [B3)

P [AIB1] P [B1] (1.28)
= (0.8)(0.3) + (0.9)(0.4) + (0.6)(0.3) = 0.78. (1 .29)

For the whole factory , 78% of resistors are withi n 50 n of the nomina l value.

Bayes' Theorem
W hen vie have advance information about P[A IB ] and need to calcl1late P[BIA],
"''e refer to t he follovving forrr1ula:

- - -Theorem 1.11== Bayes' theorem

p [A) .


p [Bl A) = P [AB ] = P [A IBJI=> [B J. (1.30)

- P [A) P [A)

Bayes ' theorerr1 is a simple conseql1ence of the definition of conditional probability.

It has a narr1e because it is extrernely useful for rr1aking inferer1ces about phenomer1a
that car1r1ot be observed directly. Sorr1etirr1es tr1ese inferer1ces are described as "rea-
sonir1g about causes \iVr1en we obser\re effects ." For exarnple, let {B 1 , ... , Brn} be
a partit ion that includes a ll possible states of something that interests us but tr1at


"''e cannot obser ve directly (for exarnple) the rnachir1e that rnade a particular resis-
t or ) . F or each possible sta,te, B i, kr1ovv the prior probability P [B.i:] and P [A IBi],
the probability that ar1 ev ent A occurs (the resist or rneets a quality criterion) if
B i is the actl1al st ate. No\v we obser\re the actl1al even t (either t 11e resistor passes
or fails a t est ) , a rid \ve abou t t11e t11ing "''e a re inter est ed in (t11e m ac11ines
that migh t have produced the resistor) . That is) vve use Ba:yes' t heorern to find
P[B 1 IA], P [B 2IA], ... , P[B1nl A]. In perforrr1ing t he calculations, \Ve use t11e laV\r of
t otal probabilit}' t o calc11lat e t he denorninat or ir1 T 11eorerr1 1. 11. Thus for st ate B .i,


c:::== Example 1.20

In Example 1.19 about a shipment of resistors from the factory, we learned t hat :
Th e probabi lity that a resistor is from machine B 3 is P[B3 ] = 0.3 .
T he probability that a res istor is acceptab/e-i.e., w ithin 50 D of the nomina l
value - is P[A] = 0.78.
Given that a res istor is from ma ch in e B 3 , the conditiona l probabi lity that it is
acceptable is P[AIB 3] = 0.6.
What is the probabi lity that an acceptable resistor comes from m ach ine B 3?
Now we are g iven the event A that a resistor is with in 50 D of the nominal value, and
we need to find P [B 3 IA]. Using Bayes' t heorem , we have

p [B IA] = p [AIB 3] p [B3] (1.32)

3 P[A] .
Since all of the quantities we need are given in the problem description, our answer is

p [B3IA] I
= (0.6) (0.3) (0. 78) = 0.23. (1.33)
Similarly we obtain P [B 1 IA] = 0.31 and P [B 2IA] = 0.46. Of all resistors within 50 D
of t he nominal value, o nly 23% come from machine B 3 (even though this m ach in e
produces 30% of al l resistors). Machine B 1 produces 31% of the resistors that meet
the 50 D criterio n and machine B 2 produces 46% of them .

Q uiz 1.5
]\/Ior1itor cust orner beh avior in t 11e Phonesrnart store. Classify t he behavior as b l1y-
ing (B ) if a custorner pl1rchases a sm artphor1e. Ot hervvise t he beh avior is no pur-
chase ( N) . Classify t 11e tirne a customer is in the st or e as long ( L ) if t he Cl1storner
sta}'S rnore t h ar1 t hree rnir1utes; otherwise classify t he arnount of t ime as r apid
(R). Based on experien ce with ma n}' c11storners, we llSe t he prob ability model
P[N ] = 0.7 , P[L] = 0.6 , P[N L] = 0.35. Fir1d t 11e following probabilities :
(a) P [B u L] (b) P[JVU L]
(c) P[N U B] (d) P[LR]


1.6 Independence

T wo events independent if observing one ever1t does not cha.r1ge

the probability of observing t he other ever1t .

Definition 1.6 --== Two Independent Events

E'uerits A an,d B are independent 'if an,d on,ly if

P [AB] = P [A] P [B ] .

vVhen events A and B ha\re r1onzero probabilities, the follov.rir1g formulas are equiv-
alent to t11e definition of ir1dependent events :

P [AIB] = P [A] , P [B IA] =P [B] . (1 .34)

To interpret independer1ce, consider probabilit:y as a descriptior1 of our knov.rledge

of the result of the experiment. P [A] describes our prior kr10\vledge (before the
experirnent is perforrr1ed) that the outcorne is included ir1 event A. The fact that
the outcorne is in Bis partial inforrnatior1 about the experirr1ent . P[A IB ] reflect s 011r
knowledge of A vvhen vve learn that B occurs. P[AIB] = P [A] states that learr1ing
that B occt1rs does not cha.r1ge our information about A. It is in this sense that t he
events are independent .
Problern 1.6.11 asks the reader to prove that if A and B are independent, then
A and B e are also indeper1der1t . The logic be11ir1d this cor1cl11sion is that if learnir1g
that event B occurs does riot a lter the probability of event A , t11en learning t11at B
does not occt1r also shot1ld not alter t11e probability of A.
K eep ir1 rr1ind that i ndepende nt a nd mut u ally excl u sive a r e not syn-
o n y m s. Ir1 sorne contexts these words can h ave sirnilar rr1eanir1gs, bt1t this is not
the case in probability. ]\/Iutuall}' exclt1sive events A and B have no outcornes in
comrnon and therefore P [AB] = 0. Ir1 most sit11atioris independer1t events a re not
rr1ut11ally ex clusi\re! Exceptions occur only wher1 P [A] = 0 or P [B] = 0. vV11er1 11ave to calculate probabilities, knov.rledge that events A and B are rn'ut'ually
excl'usive is ver}' 11elpf111. Axiorr1 3 en ables us to add t11eir probabilities t o obtain
the probability of the 'Un,ior1,. Kno\vledge that e\rents C and D are in,depen,den,t is
also ver}' useful. Definit io n 1.6 en ables us to rn11,ltiply their probabilities t o obta in
the probability of the ir1,tersectior1,.

=== Example 1.21==::::a

Suppose that for the experiment mo nito rin g three purchasing decisions in Example 1.9,
each outcome (a sequence of t hree decisions, eac h either buy or not buy) is eq ua I ly
likely. Are the events B 2 that the second custome r purc hases a phone and f\T2 that the
second customer does not purchase a phone independent? Are the events B 1 and B 2

Each element of the sa m p ie space S = {bbb, bl!r1,, brib , bn:n,, n,bb, n,bri, n/nb, n,rin,} has


probability 1/ 8. Each of the eve nts


contains four outcomes , so P [B 2 ] = P [N 2 ] = 4/ 8. However, B 2 n

f\T2 0 and
P [B 2N 2] = 0. That is, B 2 and f\T2 are mutua lly exclusive beca use the second cus-
tomer cannot bot h purchase a phone and not purchase a phone. Since P [B 2 N 2 ] ':I
P [B2] P [N2], B 2 and f\T2 are not independent. Learning whet her or not t he event B2
(second customer buys a p hone) occurs drastica lly affects our knowledge of whether
or not the event N 2 (second customer does not b uy a phone) occurs. Each of t he
events B 1 = {brin,, bn,b, bbri , bbb} and B 2 = {bbri , bbb , n,br1,, n,bb} has four o utcomes,
so P [B 1 ] = P [B2] = 4/ 8 = 1/ 2. In t his case , the intersection B 1 n B 2 = {bbn,, bbb}
has probability P [B1 B 2] = 2/ 8 = 1/ 4. Since P [B 1 B 2] = P [B 1 ] P [B 2], events B 1 and
B 2 are independent. Learning w het her or not the event B 2 (second customer buys
a phone) occurs does not affect our k nowledge of whet her or not the event B 1 (first
customer buys a phone) occurs.

In this exarr1ple -vve ha\re analyzed a probability model to deterrnir1e 'ivhether two
events are independent. In rr1any p ractical applicatior1s 'ive reason in t he opposite
direction. Our kr1owledge of an experirnent leads us t o ass?J,rne that certain pairs of
events are independent . We t hen llSe t:his kno-vvledge t o b11ild a probability rr1odel
for the experirnent.

-= Example 1.22
Integrated circuits undergo two tests. A mecha nical test determines whether pi ns have
the correct spaci ng , and an electrical test checks the relationshi p of outputs to inputs .
We assume that electrica l fa ilures and mechanica l fai lures occur independently. O ur
information about circuit production tel ls us that mec han ica l failures occur with prob-
ability 0.05 and electrical failures occur wit h pro bability 0.2. What is t he probab ility
model of an experiment that consists of testing an integrated circu it and observing the
resu lts of the mechan ical and electrical tests?

To bu ild the pro bability model, we note that t he sample space contains four outcomes :

S = {(rna, ea) , (rn,a ,er) , (rnr, ea) , (rnr, er)} (1.36)

where rn denotes mechanical, e denotes electrica l, a denotes accept, and r denotes

reject. Let M and E de note the events that the mechanical and electrical test s are
acceptable. Our prior informatio n tells us that P [J\Jc] = 0.05, a nd P [Ec] = 0.2.
This implies P [JIJ] = 0. 95 and P [E ] = 0. 8. Using the independe nce assumption and
Definition 1.6, we obtain t he probabil ities of the four o utcomes:

P [(rn,a,ea) ] = P [ME] = P [M] P [E] = 0.95 x 0. 8 = 0.76, (1.37)

P [(ma, er) ] = P [1\11 E c] = P [1\11] P [E c] = 0.95 x 0.2 = 0.1 9, (1.38)
P [('1T1,r, ea) ] = P [McE ] = P [1\llc] P [E ] = 0.05 x 0.8 = 0.04, (1.39)
P [(rnr, er) ] = P [McE c] = P [J\Jc] P [Ec] = 0.05 x 0.2 = 0.01. (1.40)


T11us far , have con sider ed independen ce as a propert:y of a pair of ever1ts .
Often we consider larger sets of independer1t events. For rnore thar1 tvvo events to
be in,de1Jeriderit, the probabilitJr rr1odel has to rneet a set of conditions. To define
rnutual independence, vve begin \vitr1 tr1ree set s.

Definition 1.7 Three Independent Events

A 17 A2; an,d A 3 are mutually independent 'i f an,d or1,ly if
(a) A 1 an,d A 2 are iridepen,den,t)
{b) A 2 an,d A 3 are iridepen,den,t)
(c) A 1 an,d A 3 are iridepen,den,t7
( d) P[A1 n A2 n A 3] = p [A1 ] P [A2] P [A3].

The final cor1dition is a sirr1ple exter1sion of Definition 1.6. The follovving exarr1ple
shows \Vh}' t his cor1dition is ir1sl1fficient to guarantee that "everything is ir1dependent
of e\rerythir1g else," the idea at the 11eart of indepen dence.

c:::== Example 1.23

In an experiment with equiprobable outcomes, the partition is S = {1, 2, 3, 4} . P[s] =
1/ 4 for al l s ES . Are the events A 1 = {1 , 3, 4}, A2 = {2, 3, 4} , and A3 = 0 mutual ly

T hese three sets satisfy the fina l cond it ion of Definition 1.7 becauseA 1 n A 2 n A 3 = 0,

(1 .41)

However, A 1 and A 2 are not independent because, with all outcomes equ iprobable ,

P [A1 n A2] = P [{3 , 4}] = 1/ 2 ~ P [A1] P [A2] = 3/4 x 3/ 4. (1 .42)

Hence the three events are not mutua lly independent.

T11e definition of an a.rbit rary nurr1ber of rnutually independent events is an

extension of Defir1ition 1. 7.

Definition 1.8 More than Two Indepe ndent Events

If ri > 3 ) the even,ts A 1 , A 2 , ... , A n are rri'utually in,deperiden,t if an,d on,ly if
(a) all collectioris of n, - 1 even,ts chosen, frorn A 1 , A 2 , ... A n are rr1/,ally 'iride-
p eriden,t7
{b) P[A1 n A2 n n An] = P [A1] P [A2] P[An].

1. 7 l\llA TLAB 27
This defir1it ion and Exa rr1ple 1.23 shov.r us that v.r11en ri > 2 it is a corr1plex m atter
t o det errnir1e w11et l1er or not ri e\rents ar e rr1ut ually ir1dep er1dent. On t h e ot11er
hand ) if we kno-vv t 11at n, events are rnt1t tu.1lly ir1dependent ) it is a sirr1ple m atter t o
det errnine the probability of t11e ir1tersection of any st1bset of the n, events. Just
rr1ultipl:y t11e probabilities of t he events in the su bset .

Quiz 1.6
l\llonit or two consecutive packets going t hrough a router. Classify each or1e tiS video
(v) if it v.ras sent frorr1 a Y out ube server or as ordir1a ry d at a ( d) otherv;.rise. Yot1r
observation is a seqt1ence of t wo letters (either v or d) . For ex ample) two video
p ackets correspor1ds to vv . The t packets are indeper1dent and t he probability
t h at any one of t 11em is a v ideo p acket is 0 .8. Denote t 11e ident ity of p acket i by C,i .
If packet i is a video p acl<:et ) then Ci = v; other-vvise) Ci = d. Cour1t t h e r1urr1ber
JVv of video packets in t he t wo packets } ' OU have obser\red. Deterrr1ine w11ether t 11e
following pairs of e\rents independer1t:
(a) {Nv = 2} arid {J\Tv > 1} (b ) {Nv > 1} and {C1 = v}
(c) {C2 = v} and {C 1 = d} (d ) {C2 = v} and {Nv is even }

1. 7 1\1.IATLAB

The l\ilATLAB progr amming enviror1rnent can be used for stud}ring

probability models by perforrr1ing nl1rnerical calculatior1s) sirr1ulat -
ing experimer1ts) ~tnd drawing graphs. Simt1lat ior1s rnake extensive
llSe of t he 1\11.A.TLAB random nl1rnber generator rand. In addition
to introducing asp ect s of probability theory, each ch apter of t his
book concludes w it h a section that uses l\llATLAB t o dernor1strate
wit h nurr1erical examples t he concepts presented in t 11e c11apter.
All of the l\llATL.A.B prograrr1s in this book can be downloaded frorr1
the companion website. On the other hand ) t he l\llATLAB sections
are not essential t o t1nderstanding t11e theor:y. You can use t11is
t ext to learn probability wit hout llSing l\llATLAB.
Engineers studied and applied probability t heory long before t 11e invent ion of 1\11.A.T-
LAB . Nevertheless) l\ll ATLAB provides a con\rer1ient programmir1g enviror1rnent for
sol,ring prob ability problerr1s and for building rr1odels of probabilistic systerr1s . v er-
sior1s of MATLAB, includir1g a lo\v-cost st t1dent edition, are available for most corn-
pl1ter S}rsterr1s .
At t he end of each c11a pter ) we incl t1de a l\ll ATLAB section (like t his one) t 11at
ir1troduces -vva}rs that l\llATLAB car1 be applied t o t he cor1cepts and problerns of t he
chapter. vVe assurne } ' OU already h a,re sorne farnilia.rity wit11 t he b asics of rur1nir1g
M ATLAB. If you do not, -vve encourage you t o investigate t he b11ilt-in t l1t orial, books
dedicated to M ATLAB. arid variot1s vVeb resources.


MATLA.B can be used two ways to stt1dy and appl}' probability t heory. Like a
sophisticated scient ific ca.lculator , it can perform cornplex n11merical calct1lations
and draw gr aphs. It can also simt1late experirr1ents -vvith random outcorr1es . To
sirr1ulate experirr1ents, vie need a so11rce of r andomr1ess. MATLAB t1ses a computer
algorit hm, referred to as a pse'udora,n,dorn 'n11,rnber gerierator, to produce a sequence
of nurr1bers betv.reen 0 and 1. Unless sorr1eone kno-vvs t he algorithm, it is irr1possib le
to ex amine sorr1e of the nurr1bers ir1 t11e seqt1ence and t11ereb}' calculate others .
The calculation of each randorn n11mber is sirnilar to ar1 experirnent in -vvhich all
outcornes are equally likely and the sarr1ple space is all bir1ary nt1rr1bers of a cer tain
lengtli. (The length depends on the rnachine r11rming MATLA.B .) Each r1urr1ber
is interpreted as a fraction , wit11 a binary poir1t preceding t he bits in the binary
ntm1ber. To llSe the pset1dorandom nt1rr1ber generator t o sirr1t1late an experirr1en t
that contair1s an ever1t -vvith probability r>, vie examine one nurr1ber , r , prod11ced b}'
the J\II ATLAB a lgorit11m ttr1d Sa}' that the event occurs if r < p; otherwise it does
not occur.
A MA.TLAB sirnt1lation of an experirr1ent start s wit11 rand: the randorr1 number
generator rand Cm, n) returns ar1 rn x 11, arra}' of pset1dorandorn r1t1mbers. Sirnilarly,
rand(n) prod11ces a n ri x 11, array and rand(1) is jt1st a scalar randorn r1urnber.
Each nurr1ber produced by rand(1) is in t he interval (0 , 1). Each tirr1e we use rand,
-vve get new, seemingly ur1predictable n11mbers. Suppose I> is a n11mber bet-vveen 0
and 1. The compa risor1 rand( 1) < p prodt1ces a 1 if the ra ndom r1urr1ber is less
than r>; otherwise it produces a zero. Roughly speaking, the fur1ction rand( 1) < p
simulates a coir1 flip -vvit h P [tail] = J> .

===- Example 1.24--===i

>> X=rand ( 1, 4) Since rand(1,4) < 0. 5 compares four ran-

x= dom numbers against 0.5, the result is a ran-
0.0879 0.9626 0.6627 0.2023 dom sequence of zeros and ones that simulates
>> X<0.5 a sequence of four flips of a fair co in. We as-
ans - sociate the outcome 1 with {head } and 0 with
1 0 0 1 {tail} .

MATLAB also has sorne cor1venient v ariations on rand. For exarnple, randi (k)
gener ates a r ar1dom integer from t11e set {1 , 2, ... , k} and randi (k,m,n) ger1erates
an rn x 11, array of such random integers.

Example 1.25
Use MATLAB to generate 12 random student test scores T as described in Quiz 1.3.

Since randi (50, 1, 12) generates 12 test scores from t he set { 1, ... , 50}, we need
on ly to add 50 to each score to obtain test scores in the range {51 , ... , 100}.
>> 50+randi(50,1,12)
ans =
69 78 60 68 93 99 77 95 88 57 51 90


Finally, note t h at l\IIATLAB 's rar1dom nurr1bers are only seerningly llnpredictable.
Ir1 fact, l\IIATLAB rr1aintair1s a seed val11e that determines the s11bseq11ent "random"
nt1rr1bers that vvill be returned. T11is seed is controlled by the rng f\1r1ction; s =rng
saves t11e current seed arid rng(s) restores a previously saved seed. Initializing the
r andorr1 nurr1ber generator 'ivith the sarr1e seed al'ivays generates t11e sarne sequence:

Example 1.26
>> s=rng;
>> 50+randi(50,1,12)
ans =
89 76 80 80 72 92 58 56 77 78 59 58
>> rng(s);
>> 50+randi(50,1,12)
ans =
89 76 80 80 72 92 58 56 77 78 59 58

vVhen you run a sirnulation t h at uses rand, it r1ormally doesn't m att er 11ow the
rng seed is initialized . Hovvever, it can be instruct ive to use the sam e repeatable
seqt1ence of rand values 'ix.rhen yot1 are debugging }' Our sirnulation.

===Quiz 1. 7c===:a
The n11mber of c11aracters ir1 a t 'iveet is equally likely to be an y integer betV\reen 1
and 140. Sirr1ulate an experirr1ent that generates 1000 tV\reets and counts t11e number
of "long" t weets that 11ave over 120 ch aracters . R epeat this experirr1er1t 5 t imes.

Difficulty: Easy Moderate D ifficu lt t Experts Only

1.1.1 Continuing Quiz 1.1 , write Ger- 1.1.3 Ricardo 's offers customers two kinds
landa's ent ire menu in words (supply prices of pizza crust, Roman (R) and Neapolitan
if you 'ivish). (N). All pizzas have cheese but not all piz-
zas have tomato sauce. Roman pizzas can
1.1.2 For Gerlanda's pizza in Quiz 1.1, an-
have tom ato sauce or t hey can be white
swer t hese questions:
(W); Neapolitan pizzas always have tomato
(a) Are N and M mutually exclusive? sauce. It is possible to order a Roman pizza
(b) Are N, T, and M collectively exhaus- with mushrooms (JV!) added. A Neapolitan
t ive? pizza can contain mushrooms or onions ( 0)
or both , in addit ion to t he tomato sauce and
( c) Are T and 0 mutually exclusive? State
cheese. Draw a v enn diagram t hat shows
t his condition in 'ivords.
the relationship among t he ingredients N,
( d) Does Gerlanda's m ake Tuscan pizzas 111, 0 , T, a nd W in t he menu of Ricardo's
wit h mushrooms and onions? . .
(e) Does Gerlanda's m ake Neapolitan piz-
zas t hat have neit her mushrooms nor 1.2.1 A hypothe t ical w i-fi transmission
onions? can take place at any of t hree speeds


d ep ending on t he condition of t he r ad io (e) W hat a re t he elem ents of t he sets

cha nnel between a lap top a nd a n access
point . The speed s are high (h) at 54 Mb/s, C = {more t han one circuit accep table},
med ium (m) at 11 ~1b/s, a n d low (l) at D = {at least two circuits fa il}.
1 lVIb/s. A user of t he w i-fi co nnection can
transm it a short signal corresponding to a ( f) Are C and D mut ually exclusive?
mouse click ( c), or a long sign al correspond- (g) A re C and D collectively exhaustive?
ing to a tweet ( t) . Consider t he experiment
of monitoring \Vi-fi s ignals a nd obser ving 1.2.3 Shuffle a d eck of cards and t urn over
t he t r ansm ission speed and t he length. A n t he first card. \i\fhat is t he sam ple space of
observation is a t\vo-letter wor d, for exam- t his exper iment? How m any ou tcomes are
ple, a high-speed, m ouse-click transm ission in t he even t t hat t he first car d is a hear t?
is hm,.
1.2.4 F ind out t he birt hday (mon t h and
(a) W hat is t he sample space of t he exper- d ay b u t not year) of a randomly chosen per-
imen t?
,. son. \i\f hat is t he s ample sp ace of t he ex-
(b) Let _4. 1 be t he event " m edium speed per im ent? Ho\v m any ou tcomes ar e in t he
co nnection." W hat are t he ou tcomes event t hat t he person is born in J uly?
. -A 1 ?.
1.2.5 T he sample space of an exper imen t
( c) Let A2 be t he eve nt " mouse click ." consists of all undergr aduates at a univer-
\i\f hat are t he outcomes in A 2? sity. G ive four examples of par t it ions.
( d) L et A 3 be t he eve nt " hi gh s peed 1.2.6 T he sample space of an exper imen t
co nnection or low speed connection."
consists of t he ineasured resistances of two
\i\f hat are t he outcomes in A 3?
resistors . Give fou r exa1nples of part it ions.
( e) Are A 1 , A 2, a nd _4 3 mut ually exclu-
sive? 1.3.1 F ind P [BJ in each case:

( f) Are Ai, A2, and A3 collectively exhaus- (a) Events A and B ar e a pa rt ition and
t ive? l=> [AJ = 3 P [B J.
(b) For even ts A and B , P [A U BJ = P [AJ
1.2.2 _An integrated circu it factory h as and P [A n BJ = 0.
t h ree machines X, Y, a nd Z . Test one in-
(c) For events _4. and B , P [A U BJ = P [AJ-
tegr ated circuit prod uced by each machine.
l=> [BJ.
E it her a circuit is accep tab[e (a) or it fa ils
(f) . _An observation is a sequence of t hree 1.3.2 You r oll two fair six-sided d ice; one
test results correspo nding to t he circuits d ie is red , t he other is \vhite. Let R i be t he
from m achines X, Y, and Z , respectively. event t hat t he red d ie rolls i. Let vVj be t he
For example, aaf is t he observation t hat event t hat t he white d ie rolls j .
t he circuits fr om X and Y pass t he test and
t he circuit from Z fails t he test. (a) W hat is P [R3W2J?

(a) W hat are t he elements of t he sample (b) \tV hat is t he P [S5J t hat t he sum of t he
space of t his experiment? t \vo rolls is 5?

(b) W hat are t he elem ents of t he sets 1.3.3 You r oll two fair six-sided d ice.
F ind t he probability P [D 3J t hat t he abso-
Z F = {circuit from Z fails} , lu te value of t he difference of t he d ice is 3.
XA = {circuit from X .is accep table} .
1.3.4 Indicate \vhether each statemen t is
(c) Are Z p and XA mut ua lly exclusive? t r ue or false.

(d) Are Z p and XA collectively exhaus- (a) If P [-4.J = 21=>[Ac), t hen P [AJ = 1 / 2.
t ive? (b) For all A and B , P [ABJ < P [AJ P [BJ.


( c) If P[A] < P[ BJ, t hen P [AB] < P[ B J. 1.3.10 Use Theorem 1.4 to prove the fol-
lo,ving facts:
(d) If P[A n B J = P[A], t hen P[A] > P[B].
(a) P[A U BJ > P[-4]
1.3.5 Computer programs are classified by (b) P [_4 uB] >P[B]
the length of the source code and b y the (c) P[A n BJ < P[A]
execution t ime. l=>rograms w ith more t han (d) P [A n B J < P[B]
150 lines in t he source code are b ig ( B ).
Programs \Vit h < 150 lines are li ttle (L). 1.3.11 Use Theore1n 1.4 to prove by in-
Fast programs (F) run in less than 0.1 sec- duction the 7J,nion bound: For any collection
onds. Slow programs (W) r equire at least of events A1, ... , _4n,
0. 1 seconds. l\/Ionitor a program executed n
by a computer. O bserve the length of the
source code and t he run time. The prob-
P [A1 U A2 U U -4n] < LP [Ai] .
i =l
ability model for this experiment contains
the follo\ving informat ion: P [LF] = 0.5 , 1.3.12 Using only t he three axioms of
P[BF] = 0.2 , and P [BW] = 0.2. \i\!hat is probability, prove P[0] = 0.
the sample space of t he experiment? Calcu-
late the follo,ving probabilities: P [W], P[B], 1.3.13 Using t he three axioms of p roba-
bility and the fact that P [0 ] = 0, prove
and P[vV u BJ.
Theorem 1.3. Hint: Define _4i = Bi for
1.3.6 There are two types of cellu lar i = 1, . . . , m and _4 i = 0 for i > 1n.
phones, handheld phones (H) that yo u 1.3.14 For each fact stated in Theo-
carry and mobile phones (M) t hat are rem 1.4, determine \vhich of the t hree ax-
mounted in ve hicles. Phone calls cru1 be ioms of probability are needed to prove the
classified by the t raveling speed of the user fact.
as fast (F) or slo\v (W). l\/Ionitor a cellular
phone call and observe the type of telephone 1.4.1 Mobile telephones perform handoffs
and the speed of the user. The probability as they move from cell to cell. During a
model for this experiment has the follow- call , a telephone either performs zero hand-
ing information: P [F] = 0.5, P [HF] = 0.2, offs (Ho), one handoff (H1), or more than
P[MW] = 0.1. \tVhat is the sample space of one handoff (H 2 ) . In addition, each call is
the experiment? Find the follo,ving proba- either long ( L), if it lasts more than three
bilities P [W], P [MF], and P [HJ. m inutes, or b rief ( B). The following table
describes the probabilities of t he possible
1.3.7 Shuffle a deck of cards and turn over types of calls .
t he firs t card. What is the probability t hat
the first card is a heart?
L O.l 0.1 0.2
1.3.8 You have a six-sided die t hat you B 0.4 0.1 0.1
ro ll once and observe the number of dots
facing up,vards. \t\fhat is the sample space?
(a) What is the probability that a brief call
\i\fhat is the probability of each sample out-
\vill have no handoffs?
come? \i\fhat is the probability of E, the
event that t he roll is even? (b) \i\fhat is t he probability that a call with
one handoff \Vill be long?
1.3.9 A student's score on a 10-point quiz (c) \t\fhat is t he probability that a long call
is equally likely to be any i11teger bet\veen \vill have one or more handoffs?
0 and 10. \tVhat is the probability of an _4 ,
'vhich requires the st udent to get a score 1.4.2 You have a six-sided d ie that you
of 9 or more? \i\fhat is the probability t he roll once. L et Ri denote the event that
student gets an F by getting less than 4? the roll is i. Let Gj denote t he event t hat


the roll is greater t han j. Let E denote pea has yello'v seeds. In one of Mendel's ex-
the event that the roll of the die is eve n- periments , he started \Vith a parental gen-
numbered. eration in which half the pea plants \Vere yy
(a) W hat is P[Rs lG1], the conditional and half the plants \Vere gg. The two groups
probability t hat 3 is rolled given t hat were crossbred so that each pea plant in the
the roll is greater than 1? first generation \Vas gy. In the second gen-
eration, each pea plant \Vas equally likely
(b) What is the conditional probability to inherit a y or a g gene from each first-
that 6 is rolled given t hat the roll is generation parent. V\fhat is the probability
greater than 3? P [Y] that a randomly chosen pea plant in
( c) \i\1 hat is P [Gs IE], t he conditional prob- the second generation has yello'v seeds?
a b ili ty that the roll is greater than 3
1.4.6 Fi-om Problem 1.4.5, what is the
given that the roll is even?
conditional probability of yy, that a pea
(d) G iven that the roll is greater than 3, plant has two dominant genes given the
what is the conditional probability that event Y that it has yellow seeds?
the roll is even?
1.4.7 You have a shuffled deck of three
1.4.3 You have a shuffled deck of three cards: 2, 3, and 4, and yo u deal out the
cards : 2, 3, and 4. You dra':v one card. Let three cards. Let Ei denote the event that
Ci denote the event t hat card i is picked. ith card dealt is even numbered.
Let E denote the event t hat the card cho- (a) \iVhat is P[E2 IE1], the probability t he
sen is a even-numbered card. second card is even given that the first
(a) What is P[C2IE], the probability that card is even?
the 2 is picked g iven that an even- (b) \tVhat is the conditional probability
n umbered card is chosen? that the first t\vo cards are even given
(b) What is the conditional probability that the third card is even?
that an even-numbered card is picked ( c) Let Oi represent t he event that the ith
given that t he 2 is picked? card dealt is odd numbered. W hat is
P[E2 I01], the conditional probability
1.4.4 Phonesmart is having a sale on Ba-
that the second card is even given that
nanas. If you buy one Bana11a at full price,
the first card is odd?
you get a second at half price. \tVhen cou-
ples come in to buy a pair of phones , sales ( d) \iVhat is the conditional probability
of Apricots and Bananas are equally likely. that the second card is odd given that
Moreover, given that the first phone sold the first card is odd?
is a Banana, the second phone is twice as
likely to be a Banana rather than an Apri- 1.4.8 Deer t icks can carry both Ly me dis-
cot. What is the probability that a couple ease and human granulocytic ehrlichiosis
buys a pair of Bananas? (HGE). In a study of t icks in the ~1id,vest,
it was found t hat 16% carried Ly me d is-
1.4.5 The basic rules of genetics \Vere d is- ease, 10% had HGE, and that 10% of the
covered in mid-1800s by ~1endel , who found ticks that had either Ly1ne disease or HGE
that each characteristic of a pea plant, such carried both diseases.
as \vhether the seeds \Vere green or yello,v,
is determined by two genes, one from each (a) What is t he probability P [LH] that a
parent. In his pea plants, Mendel fo und t ick carries both Ly me disease ( L) and
that yello\v seeds \Vere a do1ninant trait over HGE (H)?
green seeds. A yy pea with two yellow genes (b) \iVhat is the conditional probabili ty
has yello'v seeds; a gg pea \Vith two reces- t hat a tick has HGE given that it has
sive genes has green seeds; a hybrid gy or yg Ly me d isease?


1.5.1 G iven the model of handoffs and call 1.6. 1 Is it possible for A and B to be in-
lengt hs in Problem 1.4.1, dependent events yet satisfy A = B?
(a) What is the probability P[Ho) that a 1.6.2 Events A and B are equiproba-
phone makes no handoffs? ble, mutually exclusive, and independent.
(b) What is t he probability a call is brief? What is P[A]?

(c) \i\fhat is the probability a call is long or 1.6.3 At a P honesmart store, each phone
there are at least two handoffs? sold is twice as likely to be an Apricot as a
Banana. _Also each phone sale is indepen-
1.5.2 For the telephone u sage model of dent of any other phone sale. If you monitor
Example 1.18, let Brn denote the event that the sale of t'vo phones, what is the probabil-
a call is billed for m, minutes. To generate a ity that the two phones sold are the same?
phone bill, observe t he duration of the call 1.6.4 Use a \ ! enn diagram in 'vhich the
in integer minutes (rounding up). Charge event areas are proportional to t heir prob-
for M minutes JV! = 1, 2, 3, .. . if the exact abilities to illustrate t'vo events A and B
duration T is M - 1 < t < M. A more that are independent.
complete probability model sho,vs that for
m, = 1, 2, . . . the probability of each event 1.6.5 In an experiment, A and B are mu-
Brri is
tually exclusive events 'vith probabilities
P [A) = 1 /4 and P[B) = 1/8.
(a) Find P[A n B J, P [A u B J, P [A n B e],
and P[A U B e) .
'vhere a = 1 - (0.57) 113 = 0.171.
(b) Are A and B independent?
(a) Classify a call as long, L, if the call
lasts inore than three minutes. \i\fhat 1.6.6 In an experiment, C and D are in-
is P [L)? dependent events with probabilities P [C ) =
5/8 and P[D) = 3/8 .
(b) What is the probabilitJr that a call will
be billed for nine minutes or less? (a) Determine the probabilities P[C n DJ,
f>[C n D e), and P [Cc n D e).
1.5.3 Suppose a cellular telephone is (b) Are cc and D e independent?
equally likely to make zero handoffs (Ho),
one handoff (H1), or more t han one hand- 1.6.7 In an experiment, A and B are mu-
off (H2). Also, a caller is either on foot ( F) tually exclusive events 'vith probabilities
'vith probability 5/12 or in a vehicle (V). P [A U B J = 5/8 and P[A) = 3/8.
(a) Given t he preceding in.formation, find (a) F ind l=> [B], P[A n B e], and P[A U Be).
three ways to fill in the follo,ving prob- (b) Are A and B independent?
ability table:
1.6.8 In an experiment, C, and D
are independent events with probabilities
F P [C n D J = 1/3, and P [C) = 1/2.
v (a) F ind P[D], P[C n De], and P [Cc n D e).
(b) F ind P [C uD) and P [C u D c).
(b) Suppose 've also learn that 1 /4 of all
callers are on foot inaking calls with no ( c) C and D e independent?
handoffs and that 1 /6 of all callers are
1.6.9 In an experiment with equiproba-
vehicle users making calls 'vi th a single
ble outcomes, the sample space is S =
handoff. G iven these additional facts ,
{1 , 2, 3,4} andP [s] = l/4forall s ES.
find all possible ways to fill in the table
Find three events in S that are pair,vise in-
of probabilities.
dependent but are not independent. (Note:


Pair,vise independent events meet the first many visib ly different kinds of pea plants
three conditions of Definition 1.7). would l\/Iendel observe in the second gener-
at ion? \tVhat are the probabilities of each
1.6.10 (Continuation of Problem 1.4.5) of these kinds?
One of rvlendel's most s ignificant results
'vas the conclusion that genes determin- 1.6.1 1 For independent events A and B ,
ing different characteristics are transmit- prove that
ted independently. In pea plants, l\/Iendel (a) A and B e are independent.
found that round peas (r) are a domi-
nant trait over 'vrinkled peas ('UJ). Mendel (b) A c and B are independent.
crossbred a group of (rr, yy) peas with a (c) Ac and B c are independent.
group of (?lJ'UJ,gg) peas. In t his notation,
rr denotes a pea w it h two ((round" genes 1.6. 12 Use a Venn d iagram in which the
and ?1J?1J denotes a pea w ith t'vo "wr in- event areas are proportional to their proba-
k led" genes. The first generation 'vere ei- bilities to illustrate three events A, B , and
ther (r1D,yg) , (r1D ,gy) , ('1Dr, yg), or (v1r, gy) C that are independent.
plants 'vith both hy brid shape and hy brid
1.6. 13 u se a Venn diagram in which event
color. Breeding among the first gener-
areas are in proportion to their probabilities
at ion yielded second-generation plants in
to illustrate events _4 , B, and C that are
'vhich genes for each characteristic were
pair,vise independent but not independent.
equally likely to be either dominant or re-
cessive. \1Vhat is the probabilit y P [Y] that 1.7. 1 Follo,ving Quiz 1.3, use 1VIATLAB,
a second-generation pea plant has yello'v but not the r andi function, to generate
seeds? What is the probability P [R] that a vector T of 200 independent test scores
a second-generation plant has round peas? such that all scores bet,veen 51and100 are
Are R and Y independent events? How equally likely .

Sequential Experiments

l\/Iariy applications of probability refer t o seql1ential experiments in vvhicli t he pro-

cedure consists of man}' actioris performed in sequerice, vvith an observation taken
after each action. E ach a.ction in the procedure together v.rith the outcome asso-
ciated v.rith it can be vievved as a separate experirnerit vvith its O\vn probability
rnodel. In analyzing sequeritia.1 experiments \Ve refer to the separat e experirnents
in the seql1ence as sv,bex;perirnerits.

2.1 Tree Diagrams

Tree d iagrarns diisplay the outcomes of the subexperiments in a

seqt1ential experirrient. T lie labels of the branches are probabilities
and conditional probabilities. The probabilit}' of an outcorrie of tlie
entire experirrient is the product of the probabilities of branches
goirig from the root of tlie tree to a leaf.

Many experirnents consist of a sequence of s'ubex1Jerirnerits. The procedure fol-

lowed for each st1bexperirnent rriay depend on tlie resl1lts of the previous subexper-
iments. We often find it l1seft1l to use a type of graph referred to as a tree diagrarn
to represent t he sequerice of st1bexperirnents . To do so, "''e assernble t he outcomes
of each subexperirrierit int o sets in a par t ition. Startirig at the root of the tree, 1 we
represent each event in t he partit iori of the first subexperirnent as a brancli and \Ve
label the with the probability of the event . Each leads to a riode.
The events iri the partitiori of the second Sl1bexperirnent appear as branclies growing
from every node at the end of the first subexperiment. The labels of the branclies

Unlike b iological trees , \v hich grow from the groun d up, probabilities usually grow from left to
right . Some of them h ave t h eir roo ts on top a n d leaves on t he bo t tom.



of the secor1d subexperim.ent are the conditional probabilities of t he events in the

second st1bexperiment. vVe contint1e the procedure taking the rernaining subexper-
iments ir1 order. T11e r1odes at t he end of the final subexperirr1er1t are the leaves of
the tree. Eac11 leaf corresponds to an outcorne of the ent ire sequer1t ial experirr1er1t .
The probability of each outcorr1e is the product of the probabilities and conditional
probabilities on the path frorr1 the root t o the leaf. We usuall}' label each leaf v.rith
a r1arne for t11e event and the probability of the event .
This is a cornplicated description of a sirr1ple procedure as see in the follov.rir1g
five exarnples.

- = Example 2. 1
For the resistors of Example 1.19, we used A to denote the event that a random ly
chosen resisto r is "with in 50 D of the nom inal value ." T his could mean "acceptable."
We use the notation /ll ("not acce ptab le") for the complement of A. The experi ment
of testing a resistor can be viewed as a two-step procedu re. First we identify which
machine (B 1 , B 2 , or B 3 ) produced the resistor . Second, we find out if the resistor
is acceptable. Draw a tree for this sequentia l experime nt . What is the probability of
choosing a resistor from machine B 2 that is not acceptable?

0 .24
Th is two-step procedure is shown in t he
~ A tree o n the left. To use the tree to
0.3 0.2 N 0.06
0.9 A 0 .36 find the probability of the eve nt B 2 J\T,
a no nacce ptable resistor from machine
0.1 0.04
B 2 , we start at t he left and find that t he
B3 ~ 0 .18
~N 0.12
probab ility of reaching B2 is P [B 2] =
0.4. We then move to the right to B 2 J\T
and mu ltiply P[B2] by P [N IB2] = 0.1 to obtain P[B2N] = (0.4)(0.1 ) = 0.04.

We observe ir1 this exarnple a gener al propert}' of all tree diagrarr1s that represent
sequential experiments. The probabilit ies on the brar1ches leaving an}' node add
up to 1. This is a consequer1ce of the lav.r of total probability arid the property of
conditional probabilities that corresponds to Axiorn 3 (Theorerr1 1.7). Moreover ,
Axiorr1 2 implies that the probabilities of all of the leaves add up to 1.

Example 2.2
T raffic engineers have coordinated the t iming of two traffic lights to encourage a run of
green lights. In particula r, the tim ing was designed so that with probability0.8 a driver
will find the second light to have t he sa me co lor as the first . Assuming t he first light
is equa lly likely to be red o r green, what is the probabil ity P [G2] t hat the second light
is green? Also, what is P[W ], the probab ility t hat you wait for at least one of t he first
two lights? Lastly, what is P [G 1 IR2]. the conditional probability of a green first light
given a red second light?


The tree for the two-light experiment is
shown on the left. The probability that
the second light is green is

P [G2] = P [G1G2] + P [R1 G2]

= 0.4 + 0.1 = 0.5. (2.1)
The event W that you wait for at least
one light is the event that at least one
light is red.

The probability that you wait fo r at least one light is

An alternative way to the same answer is to observe that VV is a lso the complement of
the event that both lights a re green . Thus ,

To find P [G1 IR 2), we need P [R2] = 1 - P [G2] = 0.5. Since P [G 1R2] = 0.1, the
conditional probabil ity that you have a green first light given a red second light is


'----==- Example 2.3.= ==

Suppose you have two coins , one biased, one fair , but you don 't know which coin is
which. Coin 1 is biased. It comes up heads with probabi lity 3/ 4, whi le coin 2 comes
up heads with probabil ity 1/ 2. Suppose you pick a co in at random and flip it. Let Ci
denote the event that coin i is picked . Let Hand T denote the possible outcomes of the
flip. Given that the outcome of the fl ip is a head , what is P [C 1IH], the probability t hat
you picked the biased coin? Given that the outcome is a t a il, what is the probability
P[C 1IT] that you picked the biased coin?

First , we construct the sample tree on the

~H C1 H 3/8
left. To find t he conditional probabilities ,
C1 l/ 4 T C1T 1/8
we see
~H C2 H p [C IH ] = p [C1H ]
C2 1/4
C2T 1/4 1 p [H']
P [C1H]

From the leaf probabilities in the sample t ree,

p [C1 IH] - 3I 8 - ~
- 3/8 + 1/ 4 5



1/8 1
1/8 +1 / 4 3
As we would expect, we a re more like ly to have chosen coin 1 when the first flip is
heads , but we are more likely to have chosen coin 2 when the first flip is ta ils.

The r1ext exarr1ple is t11e "Mont}' Hall" garne, a farnous problerr1 v.rith a solutior1
that rr1any regard as cour1terintl1itive. Tree diagrarns provide a clear explanatior1 of
the ansvver.

Example 2.4 Monty Hall

In the Monty Hall game , a new car is hidden beh ind one of three closed doors while a
goat is hidden behind each of the other two doors. You r goal is to select the door that
hides the car. You make a preliminary selection and then a final select ion. T he game
proceeds as follows:
You select a door .
The host, Monty Ha 11 (who knows where the car is hidden), opens one of the two
doors you didn't select to reveal a goat.
Monty then asks you if you would li ke to switch yo ur selection to the other
unopened door.
After you make your choice (either staying with your original door, or switching
doors), Monty revea ls the prize behind your chosen door.
To maximize your probability P[C ] of winning the car, is switching to the other door
either (a) a good idea, (b) a bad idea or ( c) makes no difference?
To solve th is problem, we wi 11 consider the "switch " and "do not switch" policies
separately. That is, we will construct two different tree diagrams: The first describes
what happens if you switch doors while the second describes what happens if you do
not switch .
First we describe what is the same no matter what po licy you fo llow. Suppose the
doors are numbered 1, 2, and 3. Le t H i denote the event that the car is hidden behind
door i . Also, let's assume you first choose door 1. (Whatever door you do choose,
that door can be labeled door 1 and it would not change your probabil ity of winning .)
Now let Ri denote the event that Monty opens door i that hides a goat. If the car is
behind door 1 Monty can choose to open door 2 or door 3 because both hide goats .
He chooses door 2 or door 3 by fl ipp ing a fair coin. If the car is behind door 2, Monty
opens door 3 and if the car is behind door 3, Monty opens door 2. Let C denote the
event that you win the car and G the event that you win a goat. After Monty opens
one of the doors, you decide whether to change your choice or stay w ith your choice of
door 1. Finally, Monty opens the door of your final choice, either door 1 or the door
you switched to.
The tree diagram in Figure 2. l(a) applies to the situation in which you change your
choice. From this tree we learn that when the car is behind door 1 (event H'1 ) and


7 R 2 G1 / 6 7 R 2 C 1/ 6
1/ 3 Hi l/2 R 3 G 1/ 6 Hi l/2 R 3 C 1/ 6
H2 R 3 C 1/ 3 H2 R 3 G 1/ 3
1/ 3 H3 R2 C 1/ 3 H3 R2 G 1/ 3

(a) Svvitch (b) Do Not Sv.ritch

Figure 2.1 Tree Diagran1s for Niont y Hall

Monty opens door 2 (event R2), you switch to door 3 and t hen Monty opens door 3 to
revea l a goat (event G). On t he other ha nd, if the car is beh ind door 2 , Monty revea ls
the goat behi nd doo r 3 and you switch to door 2 a nd win t he car. Si milarly , if the car
is beh ind door 3, Monty revea ls the goat behind door 2 and you sw itch to door 3 and
w in the car. For always switch, we see that


If yo u do not switch , the t ree is shown in Figure 2 .1 (b). In th is tree , when the car
is behi nd door 1 (eventH 1 ) and Monty opens door 2 (event R2), you stay w ith door 1
and then Monty opens door 1 to revea l the car. On the other hand , if the car is behi nd
doo r 2 , M onty w ill open door 3 to revea l the goat . Since your final choice was doo r 1,
Monty opens door 1 to rev ea I the goat. For do not switch ,

T h us switch ing is better ; if you don't switch , you win the car on ly if you in it iaIly guessed
t he location of the car correctly, an event that occurs w ith probabil ity 1/ 3. If you switch,
you win the car w hen your initia l guess was wrong , an event with probabil ity 2/ 3 .
Note that the two trees look largely the sa me because the key step where you make
a choice is somewhat hidden because it is impl ied by t he f irst t wo bra nches fol lowed in
the tree .

Quiz 2.1
In a cellular phone syst ern, a rnobile p hone rr1ust be paged t o receive a phone call.
However , paging atternpts don't alvva:ys st1cceed because the rr1obile phone rnay not
receive t he paging signal clearly. Consequent ly, t 11e system v.rill page a phone up to
t hree tirnes before giving up. If t he results of all paging atterr1pts are indeper1der1t
and a single pagir1g att err1p t succeeds -vvit h probability 0 .8, sketch a probability tree
for t his experirnent and fir1d t he probability P [F] t hat t he pl1one receives t he paging
signal clearly.


2.2 Counting Methods

In all applications of probability theory it is important t o under-

star1d the sarr1ple sp ace of an experiment. The m ethods in t his
section deterrr1ir1e the nurnber of ot1tcomes in the sarnple space of
a sequential experirnent
Underst anding the sample sp ace is a key st ep in formulating arid sol-vir1g a prob-
abilit}' problem. T o begin , it is often useful t o kno\v t he r1urr1ber of outcomes in t11e
sarnple space. This r1urr1ber can be enorrnous as in t he followir1g sirr1ple ex arnple.

=== Example 2. ~==

Choose 7 cards at random from a deck of 52 different cards. D isp lay the cards in the
order in which you choose them . How many different sequences of cards are possible?

The procedure consists of seven subexperiments. In each subexperiment, the obser-

vation is the identity of one card. The first subexperiment has 52 possib le outcomes
corresponding to the 52 cards that could be drawn. For each outcome of the first subex-
periment, the second subexperiment has 51 possible outcomes corresponding to the 5 1
remaining cards. Therefore there are52 x 51 outcomes of the first two subexperiments.
The total number of outcomes of the seven subexperiments is

52 x 5 1 x ... x 46 = 674,274,182,400. (2.8)

Alt hougl1 rr1any practical experirnents are rnore complicated , the t ecl1r1iques for
det errnining the size of a sarnple sp ace all follo\v frorr1 the fundarnenta.l principle of
cot1r1t ing in Theorerr1 2 .1:

~-== Theorem 2.1 ===

A n, exper~irnerit con,sists of tv.10 S?J,bei;per'irnerits. If o'ne s'ubexperirnen,t has k 011,t cornes
an,d the other sv,bei;per'irnen,t has n, ov.,tcornes, theri the ex1Jer'irnerit has n,k ov.,tcornes.

Example 2.6;--:==
There a re two su bexperi ments . The first su bexperi ment is "Flip a coin and observe
either heads Hor tails T." The second subexperiment is "Roll a six-sided die and
observe the number of spots. " It has six outcomes , 1, 2, ... , 6 . The experiment, "Flip
a coin and roll a die ," has 2 x 6 = 12 outcomes:

(H , 1), (H , 2) , (H , 3), (H, 4), (H , 5), (H', 6),

(T , 1), (T , 2) , (T , 3), (T ,4), (T , 5), (T , 6).

Gener all}', if an experirnent E has k subexperiments E 1 , ... , E k "''here Ei has n ,i



ot1tcomes, then E 11as Il i = l 'ni outcornes.
In Exarr1ple 2.5, we chose an ordered seq11ence of seven objects ot1t of a set of 52
distirig11,ishable objects. In ger1eral, ar1 ordered sequence of k distir1gt1ishable objects
is called a k -perrnv,tation,.. We '\vill use the notation ( n,)k to der1ote t11e nt1rr1ber of
possible k-perrr1utations of n, distinguishable objects. To find (ri,)k , suppose -vve have
n, distinguishable objects, arid the experirr1er1t is to choose a sequer1ce of k of t11ese
objects. There are ri choices for the first object , ri - 1 c11oices for the second object,
etc. Therefore, the total r1urr1ber of possibilities is

(ri)k = n,(ri - l )(ri - 2) (ri - k + 1). (2.9)

NI t1ltiplying the right side b}' ( n, - k) !/ (ri - k) ! yields our next theorerri.

=== Theorem 2.2'= ==

The n:u:rnber of k -perrnutation,s of n, distin,g11,ishable objects is

(ri) k = n,(ri - 1) (n, - 2) (n, - k + 1) = ( _ k) 1
r/, " .

Sampling without Replacement

Samplir1g wit11ot1t replacerr1ent correspor1ds to a sequential experirnent in -vvhich t he
sarnple space of each st1bexperirnent deper1ds on t11e outcornes of previous subex-
perirner1ts. C11oosing objects randornly frorn a collectior1 is called sarnplin,g, and
the chosen objects are known as a sarnple . A k-perrnt1tation is a type of sample ob-
tained by specific rules for selecting objects from t11e collection. In particular, or1ce
-vve choose an object for a k-perrr1utation, \Ve rernove the object frorr1 the collection
and -vve canr1ot choose it again. Consequer1tl}', t11is procedure is called sarnplin,g
111itho11,t replacernen,t.
Different outcornes ir1 a k-perrr1utation are distingt1ished by the order in which ob-
j ects arrive ir1 a sarnple. B y contrast, in rr1ar1y practical problerr1s, we are concerned
only '\vith t11e identity of the objects in a sarr1ple, not t11eir order. For exarr1ple,
in many card garr1es, only the set of cards received by a player is of interest. The
order in which the}' arrive is irrelevar1t.

===- Example 2.?i...........:==

Suppose there a re f our objects, A, B, C, and D, a nd we define an experi m ent in
which the procedure is to choose two objects w ithout replacement, arra nge t hem in
alphabetical order, and observe the result. In t his case , to observe AD we could choose
A first or D first or both A and D simultaneously. The possible o utcomes of the
experiment are AB, AC, AD, BC, ED, a nd CD.

In contrast to this exarr1ple -vvith s ix outcorr1es , the r1ext exarnple shows that the
k-permt1tation corresponding to an experirnent ir1 -vvhich the observation is these-
quence of t-vvo letters 11as 4! / 2! = 12 outcorr1es.


i;;;;;;;;;;=;;;.. Exam p Ie 2. 8,- - - -

S u ppose there a re four obj ects, A, B, C , and D, and we define an experim ent in wh ich
the proced ure is to c hoose two objects wit hout rep lacement and observe the resu lt.
T he 12 possible outcomes of the experiment are AB , AC , AD, BA, BC , BD, CA,
CB, C D , DA, DB, and DC.

I r1 E x a rnple 2. 7 , each Ol1t corr1e is a subset of tl1e 011tcornes of a k-perm11tation. Each

subset is called a k -cornbin,atiori. vVe wan t t o fir1d the nurr1ber of k -cornbinations .
vVe llSe t he not ation (~) t o d enote t he n11mber of k-combinations . The vvords for
this n11mber a re "ri choose k," t he nl1rr1ber of k-combinations of n, obj ects . T o find
(~) , vve perfor rr1 tl1e following t s11bexperirr1er1ts t o assemble a k-perm11tatior1 of
n, disting11ishable objects:
1. Choose a k-cornbina,tion out of the ri obj ect s .
2. Choose a k-permuta.tion of t he k obj ects ir1 the k-combination.
Theorem 2.2 tells llS tl1at the n u m ber of ou tcornes of t he combined exp erirnent
is ( n,) k . T he first su bexperiment h as (~) possible 011tcornes, t he nurr1ber vie l1ave to
derive. By Theorem 2.2 , t he second experirnen t has (k)k = k ! possible outcomes.
Since tl1er e a re ( n,)k possib le ou tcorr1es of t he cornbined exper im ent ,

(n)k = (~) k ! (2.10)

R earrar1gir1g t h e terrns yields our next result.

----- Theorem 2.3
T he n/tJ,rn ber of 'tuays to clioose k objects out of n, distin,gv,ishable objects is
'n,) (n,) k rd
(k - k! - k !(ri - k) !.

\!Ve encounter (~) in othe r rnathernatical st l1dies. Sornetirnes it is called a bin,ornial

coeffi cierit because it a ppears (as tl1e coefficient of x;kyn-k) in t he expa nsion of t he
binomial (x + y)n . In a ddition, we observe that

) (
(k - ri - k
'n, ) (2.11 )

The logic beh ind t his ider1t ity is that choosir1g k out of n, elerner1ts to be part of a
subset is equivalen t to choosing n, - k elerner1ts t o be ex cluded from the su bset .
In most contex ts, (~) is u ndefined ex cep t for integers n, and k vvit h 0 < k < n,.
Here, vve ad op t the follov.ring definition t h at a pplies t o all nonnegative integers n,
a nd all real nurr1bers k :

Definition 2.1 n, choose k

For ari in,teger 'n > 0; 'tue defi rie

(~) = k !(n, - k) !
0 other111ise.


This definit ion captures t11e intuit ion t11at given , say, n, = 33 object s , t here a re
zero vvays of choosir1g k = - 5 objects, zero ways of choosing k = 8.7 objects, an d
zero v.rays of choosing k = 87 object s. Although t his exten ded defir1it ion ma}' seem
t1nr1ecessary, and perhaps even silly , it v.rill rr1ake many forrr1ulas in lat er c11apt ers
rr1ore concise arid easier for studer1ts to grasp .

c:::::== Example 2.9

The number of comb inations of seven cards chosen fro m a deck of 52 cards is

52) = 52 x 51 x ... x 46 = 133,784,560 , (2.12)

( 7 2 X 3 X X 7
wh ich is the nu m be r of 7-combinations of 52 objects. By contrast, we found
in Example 2.5 674 ,274 ,182,400 7-permutations of 52 objects. (The ratio is
7! = 5040).
There are 11 players on a basket bal l team. The starting lin eup consists of five
players. Th ere are ( 151 ) = 462 possible starting lineups.
There are ( 1620) ~ 10 36 . ways of divid ing 120 stude nts e nroll ed in a probab il it y
course into two sections with 60 students in each section.
A baseball tea m has 15 field players and ten pitc hers. Each field player can
take any of the eight no npitchi ng positions. The starting lineup consists of o ne
pit cher and eight fie ld players. Therefore, the number of possible starting lineups
is N = (11) (185 ) = 64,350. For each choice of start ing line up, t he manager must
submit to t he umpire a batting order for the 9 starters. T he number of possible
batting orders is N x 9! = 23 ,351,328,000 s ince there are N ways to choose the 9
starters, and for each choice of 9 starte rs, there are 9! = 362,880 possible batt ing

Example 2.10
There are four queens in a deck of 52 cards. You are given seve n cards at ra ndo m from
the deck. What is the probability t hat you have no quee ns?

Consider an expe ri ment in wh ich the proced ure is to select seven cards at random from
a set of 52 cards and the observatio n is to determine if there a re o ne or more queens
in the se lection. The sample space co ntains H = (572 ) possib le combinat ions of seven
cards, each with probabilit y 1/ H . T here are n NQ = ( 52; 4 ) combi nations with no
queens. The pro bability of receiving no queens is the ratio of the number of outcomes
with no queens to the numbe r of outcomes in the samp le space. H 1vQ/ H = 0. 5504.
Another way of analyz ing th is experime nt is to co ns ider it as a seque nce of seven
su bexperiments. The first su bexperiment consists of select ing a card at ra ndom and
observing whether it is a queen. If it is a queen, an outcome wit h probabi lity 4/ 52
( because there are 52 outcomes in t he sample space and four of t hem are in t he event
{queen }) , stop looking for queens. Otherwise, with pro babil ity 48/52, select another
card from the rema ining 51 cards and observe whether it is a queen . This outcome of
th is subexperiment has pro bab ility 4/51 . If the second card is not a queen , an outcome


with probability 47 /5 1, co111tinue un til you select a q ueen or you have seven cards with
no queen. Using Q i and J\Ti to indicate a "Queen" or "No queen" on subexpe rim ent i ,
the tree for this experiment is


The probabil ity of t he event N 7 that no queen is received in your seven cards is the
product of the probabi lities of the bra nches leadi ng to N 7 :

(48/52) x (47 /51) ... x (42/ 46 ) = 0.5504. (2.13)

Sampling with Replacement

Consider selecting an obj ect from a collectior1 of objects, r eplacing t he selected
object , and repeating t:he process several tirnes, eac.h t irr1e replacing t he selected
object befor e rnaking another selection. '\ e refer t o t his sitt1ation as sarnplin,g v.Jith
replacernen,t. Each selectior1 is the procedure of a subexperirnent . The subexperi-
rner1ts are referred t o as in,depen,den,t tr~ials . In tr1is section consider tr1e n11rnber
of possible outcornes t hat result frorr1 sarr1pling vvitr1 replacem ent . In t he next sec-
t ion vie derive probability models for for experirr1ents that specify sarr1pling witr1
replacemer1t .

==~ Example 2.11==:::::11

There are four queens in a deck of 52 cards. You are given seve n cards at ra ndo m from
the dec k. After receiving eac h card you ret urn it to the deck and receive another card
at random. Observe whether you have not received any quee ns amo ng t he seven cards
you were given . What is the probabi lity t hat you have received no queens?

The sample space contains 52 7 outcomes. There are 48 7 outcomes with no q ueens. The
ratio is (48 /52) 7 = 0.5710, the proba bil ity of receiving no quee ns. If this experiment is
considered as a sequence of seve n subexperiments, t he tree looks the sa me as the tree
in Example 2.10 , except t hat all the horizonta l bra nches have proba bil ity48/52 and a ll
the diagona l branches have probabil ity 4/52.

Example 2.12
A laptop computer has USB slots A and B . Each slot ca n be used fo r co nnecting a
memory card (177,), a camera (c) or a printer (r>). It is possib le to connect two memory
cards, two cameras, or two printers to the laptop . How many ways ca n we use the two
USB slot s?

Th is example corresponds to sampli ng two times with replacement fro m the set {rn,,c ,r>} .
Let x;y denote the outcome that device type x is used in s lot A a nd device type y is
used in slot B. The possible outcomes a re S = {m,rn ,rn,c,'IT1,7>,crn ,cc,cr>,I>m,,r>c ,r>p} .
The sample space S co ntains ni ne outcomes.


The fact that Exarr1ple 2.12 has nine possible outcomes should not be st1rprising.
Since were sarnplir1g '\:vit11 replacernent , t here were alv..ra:ys three possible out-
comes for each of the subexperiments to attach a device to a USB slot . Hence, by
the fundarnenta.l t heorern of cour1t ing , Exarr1ple 2.12 rr1ust h a\re 3 x 3 = 9 possible
In Exarnple 2.12 , rn,c arid C'I T/, are distinct outcorr1es. This result ger1eralizes nat-
urally vvhen v.rar1t to choose with replacerr1ent a sample of 11, objects out of a
collection of 'IT/, distinguishable objects. The experirr1ent consists of a sequen ce of 'n
identical st1bexperiments v..rith rn 011tcornes in the sarnple space of eac11 subexperi-
rnent . Hence there ar e 'JT/,n v.rays to c11oose v..rith replacerner1t a sarr1ple of ri obj ects.

---== Theorem 2.4

Gi'/Jeri rn, distirigv,ishable objects) there are 'IT/,n v.1ays to choose v.1ith replacernen,t ari
ordered sarnple of n, objects.

i::::::== Example 2.13

T here are 2 10 = 1024 binary seq uences of le ngth 10.

Example 2.14
T he letters A through Z can produce 264 = 456,976 four- letter words.

Sarr1pling v..rith replacem ent correspor1ds t o perforrr1ir1g ri repetit ions of an iden-

tical subexperiment. Usir1g xi t o denote the outcorne of the i th subexperiment , t11e
r esult for 77, repetitior1s of the subexperirnen t is a sequence ;,r; 1 , .. , Xn

==;;.. Exa m p Ie 2 .15,___,;;=:::::a

A chip fabrication fac ility produces m icroprocessors. Each m icroprocessor is tested to
determin e whether it runs re liab ly at an acceptable clock speed. A subexperiment to
test a microprocessor has sample space Bsub = {O, 1} to ind icate w hether the test was
a failure (0) or a success ( 1) . For test 'i, we record Xi = 0 or X i = 1 to ind icate the
result . In test ing four m icroprocessors, the observation sequence, x 1 x2 ;,r ;3x;4, is one of
16 possible outcomes:

s= {0000, 0001, 0010 , 0011 , 0100, 0101 , 0110 , 0111 ,}

1000, 1001, 1010, 1011 , 1100, 1101 , 1110, 1111 .

Note that we can think of the observation seq11ence x; 1 , ... , Xn as the r est1l t of
sarnpling wit11 r eplacerr1er1t 11, tirr1es from a sample sp ace Ssub For sequences of
identical subexperiments, vve can express Theorem 2.4 as


==;;;; Theorem 2.5,- - - -

For ri repetition,s of a subexperirnen,t 'tlJith sarnple space Ssttb = {s 0 , ... , Srn- 1 }, th e
sarnple space S of th e seq11,en,tial eJ;perirnen,t has rnn 011,t cornes.

=== Example 2.16;==::::::11

T here are ten students in a probability class. Each earns a grade s E Bsub = {A ,B ,C,F}.
We use the notatio n x ,i to denote the grade of the i th student . For exa m ple, the grades
for the class cou Id be

X1X2 ... X10 = CBBACFBACF (2.14)

T he sample space S of possible sequences contains 410 = 1,048,576 outcomes.

In Example 2.12 and Example 2.16, repeating a st1bexperiment n, tirr1es arid record-
ing the observation consists of constructing a word witl1 n, letters. Ir1 gen era l, n,
repetitions of the same sub experiment consists of choosing S}rrnbols frorr1 t11e alpl1a-
bet { s 0 , ... , S 7n- 1 }. In Example 2.1 5, rn, = 2 and "''e 11ave a binary alph abet v.rith
symbols s 0 = 0 and s 1 = l.
A more ch aller1gir1g problerr1 than finding t he n11mber of possible corr1binations
of 777, obj ects sampled "''ith replacernent frorr1 a set of n, objects is to calculat e t11e
nt1rr1ber of observation sequences such t hat ea.cl1 object appears a specified r1urnber
of tirr1es . \"Ve start vvith the case in v.rl1ich subexperirnent is a t rial v.rith sarr1ple
space Bsub = {O, 1} indictt t ing failure or success.

=== Example 2.l 1==::::::11

For fi ve subexperiments w ith samp le space Bsub = {O, 1}, what is the nu mber of obser-
vation sequences in which 0 appears n,0 = 2 t imes and 1 appears n,1 = 3 t imes?

T he 10 f ive- letter words w ith O appearing twice and 1 appeari ng three t imes are:

{00111 , 01011 , 01101 , 01110, 10011, 10101, 10110, 11001,11010,11100}.

Exarr1ple 2.17 deterrnir1es the nurnber of outcom es in t he sarnple sp ace of an

experirnent v.rith five subexperirnen ts by listing all of the outcorr1es. Even in this
sirnple example it is not a, simple rnatter to deterrnine all of t he outcornes, and in
rnost practical a.pplication.s of probability there are far more then ter1 outcornes ir1
the sarr1ple space of a.n experirnent and listing them a.11 is out of t he q11estiori. On
the other 11and, t he cour1ting rnethods covered in this chapter pro\ride formulas for
quickl}' calculating t11e r1urr1ber of 011tcomes ir1 a sarnple space.
In Exarr1ple 2.17 each outcorn e corresponds to t h e position of t hree ones in a
five-letter binary "''Ord. T11a.t is, each outcorr1e is cornpletely specified by cl1oosing
three positions that conta in 1. There are (~) = 10 v.rays t o choose t11ree positions
in a v.rord. More generally, for length n, binary words wit h n,1 1 's, vve choose (r:~ )
slots to 11old a 1.


- - -Theorem 2.6
The rt?J,rnber of observation, seq'1J,erices f or n, s11,be1;1Jerirnerits v;ith sarnple space S =
{ 0) 1} 71Jith 0 appearin,g T/,o tirnes arid 1 appearirig T/,1 = T/, - T/,o tirnes is c~:) .
T 11eorem 2. 6 can be gen era lized t o subexperirnents with rn, > 2 elerner1ts in
t he sarnple sp ace. For n, trials of a subexperiment with sarr1ple space Ssub =
{ s 0 , . . . , Srn- 1 } , \Ve want to find the nt1rnber of ot1tcornes in -vvhich s 0 appears n,0
t irr1es, s 1 appears n,1 t irr1es, an d so on. Of co11rse, t11ere r10 s11cl1 outcornes unless
n ,o + + n ,rn-l = n , . The notat ion for t he r1urr1ber of outcomes is

ri )
( rio, .. , Tl,,rn-1

It is r eferred to as t he rnv,ltin,ornial coefficien,t . To derive a formula for t he rnult i-

nornial coefficient, -vve generalize t he logic used in derivir1g t he forrr1ula. for t he bi-
nornial coefficient . vVit 11 r1, subexperimer1ts, representing t 11e observation sequence
by n, slots, we first choose n,0 posit ions in the observation sequen ce t o 11old so, t her1
n,1 posit ions t o hold s 1 , a nd s o on. The d etails can be found in t 11e proof of t he
following theorern:

----== Theorem 2. 7
For n, re1Jetition,8 of a 811,bex;perirnen,t v.;ith 8arnple space S = {so, . . . , Srn-1}, the
ri'1J,rnber of len,gth n, = n,o + + n ,rn- 1 observation, seq1J,er1,ces v.;ith s,i appear~in,g n ,i
tirnes is

ri )
( Tl,O' . . . ''TLrn- 1

Proof Let M = (n0 , . . . ~i,rn_ 1 ) . Start wit h n, empty slots and perform t he follo,ving sequence
of su bexperiments:
Subexpe riment Proce dure
0 Lab el n,o slots as so .
1 Lab el n, 1 slots as s 1.

m-1 Label t he remaining ri 1n - l slots as Srn - l

Ther e are C:~) 'vays to perform subexperimen t 0. After n,o slot s have been labeled, t here
are (n -n1no) wavs
to per for m subexper imen t 1. After subexperimen t j - 1, rio + + n 1 - 1
slots have a lready been filled , leaving (n - (no+1~+nj - 1 ) ) 'vays to perform su bexperimen t j .
From t he fundamen tal count ing principle,

n,! (n-no) !
(2. 15)
(n - n,o) !n,o! (ri - no - n,1)!n,1! ( 'YI
I lJ - 'YI
I 11 0 - - 'YI
I ll7J'1, - 1 ) I 'YI
I l17n - 1 I .


Canceling t he common factors , vve obtain t he formula of t he t heore1n.

Note t h at a binorr1ial coefficient is t11e specia l case of the multinomia l coefficient for
a n a lpl1abet \vit h m = 2 syrr1bols . In particular ) for n, = n,o + n,1 )


Lastly) in the sarn e \vay that we ex tended the definition of the b inorr1ia l coeffi-
cien t ) we vvill ernploy a n ex ter1ded definition for the mt1ltinomia.l coefficien t .

Definition 2 .2 Multinomial Coefficient

For ari in,teger n, > 0) 'tue defirie

ri,.I n,o + + rl,rn- 1 = r/,;

'n i E { 0 ) 1 ) . . . ) 77,} )i = 0 ) 1 ) . . . ) rn, - 1,

0 other~1u'ise.

Example 2.18
In Examp le 2.16, the professor uses a curve in dete rmin ing student grades. Whe n the re
are ten students in a probabil ity c lass, the professor always issues two g rades of A, three
grades of B, th ree grades of C and two grades of F. How many different ways can the
professor assign grades to t he ten students?

In Example 2.16, we determine t hat w ith four possible g rades t here are4 10 = 1)048 )576
ways of assigning grades to ten students. However, now we a re Ii m ited to c hoosing
11,0 = 2 students to receive an A, ri, 1 = 3 students to receive a B, ri,2 = 3 students to
receive a C and n,3 = 4 st udents to receive an F . The n um ber of ways that fit the
curve is the multinom ial coefficient

10 ) = 10! = 2 5 2 00 (2.17)
( 2 ) 3) 3) 2 2!3!3!2! )

Quiz 2.2
Consider a binar}' code w itl1 4 b its (0 or 1) in each code word. Ari exarr1ple of a
code vvord is 0110.

(a) How rr1an}' different code \vords a re there?

(b) H o\v rnany code words ha:ve exactly tvvo zeroes?
( c) Hovv m an}' code vvords begin \vi t11 a zero?
( d) In a cor1stant-ratio binary code, each code word has N b its . In ever y \vord ) J\lf
of the N b its a re 1 a nd the otl1er N - Ji.If b its a re 0. H o\v m any differer1t code
words a re in t h e code w ith N = 8 a nd J\lf = 3?


2.3 Independent Trials

Independent trials are ideritical subexperirrients iri a sequential ex-

perirrient . The probability rnodels of all the st1bexperiments are
ident ical and independer1t of t he ot1tcomes of previot1s subexperi-
ments. Sarripling with replacement is one categor}' of experirrients
vvith independent trials.
vVe nov.r apply tlie cot1nting rnetliods of Section 2.2 to derive probabilit}' rnodels
for experimerits corisisting of independent repetitions of a s11bexperiment . '\'IVe start
vvith a sirnple subexperiment in v.rliich there are t\vo outcorries: a success (1) occurs
vvith probability p; otlier vvise, a failure ( 0) occurs vvi th probability 1 - '[J . The
results of all t rials of t he subexperirnerit are mutually independerit . An outcome
of the cornplet e experiment is a sequen ce of successes and failures derioted by a
sequence of ones arid zeroes. For exarnple, 10101 ... is an alternating sequence of
successes arid failures. Let En 0 ,n 1 deriote tlie e\rerit n,0 failures and ri 1 successes in
n, = n,0 + n,1 trials. To find P [En0 ,n 1 ], vve first consider an exarriple.

Example 2.19
What is the probability P [E2,3] of two failures and three successes in five independent
tr ials with success probability p .

To find P [E 2 ,3 ], we observe that the outcomes with three successes in five tria Is a re
11100, 110 10 , 11001 , 10110, 10101, 1001 1 , 01110, 01101, 0101 1, and 001 11 . We
note that the probability of each outcome is a product of five probabilities , each related
to one su bexperi ment. In outcomes with three successes, three of the probabi lit ies
are '[J and the other two are 1 - '[J . Therefore each outcome with three successes has
probability (1 - p ) 2 p 3 .
From Theore m 2.6, we know that the number of such sequences is (~). To find
P[E 2 ,3 ] , we add up the prolbabi lities associated with the 10 outcomes with 3 successes,


In general, for n, = n,o + n,1 independent trials \Ve observe th at

Each outcorne with 'n,o failures and n, 1 successes lias probability (1 - p)npn 1 .

n) = ( n1n) out comes tliat h ave 'no failures arid ri 1 successes.

There are ( no
Therefore t lie probability of n, 1 s11ccesses iri n, independent t rials is the surri of (,~1~ )
terms , each \Vith probability (1 - 'fJ)nopn1 = (1 - 'fJ)n-n1'[Jn 1 .

=== Theorem 2.8:= ==

The probability of ri 0 fail'ures an,d ri 1 successes in, n, = n,0 + n,1 in,depen,den,t trials is

p [Eno,ni] = ( T/, ) (1 _ p)n-n1'[Jn1 = (,Tlno,) (1 _ p)nopn-n 0 .



The secor1d formula in this theorerr1 is t11e result of mult ipl}ring t11e probabilit}' of
'no failures in n, t rials by t h e nl1mber of Ol1tcomes vvith n,0 failures.

Example 2.20
In Examp le 1.19, we fou nd that a rando mly tested resistor was acceptab le with proba-
bility P[A) = 0.78 . If we random ly test 100 res istors, what is t he probab ility of Ti, the
event t hat i resistors test acceptable?

Testi ng each resistor is an indepe ndent trial with a success occurring when a resistor is
acceptab le. Thus for 0 < i < 100 ,

P [T;] = (1~ ) (0.78);(1 -

0.78) 100 - i (2.19)

We note that ou r int uition says that since 78% of the res istors are acceptable , t hen
in testing 100 resistors , the nu m ber acceptab le should be near 78. However, P[T78 ) ~
0.096 , which is fa irly sma l l. This shows t hat although we might expect the number
acceptable to be close to 78, that does not mea n that the probabi lity of exact ly 78
acceptab le is high.

c:::== Example 2.21

To com m unicate one bit of information re liably, cellu lar phones transmit the same binary
symbo l fi ve t imes . Thus the in formation "zero" is transmitted as 00000 and "one" is
11111. Th e rece iver detects t he correct information if three or more binary symbols are
rece ived correctly. What is the in formatio n error pro bability P[E], if the bin ary symbo l
error probabi lity is q = 0.1?

In th is case, we have five t r ia ls correspond ing to the five t im es the binary sym bo l is
sent. On eac h trial, a success occ urs when a bin ary symbol is rece ived correctly. T he
probabi lity of a success is p = 1- q = 0.9. T he error event E occurs when t he number
of successes is strictly less t han t hree:

P [E) = P [Eo,5) + P [E1,4) + P [E2,3) (2.20)

= (~)1/5 + G)pq + (Dp 2q3 = o.00856 . (2.21 )

By increasing the number of bina ry sym bols per information bit fro m 1to5, the cellular
phone reduces the probabi lity of error by more than one order of magnitude, from 0 .1
to 0 .0081.

Nov.r suppose we perforrr1ri independer1t repetitions of a subexperirr1ent for which

there are rn, possible Ol1tcomes for any subexperirr1er1t. That is>the sample space
for eac11 Stlbexperirnent is (so >... , Srn-1) and every event in one Stlbexperirnent is
independent of the events in all t11e ot11er Sl1bexperirnen ts . Therefore, in every
subexperiment t11e probabilities of correspondir1g events are t he same and can
t1se the notation P [sk) = Pk for all of the subexperiments.


An outcom e of t he experiment cor1s ist s of a sequen ce of ri subexperimer1t out -

corr1es. In t he probability t ree of t he ex perirr1ent, eac11 r1ode 11as m branches and
branch i has probability 'fJi The probability of an ot1tcorr1e of t11e seql1en tial experi-
rr1er1t is just t he prod uct of t hen, branch probabilit ies on a path frorn the root of t he
t r ee t o t he leaf represent ing t he ot1tcorr1e. For exarr1ple, with ri = 5, t h e outcorne
s2sos3s2s4 occ11rs -vvith probability '[J2'fJo'fJ3'fJ2'fJ4 . vve -vvan t to find t he probability of
t he event

En0 , ... ,n=_ 1 = {so occurs 77,o t imes, ... ,srn- 1 occurs n,rn- 1 t irnes} (2.22)

Note t 11at t he notation En 0 ,. .. ,n=_ 1 irr1plies t h at t he experirnent cons ists of a se-

quence of 77, = n,o + + r1,rn- l t rials.
T o calculate P [En0 , ... ,n=_ 1 ], \Ve obser\re t hat t he probability of t he 011tcorne

So So S 1 S1 Srn-1 Srn- 1 (2.23)

---~ ---~ "V"" ~
rio t imes n ,1 times n ,rn - l t i1nes


no n1 nrn -1
Po P1 'fJrn- 1 (2.24)

Next , -vve observe t hat ar1y ot11er ex perimental outcome that is a reor dering of the
preceding sequence 11as t11e same probabilit}' because or1 eac11 pat11 t11rough t he t ree
to such an outcome there are rii occurrer1ces of s,i As a result,

P [Eno , ... ,nrn- 1 ] -_ 11!f

A n1
lVJ ' ) l- 1 P2

-vvhere M, t11e r1urr1ber of s uc11 ot1tcorr1es , is the rr1ult inorr1ial coefficien t (,n0 , ... ~~i= _ 1 )
of Defir1it ion 2.2. Applying Theorerr1 2.7, we h a\re t he follo\ving theorem:

.-----== Theorem 2.9

A subexperirnen,t has sarriple space Ssv,b = {so, ... , Srn- 1} 111ith P [si] = '[J,i, . For
77, = rio + + n,1n-1 ir1,deper1,der1,t trials, the probability of n,i occ11,rer1,ces of Si,
'i = 0 , 1 , . . . , rn, - 1, is

'/7, ) no nrn - 1
( Po Prn- 1
n,o, , Tl,rn- 1

Exa mple 2.22

A packet processed by an Int ernet router carries either audio information wit h probability
7 / 10, video, with proba bil ity 2 / 10, or text with probability 1 / 10. Let Ea ,v,t denote the
event that the router processes a aud io packets, v video packets, and t text packets in
a seq uence of 100 packets. In this case,

100 ) ( 7 ) a ( 2 ) 'U ( 1) t
p [Ea ,v,t] = ( a,v ,t lO lO lO (2.26)


Kee p in mind t hat by the extended def inition of the m ultinom ial coefficient , P[Ea,v,t]
is nonzero on ly if a + v + t = 100 and a, v, and tare nonnegative integers.

Example 2.23
Continu ing w ith Example 2 .16, suppose in testing a microprocessor that all four grades
have probabi lit y 0.25, independent of any other microprocessor. In testing n, = 100
m icroprocessors, w hat is the proba bility of exactly 25 microprocessors of each grade?

Let E 25 ,25 ,25 ,25 denote the probab ility of exactly 25 mi croprocessors of each grade .
From T heorem 2.9,

2 5 25
~~~ , 100 (2.27)
P [E2s,2s,2s,2s] = ( , ) (0.25 ) = 0.0010.

==-- Quiz 2. ] ___,;;==

Dat a packet s containir1g 100 bits are transmitted over a corr1municatior1 lir1k. A
transmitted bit is received in error (eit11er a 0 sent is mistaken for a 1, or a 1 sen t
is mistaker1 for a 0) wit11 probability E = 0.01, independent of the correctness of
an}' ot11er bit. T he packet has been coded in Sl1ch a v.ra}' that if three or fev.rer bits
are received in error, then those bits can be corrected. If rnore thar1 three bits are
received in error, then the packet is decoded with errors .

(a) Let Ek , lOO-k denote t11e ever1t that a received packet 11as k bits in error and
100 - k correctl}' decoded bits. vVhat is P [Ek,100-k ] for k = 0, 1, 2, 3?
(b) Let C denote the event that a packet is decoded correctly. vV11at is P [C)?

2.4 Relia bility Analysis

To find the Sl1ccess probabilit}' of a complicated process \vith corn-

ponents in series and corr1ponents in parallel, it is helpful to con-
sider a grol1p of components ir1 series as one equivalent cornponent
and a grotlp of cornponents in parallel as another eq11ivalent com-
ponent .
Sequential experimer1ts are models for practical processes that depend on several
operations to succeed. Exarr1ples are rr1anufacturing processes that go t hrough se\r-
eral stages, and cornrnur1ications systerns t hat rela}' packets t11rol1gh several rol1ters
bet-vveen a SOl1rce and destinatiori. In sorr1e cases, the processes cor1tain redundant
componen ts t hat protect the en t ire process frorr1 the failure of one or more com-
por1er1ts. In t11is section vve describe the sirnple case ir1 \vhich all operations ir1 a



WI w2 w3 wl

Components in Series Components in Parallel

Figure 2.2 Serial a nd parallel d evices.

process st1cceed v.rith probabilit:y '[J ir1depender1t of the success or faill1re of ot.her
Let Wi denote the event t hat corr1por1ent i succeeds. As depict ed in F igt1re 2.2 ,
there are basic types of operatior1s.
Cornpon,erits in, series. The operation succeeds if all of its components st1cceed.
One exarnple of such an oper ation is a seqt1en ce of cornputer programs in
"'' hich each progra rn after t he first one uses the result of the previous pro-
grarn. Therefore, t he corr1plete oper ation fails if any cornponent program
fails . vVhenever t 11e operation cor1sists of k cornponents in series , r1eed
all k components t o st1cceed in order t o 11ave a su ccessful operation. The
probability t 11at the operatior1 succeeds is


If the independent cornponents ir1 parallel have different success probabilities

P1, P2 ... '[Jn, the operat ion succeeds v.rith probability


With componer1ts in series, the probability of a successful operatior1 is lovver

than the success probability of t11e weakest corr1por1ent.
Cornporien,ts in, parallel. The operation su cceeds if a/ny cornponent vvorks.
This operation occurs vvhen vve introduce redur1dancy to prornote reliabilit y .
In a redundant systerr1, such as a space shuttle , there are n, corr1puters on board
so t 11at the shuttle can cor1tinue to fur1ction as lor1g as at least one cornputer
operates successfully. If the componer1ts are in parallel, the operation fails
"'' hen all elerr1ents fa il, so have

p [Tc] = p [T!Tlf W~ ... W~] = (1 - p)n. (2.30)

The probability that the para llel operation st1cceeds is

P [VV] = 1 - P[Wc] = 1 - (1 - p)n. (2.31)



w, w2 w5

w3 w4 w6

Figure 2.3 The operation described in Example 2.24. On the left is t he origi nal operation.
On t he right is t he equivalen t operation vvith each pair of series con1ponents replaced Virith
an equivalent component.

If t he independent cornponents ir1 parallel have different success probabilities

P1> P2 ... J>n, the oper ation fails vvith probability

P [vT!c] = P [W{; W~ W~] = (1 - r>1) x (1 - r>2) x x (1 - r>n) (2.32)

The probability that t he parallel oper ation st1cceeds is

p [W] = 1 - p [WC ] = (1 - Pl ) x (1 - r>2) x ... x (1 - Pn) . (2.33)

vVith componer1ts in parallel , t he probabilit}r that the operation SllCCeeds is

higher t:han the probability of success of t he strongest corr1ponent.
We can ar1a.lyze corr1plicated cornbinations of cornponents ir1 series and in parallel
b}' reducing several cornpor1er1ts in parallel or corr1pon en ts in series to a single
equiva.lent corr1ponent.

Example 2.24
An operation co nsists of two redundant parts. The fi rst part has two components in
series (W1 and W2 ) and the seco nd part has two compo nents in series (vT
/ 3 and vT14 ).
Al l components succeed with probabil ity p = 0. 9. Draw a diagram of t he operat ion
and calculate the probabi lity t hat the operation succeeds.

A diagram of the operation is shown in Figure 2.3. We can create an eq uivalent

component , W 5 , w ith probabi lity of success r>s by observing t hat for the combination
of W1 and vTl2 ,


Simi larly, the combination of W 3 and W4 in series produces an equivalent com ponent ,
W 6 , w ith probability of success r>6 = p 5 = 0.81. The entire o peration then consists of
W 5 and W6 in para llel , wh ich is also shown in Figure 2 .3. The success probab il it y of
the o peration is

P [W] = 1 - (1 - r>s) 2 = 0. 964 (2.35)

We could co nsider the comb ination of W5 and W5 to be an equivalent compone nt lV1

w ith success proba bility r>7 = 0.964 and then ana lyze a more complex operation that
contains vV7 as a component .

2.5 l\IIA TLAB 55

Note that in Equation (2.29) V\'e corriputed t.he probability of Sl1ccess of a pro-
cess V\rit h componen ts iri series as tlie product of t h e success probabilit ies of t he
components . The reason is that for the process to be su ccessful , all corriponents
rriust be su ccessful. The event { all cornponents successful } is the intersectiori of
the individual Stlccess events and the probability of t he intersection of everits is
the product of t he tvvo success probabilities. Ori the other hand, v.rith corriponents
in parallel, tlie process is successful v.rhen one or rriore corriponents is Sl1ccessful.
The event { orie or rnore corriponent s successful} is the 11niori of individual success
probabilities . Recall that t he probability of t lie union of tV\ro ev ents is the differ-
ence bet\veen the surri of t he individual probabilit ies and the probabilit y of their
iritersection. Tlie forrriula for t he probabilityf of rriore tlian t\vo events is even rnore
complicated . On t he ot11er h and, V\rith cornponen ts in parallel, the process fails
V\' hen all of tlie corriponerits fail. The e\ren t {all cornporien ts fail} is the intersec-
tion of the individual failure probabilities . Each failure probability is tlie difference
betvveen 1 and the success probabilit y. Hence in Eql1atiori (2.30) and Exarriple 2.24
V\'e first corripute the failure probability of a process wit h components in para.llel.
In general , De l\!Iorgan 's la\v (Theor erri 1.1) allovvs us to express a uriion as the
corriplernent of an intersection and vice versa. Therefore, in rriany applications of
probabilit}', vvhen it is difficult to calculat e directly t he probabilit}' \Ve need, V\'e can
often calcl1late the probability of the corriplem entar}' e\rent arid then Sl1btract this
probability from 1 t o find t he ans\ver. Tliis is ho\v vve calculated t he probabilit y of
success of a process V\rith corriponents in parallel.

Quiz 2.4
A merriory rriodule consist s of nine chips. The device is designed vvit h redundan cy
so tliat it \vorks even if one of its chips is defective. Eacli chip contains n, t r ansistors
and functions properly only if all of its transistors V\rork. A trarisistor V\rorks \vitli
probabilit}' '[J independent of an}' other transistor.

(a) W h at is the probability P (C ) that a cliip V\rorks?

(b) vVliat is tlie probability P(JIJ) that the rriernory rriodt1le \VOrks?
( c) If '[J = 0.999, vvliat is t he rriaxirrn1rri nurnber of transistors per chip n, that
produces P(M] > 0.9 (a 90% success probabilit y for tlie m erriory module)?
(d) If the rrierrior:y rriodl1le cari tolerate two defective chips , \vliat is t he rriaximt1rri
nurnber of trarisistors per chip ri t hat prodt1ces P [M] > 0.9?

2.5 l\!JATLAB

T\vO or tliree liries of l\IIATLAB code are st1fficient to sirriulate an

arbitrary number of sequential trials.
vVe recall frorri Section 1.7 tliat rand(1,m) <p sirnulat es 'IT/, coin flips vvith P (heads] =
p . Because l\IIATLAB can sirriulat e these coin flips much faster thari we can actu-
ally flip coins , a few liries of 1\11.A.TLAB code can yield quick sirriulations of rriany
experirrien ts.


y =
Columns 1 through 12
47 52 48 46 54 48 47 48 59 44 49 48
Columns 13 through 24
42 52 40 40 47 48 48 48 53 49 45 61
Columns 25 through 36
60 59 49 47 49 45 48 51 48 53 52 53
Columns 37 through 48
56 54 60 53 52 51 58 47 50 48 44 49
Columns 49 through 60
50 46 52 50 51 51 57 50 49 56 44 56

Figure 2.4 The sin1ulation output of 60 rep eated experin1ents of 100 coin flips .

c:::==-- Example 2.25

Using l\/IATLAB, perform 60 experiments. In each experiment, flip a coi n 100 t imes and
reco rd the number of heads in a vect or Y such that the j t h eleme nt Y:'i is the number
of heads in subexperiment j .

>> X=rand(100,60)<0.5; The MATLAB code fo r t his task appears o n the left. The
>> Y=surn(X,1) 100 x 60 matrix X has 'i>jth element X(i,j)=O (tai ls)
or X(i,j)=1 (heads) to ind icate the resu lt of flip i of
subexperiment j. Since Y sums X across t he first dimension, Y(j) is the numbe r of
heads in t he j th subexperiment. Each Y (j) is between 0 a nd 100 and general ly in t he
neighborhood of 50. The output of a sample run is shown in F igu re 2 .4.

Example 2.26
Sim ulate the testi ng of 100 m icroprocessors as described in Examp le 2 .23. Your o utput
should be a 4 x 1 vector X such that X ,i is t he nu mber of grade i microprocessors.

%chiptest.m T he f irst line generates a row vector G of random grades

G=ceil(4*rand(1,100)); for 100 m icroprocessors. The p ossible test scores are in
T=1: 4; the vector T. Lastly, X=hist (G, T) returns a histogram
X=hist(G,T); vector X such t hat X(j) counts the number of elements
G(i) that eq ual T(j) .
Note that " h elp h ist" wi ll show the variety of ways that the hist fu nction ca n be
cal led . Morever, X=hist (G, T) does more than just co unt the num ber of eleme nts of
G t hat equal each eleme nt o f T. In pa rticular, hist(G, T) creates bins centered arou nd
each T(j) and counts the number of elements of G t hat fall into each bin.

Note that in 1\IIA.TLAB all variables are assurr1ed to be rnat rices. In w riting
MATLAB code, X rr1a:y be an n, x rn, rnatrix>an n, x 1 colurr1n vector, a 1 x 1T1, rO\iV
vector>or a 1 x 1 scalar. In 1\IIATLAB , we write X(i,j) to index thei, jth elerr1er1t,
By contrast, in t11is t ext , we \rary the notation dep endir1g on \iVhether have a


scalar X ) or a vector or rr1atrix X . In addition) t1se X i,j to denote the 'i) jth
element. T11us) X and X (in a M.A.TLAB code fr agrnent) ma}' both refer to the sarr1e

Quiz 2.5
The flip of a thick coin }'ields heads with probability 0 .4, tails vvith probability 0.5 )
or lands on its edge vvith probability 0.1. Sirnulate 100 thick coin flips. Yol1r outpl1t
sl1ol1ld be a 3 x 1 vector X such that X 1 ) X2 ) and X3 are t11e nurr1ber of occurrences
of heads, tails) and edge.

Difficulty: Easy Moderate D ifficu lt Experts Only

2.1 .1 Suppose you flip a coin t'vice. On 2.1.5 Suppose that for the general popula-
any flip , the coin comes up heads with prob- t ion, 1 in 5000 people carries the human im-
ability 1/4. Use H i and Ti to denote the munodeficiency virus (HIV). A test for the
result of flip i. presence of HIV yields either a positive ( +)
(a) What is t he probability, P[H1 IH2], that or negative (-) response. Suppose t he test
the first flip is heads given that the sec- gives the correct ans,ver 993 of the t ime.
ond flip is heads? What is P [- IHJ , the conditional probabil-
ity that a person tests negative given that
(b) What is the probabilit:y that t he first
the person does have the HIV virus? What
flip is heads and the second flip is tails?
is P [HI+], the condit ional probability that
2.1 .2 For Example 2.2 , suppose P[G1) = a randomly chosen person has the HIV virus
1/ 2, P [G2 IG1) = 3/4, and P[G2 IR1] = 1/4. given that the person tests positive?
F ind P[G2), P[G2 IG1), and P[G1 IG2).
2.1.6 A machine produces photo detectors
2.1 .3 At the end of regulation time, a bas- in pairs. Tests show that the first photo
ketball team is trailing by one point and a detector is acceptable with probability 3 /5.
player goes to the line for t\vo free throvvs. W hen the first photo detector is accept-
If the player inakes exactly one free throw, able, the second photo detector is accept-
the game goes into overtime. The proba- able with probability 4/5. If the first photo
bility that the first free throw is good is detector is defective, the second photo de-
1/ 2. However , if the first attempt is good, tector is acceptable \vit h probability 2/5.
the player relaxes and the second attempt is
good \Vi th probability 3 / 4. However, if the (a) F ind the probability that exactly one
photo detector of a pair is acceptable.
player misses the first attempt, the added
pressure reduces the success probability to (b) Find the probability t hat both photo
1/4. What is the probability that the game detectors in a pair are defective.
goes into overtime?
2.1 .4 You have t\vo biased coins. Coin A 2.1.7 You have two biased coins. Coin .4
comes up heads with probability 1 /4. Coin comes up heads \Vith probability 1/ 4. Coin
B comes up heads \v ith probability 3/4. B comes up heads w ith probability 3/4.
However, you are not sure which is \Vhich, Ho,vever , you are not sure which is w hich
so you choose a coin randomly and you flip so you flip each coin once, choosing the first
it. If t he flip is heads, yo u guess that the coin randomly. Use H i and Ti to denote the
flipped coin is B; otherwise, you guess that result of flip i. Let .41 be the event t hat coin
t he flipped coin is .4. \tVhat is the probabil- A was flipped first. Let B1 be the event that
ity P[C) that your guess is correct? coin B was flipped first. \tVhat is P[H1 H 2)?


Are H 1 and H 2 independent? Explain your counted in the attempt to win t'vo of three
answer. and that Dag,vood never performs any un-
necessary flips. Let Hi be the event that
2 .1 .8 A particular birth defect of the heart
D ag,vood flips heads on try i. Let Ti be the
is rare; a ne,vborn infant w ill have t he de-
event t hat tails occurs on flip i.
fect D 'vith probability P[D) = 10- 4 . In
the general exa1n of a ne,vborn, a particular (a) Draw the tree for this experiment. La-
heart arrhythmia A occurs with probability bel t he probabilities of all outcomes.
0. 99 in infants 'vi th the defect. However , (b) \i\fhat are P [H3) and P [T.1)?
the arrhythmia also appears ,;vith probabil- ( c) Let D be the event t hat Dag,vood must
ity 0.1 in infants withou t the defect. \!\!hen diet. What is P[D)? \i\!hat is P[H1ID J?
the arrhythmia is present, a lab test for the ( d) Are H 3 and H 2 independent events?
defect is performed. The result of the lab
test is either positive (event r+) or nega- 2.1.10 The quality of each pair of photo
tive (event T - ). In a newborn 'vith the de- detectors produced by the machine in Prob-
fect, the lab test is positive 'vith probabil- lem 2.1.6 is independent of the quality of
ity p = 0.999 independent from test to test. every other pair of detectors.
In a ne,vborn ,;vithout the defect , the lab (a) \!\!hat is the probability of finding no
test is negative 'vith probability p = 0.999. good detectors in a collection of n pairs
If the arrhythmia is present and the test produced by the machine?
is positive, then heart surgery (event H) is (b) How many pairs of detectors must the
performed. machine produce to reach a probability
(a) Given the arryth1nia A is present, 'vhat of 0.99 that there 'vill be at least one
is the probability the infant has the de- acceptable photo detector?
fect D?
2.1.11 In Steven Strogatz's New York
(b) Given that an infant has the defect, Times blog http: I I opinionator. blogs.
w hat is the probability P [H IDJ that nytirnes.corn/2010/04/25/chances-are/
heart surgery is performed? ?ref=opinion, the follo,ving problem 'vas
( c) Given that the infant does not have posed to highlight the confusing character
the defect, what is t he probability of conditional probabilities.
q = P [HIDc) t hat an unnecessary heart Before going on 1;acation for a 71Jeek, you
surgery is performed? ask yo1J,r spacey friend to 71Jater yo1J,r ailing
(d) F ind the probability P[H) that an in- plant. Without 111ater, the plant has a 90
fant has heart surge1y performed for percent chance of dying. E1;en 71Jith proper
the arrythmia. 111atering, it has a 20 percent chance of dy-
ing. And the probability that your friend
(e) G iven that heart surgery is performed, 1Dill forget to 71Jater it is 3 0 percent. (a)
w hat is the probability that the new- What's the chance that yo1J,r plant 7Dill sur-
born does not have the defect? vive the 111eek? {b) If it's dead 71Jhen you
return, 71Jhat 's the chance that your friend
2.1 .9 Suppose Dagwood (Blondie's hus- forgot to 71Jater it? ( c) If yo1J,r friend forgot
band) wants to eat a sandwich but needs to to 11Jater it, 71Jhat 's the chance it'll be dead
go on a diet. Dagwood decides to let the flip 1Dhen you return?
of a coin determine 'vhether he eats. u sing
an unbiased coin, Da~vood w ill postpone Solve parts (a), (b) and (c) of t his problem.
the diet (and go directly to the refrigerator) 2.1.12 Each t ime a fishe1man casts his
if eit her (a) he flips heads on his first flip or line, a fish is caught ,;vith probability p, in-
(b) he flips tails on the first flip but then dependent of 'vhether a fish is caught on
proceeds to get t'vo heads out of the next any other cast of t he line. The fisherman
three flips. Note that the first flip is not will fish a ll day until a fish is caught and


then he 'vill quit and go home. Let Ci de- 2.2.5 In a game of rummy, you are dealt
note the event that on cast i the fisherman a seven-card h and.
catches a fish. Draw the tree for this exper- (a) W h at is the probability P[R7 ] that your
iment and find P[C1 ), P[C2], and P[Cn] as hand has only red cards?
func t ions of p.
(b) \i\fhat is the probability P [F] that your
2.2.1 On each turn of the knob, a gum- hand has only face cards?
ball machine is equally likely to dispense a (c) \tV hat is t he probability P[R1F] that
red, yellow, green or blue gumball, indepen- your h and has only red face cards?
dent from turn to turn. After eight turns, (The face cards are jack, queen, and
what is the probability I>[R2Y2G2B2] that king.)
you have received 2 red, 2 yellow, 2 green
and 2 blue gumballs? 2.2.6 In a game of poker, you are dealt a
five-card hand.
2.2.2 A Starburst candy package contains
12 individual candy pieces. Each piece is (a) \t\fhat is the probability I>[R5 ] that your
equally likely to be berry, orange, lemon, or hand has only red cards?
cherry, independent of all other pieces. (b) \i\fhat is the probability of a "full
house" with three-of-a-kind and two-of-
(a) What is the probability that a Star-
burst package has only berry or cherry
pieces and zero orange or lemon pieces? 2.2. 7 Consider a binary code 'vi th 5 bits
(b) What is the probability that a Star- (0 or 1) in each code 'vord. An example
burst package has no cherry pieces? of a code word is 01010. How many differ-
en t code words are there? Ho'v many code
( c) \i\fhat is t h e probability P [F1] that all
twelve pieces of your Star burst are the words have exactly three O's?
same flavor? 2.2.8 Consider a language containing four
letters: A , B, C, D. Ho'v many three-letter
2.2.3 Your Starburst candy has 12 pieces, words can you form in this language? Ho'v
three pieces of each of four flavors: berry, many four-letter 'vords can you form if each
le1non, orange, a nd cherry, arranged in a letter appears only once in each word?
random order in the pack. You draw the
first three pieces from the pack. 2.2.9 On an American League baseball
team 'vith 15 field players and 10 pitchers,
(a) What is the probability they are all t he the manager selects a starting lineup with
same flavor? 8 field players, 1 pitcher, and 1 designated
(b) What is the probability they are all dif- hitter. The lineup specifies the players for
ferent flavors? these positions and the positions in a bat-
ting order for the 8 field players and desig-
2.2.4 Your Starburst candy has 12 pieces, nated hitter. If t h e designated hitter must
three pieces of each of four flavors: berry, be chosen among all t he field players, how
lemon, orange, a nd cherry, arranged in a many possible starting lineups are there?
random order in the pack. You draw the
2.2.10 Suppose that in Proble1n 2.2.9, the
first four pieces from the pack.
designated hitter can be chosen from among
(a) What is t he probability P[F1] they are all the players. How many possible starting
all t he same flavor? lineups are there?
(b) What is t he probability P [F4] they are 2.2.11 At a casino, the only game is num-
all different flavors? berless roulette. On a spin of the 'vheel,
(c) \i\f hat is the probability P [F2 ] that your the ball lands in a space wit h color red ( r),
Star burst has exactly two pieces of each green (g), or black ( b). The wheel has 19 red
of t'vo different flavors? spaces, 19 green spaces and 2 black spaces.


(a) In 40 spins of the wheel, find t he prob- of the Celt ics winning eight straight cham-
abili ty of the event pionships beginning in 1959? A lso, w hat
would be t he probability of the Celtics win-
A= {19 reds, 19 greens, and 2 blacks} . ning the t it le in 10 out of 11 years, starting
in 1959? G iven your answers, do you trust
(b) In 40 spins of the 'vheel , find the prob- this simple probability model?
ability of G19 = {19 greens}.
2.3.3 Suppose each day that you drive to
( c) The onl y bets a llowed are red and
work a traffic light that you encounter is ei-
green. Given that you randomly choose
ther green \Vith probability 7 /1 6 , red with
to bet red or green, 'vhat is t he proba-
probability 7 / 16, or yello\v 'vith probability
bility p that your bet is a vvinner?
1/8, independent of the status of the liaht 0
on any other day. If over the course of five
2.2.12 A basketball team has three pure
days, G, Y, and R deno te the number of
centers, four pure for\vards, four p1ue
times the light is found to be green, yello,v,
guards, and one swingman w ho can p lay
or red, respectively, \vhat is the probability
either guard or forward. A pure posit ion
that P [G = 2, Y = 1 , R = 2]? _Also , 'vhat is
p layer can play only the designated posi-
the probability P [G = R]?
t ion. If the coach must start a lineup with
one center, t\vo for,vards, and two guards, 2.3.4 In a game between t\vo equal teams,
how inany possible lineu ps can the coach the home team \Vins \Vith probability p >
choose? 1/ 2. In a best of t h ree playoff series, a
2.2.13 An instant lottery t icket consists team 'vith the home advantage has a game
of a collection of boxes covered with gray at home, followed b y a game a\vay, followed
\Vax. For a subset of the boxes, the gray wax by a home game if necessary. The series is
hides a special mark. If a p layer scratches over as soon as one team \vins t\vo games.
off the correct nu1nber of the marked boxes \tVhat is P [H], t he probability t hat the team
(and no boxes 'vithout the mark) , then that with the ho1ne advantage wins t he series? Is
ticket is a \Vinner. Design an instant lottery the home advantage increased b y playing a
three-game series rather than a one-game
game in 'vhich a player scratches fi ve boxes
playoff? That is, is it true that P [HJ > p
and the probability that a ticket is a \vinner
for all p > 1/2?
is approximately 0.01.

2.3.1 Consider a binary code 'vith 5 bits 2.3.5 A collection of field goal kickers are
(0 or 1) in each code \vord. An example of divided into groups 1 and 2. Group i has
a code word is 01010. In each code word a 3i kickers. On any kick, a kicker fro1n
bit is a zero with probability 0 .8 , indepen- group i vvill kick a field goal with proba-
dent of any other bit. bility 1/(i +l), independent of the outcome
of any other kicks.
(a) What is the probability of the code
word 00111? (a) A kicker is selected at random from
among all the kickers and attempts one
(b) What is the probabili ty t hat a code field goal. Let K be the event that a
word contains exactly three ones? field goal is kicked. F ind P [K].

2.3.2 T he Boston Celtics have won 16 (b) T'vo kickers are selected at random' J{J
NBi\. championships over approximately 50 is the event that kicker j kicks a field
years. Thus it may seem reasonable to as- goal. Are J{ i and J{ 2 independent?
sume that in a given year the Celt ics \Vin (c) _A. kicker is selected at random and at-
the t it le \Vith probability p = 16/5 0 = 0.32, tempts 10 fie ld goals. Let M be the
independent of any other year. G iven such number of inisses. F ind P [M = 5].
a model, what \Votlld be the probabili ty


2.4.1 A particular oper ation has s ix com- record 'vhether it \Vas heads (Hi = 1) or
ponents. Each component has a failure tails (Hi = 0), and Ci E { 1, 2} \Vill record
probability q, indepe ndent of a ny other which coin \Vas picked.
component. A successful operation requires
2.5.2 Following Quiz 2.3, s u ppose the
both of t he following condit ions:
communication link has different error
Components 1, 2, and 3 all \Vork, or probabilities for trans1nitt ing 0 and 1.
component 4 \Vorks. \tVhen a 1 is sent, it is received as a 0 with
Component 5 or component 6 works. probability 0.01. \tV hen a 0 is sent, it is re-
Dra'v a block diagram for this operation ceived as a 1 'vi th probability 0 .03. Each
similar to those of F igure 2.2 on page 53. bit in a packet is still equally likely to be a
Derive a formula for t he probability P[W] 0 or 1. Packets have been coded such t hat if
t hat the operation is successful. fi ve or fewer bits are received in error, t hen
the packet can be decoded. Simulate the
2.4.2 We wish to modify t h e cellular tele-
transmission of 100 packets, each contain-
phone coding system in Example 2.21 in
ing 100 bits. Count the number of packets
order to reduce the num ber of errors. In
decoded correctly .
particular, if there are t\vo or t hree zeroes
in t he received sequence of 5 bits , \Ve \vill 2.5.3 For a failure probability q = 0.2,
say that a deletion (event D) occurs. O t h- s imulate 100 tria ls of the s ix-component
er,vise, if at least 4 zeroes are received, t he test of I>roblem 2.4. l. Ho\v many devices
receiver decides a zero \Vas sent, or if at least were found to work? Perform 10 repetitions
4 ones are received , the receiver decides a of the 100 trials. What do you learn from
one was sent. We say t hat an error occurs 10 repetitions of 100 trials com pared to a
if i \Vas sen t and the receiver decides j f=. i simulated experiment vvith 100 trials?
\Vas sent. For t his modified protocol, \vhat
is the probability P [E] of a n error? W hat 2. 5 .4 \i\1 rite a JVIA TLAB function
is the probability P[D] of a deletion? N=countequal(G,T)
2.4.3 Suppose a 10-digit phone number is that duplicates the action of h i st (G, T) in
transmitted by a cellular phone using four Example 2.26. Hint : Use ndgr i d.
binary symbols for each d ig it, using t he
model of binary symbol errors and deletions 2.5.5 In this problem, \Ve use a MATLAB
given in Problem 2.4.2. Let C denote the simulation to "solve" Problem 2.4.4. Recall
number of bits sent correctay, D t he num- that a particular operation has six compo-
ber of deletions, and E the number of er- nents. Each componen t has a failure prob-
rors. Find P[C = c, D = d, E = e] for all c, ability q independent of any other compo-
d, and e. nent. The operation is successful if both

2.4.4 Cons ider the dev ice in Prob- Components 1, 2, and 3 a ll work, or
lem 2.4. l. Suppose we can replace any one component 4 \vorks.
component \vith an ultrareliable componen t Component 5 or component 6 \Vorks.
that has a failure probability of q/2 = 0.05.
\tVith q = 0.2, simulate the replacemen t of
\i\Thich component should we replace?
a component \Vith an ultrareliable compo-
2.5 .1 Build a IVIATLAB simulat ion of 50 nent. For each replacement of a regular
trials of t he experiment of Example 2.3. component, perform 100 trials. Are 100
Your ou tput should be a pair of 50 x 1 vec- trials sufficient to decide which componen t
tors C and H . For the ith trial, Hi will should be replaced?

Discrete Random Variables

3.1 Definitions
A ra ndorr1 va riable assigns nurr1bers to outcorr1es in the sample
sp ace of an experiment.

Cha pter 1 d efir1es a probability rr1odel. It begins wit h a physical rr1odel of a n

experirnent . An experiment consist s of a procedure and observat ions. T .h e set of all
possible observations, S, is t 11e sarr1ple sp ace of t he experirr1ent . Sis t he beginning
of t h e rnathernatical proba,bility model. In addition t o S, the rr1athernatical model
includes a rl1le for assigr1ing ntlrnbers betv.reen 0 a rid 1 to set s A in S. ThtlS for
every A c S, t he rr1odel gives us a probability P (A], vvhere 0 < P (A) < 1.
In this c11apt er and for m ost of t he rernainder of t his book , '"'e exarr1ine probabilit y
rr1odels that assign r1t1mbers to t 11e ot1t comes ir1 t he sarr1ple space. \ Nl 1er1 we obser ve
on e of these nt1rnbers, we refer to the observation as a ran,dorn 'variable . Ir1 our
notation, the nam e of a rai,ndorr1 variable is a lways a capital letter , for exarr1ple, X.
The set of possible values of X is t 11e ran,ge of X. Since often consider rnore
than one r ar1dom variable at a t ime, we denote the r ange of a r ar1dom variable b}'
the letter S \A.Tith a Sl1bscrip t t h at is t he n a rr1e of the ra ndorn variable. T11us Sx
is t 11e range of r andorr1 va,riable X , Sy is t 11e r ange of r ar1dom variable y , and so
forth. \Ve 11se S x t o denote t he r ange of X because the set of all possible valt1es of
X is analogol1S t o S, t 11e set of all possible outcornes of ar1 experiment.
A probability model al\va}'S begins wit 11 an experiment. E ach rar1dorn variable
is related directly t o this experimer1t . There are three types of relationships.
1. The randorr1 variab le is t 11e obser vation.

- - - Example 3.1- - -
Th e experime nt is to attach a photo detector to an optica I fi ber and count the
number of photons arriving in a one-microsecond t ime interva l. Each observation



is a random variable X . The range of X is S x = {O, 1, 2, ... }. In this case, S x,

the range of X, and the sample space Sare identical.

2. The r ar1dom variable is a fur1ction of t11e observation.

- - - Example 3.2'- - -
The experiment is to test six integrated circuits and afte r each test observe
whether the circuit is accepted (a) or rejected (r). Each observation is a sequence
of six letters where eac h letter is eit her a or r . For example, s 8 = aaraaa. The
sample space S consists of t he 64 possible sequences. A random variable related
to t his experiment is N, the nu mber of accepted circuits. For outcome s 8 , J\T = 5
circuits a re accepted. The ra nge of N is SN = {O, 1, ... , 6} .

3. The randorr1 v ariable is a ft1nction of another r andom variable.

Example 3.3
In Exam ple 3.2, the net reve nue R obta ined fo r a batch of six integrated circuits is
$5 for each circu it accept ed minus $7 for each circuit rejected. (This is beca use
for each bad circ ui t that goes out of t he factory, it wi 11 cost the company $7
to deal with t he customer's compla int and supp ly a good replacement circuit.)
When N circuit s are accepted , 6 - N circuits are rejected so that the net revenue
R is related to N by the functio n

R = g(N) = 5N - 7(6 - J\T) = 12N - 42 dollars. (3.1)

Si nce S1v = {O, ... , 6}, t he ra nge of R is

SR = {- 42, -30 , - 18, - 6, 6, 18, 30}. (3 .2)

The revenue associated with s 8 = aaraaa and all other outcomes for which N = 5

g(5) = 12 x 5 - 42 = 18 dollars (3 .3)

If have a probability rnodel for the integr ated circuit experirr1ent in Exarn-
ple 3.2 , we can use t11at probabilit}' rnodel to obtain a probability rr1odel for the
r andorn variable. The rerr1ainder of this chapter will develop rnethods t o c11arac-
terize probability models for random variables. We observe that in the preceding
exarr1ples, the val11e of a r.a,ndorr1 variable car1 al vvays be derived frorr1 the outcorne
of the ur1derlying experirn.er1t . This is not a coincidence. T11e formal definition of a
randorr1 variable reflects this fact.


Definition 3 .1 Random Variable

A random variable co'nsists of an, experim,en,t 'tuith a probability rneasv,re P[] de -
fin,ed on, a sarnple space S an,d a f11,rict'ior1, that assign,s a real ri11,rnber to each outcorne
in, the samJJle sp ace of the ex;perirnen,t.

This defir1ition acknowledges t 11at a r andom variable is t 11e result of ar1 underlyir1g
experirnent , but it also perrr1its llS t o separate the experiment , in p art icl1lar , t he
observa.tior1s, frorr1 t h e process of assigning numbers to Ol1tcomes . As -vve saw ir1
Exarnple 3.1 , the assigr1rn.er1t rr1ay be irr1plicit in the definition of t 11e experirr1er1t ,
or it may require further a n alysis.
In sorne defir1itions of experirr1ents, t he procedures contain variable par arneters .
In these experirnents, t here can be values of t h e pa r arneters for which it is irr1-
possible t o perform the o bserva,tions specified in the experiments. In t 11ese cases ,
the experiments do not produce r andom variables. -;. e r efer to experirr1ents -vvit h
p ararnet er settings t h at do riot produce randorn variables as 'irnproper experirnen,ts.

Example 3.4
The procedure of an experime nt is to fire a rocket in a vertica l direction f rom Earth's
surface with initial velocity V km / h. The observation is T seconds, the time elapsed
until the rocket returns to Earth. Under what conditions is the experiment improper?

At low velocities, V, the rocket wi 11 return to Earth at a random t i me T seconds that

depends on atmosp heric conditions and small detai ls of the rocket's shape and weight.
However, w hen V > v * ~ 40 ,000 km / hr, the rocket w il l not return to Earth. Thus, the
experiment is improper when V > v* because it is impossible to perform the specified

On occasion , it is importan t to ident ify the randorr1 v ariable X by t he function

X ( s) t hat rr1aps the sample out corne s t o the correspondir1g value of the r ar1dom
variable X. As needed , we -vvill write { X = ;i;} t o emphasize that there is a set of
sarr1ple points s ES for w11ich X (s ) = x . T h at is, "'irve h a;ve adopted the shorthand

{X = x} = {s ES IX (s) = x} . (3.4)

Here are some rr1ore r ar1dom variables :

A , t he number of st l1dents asleep in t he next probability lecture;
C , t 11e nt1rr1ber of texts you receive in the next hour ;
M, t he nt1rr1ber of minutes :you -vvait until the next text arrives.
Randorn variables A and C are discre te r andom variables. The possible values of
these r andom variables forrr1 a cot1ntable set. The underlying experiments h ave
sarnple spaces that are discrete. T he randorr1 v ariable M can be ar1:yr nonnegative
r eal r1l1rr1ber. It is a con,tin/uo11,s ran,dorn 'oariable. Its experirr1ent h as a cont ir1uous


sarr1ple space. In this c11apter, we st11dy the properties of discret e rar1dom v ariables.
Chapter 4 covers continuous r andorr1 variables.

Definition 3 .2 Discrete Random Variable

X is a discret e rar1,dorn variable if the rarige of X is a co11n,table set

The defining c11aracteristic of a discrete r ar1dom variable is that t11e set of possible
values can (ir1 principle) be listed, even t houg11 the list rr1ay be ir1finitely lor1g. Often ,
b11t not alvva:ys, a discret e randorr1 variable takes on ir1teger values. An exception is
the randorn variable related t o your probability grade. T11e experirr1er1t is to t ake
this co11rse and observe your gr ade. At Rutgers, t he sarnple space is

S = {F, D ,c,c+, B ,B+, A}. (3.5)

vVe use a funct ion G 1 (-) to rr1ap t his sarr1ple space int o a rar1dorn v ariable. For
exarr1ple, G 1 (A) = 4 and G 1 (F) = 0. The table

011tcornes F D c c+ A
0 1 2 2.5 3 3.5 4

is a cor1cise descriptior1 of the entire m apping.

G 1 is a discrete randorn v ariable with r ange Sa 1 = {O, 1, 2, 2.5 , 3, 3.5, 4}. H ave
yot1 t ho11ght about why we t ransform letter grades to numerical values? We believe
the principal reason is t hat it allows us to compute averages. Tllis is als o a n
important rr1otivation for creating randorr1 variables by assignir1g nurnbers to the
outcomes in a sarr1ple space. Unlike probability models defined on arbit rary sarnple
spaces, randorr1 variables h ave expected val'ues , vv11ich are closely related to averages
of data sets. We int roduce expected values formally ir1 Section 3.5.

Quiz 3. t =::::::..-
A student t akes tvvo cotuses . Ir1 each course , t he studen t v.rill earn eit her a B or
a C. To ca.lc11lat e a grade point aver age (GPA) , a Bis v.rort11 3 points and a C is
vvorth 2 poir1ts. The student 's GPA G 2 is t he surn of the poir1ts earn ed for each
course divided b}' 2. }/l ake a table of the sarr1ple space of t h e experirr1ent arid t11e
corresponding values of t11e GPA, G2.

3.2 Probability Mass Function

The PJ\!fF of randorn variable X expresses the probability rnodel

of ar1 experiment as a rr1athematical f11nctiori. T 11e f11nctior1 is the
probability P [X = 3'; ] for every number x; .


Recall that the probability model of a discrete randorri variable assigns a nt1mber
betvveen 0 arid 1 to each ot1tcorrie iri a sarnple space. vVhen we h a\re a discrete
randorri variable X , we express t he probabilit}' rnodel as a probability rriass function
(P MF) Px(x). Tlie argl1rr1ent of a P 1!{F ranges over all real nurnbers.

Definition 3.3 Probability Mass Function (PMF)

The probabili t y m ass f unction (PMF) of the discrete ran,dorn variable X is

Px(x) = P [X = x;]

Note tliat X = x is ari event corisisting of all Ol1tcomes s of the underlying exper-
iment for vvhicli X(s) = x; . On the other h a nd , Px(x) is a function ranging over
all real nl1rnbers x . For ar1y vall1e of x;, the functiori Px(x;) is the probabilit}' of tlie
event X = x .
Observe Ollr notation for a randorri \rariable and its PMF. vVe llSe an uppercase
let ter ( X in the preceding definition) for tlie narne of a randorn variable. We ust1all}'
t1se the corresponding lowercase letter ( x) to denote a possible value of the raridom
variable. The notation for the P 1!{F is t he letter P v.rith a st1bscript iridicating the
narne of tlie ra ndorn variable. T hus PR(r) is tlie notation for the P1!{F of raridom
variable R. In t hese examples, r and x are d11rnmy variables. Tlie sarrie randorn
variables and P1!{Fs COl1ld be denoted PR(v,) and Px(v,) or , indeed , PR(-) arid Px(-).
vVe deri\re the PMF from the sarriple space, the probabilit}' rnodel, and tlie ru le
that maps outcorries t o values of the random \rariable. vVe t hen graph a PMF by
rriarking on t he horizontal axis eacli \ralue \vith norizero probabilit}' and dravving a
vertical bar with length proportional to the probability.

Example 3.5
W hen the basketball player Wi lt Cha m berla in s hot two free throws, each s hot was
equally like ly eithe r to be good (g) or bad (b) . Each s hot that was good was worth 1
point. What is the PM F of X, t he number of points t hat he scored?
T here are four outcomes of th is experim ent: gg , gb, bg, and bb . A s imple tree diagra m
ind icates that each o utcome has probability 1/ 4. T he sa mple space and probabilities
of t he experi ment and the correspond ing va lues of X are given in t he tab le:

Outcomes bb bg gb gg
P[ ] 1/ 4 1/ 4 1/4 1/4
x 0 1 1 2

T he random variable X has three possible values correspond ing to three events:

{X = O} = {bb}, {x = 1} = {gb,bg}' {x = 2} = {gg} . (3.6)

Since each outcome has probabil ity 1/ 4, these three events have probabilities

P[X = O] = l / 4, P [X = 1] = 1 / 2, P [X = 2] = 1/ 4. (3.7)


We can express the probabilities of these events in terms of the probability mass function

( 1/4 .T, -- 0 )
1/2 .T, -- 1 )

Px(x) = (3.8)
1/4 ;i; = 2,
0 otherwise.

It is often usefu I or convenient to depict Px( ;r;) in two other display formats: as a bar
plot or as a tab le.

Px(x) .T, 0 1 2
Px(x) 1/4 1/2 1/4
-1 0 1 2 3 x
Each PMF display format has its uses. The function definition (3.8) is best when
Px(x;) is given in terms of algebraic functions of x for various subsets of Sx. The
bar plot is best for visualizing the probability masses. The tab le can be a convenient
compact representation when the PMF is a long list of sample values and corresponding

No matter 11ovv the Px(x) is forrr1atted, the PMF of X states t11e value of Px(x;) for
every real r1urr1ber x . The first three lir1es of Equation (3 .8 ) give the function for
the v alues of X associated with nor1zero probabilities: x; = 0, x = 1 and x; = 2. The
final lir1e is necessary to specify the ft1nction at all other nt1rnbers. Although it rnay
look sill}' to see "Px( x;) = 0 otherwise" included ir1 rr1ost forrr1 l1las for a P 1!{F, it is
an essential part of the PMF. It is 11elpful to keep t11is part of the definition in m ind
vvhen vvorking with the P11F. However , in the bar plot and table representatior1s
of the PNIF, it is ur1derstood that Px(x;) is zero except for those va1t1es x explicitly
The PNIF contains all of our ir1formation abot1t the random variable X. Because
Px( x) is the probability of the event { X = x}, Px( x;) has a nt1rr1ber of irr1portant
properties. T11e following theorern applies the three axioms of probability to discrete
randorr1 variables.

- - - Theorem 3.l ---==::-

For a discrete ran,dorn variable X 1JJith P MF Px( x) arid rarige S x:
(a) For an,y x;, Px(x) > 0.
{b) l::i:ESx Px( ;i;) = 1.
( c) For an,y even,t B CS x , th e probabil'i ty that X is in, the set B is

P [B] = L Px(x).
xE B

Proof All t hree properties are consequences of t h e axio1ns of probability (Section 1.3).


F irst, Px(x) > 0 since Px(x) = P[X = :::r] . Next, v.;e observe t hat every outcomes ES is
associated 'vit h a num ber x E Sx . Therefor e, P [:::r E Sx ] = L: xESx Px(x) = P [s E SJ =
P[S] = 1. Since t he events {X = x} and {X = y} ar e m utually exclusive when :::i; -=f. y, B
can be wr itten as t he union of mut ually exclusive events B = UxEB{ X = x }. Thus we
can use Axiom 3 (if B is countably infinite) or Theorem 1. 3 (if B is finite) to write

xE B x EB

Quiz 3.2
The ra ndorn variable JV 11as P 1!lF

c/ n, ri = 1, 2, 3,
0 otherwise.

(a) T h e value of t h e constant c (b) P [JV = 1)
(c) P[N > 2) (d) P[N > 3)

3.3 Families of Discrete Random Variables

In a pplications of p robability, rr1any experirnen ts h ave simila r prob-

a bility rnass functions . I r1 a famil}' of r a ndorn varia bles, the P11IF s
of t h e ra ndorr1 varia bles h ave t he sarr1e m atherr1a t ical form , differ-
ing only in t h e vaJt1es of one or t pa r a rnet ers .

Thus far in our d iscussior1 of ra ndorn varia bles we h ave described how each ra ndorr1
varia ble is related t o the o t1tcom es of a n experirr1en t . We h ave also introd uced t he
probab ilit}' rnass f\1r1ctior1, w hich contair1s t h e probabilit y rr1odel of the experirnen t .
I n pract ical a pplications, certain farr1ilies of ra n dom variables a ppear over and over
again in rr1an y experirr1en ts. In eacl1 fa rnily, the probability rnass functions of all t h e
r a ndorr1 variables h a:ve the sarr1e m atherr1atical forrri. The}' differ onl}' in t 11e valt1es
of one or two par am eters . T11is er1ables t1s to stt1d}' in advar1ce eacl1}' of ra ndorr1
variables a rid lat er appl}' t 11e knovvledge vie gain t o specific practical a pplications . In
this section , we define six fa milies of d iscret e ra ndorn v aria bles. T 11ere is one form11la
for the PNIF of all the ra n d orr1 var iables in a fa rr1il}' . Dependin g on t h e fa rnily, the
P 1!lF formula con t ains one or t vvo pa ra rr1et ers . By assigning n11rr1erica.l values to t he
p a ra rr1eters , vve obtain a specific r a r1dorn variab le . O ur r1ornen clature for a fa m ily
consists of the fa rr1ily n a rr1e folloV\red by or1e or t p a ra m et ers in pa rentheses . F or
ex a rr1ple, bin,ornial (ri , p) refers in gener al to the fa rr1ily of b inorr1ial randorn variables.


Bin,ornial (7 , 0.1 ) refers to the biriorriial randorri variable wit.h parameters n, = 7 arid
p = 0. 1. Appendix A summarizes irnporta nt properties of 17 families of r andorn

===- Example 3.6----===


Consider t he fo llowing experiments:

Flip a coin and let it land o n a tab le. Observe whether the side faci ng up is heads
or tai ls. Let X be the number of heads observed.
Select a student at random and find out her telephone number. Let X = 0 if the
last digit is even. Otherwise , let X = 1.
Observe one bit transmitted by a modem t hat is down loading a fil e from the
Internet. Let X be the va lue of the bit ( 0 or 1).
Al l three experiments lead to the probability mass function

1/2 x; = 0,
Px(x) = 1/2 x; = 1, (3. 11)
0 otherwise.

Because all three experirrierits lead to the sarne probabilit:y rnass funct iori, t.hey can
all be arial}rzed the sarrie vvay. T he P l\/IF iri Exarnple 3.6 is a merriber of the farriily
of Bern,o'ulli randorn varia,bles.

Definition 3.4 Bernoulli (rJ) Random Variable

X is a Bernoulli (IJ) ran,dorn variable if the PMF of X has the fo rrn

Px(x) = p .X =l '
0 other'tlJ'ise,
1JJhere the pararneter '[J is in, the ran,ge 0 < '[J < 1.

l\/Iariy practical applications of probability produc,e seqt1ential experirrients with

independent trials iri v.rhich each st1bexpe1irnen t h as possible outcomes. A
Bernot1lli PMF represerits the probability rnodel for each st1bexperirnent . \e refer
to subexperirrients witli two possible outcorries as Berrio'ulli trials.
Iri the follovving exarriples, vie refer to tests of integr ated circuits v.rith tvvo pos-
sible outcorries: accept (a,) arid reject (r). Each test in a sequence of t ests is a n
iridependent trial vvith probabilit}' p of a r eject. Depending ori the observation , se-
qt1ential experiments with Berrioulli trials liave probabilit}' rriodels represerited by
Bern,o'tJ,lli, biriornial, geornetric, and Pascal random variables . Otlier experirrients
produce discrete un,iforrn randorri variables and Poisson, randorn v ariables . These
six farriilies of randorri variables occur often in practical applications.

c:::== Example 3. 7
Test one circu it and observe X, the number of rejects. What is Px(x) the PMF of
random variable X?


Because there are on ly two outcomes in the sample space, X = 1 with probability p
and X = 0 w ith probabi lity 1 - p ,

1- p x= o.I

Px (x) = p .x = l ) (3.12)
0 otherwise.

T herefore, the number of circuits rejected in one test is a Bernoulli (r>) random variable.

Exa mple 3 .8
If there is a 0.2 probab ility of a reject, the PM F of the Bernoul li (0 .2) random variable

Px(x) 0.8 x = 0,
Px(x;) = 0.2 x = 1, (3.13)
I 0 otherwise.
-1 0 1 2 x

i::::::== Example3 .9
In a sequence of independent tests of integrated circuits , each circuit is rejected with
probability r>. Let Y equal the number of tests up to and including the first test that
results in a reject. What is the PMF of Y?

The procedure is to keep testing circu its until a reject appears. Using a to denote an
accepted circuit and r to denote a reject, the tree is

r Y= l r Y= 2 p

a ......__ _ __
a ...

From the tree, we see that P [Y = 1) = r> . P[Y = 2) = p(l - p), P['Y = 3) = r>(l - r>) 2 ,
and, in general , P [Y = y) = r>( l - p)Y- 1 . Therefore,

p( l - r>)y-1 y = 1) 2, ...
Py(y) = (3.14)
0 otherwise.

Y is referred to as a geometric random variable because the probabi lities in the PMF
constitute a geometric series.

In general, the number of Berr1oulli trials that t ake place until t he first observation
of one of t11e two outcorr1es is a geornetric rar1dom variable.


Definition 3.5 Geometric (p ) Random Variable

X is a geomet ric (p) ran,dorn variable if the P M .F of X has the f orrn

r>( 1 - pr1;-1 x; = 1, 2, ...

Px(;x;) =
0 other111ise.

111here the pararneter I> 'is iri the ran,ge 0 < I> < 1.

Example 3.10
If the re is a 0.2 probability of a reject, t he PM F of the geo metric (0 .2) random variable
Py(y) (0.2)(0.8)y-l y = 1, 2, ...
0.1 Py(y) =
0 ot herwise.
1111 . ...
0 IO 20 y

Example 3.11- - -

1n a sequence of n, independent tests of integrated circu its, each circuit is rejected with
probability J> . Let K equa l t he nu mber of rejects in the n, tests. Find the PM F P1<(k) .
Ado pting the vocabu lary of Sect ion 2.3, we ca ll each discovery of a defective circu it
a success, a nd eac h test is an independe nt tria l with success probability J> . T he event
K = k corresponds to k successes in ri trials. We refe r to T heo rem 2.8 to determ ine
that the PMF of J( is

(3 .15)

K is an example of a binomial random variable.

vVe do not state t he values of k for vvhich PK(k) = 0 in Equation (3 .1 5) becat1se

( ~) = 0 for k ti {0, 1, . . . , r1,} .

Definition 3.6 Binomial (ri,, p) Random Variable

X is a binomial (ri, I>) ran,dorn variable if the P MF of X has the f orm,

Px (x) = ('"x.'i)px(l - r>)n-:i:

111here 0 < p < 1 an,d n, is an, in,teger S'IJ,Ch that n, > 1.

vVhenever vve h ave a seql1ence of n, indepen dent Bernot1lli t rials each v.rit h success
probabilit}' p, the nurnber of Sl1ccesses is a binornial randorr1 variable. Note t.hat a
Bernoulli random variable is a binorr1ial randorn variable vvit h 'n = 1.


Example 3.12
If there is a 0.2 probabil ity of a reject a nd we perform 10 test s, the PM F of the bino mia l
(10,0.2) rando m variable is

(3 .16)

() 5 10 k

Example 3.13
Perform independen t t ests of integrated circuits in which each circuit is rejected with
probabil ity p . Observe L , the number of tests performed until t here are k rejects . What
is the PM F of L ?
For large va lues of k , it is not practical to draw the tree. In this case, L = l if a nd on ly
if there are k - 1 successes in the first l - 1 t ria ls and there is a success on tria l l so
t hat

P [L = l] = P k - 1 reject s in l - 1 attempts, reject on attempt l (3 .17)


The events A a nd B are independent since t he outcome of atte mpt l is not affect ed
by the previous l - 1 attempts. Note that P[A] is t he binomia l proba bil ity of k - 1
successes (i. e. , rejects) in l - 1 tria ls so that

P [A] = (zk -- 1)Pk-1(

1 - r>)~-1-(k- 1) (3 .18)

Finally, since P [B] = p,

PL(l) = p [A] p [B] = (~ ~ ~)rl(l - r>)l-k (3.19)

L is an examp le of a Pascal random variab le.

Definition 3. 7 Pascal (k, p) Random Variable

X is a Pascal (k, p) ran,dorn variable if the P MF of X has the forrn

1JJhere 0 < p < 1 an,d k is an, in,teger sv,ch that k > 1 .

In general, the r1l1rr1ber of Be1noulli t rials that take place unt il one of t he t\vo
outcornes is observed k t imes is a Pascal randorr1 variable. For a P ascal ( k, p)


r andorr1 variable X , Px(x) is nonzero only for x = k, k + 1, .... Definit ion 3. 7 does
not state t h e valt1es of k for which Px(x;) = 0 because in Defir1ition 3.6 we h a;ve
(~) = 0 for x tf. {O, 1, ... , ri} . Also note t h at t 11e P ascal (l ,JJ) r ar1dom v ariable is
t he geometr ic (p) randorr1 variable.

Example 3.14
If t here is a 0.2 probabi lity of a reject and we seek fou r defective circuits , the ra ndom
variable L is the number of tests necessary to find the four circu its . T he PMF of the
Pascal( 4,0.2 ) random variable is
0.1 ~-------


0 ......-.....................................................................
0 20 40 l

c::::== Example 3.15

In an experiment with equiprobable outcomes, the random variable N has the range
SN = { k, k + 1, k + 2, , l}, where k a nd l are integers with k < l . T he range
conta ins l - k + 1 numbers, each with probabil ity 1/(Z - k + 1) . Th erefore, the P MF
of JV is

l /(l - k+ l ) T/, = k,k + l ,k +2 , ... ,l

0 otherwise

JV is an examp le of a discrete uniform random variable.

Definition 3 .8 Discrete Uniform ( k , l) Random Variable

X is a discret e unif orm (k, l) ran,dorn variable if the PMF of X has the form,

l /(l - k +l ) x = k,k+ l ,k +2 , ... , l

Px(x) =
0 other'tlJ'ise

1JJhere the pararneters k an,d l are in,tegers s'uch that k < l.

T o describe this discret e 11r1iforrn randorr1 variable, vve use the expressior1 "X is
uniformly distribt1t ed betvveer1 k and l ."

Example 3.16
Roll a fair die. T he random variable N is the number of spots on the s ide fac ing up .
T herefore, JV is a discrete un iform (1, 6) random variable with PMF


0.2 , . . . - - - - - - -.....

1/ 6 n, = 1, 2, 3, 4, 5,6,
(3.21 )
0 .__..___..___.___.___,___..___, 0 otherwise.
0 5 T/,

The prob ability rr1od el of a Poisson r andorn var ia ble d escribes p l1enornen a t h at
occur r andornly in t ime. W hile t 11e t ime of each occurrence is complet ely randorn,
there is a k r1ovvn aver age nt1rr1ber of occurrences per unit time . The Poisson rnodel
is l1sed \videly ir1 rr1an y fields . For example , the arrival of inforrnatior1 r equest s at
a \ orld '\ ide \ eb ser\rer , t he init iation of t elephone calls , and the err1ission of
p art icles frorr1 a radioactive source are often rr1odeled as P oisson randorn varia bles.
vVe will r ett1rn t o Poisson randorr1 variables rnan y times in this t ext. At t his point ,
"''e cor1sider onl:y the basic properties .

===- Definition 3.91----==:.....i Poisson (a) Random Variable

X is a Poisson (a) r aridorn variable if the P MF of X has the fo rm,

a xe - a./ x ! x = o, 1 , 2, . . . ,
Px (:i; ) =
0 other'tuise,

1JJhere the pararneter a 'is iri the rarig e a > 0.

T o d escribe a P oisson r ar1dom \rariable, \Ve \vill call the occurrence of t 11e pl1e-
nornenon of interest ar1 arrival. A P oisson rr1od el often specifies an aver age r at e,
,.\ arrivals per second, and a tirne ir1terval, T seconds . In this tirr1e interval , t h e
n11rnber of arri\rals X has a Poisson P l\/IF V1rith a = ,.\T.

i::::::== Ex a m p Ie 3 . l ri___,;;:::::::::11-
Th e numbe r of hits at a website in any t ime in terva l is a Poisso n random variab le . A
particular site has on average,.\= 2 hits per second. W hat is the probability t hat there
are no hits in an interva l of 0.25 seconds? What is the probability that there are no
more than two hits in an interva l of one second?

In a n interval of 0.25 seconds, t he number of hits His a Po isson ra ndo m va ri able with
a = ,.\T = (2 hits/ s) x (0.25 s) = 0.5 hits. Th e PM F of n is
0.5he- 0 5 / h! fi = 0, 1, 2, ...
0 otherwise.
0 2 4 h
The probab ility of no hits is

P [H = O] = PH (0) = (0.5) 0 e- 0 5 / O! = 0.607. (3.22)



In an interva l of 1 second, a = >..T = (2 hits/ s) x (1 s) = 2 hits. Letting.] de note the

number of hits in one seco nd , the PMF of.] is

P.J(j) 0.2 2 /j! j =0, 1, 2, ...
0 otherwise.
0 2 4 6 8 J
To fi nd the probability of no more t han two hits, we note t hat

{J < 2} = {.J = O} u {J = 1} u {J. = 2} (3 .23)

is the union of three mutua lly exclusive events . Therefore,

P [J. < 2] = P [.J = O] + P [.J = 1] + P [.J = 2]

= PJ (0) + PJ (1) + PJ (2)
= e - 2 + 2 1 e- 2 / 1! + 22 e- 2 / 2! = 0.677. (3 .24)

i::::::== Example 3.18

The number of database queries processed by a computer in any 10-second interval is
a Po isson random variable, K, wit h cv. = 5 queri es. What is the probabi lity that there
w i ll be no queries processed in a 10-second interva l? What is the probability t hat at
least two queries will be processed in a 2-second interval?

The PMF of J( is

5ke- 5 / k! k = 0, 1, 2, ...
0 otherwise.
I II .
0 5 10 15 k
Therefore, P[I< = O] = P1<(0) = e- 5 = 0.0067 . To answer t he questi o n about t he
2-second interva l, we note in t he prob lem definition that a = 5 queries = >..T with
T = 10 seconds. Therefore, >.. = 0.5 queries per second. If N is t he number of queries
processed in a 2-second interval , a = 2>.. = 1 and N is the Poisson (1) random variable
w ith PMF

e- 1 /ri! ri = 0 , 1, 2, ...
0 otherwise.


P [N > 2] = 1 - P1v(O) - PN(l) = 1 - e- 1 - e- 1 = 0.264. (3 .26)

Note t h at the units of >.. arid T have to be consistent . Ir1stead of >.. = 0.5 ql1eries
per second for T = 10 seconds, could t1se >.. = 30 ql1eries per rr1int1te for the tirne


interval T = 1/6 rninl1tes t o obtain the sarr1e o~ = 5 qlleries , and therefore t he sarne
probabilit}' rr1odel.
In t he follovving exarr1ples, vve see that for a fixed rate ,\ , the shape of the P oisson
P MF depends on the dl1r c.ition T over w11ich arrivals are counted.

Example 3.19
Calls arrive at ra ndom t imes at a te lep hone switching office with an average of,\ = 0.25
ca lls/ second. T he PM F of the numbe r of ca lls that arrive in a T = 2-second int erva l
is the Poisson (0.5) ra ndom variab le with PM F
1- - - - - --
P.J(j) (o.5).i e- 0 5 I j ! j = o, 1, ... ,
0 ot herwise.
o------- 0 2 J 4
Note that we obtain the same PMF if we define the arriva l rate as ,\ = 60 0.25 = 15
calls per minute a nd derive the PMF of t he nu mber of ca lls t hat arrive in2 / 60 = 1/ 30

Example 3.20
Calls arrive at random t imes at a te lep hone switch ing office with an average of,\ = 0.25
ca lls per second . T he PM F of t he nu mber of ca lls that arrive in any T = 20-second
interval is the Poisson (5) random variab le with P MF
P.J(j) 5 / j ! j = 0, 1, ... ,
0.1 p.J (j) =
0 otherwise.

0 5 10 15 J

Quiz 3.3
E ach t irne a rnodern trar1srnits on e b it, t h e receivir1g rr1od err1 a n al}rzes t h e sign al
that arrives a nd d ecides \vhet11er the t r ansmit t ed b it is 0 or 1. It rr1akes a n error
vvith probability p, independent of whet11er an}' ot11er b it is received correctly .

(a) If t he t ransmission cont inues until t 11e r eceiving rr1odem rnakes its first error ,
wh at is the P JVIF of X , the nl1rr1ber of b its t ransrr1itted?
(b) If IJ = 0.1 , what is t he probability t 11at X = 10? \Vhat is the probability t hat
x > 10?
(c) If t he rnoderr1 t ransmits 100 bits, what is the PMF of Y , t he nl1mber of errors?
( d ) If [ J = 0.01 and t he rnoderr1trar1srnits 100 bits , wh at is t he probability of Y = 2
errors at the recei\rer ? \Vh at is t he probability that y < 2?
( e) If the t ransmission contir1ues until t he r ecei\ring modem rr1akes t hree errors,
'\vh at is the P JVIF of Z , t 11e nurnber of bits trar1srnit t ed ?


( f) If '[J = 0.25 , vvhat is t11e probabilit}' of Z = 12 bit s transmitted t1ntil the moderr1
rnakes three errors?

3.4 Cumulative Distribution Function (CDF)

Like the P}v1F, the CDF of random variable X expresses the prob-
ability rr1odel of a n experiment as a rnatherr1atical funct ion. The
function is the probability P [X < 1'; ] for every nurr1ber x .
The PNIF and CDF are closely relat ed. Eac11 can be obtained easil}' frorr1 the ot11er.

===-- Definition 3.10 Cumulative Distribution Function (CDF)

The cumulativ e distribution fun ction (GDF) of r an,dorn variable X is

Fx(x;) = P [X < 1';].

For any real r1urnber x, t11e CDF is the proba bility t h at t11e randorr1 varia ble X
is no larger than x . All randorr1 variables h ave c11mulative distribution fur1ct ions,
b11t onl}' discret e randorn v ariables ha;ve probabilit}' rr1ass functior1s . The notat ion
convention for the CDF follovvs that of the PNIF , except that vve use the letter F
vvith a subscript corresponding t o the narr1e of the randorr1 variable. Because F x( x)
describes t11e probabilit y of an event , the CDF h as a nurnber of properties.

~--- Theorem 3.2

For ariy discrete raridorn variable X v1ith ran,ge Bx = {x; 1 , x2, .. .} satisfyin,g 1'; 1 <
X2 < . .. ,
(a) Fx(- oo ) = 0 an,d Fx(oo ) = 1.
{b) For all 1';' > x, Fx(x' ) > Fx(x; ).
(c) For xi E Bx arid E, ari arbitrarily srnall positive 'nurnber,

{d) Fx(x ) = Fx(x;i ) for all x s'uch that x;i < x < x;,i+I

Each proper ty of Theorerr1 3.2 has ar1 equivalent stat err1er1t ir1 v.rords:
(a) Going from left to right on the x-axis, Fx(x ) st a rts at zero a rid er1ds at or1e.
(b ) The CDF never decreases as it goes from left to right.
(c) For a discret e randorr1 variable X , t here is a jurnp (discor1t inuity ) at each
value of xi E Bx . T11e heigh t of the jt1rnp at x,i: is Px (xi )


( d) B et\veen jumps, the graph of the CDF of the discrete randorr1 variable X is a
horizontal line.
Another irnportant cor1seql1er1ce of the definition of t11e CDF is that the differ-
ence betvveen the CDF evah1at ed at two points is t11e probability that the randorn
variable takes on a vall1e b et\veen these tvvo poir1ts:

Theorem 3.3
For all b > a, 7
Fx(b) - Fx(a) = P [a< X < b].

< X < b} as a part of a union

Proof To prove t his t heorem , express the event Ec,ib = {a
of inutually exclusive events. Start with t he event Eb = {X < b}. Note t hat Eb can be
\Vritten as t he union

Eb = {X < b} = {X < a} U {a < X < b} = Ec,i U Eab (3.27)

Note also that E a and Eab are mutually exclusive so t hat I>[ Eb] = I> [Ea] + P [Eab ] Since
P[Eb] = Fx(b) and P[Ea ] = Fx(a), \Ve can \Vrite Fx(b) = Fx(a)+P[a < X < b]. Therefore,
P[a < X < b] = Fx(b) - Fx(a) .

In vvorking with the CDF, it is necessar}' to pay car eful attent ion to the n ature
of ineqt1alities, strict ( <) or loose ( <) . The defir1ition of the CDF contair1s a loose
(less thar1 or equal to) inequalit}' , which mear1s t11at the ft1nction is cont inuous from
the right. To sket ch a CDF of a discrete r andom variable, \ve dra\v a graph \vith
the vertical va1t1e begir1r1ing at zero at the left end of t he horizor1tal axis (r1egati\re
nt1rr1bers witl1 large magnitude) . It remair1s zero until x 1 , the first value of x witl1
nonzero probability. The graphjurr1ps by an arr1ount Px(x,i) at each ;,r;i \vith nor1zero
probabilit}' We draw t11e graph of t11e CDF as a st aircase with j11rr1ps at each Xi witl1
nonzero probability. The CDF is the upper value of e\rery jump in t11e staircase .

Example 3.21
In Example 3.5, ra ndom variable X has PMF

1/ 4 x = O
Px(x) )

1/ 2 x = 1,
Px(x;) = (3.28)
0 .....___.__.......__...._..... 1/ 4 x = 2,
-1 0 1 2 3 x 0 otherwise.
Fi nd and sketch the CDF of random variable X.
Referring to the P MF Px(x;) , we derive the CD F of ra ndom variable X:
1 0 x < 0,
Fx(x;) 1/4 0 < x < 1.
0.5 Fx (x;) = P [X < x;] = - I

3/4 1 <x< 2
-1 0 1 2 3 x 1 x > 2.


Kee p in mind that at the d iscont in uit ies x; = 0, x = 1 and x = 2, t he va lues of Fx(x)
a re the upper va lues: Fx(O) = 1/4, Fx( l ) = 3/ 4 a nd Fx( 2) = l . Math texts ca ll this
t he right hand limit of Fx(x) .

Consider any finit e rar1dorr1 var iable X vvit h all elern ents of Sx betvveen x;rriin
and x;rnax For t his r andom variable, t he nurr1erica.l specification of t he CDF begins
vvit h

Fx (x) = 0, X < Xrnin,

and ends with

Fx (x) = 1,
Like tl1e statement "Px(x;) = 0 otherwise," the descript ion of t he CD F is incorr1plete
vvithout t11ese two statem ents. The next exam ple disp la}'S t he CD F of an infinite
discrete random variable.

- - - Exa mple 3 . 2 2~--

l n Examp le 3.9, let the probabil ity t hat a circu it is rejected equa l p = 1/ 4 . T he PM F
of Y, t he number of t ests up to a nd inc lud ing the first reject, is the geomet ric (1/ 4)
rando m variable with PM F

(1/ 4)(3/ 4)Y-l y = 1, 2, ...

Py(y) = (3.29)
0 otherwise.

What is the CD F of y7

Random variab le y has nonzero probab ilit ies for a ll positive integers. For any integer
n, > 1, the CDF is

Fy (n) = ~ Py(j) = .~ ~ (~);-i (3.30)

Eq uat ion (3 .30) is a geometric series. Fam ilia rity with t he geometric series is essen-
tial for ca lcu lating probabi lities invo lving geometric rando m variables. Appendix B
su mmarizes t he most important facts. In part icu lar, Math Fact B.4 imp lies (1 -
x) ~~ 1 x;.i- l = 1 - xn . Substituting x; = 3/4, we obta in

Fy(ri) = 1- (~ ) ''. (3.31 )

The complet e expression for the CD F of y must show Fy(y) for a ll integer and nonin-
teger va lues of y. For an in teger-valued ra ndom va riable Y , we can do t his in a sim ple
way using the floor function lYJ , wh ich is the largest intege r less than or equal to y . In
pa rt icu la r, if n, < y < n, - 1 for some integer ri, then lYJ = n, and

Fy (y) = P ['Y < y] = P (Y < n,] = Fy (n,) = F y ( lYJ) . (3.32)



In terms of t he floor function , we can express the CDF of Y as

0.5 0 y < 1,
Fy(y ) = (3.33)
o........._____..... 1 - (3I 4) LYJ y > 1.
0 5 10 y
T o find t he probability t hat Y takes a va lue in the set {4, 5, 6, 7, 8}, we refer to Theo-
rem 3.3 and compute

P [3 < y < 8] = F y (8) - Fy (3) = (3/ 4)3 - (3/ 4)8 = 0.322. (3.34)

Quiz 3.4
Use t he CDF Fy(y ) t o find t:he followir1g probabilities:
0.8 I (a) P[Y < 1] (b) P[Y < 1]
Fy(y) 0.6 I
(c) P [Y > 2] (d) P[Y > 2]
0.2 (e) P [Y = 1] (f) P['Y = 3]
0 1 2 3 4 5 y

3.5 Averages and Expected Value

An average is a nurnber tha t describes a set of experirnen tal ob-

servat ions. The expected value is a number that describes the
probability model of an experiment.
The aver age valt1e of a set of ri nt1rr1bers is a statistic of the t he set of r1urr1bers.
The avera ge is a single r111mber t11at describes the entire set . Statisticia ns v.rork
vvith severa l kinds of aver ages. The ones t 11at a re t1sed t 11e rnost are t 11e rnean,, the
rnedian,, and the rnode.
T11e rr1ean value of n, nurnbers is the s urr1 of t he n, r1urnbers divided b}' ri. An
exarr1ple is the rr1ean vah1e of the r1urnerical grades of t 11e students taking a rr1id-
t erm exarri. The rnean ir1dicates the perforrr1a nce of the entire class . T11e m edian
is anot her statistic that d escribes a set of nurnbers.
T11e median is a nurnber ir1 t 11e middle of a data set . There is ar1 equal nurnber
of d at a iterns belovv t he rr1ediar1 arid above the rr1edian.
A third aver age is t he rr1ode of a set of r1urnbers. T11e mode is the rr1ost comrnon
nt1rr1ber in the set. There are as many or rr1ore nurnbers "\vith that value thar1 an}'
other val11e. If t here are two or rnore numbers vvith t his property, the set of nurr1bers
is called rn11,ltirnodal.


===- Exam p Ie 3. 2 3:---==::::::1

For one quiz, 10 students have the following grades (on a sca le of 0 to 10):


Fi nd the mean, t he media n, and the mode.

T he sum of the ten grades is 68 . Th e mean value is 68/ 10 = 6.8. T he median is 7,

because t here are four grades below 7 and four grades above 7. T he mode is 5, because
three students have a grade of 5, more than the number of students who received any
other grade.

Exarr1ple 3.23 arid the preceding comrnents on aver ages apply to a set of r1urn-
bers observed ir1 a practical situation. T he probability rnodels of randorn v ar iables
characterize experirner1ts -vvith r1urr1erica.1 outcornes, and in practical applications
of probability, we assum e that the probabilit}' rnodels are related to t11e nl1mbers
observed in practice. Just as a statistic describes a set of r1urnbers observed in
practice, a pararneter desc1ibes a probabilit}' rnodel. Each pararr1eter is a r1urr1ber
that can be cornputed from t11e P l\/IF or CDF of a r andorr1 v ariable . \ hen "' use a
probabilit}' rnodel of a randorn variable to represent an application t11at resl1lts in a
set of numbers , the expected valv,e of the r andorn variable corresponds to the rr1ean
value of the set of nl1rnbers. Expected v alues appear thro11ghout the remainder of
this textbook. T notatio ns for the expected value of rar1dom variable X ar e E [X )
and 11,x .
Correspor1ding to the ot11er t wo averages, h ave the follo-vvir1g definitions:

Definition 3.11 Mode

A mode of raridorn variable X is a rl/Urnber ~Drnod satisfyirig Px( Xrnod) > Px(i;) for
all i ; .

Definition 3.12 Median

A median ) XrnodJ of ra:ndorn variable X is a rl/LJ,rnber that satisfies

P [X < X rned) > 1/ 2, P [X > X rned) > 1/ 2.

Neither the rnode nor the rnedian, of a randorr1 variable X is necessarily unique.
There ar e random v ar iables that 11ave several rnodes or m edians.

- - - Definition 3.13:- - - Expected Value

The expect ed v alue of X is

E [X ) = ,x = L xPx(i;) .


Ex1Jectation, is a S}rnor1}rrn for expected vall1e. Sometirnes the t erm rnean, valv,e is
also used as a synon}rm for expected value. \e prefer to tlse rnean value to refer
to a stat'istic of a set of experirnen tal d ata (the surr1 divided by t he number of data
iterns) t o distinguish it frorn expected v alue, "'' hi ch is a pararneter of a probabilit}'
rnodel. If you recall your studies of rr1echanics , t he forrn of Definition 3.13 m ay
look familiar. Think of point m asses or1 a line "''ith a rr1ass of Px(x) kilogr ams at
a distance of x m eters from the origin. In t11is rnodel, ,x in Definit ion 3.13 is t he
center of rnass. This is "''h Y Px(x) is called probability rnass function.

- - - Example 3.24:---=""'
Random variable X in Example 3.5 has PMF

1/ 4 x= O
Px(x) )

1/ 2 x = 1,
Px (x;) = (3.36)
0 ...___.__....___.._.... 1/ 4 x = 2,
-1 0 1 2 3 x 0 otherwise.
What is E [X ]?

E [X ] = 11,x = 0 Px (0) + 1 Px (1) + 2 Px (2)

= 0(1/ 4) + 1(1/2) + 2(1/4) = 1. (3 .37)

To t1nderstand hovv this defir1ition of expect ed v alue corresponds to the notion

of adding up a set of rneas.l1rernents, Sllppose vie h ave an experirr1er1t t hat produces
a rar1dom v ariable X arid we perforrr1 'n indeper1dent t rials of this experiment. vVe
denote t 11e value that X takes on the 'i t11 t rial b}' x(i) . vVe say that x;(l ), ... , x('n)
is a set of n, sample vall1es of X. vve h ave, after n, t rials of the experiment , t he
sarnple average
1 n
rn,n =- ~ x;(i) . (3 .38)
ri ~
i= l

Each x(i) takes values in the set S x . Out of t11e ri t rials, ass11rr1e that eac11 x ESx
occurs N:i: t irr1es . Then the surr1 (3.38) becorr1es

(3 .39)

Recall our discussion in Section 1.3 of the relative frequen cy interpretation of

probability. There vve poiinted out t hat if in n, observations of an experirr1er1t , the
event A occl1rs NA t imes, \rve can interpret the probability of A as
P [A] = lirr1 . (3 .40)
n-700 n,


JVA/ri is the relative frequenC}' of A . In the not ation of randorn variables, ha;ve
the corresponding obser vation t 11at

. N~1;
Px ( x) = 11m - . (3.41)
n-+oo ri,

Frorn Equation (3.39) , this Sl1ggest s t hat

lirr1 rn,n =
L x ( lirr1
n-+oo 'T/,
!!2:_) = L :i;Px (:.r; ) = E [X ] . (3. 42)
x ESx x ESx

E quation (3.42) says that t he definit ior1 of E [X ] correspor1ds t o a rr1odel of doing

the sarr1e exper irnent repeat edly. After each trial, add up all t he observat ions to
date and d ivide b}' t he number of trials. \' e prove ir1 Chapter 10 that the r esult
approaches the expect ed value as t he nurnber of trials increases without lirnit. '\ e
can use Definition 3.13 t o d erive the expected value of each farnily of r andorn
variables defined ir1 Section 3.3.

Theorem 3.4.---===
T he B ern,oulli (rJ) r an,dorn variable X h as expect ed 1;alv,e E[X ] = p .

Proof E[X ) = 0 Px(O) + l Px( l ) = 0 (1 - p) + l (p) = p.

Theorem 3.5
T he geornetric (p) r a/ndorn, 1;ar~i able X has expected value E [X] = l / '[J.

Proof Let q = 1 - p . The P lv1F of X becomes

pqx- 1 x= l , 2, ...
Px (x) = {0

The expected value E[X ) is t he infinite sum

00 00

x =l x= l

A pp lying t he iden t it y of Math Fact B. 7, we have

00 00

E [x ] = P '"'"'
L xq
x- l
= qP '"'"' x P q P 1
L :i;q = q 1 _ q2 = p-2 = p. (3.45)
x =l x =l

This restllt is intuit ive if }' Oll recall the ir1tegrat ed circtlit t esting experirnents
and consider sorne nurnerical values. If the probability of rejecting an integrated
circuit is '[J = 1/ 5, then or1 aver age, }' OU have t o perforrn E[.Y ] = l / p = 5 t ests tlntil


you observe t11e first reject. If r> = 1/ 10, the average r1urnber of tests llnt il the first
reject is E[Y] = l /p = 10.

Theorem 3.6
The Poisson, (a) ran,dom, variable in, Defin,ition, 3. 9 has expected val'LJ,e E[X] = o~ .



V/e observe that x/1;! = l /(1; - 1) ! and also t hat the x = 0 term in t he sum is zero. In
addition, 've substitute ax =ex nx - l to fact or ex from t he sum to obtain


Next 've substitute l = x - 1, 'vith the result

00 l
E[X] = Q L fie - a= Q. (3.48)
- -......,,...--_./

\'\f e can conclude that the sum in this fo r mula equals 1 either b y referring to t he identity
ea = I:~ 0 al/ l! or b y applying T heorem 3.1 (b) to the fact that the sum is the sum of the
P lVIF of a Poisson random var iable L over all values in SL and P [SL ] = 1.

Ir1 Section 3.3, -vve rnodeled t he nurnber of randorn arrivals in an interval of

dl1ration T by a P oissor1 randorn variable -vvith pararneter a = A.T. We referred to
A. as the average rate of ar rivals -vvith little j ustification. T11eorern 3.6 provides the
j ustification b:y shov.rir1g that A. = cY./T is t he expected r1urr1ber of arriva1s per unit
t irne.
The next theorern provides, -vvithout derivatior1s, the expected values of binomial,
P ascal, arid discrete ur1ifor rr1 randorr1 variables.

=== Theorem 3.7

(a} For the bin,ornial (n,, p) ran,dorn variable X of Defin,it'ion, 3. 6;

E [X] = n,p.

{b} For the Pascal (k ,p) ran,dorn var~iable X of Defin/it'io'n 3. 1;

E [X] = k/r>.
(c) For the discrete 'un,iforrn (k, l) ra/ndorn 1;ar~iable X of Defin,ition, 3.8,

E [X] = (k + l)/2.


In t he follovvir1g theorem , we shovv t h at the P oisson P 1!fF is a limiting case of

a binorr1ial PNIF vvhen t he n t1mber of Bernoulli t rials, 'n, grows vvithout lirnit but
t he exp ect ed n11mber of st1ccesses Tl/fJ rem ains constant at cv., t he expect ed value of
the Poisson PMF. Ir1 the t heorern, vve let a = >.T and divide t11e T -second interval
into ri tirne slots eac11 v.rith duration T /ri. In each slot , as r1, wit hout lirr1it
and t he durat ion, T /'n, of each slot get s sm aller and srnaller \Ve assurne t 11at there
is eit her one arrival, wit 11 probability p = >..T/r1, = a/11,, or there is no arrival in t he
t im e slot, wit 11 probabilit y 1 - '[J.

Theorem 3.8
P erfo rrn 11, B ernovili trials. Jn, each trial) let the probability of s'u,ccess be a/n,;
1JJhere a > 0 is a con,st an,t ar1,d n, > cv. . Let the ran,dorn variable I<n be the n,11,rnber
of s'u,ccesses in, then, trials. A s 11, ---+ oo) P1<n(k) co'nverges to the P MF of a Poisson,
(a) ran,dorn variable.

Proof We first note t hat Kn is t he binomial (n,,ari) r andom variable 'iVit h PMF

n,) (a/n) k(1 - a)n-k

PKn (k) = (k n . (3.49)

For k = 0 , ... , n ,, \Ve can wri te

( a)n-k
1- -
. (3.50)

Notice t hat in t he first fraction , t here are k terms in t he numer ator. The denom inator is
nk, a lso a product of k terms, all equal to n,. Therefore, we can express t his fr action as
t he product of k fractions, each of t he form ( n - j)/n,. As n --too, each of t hese fr act ions
approaches 1. Hence,

. n,(ri- l ) (n, - k + 1)
lnn k = 1. (3.5 1)
n-+oo n '

F urt hermore, we have

(3.5 2)

As n, gro'ivs 'ivithout bound, t he d enominator approaches 1 and, in t he numer ator, \Ve

recognize t he iden t ity limn-+oo( l - a/n,) 71 = e - a . P u tt ing t hese t hree limits together leads
us to t he result t hat fo r any in teger k > 0 ,

1llTI . (k)' -_
p Kn a k e - 0'./k'
,. k = 0 , 1, ... (3.53)
n -+ oo { 0 otherwise,

'ivhich is t he J:>oisson P MF.



==-- Quiz 3. 5,__...;;=:::i

In a pay-as-}rou go cellpl1one plan, t he cost of sending a n Sl\IIS t ext rr1essage is
10 cer1ts and t he cost of r eceiving a text is 5 cents . For a certain Sl1bscriber, the
probability of ser1ding a text is 1/3 a nd t11e probability of receivir1g a text is 2/3 .
Let C eqllal t11e cost (ir1 cents) of one text rr1essage and find
(a) The P 11!F Pc(c) (b) T11e expect ed vall1e E[C]
(c) The probability t hat the subscriber (d) The expected number of texts re-
receives four texts before sending ceived by the subscriber before
a text. t11e subscriber ser1ds a text.

3.6 Functions of a Random Variable

A funct ion y = g(X) of randorr1 variable X is ar1other rar1dom

variable. The PMF Py(y) can be derived from Px(x) and g(X).
Ir1 rnany practical situations, we obser ve sarnple v alues of a randorn v ariable and
llSe t11ese sarnple vallles to cornpute other quar1tities. One example that occurs
frequer1tly is an experirr1er1t in vvhich the procedure is to rnor1itor t11e data activity
of a cellular t elephor1e subscriber for a mont h and observe ;i; the t otal r1urr1ber of
rnegabytes sent and received. T11e telepl1one cornpar1y refers to the price plan of
the Sllbscriber arid calculates y dollars, the amol1nt to be paid by the subscriber.
If x is a sarnple vallle of :1 randorn variable X, Defir1it ior1 3.1 implies that y is a
sarnple vall1e of a randorr1 variable Y. Because vve obtain Y frorn a not her rar1dom
variable. we refer t o Y as a derived ran,dorn variable.
I .

Definition 3 .14 Derived Random Varia ble

Each sarnple valv,e y of a derived random variable y is a rnathernatical f'u'nction,
g( x) of a sarnple val'Ue ;i; of ariother ran,dorn variable X. We adopt the n,otation
Y = g( X) to describe the relat'i oriship of the t'tJJo ran,dorn variables.

Example 3.25
A pa rcel shipping com pany offers a charging pla n : $1.00 for t he f irst pound, $0.90
for the second po und, etc . , down to $0.60 for the f ifth pound, wit h rounding up for a
fraction of a pound. For all packages between 6 and 10 pounds, the shipper wi ll charge
$5.00 per package. ( It w i ll not accept shi pments over 10 pounds.) Find a function
Y = g(X) fo r the charge in cents for send ing one package .
When t he package weight is an integer X E {1, 2, ... , 10} that speci fi es t he number
of pounds with round ing up for a fract ion of a pound , the f unction

y = g(X) =
105X - 5X 2 x = 1, 2, 3,4,5
500 x = 6, 7, 8, 9, 10.


corresponds to the charging plan.

Ir1 this section we deter rn1ir1e t he probability rnodel of a derived randorr1 v ariable
frorn t11e probability rr1odel of t h e original randorr1 variable. \' e st art vvit h Px(x)
and a ft1nction Y = g(X ). We llSe this inforrnatior1 t o obtain Py(y) .
B efore "''e present the procedure for obtair1ing Py(y)) alert stu dents t o t he
different nature of the ft1n ctions Px(:i;) a rid g(:i;). Alt hot1gl1 the}' are both ft1n ctions
V1rith t 11e argurr1ent x) t hey are ent irely different. Px(x) describes the probability
rnodel of a ra ndorn variable. It h as t he special structt1re prescribed in Theorem 3.1.
On the other hand ) g(x) car1 be any function at all. W hen we cornbir1e Px(:i;) and
g(x) t o derive t he probability rnodel for Y ) we arrive at a P MF that also conforrns
t o Theorem 3 .1.
T o describe y in terrr1s of our basic rnodel of probability, vve specify an exp erirner1t
consisting of t he following procedure and observation:

Sample value of Y = g(X )

Perform an experiment and observe an outcome s .

From s , find x , the corresponding value of random variable X.
Observe y by calculating y = g(x) .
This proced11re rnaps each experirner1ta.1 outcorne t o a r1t1mber ) y, a sample val11e of
a rar1dom v ariable) Y. To derive Py(y ) frorn Px(:i;) and g(-), "''e consider all of t he
possible va.lt1es of x . For each ;i; E Bx ) we com pute y = g(:i;) . If g(x) trar1sforrns
different valt1es of x int o different values of y (g(x 1 ) =J g(x 2 ) if x 1 =/= ;i; 2 ) we sirr1pl}'


The sit11ation is a little rno re cornplicated vvhen g (:r;) t ransforrr1s several v alues of x
t o the sarne y . For each y E Sy ) "'irve ad d the probabilit ies of all of t11e values ;r; E Bx
for v.rhich g(:.c) = y . Theorern 3.9 applies ir1 gener al. It reduces t o E quation (3.55)
"''hen g( ;i;) is a one-t o-one transformation.

Theorem 3.9
F or a discre t e ran,dorn variable X ) t he P M F of Y = g(X ) 'ts

Py (y ) = L Px (x) .
'.J; :g(:i; )=y

If vve vieVI' X = x a.s t he 011tcome of an experirr1er1t) then Theorerr1 3.9 sa}'S t 11at
Py(y) is t 11e surr1 of the probabilit ies of all t he outcornes X = ;i; for V1rhich Y = y .

===- Exam p Ie 3. 2 6---==:::::a

In Example 3 .25, suppose all packages weigh 1, 2, 3, or 4 pounds w ith equal probabi lity .
Find the PMF and expected va lue of Y, the shipping charge for a package.


X= Y = l OO
.1 5
x Y = l 90
x Y = 270
x Y = 340
-x "' Y =400

.1 0 X= Y =500

Figure 3.1 T he d erived random variable Y = g(X) for Exan1ple 3.27.

From the problem statement, the weight X has PMF

1/4 x = 1, 2, 3, 4,
Px (;r:) = (3.56)
0 otherwise.

The charge for a shipment, y , has range Sy = {100, 190, 270, 340} corresponding to
Sx = {1 , ... , 4} . The experiment can be described by the following tree. Here each
value of Y derives from a unique va lue of X. Hence , we can use Equation (3.55) to
find Py(y) .

1/ 4 y = 100, 190, 270, 340,

X = l Y = l OO Py (y) =
0 otherwise.

The expected shipping bil l is

1/4 X =4 Y = 340 E ['Y] = _! (100 + 190 + 270 + 340)

= 225 cents.

Example 3.27
Suppose the probability model for the weight in poundsX of a package in Example 3.25

Px(x) 0.15 x = 1, 2, 3, 4,
0.1 Px (x; ) = 0.1 x =5 , 6, 7,8,
0 ................_ ........._____. 0 otherwise.
0 5 10 x
For the pricing plan given in Example 3.25, what is the PMF and expected value ofY,
the cost of shipping a package?

Now we have th ree values of X, specifically (6, 7,8), transformed by g(-) into y = 500 .


For t his situation we need t he more genera l view of the P MF of Y, given by Theorem 3.9.
In particular, Y6 = 500, and we have to add t he probabil it ies of the out comes X = 6,
X = 7, and X = 8 to find Py(500) . Th at is ,
Py (500) = Px (6) + Px (7) + Px (8) = 0.30. (3 .57)
T he steps in the procedure a re il lustrated in t he diagram of Figure 3.1. App lying
T heo rem 3.9, we have
0.15 y = 100, 190, 270, 340,
Py(y) 0.2
0.10 y = 400 ,
Py(y) =
0.30 y = 500 ,
o------ 100 270 500 y
0 otherwise.
For this proba bility model , the expected cost of sh ipping a package is

E [Y] = 0.15(100 + 190 + 270 + 340) + 0.10(400) + 0.30(500) = 325 cent s.

i::::== Example 3.28

T he amplitude V (volts) of a sinusoida l s ignal is a ra ndom va ri a ble with PM F
0.2 .-----------.

Pv(v) 1/ 7 v = - 3, - 2, ... ) 3,
0. 1 Pv(v) =
0 otherwise.
0 ....._...................................__.
-5 0 5 v
Let y = V 2 / 2 watts de note t he power of t he transmit ted signal. Fi nd Py(y) .

T he possible va Iues of Y are Sy = {0, 0.5 , 2, 4.5}. Since y = y when v = ./2f;

or v = - ./2f;, we see that Py(O) = Pv(O) = 1/ 7. For y = 0.5, 2, 4.5, Py(y) =
Pv( ./2'i;) + Pv( - ./2'i;) = 2/ 7. T herefore,

Py(y) 0.2 1/ 7 y = 0,
Py (y) = 2/ 7 y = 0.5 , 2, 4.5 , (3.58)
0 _,______....___ __.
0 otherwise.
0 1 2 3 4 5 y

Quiz 3.6
l\/Ionitor three Cl1storr1ers purchasing srnartphones at the Phonesrnart store and
observe whetl1er each b uys an Apricot phor1e for $450 or a Banana phone for $300.
The ra.ndorn variable N is the nurnber of customers purchasing an Apricot phone.
Assume N has PMF
0.4 T/, = 0
PN (n,) = 0.2 T/, = 1, 2, 3, (3.59)
lO otherwise.


M dollars is t11e arr1ol1nt of rr1oney p aid by t hree custorr1ers .

(a) Express Mas a fur1ction of l'l. (b) Fir1d P1w(rn) a nd E[M ].

3. 7 Expected Value of a Derived Random Variable

If Y = g(X), E[Y] can be calculat ed frorn Px( x) and g (X) v.rit hout
deriving Py(y ).
We en cour1ter rnar1:yr situations in v.rhich need t o kr1ovv only the expected value
of a derived rar1dom variab le rather t h an the er1tire prob ability rr1odel. Fortur1at ely,
to obtain t his average, it is not necessar:yr t o compute the P l\/IF or CDF of t he nevv
r andorr1 variable. Ir1stead , 'ive can use t he follo'iving proper ty of expect ed \ralues .

- - - Theorem 3.l rl===

Given, a ran,do rn variable X 'tvith P M F Px( x) an,d the derived ran,dorn 'Variable
Y = g(X )) the expecte d 'oalv,e of Y is

E [Y] = ,y = L g(x) Px (x;) .

~i; ESx

Proof F rom t he d efinit ion of E[Y ] and Theorem 3.9, we can 'ivrite

E [Y] = L yPy(y ) = L y L Px(x) = L L g(x) Px(x), (3.60)

y E Sy yESy x :g (x )= y y E Sy x :g (x) = y

'ivher e t he last d ouble summation follows because g(x) = y for each x in t he inner sum.
Since g(x) t r ansforms each possible ou tcome x E Sx t o a value y E Sy, t he preceding
d ouble summat ion can be 'ivritten as a single sum over all possible values x E Sx . That

E [Y] = L g(x)Px (x) . (3.61)

x ESx

- - - Example 3.291~-
l n Examp le 3.26,

1/ 4 x = 1, 2, 3, 4, 105X - 5X 2 1 < x < 5,

Px(x) = and Y = g (X ) =
0 otherwise, 500 6 < x < 10.
What is E['Y]?


Applying Theo rem 3.10 we have

E ['Y] = L Px (x) g(x)
,1, -1

= (1/ 4)[(105)(1 ) - (5)(1)2 ] + (1/ 4) [(105)(2) - (5)(2) 2 ]

+ (1/ 4)[(105)(3) - (5)(3) 2 ] + (1/ 4) [(105)(4) - (5)(4)2 ]
= (1/ 4) [100 + 190 + 270 + 340] = 225 cents. (3.63)

This of course is the sarne ans-vver obtair1ed ir1 Exarnple 3.26 b}' first calculating
Py(y) and t11er1 applying Definition 3.13. As an exercise) you might "''ant to cornpute
E[Y] in Exarnple 3.27 directly from T11eorerr1 3.10.
Frorn t his t 11eorerr1 \Ve can derive sorr1e important properties of expected va.lt1es.
The first or1e h as to do wit h t he d ifference betv.reen a randorr1 variable and its
expected va.111e. vVher1 students learn t 11eir O'ivn grades on a rr1idterrr1 exarn, they
are quick to ask about t 11e class average. Let's say one student h as 73 and t he class
average is 80. She may be inclir1ed to t hink of her grade as "se'iren points belo-vv
average," rather tha r1 "73." In t errns of a probability rnodel, -vve "''ould say t hat
the randorn variable X points or1 t he rnidterrn has beer1 transformed t o the randorn

Y = g (X ) = X - x points above average. (3.64)

The expected valt1e of X - ,x is zero , regardless of the probability rr1odel of X.

=== Theorem 3.ll- - -

For ariy ran,dorn variable X )

E [X - ,x ] = O.

Proof Defining g(X ) =X - x and applying Theorem 3.10 yields

E [g(X )] = L (x - flX )Px (x) = L 1;Px (1;) - flX L Px (x) . (3.65)

xESx xESx

The first term on t he right side is x b y d efinit ion. In t he second term ) 2=xESx Px(x) = 1,
so bot h terms on t he righ t s ide are J.lx and t he difference is zero.

Anot her property of the expect ed \ralue of a fur1ction of a ra ndorn variable applies
to linear t ransforrnations. 1

1 We ca ll the t ra nsform ation a X + b linear a l t ho ugh, strictly speaking, it, should b e called affine.


~-== Theorem 3.12

For ariy ran,dorn variable X;

E [aX + b] = a E (X] + b.

This directly frorr1 Defir1it ion 3.13 arid Theorem 3.10. A lir1ear transforma-
tion is essentially a scale change of a qt1antity, like a t ransforrr1ation frorr1 inches
to cent imeters or from degrees Fahrenheit to degTees Celsius. If express the
data (r andom variable X) in nevv t1r1its , the new aver age is just t he old average
trar1sformed to the nevv units. (If the professor adds five points to everyone's grade,
the aver age goes up by five points.)
T11is is a rare exarnple of a sitl1atior1 in v.r11ich E(g(X )] = g(E(X]) . It is ternpt'in,g,
[yu,t 11,sually 'UJrorig, to apply it to other transform,ation,s. For ex ample, if y = X 2 )
it is t1su a lly t 11e case that E(Y] =I= (E (X] ) 2 . Expressing this ir1 gen eral terrr1s, it is
t1sually t11e case that E(g(X)] ~ g(E(X]).

Example 3.30
Recall fro m Examples 3.5 and 3.24 that X has PMF

Px(x) 1/ 4 x=O )

1/ 2 x=l
Px(x;) = ' (3.66)
0 ...___..____,.._____, 1/ 4 x = 2,
-1 0 1 2 3 x 0 otherwise.
What is the expected va lue of V = g(X) = 4X + 7?
From T heorem 3.12,

E ('V] = E [g(X)] = E (4X + 7] = 4 E (X ] + 7 = 4 ( 1) + 7 = 11 . (3 .67)

We can verify t his result by app lying T heorem 3.10:

E ['VJ = g(O)Px (0) + g(l )Px (1) + g(2)Px (2)

= 7 (1/ 4) + 11(1/ 2) + 15(1/ 4) = 11. (3 .68)

Example 3.31
Continu ing Examp le 3.30, let W = h,(X) = X 2 . W hat is E (T1V] ?

T heorem 3.10 gives

E (W] = Lh(x) Px( x;) = (1 / 4)0 2 + (1/ 2)1 2 + (1/4) 22 = 1.5. (3 .69)

Note that th is is not the same as h,(E(W]) = (1) 2 = 1.



..--- Quiz 3.7r--:=~
The riurriber of rnernory chips M needed in a personal cornputer depends on hovv
rnany application prograrr1s, A , t he ovvner v.rants t o run sirnultaneously. Tlie nl1rriber
of chips M and the riurriber of application programs A are described b:y

(4 chips for 1 p rogTa.rn,

4 chips for 2 progr arns, 0.1(5 - a) a, = 1) 2, 3, 4,
J\lf = PA(a) = (3.70)
6 chips for 3 p rogr arns, 0 otherv.rise.
8 chips for 4 progr arns,

(a) vVhat is the expected nurnber of prograrris A = E [A]?

(b ) Express J\lf, tlie nurn.b er of rriernory ch ips, as a function J\lf g( A ) of t h e
ntlmber of application prograrns A.
(c) Firid E[M ] = E[g (A)]. Does E [M ] = g(E[A] )?

3.8 Variance and Standard Deviation

The variance Var[X] rrieasures the d ispersion of sample values of

X arol1nd tlie expect ed value E [X]. W hen we viev.r E [X ] as a n
estirnate of X , Var[X] is t he rriean square error.
In Sectiori 3. 5, describe an aver age as a t yp ical value of a r andom variable.
It is one ntlrnber t h at summarizes an ent ire probability rriod el. Aft er firiding a n
average, sorneone v.rho v.rarits to look furt her int o t lie probabilit}' model rnight ask ,
"How t yp ical is t he aver a,ge?" or "'\i\That are t he ch a.rices of obser virig an event far
frorn t he average?" Iri t he example of the m idterrn exarri, after }'Oll firid Otlt your
score is 7 points above average, yol1 are likely t o ask , "How good is that? Is it near
t he top of tlie class or som ewhere near t he rriiddle ?" A rrieastlre of d ispersion is
an answer to these quest ions v.rra pped up in a. sirigle number. If t his m easure is
srria ll, observatior1s are lil<el}' t o be near the a;ver age. A high rneasure of dispersion
suggest s t h at it is not ur111sual t o observe everits t liat are far frorri t he a:ver age.
The rriost importa nt m easures of dispersion are t he st andard deviation and its
close r elat ive, the v ariarice. T he variarice of randorn variable X descr ibes t he dif-
ference betv.reen X and its expect ed v alue. This difference is t he der ived ra ndorri
variable, Y = X - 11,x . T h eorem 3.11 st at es t liat ,y = 0, r egardless of t lie proba-
bility rriodel of X. Therefore ,y provides no ir1forrnatiori abot1t the dispersion of X
arol1nd x . A useful measure of t he likely difference bet vveeri X arid its expect ed
value is t he expect ed absolute vall1e of the difference, E [l'YI]. However , t11is pararn-
et er is not easy t o work vvit h rnathernatically in rnany situatioris, and it is not used


Ir1stead we focus on E [Y 2 ) = E [(X - 11,x) 2], wh ic11 is r eferred t o as Var (X ], t h e

variance of X. The square root of t he v ar iance is (]" x, the st ar1dard deviation of X .

Definition 3.15 Variance

The variance of ran,dorn variable X is

- - - Definition 3.16~-- Standard Deviation

The standar d deviation of r a/ndorn 1;ariable X is

(]"x = )Var [X ).

It is usef\11 t o take t he square root of v ar[X] because (]"x has the sarr1e unit s (for
exarr1ple, exarn poir1ts) as X . T11e units of t 11e variance ar e sql1ares of the units of
the randorr1 variable (exa.rr1 points sq11ared). Thus (]" x car1 be cornp ared directly
vvith t 11e expect ed value. Ir1forrr1ally, we think of sarr1ple v alues vvit hir1 (]" x of t h e
expected value, ;i; E [1;,x - (]"x , ,x + (]"x], as "t ypical" values of X and other values
as "unusua l." In m a r1y a,pplicatior1s, a bout 2/ 3 of t h e observatior1s of a r andorn
variable ar e vvit hin one s t andard deviation of t h e expect ed value. Thus if the
standard deviatior1 of exarn scores is 12 poir1ts , t 11e st11dent vvith a score of + 7 vvit 11
r espect t o the rnean can thir1k of herself in the rniddle of t11e class. If the st andard
deviat ion is 3 points, she is likely t o be r1ear t he t op .
T11e varia nce is also u seful when you gu ess or predict t11e value of a r a ndom
variable X. Suppose you ar e asked to rr1ake a predict ion 5; before you p er forrn ar1
experirr1ent and observe a. sarr1ple value of X. The prediction x is also called a blin,d
estirnate of X sir1ce }'Ollr predictior1 is an estirr1ate of X vvithout t he benefit of ar1y
observation. Since yo11 vvould like the prediction error X - x to be srnall, a popular
approach is to choose x t o rninimize the expect ed square error

e = E [(X - x) 2 ] . (3.71)

Another narr1e for e is the rr1ean square error or MSE. \i\Tit 11 kno\vledge of t he P1!{F
Px(:i;), vve can choose x t o m inirnize the MSE.

Theorem 3.13
Iri the absen,ce of observation,s; the rn'in,irnv,rn rnean, sqv,are error estirnate of ran,dorn
variable X is

5; = E [X) .

Proof After substit u t ing X = x, we expand t he squ are in Equat ion (3.71) to \Vrite


e = E [X J - 2xE [X] + 2 . (3.72)

To minimize e, we solve

de = -2 E [x l + 2::i; = 0'

(3. 73)
yielding i; = E[ X].

W11en the estirr1ate of X is x= E [X] , the l\IISE is

e* = E [(X - E [X ]) 2 ] = v ar[X]. (3.74)

Therefore, E[X] is a best estimate of X and Var[X ] is the J\!{SE associated vvith
this best estirr1ate .
Because (X - ,x )2 is a function of X , Var[X] can be corr1puted according to
Theorerr1 3.10.

Var [X] = 0"1 = L (J"; - ,x )2 Px (x). (3. 75)


By expandir1g t11e sql1are in this formula, arrive at the rr1ost 11seful approach to
cornputing the variance.

=== Theorem 3.14--==::::i

Var [X] = E [X 2 J - 11,'i = E [X 2 J - (E [X]) .

Proof Expanding the square in (3.75), we have

Var[X] = L 2
x Px(x) - L 2p,xxPx(x) + L p,2xPx(x)
xESx xESx

xESx xESx
(3. 76)

We note t11at E[ X ] arid E [X 2 ] are examples of rnornen,ts of t11e randorn variable X.

Var[X] is a cen,tral rnornen,t of X.

Definition 3.17 Moments

For ran,dorn variable X:
(a) Th e 'nth moment is E [Xn].
{b) Th e 'nth centr al moment is E [(X - ,x)n ].


Thus, E(X ] is the first rnornen,t of r andorn v ariable X. Sirr1ilar ly, E(X 2 ] is the
secon,d rnornen,t. T11eorem 3.14 says that t11e variar1ce of X is the second moment
of X minllS the sql1are of t11e first moment.
Like the P NIF a rid t he CDF of a rar1dom variable, the set of rr1oments of X is
a corr1plete probabilit}' rr1odel. We learn in Section 9. 2 that the rnodel based on
rr1oments car1 be expressed as a rnornen,t gerieratirig f11,n,ction,.

Example 3.32
Continu ing Examples 3.5, 3.24, and 3.30, we recall that X has PMF

Px(x) f l/4 x =O,
Px(x;) =) 1/ 2 x = 1, (3.77)
-1 O 1 2 3 x
) 1/ 4
x = 2,
and expected va lue E(X] = 1. What is the variance of X?
In order of increasing simplicity , we present three ways to compute Var(X].
From Defin ition 3.15, define

'VT! = (X - ,x) 2 = (X -1)2 . (3. 78)

We observe that W = 0 if and only if X = 1; otherwise, if X = 0 or X = 2, then

W = l. Thus P(W = O] = P x( l ) = 1/ 2 and P ['VV = 1] = Px(O)+Px(2) = 1/ 2.
The PMF of W is

1/ 2 'UJ = 0 , 1,
Pvv ('ID) = (3.79)
0 otherwise.


Var [X] = E [W] = (1/ 2) (0) + (1/ 2) (1) = 1/ 2. (3.80)

Recall that Theorem 3.10 produces the same result without requiring the deriva-
tio n of P11v('ID) .

Var[X] = E [(X - ,x ) 2 ]
= (0 - 1) 2 Px (0) + (1 - 1) 2 Px (1) + (2 - 1) 2 Px (2)
= 1/2. (3.81)

To apply Theorem 3.14, we find that


Thus Theorem 3.14 yields

Vax[X] = E [X 2 J - ~ = 1.5 -1 2 = 1/2. (3.83)



Note that (X - x ) 2 > 0. Therefore, its expected valt1e is also nonnegative.

That is , for an}' r andom v ariable X
v ar (X ] > 0. (3 .84)
The followir1g t heorerr1 is r elated to Theorem 3 .12
=== Theorem 3.15

Var [aX + b] = a, 2 Var [X ] .

Proof \Ale let Y = aX +band apply Theorem 3.14. We first expand t he second m oment
to obtain


Expand ing t he right side of T heorem 3.12 yields

2 2 2 =a x + 2ab x + b2 . (3.86)

Because Var [Y] = E [Y 2 ] - ,~,Equations (3.85) and (3.86) i1nply t hat

Var [Y] = a2 E [X2 ] - a2 p,2Jc = a 2 (E [X 2 ] - 'i) = a 2 \ far[X] . (3.87)

If we let a, = 0 in t11is theorern , 11ave v ar [b] = 0 because t11er e is no dispersior1

around t 11e expected vah1e of a constant. If let a, = 1, hav e Var [X + b] =
Var[X] beca11se shift ing a r andorn variable by a const ar1t does not char1ge the dis-
persior1 of outcorr1es arot1n d t he expected value.
Example 3.33
A pri nter autom atical ly prints an initia l cover page that precedes the regu lar print ing of
an X page document. Using this printer, the number of printed pages is Y = X + 1.
Express the expected value and variance of y as funct ions of E [X] and v ar[X ].

T he expected nu m ber of transmitted pages is E[Y] = E[X] + 1. T he variance of t he

nu m ber of pages sent is Var'['Y] = Var[X].

If we let b = 0 in T 11eoirerr13.12 , we have v ar [aX ] = a 2 .Var[X] an d <7ax = a<7 x .

Ni t1ltiplying a ran dorn v ariable by a cor1stant is equivalen t t o a scale ch ar1ge in the
t1nit s of rr1east1rem ent of t11e r ar1dom variable.
Example 3.34
In Exa m ple 3.28, the ampl itude V in volts has PMF

1/ 7 v = - 3, - 2, ... ) 3,
Pv(v) = (3.88)
0 otherwise.


A new voltmeter records t he amplitude U in millivolts. Find the variance and standard
deviation of U.
Note that U = lOOOV. To use Theorem 3.15, we first find the variance of v. The
expected value of the amplitude is

v = 1/ 7[-3 + ( -2) + (-1 ) + 0 + 1+2+3) = 0 volts. (3. 89)

The second moment is

E [V2 J = 1/ 7[( -3) 2 + (-2) 2 + (-1) 2 + 0 2 + 12 + 2 2 + 3 2 ) = 4 volts2 . (3 .90)

Therefore the variance is Var[V) = E [V 2 ) - ,'f.; = 4 volts 2 . By Theorem 3.15 ,

Var [U) = 1000 2 Var [.V) = 4>000,000 millivolts2 , (3. 91)

and thus (Ju = 2000 millivolts.

The follovving theorem states t he variances of the farr1ilies of r ar1dom variables

defined in Section 3.3.

=== Theorem 3.16===

(a) If X is B ern,o'tJ,lli (p), then, {b) If X is geornetric (p), theri

Var[X ] = p(l - p) . Var[X ) = (1 - 'fJ)/rJ2 .

( c) If X is binom,ial (n,>p), then, ( d) If X 'is P ascal ( k, p), then,

Var[X ] = n,p( l - 'fJ). Var[X ) = k( l - p)/rJ2 .

(e) If X is P oisson, (a), theri (f ) If X is discre te un,if orrn (k>l),

Var [X ] = a . Var[X ] = (l - k)(l - k + 2) / 12.

Quiz 3.8
Ir1 an experirr1en t wit l1 three cust orr1ers enterir1g the Phonesrnart store> t 11e obser-
vation is N, the number of pl1ones pl1rchased. The P MF of N is

(4 - n,)/10 n, = 0, 1, 2>3
0 othervvise.


3. 9 l\IIA TLAB 99
(a) The expect ed value E [N] (b) The second mornent E [N 2 ]

( c) The variar1ce Var [N] (d) The st andard deviation a N

3. 9 1\1.IATLAB

MATLAB programs calcl1lat e valt1es of functions inclt1ding P l\!IFs

arid CDFs. Other l\IIATLA.B functions sirnt1late experiments by gen-
erating rar1dom sarr1ple vall1es of randorn variables.

This section presents t wo types of MATLAB prograrns based on r andorn variables

vvith arbitrary probabilit y rr1odels and r andorn variables ir1 the farnilies preser1ted
in Sectior1 3.3. \ lV e st ar t by calct1lating probabilit ies for any finite r andorn var iable
vvith arbitra ry P l\!IF Px(x) . vVe ther1 compute PMFs and CDFs for the farnilies of
randorr1 variables ir1troduced in Section 3.3. Based on t11e calculation of the CDF , ""'e
then develop a method for generating random sample values . Generating a rar1dom
sarnple sirr111lates performing an experirr1ent that conforrns to t11e probabilit}' rr1odel
of a specific r ar1dorn varia ble. In subseqt1er1t chapters, we "''ill see t h at 1\11.A.TLAB
functions that generate ra ndorn sarr1ples are bt1ilding blocks for the sirnt1lation of
rnore-cornplex systems. The MATLAB ft1nctions described in this section can be
downloaded frorn the compar1ion "''ebsite.

PMFs and CDFs

For t he rnost p art, t11e P l\![F and CDF functions ar e straightforward. '\!Ve start
"''ith a sirnple fir1ite discrete r ar1dom variable X defir1ed b}' t 11e set of sample v al-
ues Sx = {s1, ... ,sn} arid corresponding probabilities Pi = Px(si) = P [X = si]
In MATLAB, vve represent Bx, the sample space of X , by t he column vectors =
/ 1
[s i Sn J and the correspondir1g probabilities b}' the vector p = [IJ1 Pn J
The function y =f ini tepmf (sx, px, x) generates the probabilities of the elements of
the rn,-dimer1sional vector x = [~D1
Xrn] The Ol1tpt1t is y = [Y1 Yrn J
"''here y,;, = Px(xi). That is, for eac11 requested ;r;i , f ini tepmf returns the valt1e
Px(x;,;,) . I f xi is not in the sarnple space of X , y ,;, = 0.

2 Although column vectors a re supposed Lo appear as columns, \~re generally write a column vector
x in the form of a transposed row vector [x1 Xrn ]' Lo save space.


Exa mple 3.35

In Examp le 3.27, the random variab le X, the weight of a pac kage, has P M F

0.15 :J; = 1, 2, 3, 4,
Px (x) = 0.1 ;i; = 5,6,7, 8, (3.93)
0 otherwise.

W rite a MATLAB functio n that ca lculates Px(x) . Ca lcu late the probab ility of an Xi
pound package for x 1 = 2, :i;2 = 2. 5, and :i;3 = 6 .

T he l\/IATLAB f unction shipweightpmf (x) im plements Px(x) . We can t hen use

shipweightpmf to calculate the desired probab ilities:
function y=shipweightprnf (x) >> shipweightprnf([2 2.5 6])'
s=(1:8)'; ans =
p=[0.15*ones(4,1); 0.1*ones(4,1)]; 0.1500 0 0.1000

We also ca.n use MATL.AB to calculate a. P l\l!F in a farnily of randorr1 variables by

specifying the pararneters of the P l\l!F to be calculated. Although a P MF Px( x) is a
scalar function of or1e varia,ble, the nature of l\IIA TLAB rnakes it desirable to perforrr1
MATLAB P l\l!F calculations 'ivitl1 \rector ir1puts and \rector outputs. If y =xpmf (x)
calculates Px(x), then for a \rector input x, 'ive produce a \rector output y sucl1
that y (i) =xpmf (x (i)). That is, for vector input x , the output vector y is defined
by y,i = Px( :i;i) .

Example 3.36
W rite a l\IIATLAB funct ion geometricpmf (p, x) to calculate, for the sa mple val ues in
vector x , Px(x) for a geometric (p) ra ndom variable.

function prnf=geornetricpmf (p,x) In geometricpmf . m, t he last line ensu res t hat

%geornetric(p) rv X va lues ;i;i rj. Bx are assigned zero probability .
%out: prnf (i)=Prob[X=x(i)] Because x =x (:) reshapes x to be a co lu m n
x=x (:) ; vector , t he output pmf is always a column vec-
prnf= p * ((1-p).~(x-1)); tor.
prnf= (x>O). * (x==floor(x)).*Prnf;

Example 3.37
W rite a MATLAB f unct ion t hat ca lcu lates the Poisson (a) PMF .
For an integer x , we could calculate Px(x) by the d irect ca lculation
px= ((alpha~x)*exp(-alpha*x))/factorial(x)

T his w ill y ield the right answe r as long as the argu ment ;i; for the factoria l function is
not too large . In l\IIATLAB version 6, factoria l (171) causes an overflow. In add it ion,
for a > 1, ca lcu lati ng the rat io a/1; /x ! for large ;i; ca n cause numerica l problems beca use
both ax and x ! w i ll be very large nu m bers , p ossibly with a sm all quot ient. Another
shortcomin g of t he direct calcu lation is ap pare nt if you wa nt to ca lcu late Px(x) for

3.9 M ATLAB 101

the set of possib le values x = [O, 1, ... , n,]. Ca lculating fact oria Is is a lot of work for
a comp uter and t he d irect approach fa ils to exploit the fact t hat if we have already
calculated (1'; - 1)!, we can easily compute J'; ! = x; (x - 1)!.A more efficient ca lculation
makes use of the observation

a,:i :e-a a
Px(;r;) = =- Px(x - l ) . (3 .94)
x! x

T he poissonpmf .m f unct ion uses Equation (3.94) to ca lculate Px(x) . Even this code
is not perfect because 1\11.A.TLAB has limited range.

function pmf=poissonpmf (alpha,x) In M .A.TLAB , exp(-alpha) returns zero

%output: pmf (i)=P[X=x(i)] for alpha > 745 .13 . For these large va l-
x=x(:); k=(1:max(x))'; ues of alpha,
ip=[1;((alpha*ones(size(k)) ) ./k)];
pb=exp(-alpha)*cumprod(ip); poissonpmf (alpha,x)
%pb= [P(X=O) ... P(X=n)] returns zero for all x. Problem 3.9.9 out-
prnf=pb(x+1); %pb(1)=P[X=O] lines a so lut ion that is used in the ve r-
pmf=(x>=O) .*(x==floor(x)).*prnf;
sion of poissonpmf . m on the companion
%prnf(i)=O for zero-prob x(i)

For t he Poisson CDF, t her e is no sirr1ple way t o avoid sumrr1ir1g t11e P MF . T 11e
follovvir1g exarnple shows an implernent atior1 of t he P oisson CDF . The code for a
CDF t er1ds t o be more com p licat ed t h an t h at for a P MF because if x is not a n
ir1teger, Fx(J';) m a}' still be nor1zero. Ot her CDFs a re easily developed following the
sarne approach.

Example 3.38
Write a MATL.A.B functio n that ca lculates t he CDF of a Poisson ra ndom variab le.

function cdf=poissoncdf (alpha,x) Here we present the l\ll ATLAB code for the
%output cdf (i)=Prob[X<=x(i)] Poisson CD F. Since the sa m ple va lues of a
x=floor(x(:)); Poisson ra ndom variable X are integers , we
sx=O:rnax(x); observe that Fx(x) = Fx(lxJ) where lxJ ,
cdf=cumsum(poissonprnf(alpha,sx)); equivalent to the 1\11.A.TLAB funct ion floor (x),
%cdf from 0 to max(x) denotes the largest integer less than or equal
okx=(x>=O);/.x(i)<O -> cdf=O to x;.
x=(okx.*x);%set negative x(i)=O
cdf= okx.*cdf (x+1);
%cdf=O for x(i)<O

===- Example 3.39---===

In Example 3 .17 a website has o n average A= 2 hits per second. What is the probability
of no more than 130 hits in one mi nute? What is the probabi lity of more than 110 hits
in one minute?

Let J\lf equal the number of hits in one minute (60 seconds). Note that M is a Poisson


(a) ra ndom variab le with a = 2 x 60 = 120 hits. Th e PMF of M is

(120) 1ne- 120 /rn! rn, = O>1, 2, ...

PNI(rn,) = (3.95)
0 otherwise.

>> poissoncdf (120,130) The 1\11.A.T LAB solution shown on the left executes the
ans= following math ca lcu lations:
>> 1-poissoncdf(120,110) 130
ans= P [M < 130] = L Pj\IJ(m,), (3. 96)
0.8061 rn= O
P [M > 110] = 1 - P [M < 110]
= 1- L PNI(rn,) . (3.97)

Generating Random Samples

The progr arns described t11us far in t his section perform t he farniliar t ask of calcl1-
lating a fur1ction of a single varia ble . Here, t he ft1nctions are P NIFs and CDFs . As
described in Section 2.5 , l\!IATLAB car1 also be l1sed t o sirr1ulat e experirnents. In this
sect ion we present M ATLAB progra rns t hat gener at e dat a cor1forrr1ir1g t o farr1ilies of
discrete r ar1dom variables . W hen rnar1y samples are generat ed by t hese prograrns,
the relative frequer1cy of d ata. ir1 an event in t he sarr1ple space converges to t he prob-
ability of t he event. As in Chap ter 2, -vve t1se rand() as a. SOl1rce of ran dorr1ness.
Let R = rand ( 1). R ecall t11a t rand ( 1) sirnt1lat es a r1 experirnent t hat is equally
likely t o prodt1ce a ny real nt1rr1ber in the interval [O>1]. We \vill learn in Chapter 4
that t o express t 11is idea in m at herr1atics , \Ve say that for any interval [a, b] c [O, 1],

P [a < R < b] = b - a. (3. 98)

F or exarr1ple>P [0.4 < R < 0.53] = 0. 13. No-vv suppose \Ve wish to gener at e samples
of discret e r andorn variable J( \vit h SK = {O>1, ... }. Since 0 < FI<(k - 1) <
FK(k) < 1, for all k> we ob serve t hat


T 11is fact leads to t 11e following a pproach (as sho-vvr1 in pseudocode) t o us ing
rand() t o produce a sarnp le of r andom variable J( :

Random Sample of random variable I<

Generate R = rand(1)
Find k* E SK such that F K(k* -1) < R < F K(k* )
Set J( = k *

3.9 M ATLAB 103

lVI ATLAB Functio11s

PMF CDF Random Samples
finitepmf(sx,p,x) finitecdf ( sx,p,x) finiterv(sx,p,m)
benioullipmf(p,x) bernoullicdf (p,x) benioullirv (p ,m)
binomialpnf (n,p,x) binomialcdf (n ,p, x) binomialrv(n,p,m)
geometricpmf(p ,x) geometriccdf (p,x) geometricrv (p ,m)
pascalpmf (k ,p ,x) pascalcdf (k ,p ,x) pas ca lr v ( k, p, m)
poissonpmf (alpha,x) poissoncdf (alpha,x) poissonrv(alpha,m)
duniformpnf (k,l,x) duniformcdf(k,l,x) duniformrv(k,l,m)

Table 3.1 I\llATLAB fun ctions for d iscrete random variables.

A M .A.TLAB ft1nction that uses rand() in this v.ray sirr1ulates an experirr1ent that
produces sarnples of randorr1 variable K. Generally , t:his implies t hat before we car1
produce a sarr1ple of randorn variable K , "''e need to generate t he CDF of K. \Ve
can reuse t he work of t his computation b}' defir1ing our 1\11.A.TLAB fur1ctior1s such as
geometricrv (p, m) t o generate rn sarnple values each t irr1e. "\!Ve now preser1t t r1e
details associat ed witr1 ger1erating bir1ornial random variables.

Example 3.40
Write a function that generates 'IT/, samples of a binomial (n,,rJ) random variable.

function x=binomialrv(n,p,m) For vectors x and y , c =count (x, y) returns a vec-

% m binomial(n,p) samples tor c such that c (i) is the number of elements of
x that are less than or equal to y(i) . In terms of
r=rand(m, 1); our earlier pseudocode , k* = count(cdf ,r). If
cdf=binomialcdf(n,p,O:n); count ( cdf, r) = 0, then r < Px(O) and k * = 0.

Generating binornial randorn v ariables is easy because the range is simply {0, ... , ri}
and t he rninirr111m v alue is zero. The M .A.TLAB code for geometricrv, poissonrv,
and pascalrv is slight ly rr1ore complicated becat1se -vve need t o generate eno11gh
t erms of t r1e CDF t o enst1re t r1at vie fir1d k* .
T able 3.1 contair1s a collection of ft1nctions for an arbitrary probability rnodel and
t he six families of randorr1 variables int roduced ir1 Section 3.3. As in Exarr1ple 3.35,
the functions in t he first ro'iv car1 be used for an}' discret e randorr1 variable X -vvitr1
a finite sarnple space. Tr1e arg11ment s is t he vector of sarr1ple \ralues s ,i of X, and p
is t r1e corresponding vector of probabilit ies P [s,i] of those sample valt1es. For P l\!IF
and CDF calculations, x is t r1e \rector of numbers for -vvr1ich t he calculation is t o
be perforrned. In t he function f ini teserv, m is t r1e n11mber of rar1dorn sarnples
returned by the function. Each of t he final six ro'ivs of the table contair1s for one
fa mily t he pmf f\1nction for calc11lating va1t1es of t he P l\!IF, t r1e cdf function for
calctllating va1t1es of t he CDF , and t he rv ft1nction for ger1erating rar1dom sarnples.
In each function description, x denotes a colt1mr1 \rect or x = [ x 1 ~Drn J' . T r1e
pmf ft1nction output is a \rect ory st1cr1 t hat Yi = Px(x,i) . T r1e cdf function 011t p11t
is a vector y sucr1 t r1at Yi = Fx(x,i) The rv ft1r1ction output is a vector X =


0.2 ~

0.2 ~

0 0 0
0 I 2 3 4 5 0 I 2 3 4 5 0 1 2 3 4 5
y y y
P l\/IF Py(y) Sample Run 1 Sarnple Rur1 2

Figure 3.2 The P l\IIF of Y and the relative frequencies found in t-vvo sample runs of
voltpower(100). Note t h at in each run, the r elative frequencies are close to (but not
exactly equal to) t he corresponding PMF.

[Xi X1n J' such that each X i is a sarr1ple value of the r andom variable X. If
m, = 1, t11en the output is a sir1gle sarnple value of randorr1 v ariable X.
We preser1t an additional exarnple , partly because it dernor1strates som e useful
MATLAB fur1ctions, and also becat1se it hov.r to generate the relative frequen-
cies of randorr1 sarr1ples.

i::::::== Example 3.41- - -

Simulate n, = 100 tria ls o f the experiment p roducing the power measurement Y 1n
Example 3.28. Compare the relative frequency of each y E Sy to Py(y) .

function voltpower(n) In voltpower .m, we calculate Y = V 2 / 2 for each of

v=dunif ormrv(-3,3,n); n, sa m ples of the voltage V. As in Example 2.26, t he
y=(v.~2)/2; function hist(y,yrange) produces a vector w ith
yrange=O:max(y); jth eleme nt equal to the number of occurrences of
yfreq= (hist Cy' yrange) /n) ' ; yrange (j) in t he vector y. The function pmf p l ot. m
pmfplot(yrange,yfreq); is a uti lity for producing PMF bar plots in the style of
this text. Figure 3.2 shows Py(y) along with t he resu lts of two runs of vol tpower ( 100).

Derived Random Variables

MATLAB can also calculate P l\/IFs and C DFs of derived r andom variables. For
this section , vie assurne X is a fir1ite r andorr1 variable -vvith sarnple space Sx =
{x 1 , ... , ;:r;n} such t11at Px(xi ) = Pi '\''!Ve represent t he properties of X by the
/ 1
vectors s x = [ ::r 1 Xn J and p x = [Pi Pn J . In J\IIATLAB r1otation, sx
and px represer1t the vectors s x and p x .
For derived randorr1 variables, we exploit a feature of f ini tepmf (sx, px, x) that
allo-vvs the elem ents of sx to be repeated. Essentially, we use ( sx, px), or equiv-
alentl:y (s x, p x) , to r epreser1t a r andorr1 variable X described by t11e follo-vving
experirr1en ta.l procedure:

3.9 MATLAB 105

Finite sample space

Roll an n -sided die such that side i has probability Pi.

If side j appears 1 set X = Xj .
A consequence of this approach is that if x 2 = 3 a rid x 5 = 3, t11en t11e probability
of observir1g X = 3 is Px(3) = P2 + p5.

= = Example 3.42

> > sx= [ 1 3 5 7 3] ; f ini tepmf () accounts for multiple occ urrences
>> px=[0.1 0.2 0.2 0.3 0.2]; of a sa m p ie va Iue. In the exam p ie on the left,
>> prnfx=finiteprnf(sx,px,1:7);
>> prnfx' pmfx(3)=px(2)+px(5)=0.4.
ans =
0.10 0 0.40 0 0.20 0 0.30
It m a}' seern unnecessar}' and perhaps even b izarre to allow t11ese rep eated v alues .
Hov.rever , we see in the next example that it is quite convenier1t for derived r ar1dom
variables y = g(X) vvith t he property t11at g(x;i) is the sam e for rr1ultiple x;,i

=== Example 3.43:===:a

Recal l that in Example 3.27 the weight in pounds X of a package and the cost Y = g(X)
of shipping a package were described by

0.15 x; =l , 2, 3, 4,
Px (x) = 0.1 x; = 5,6,7, 8, y=
105X - 5X 2 1<
x < 5.
- I

500 6< x < 10.

0 otherwise,

Write a f unction y=shipcostrv(m) t hat outputs rn, samples of t he shipping cost Y.

function y=shipcostrv(rn) The vector gx is t he ma ppin g g(x) for each ;r; E Sx .

sx=(1:8)'; In gx , the element 500 ap pears t hree times, corre-
px=[0.15*ones(4,1); ... spond ing to ;i; = 6, x; = 7, and x = 8. Th e function
0.1*ones(4,1)]; y=fini terv(gx,px ,m)) prod uces rn samples of t he
gx=(sx<=5).* ... shipping cost Y.
(105*sx-5*(sx.~2)) ...
+ ((sx>5).*500);
y=finiterv(gx,px,rn); >> shipcostrv(9)'
270 190 500 270 500 190 190 100 500

==-- Quiz 3. Q___,;;==

Ir1 Section 3. 5, it was argued t hat t he average
1 n
rnn = -
L :i;(i) (3.100)
'i = l


of sarr1ples :c( l ), x(2), ... , J';('n) of a rar1dorn variable X \vill converge t o E (X] as n,
becomes large. For a discrete llniforrn (0, 10) randorr1 \rariable X , use MAT LAB to
exarr1ine this convergence.

(a) For 100 sample va1t1es of X , plot t he sequence rn,1 , rn,2 , ... , 'JT1,100 . R epea.t t his
experiment five times, plotting all five 'JTl,n cur\res or1 common axes .
(b ) Repeat part (a ) for 1000 sarr1ple values of X.

Difficulty: Easy Moderate D ifficu lt Experts Only

3.2.1 The random variable fl has P l\!IF result is eit her a hom e run ('vit h probabil-
ity q = 0.05) or a strike. Of course, t hree
_ { c(l / 2) n, = 0 , 1, 2, strikes and Casey is out .
P N (Tl,) -
0 otherwise. (a) W h at is t he p robabili ty P [H ) t hat
Casey hits a ho1ne run?
(a) What is t he value of t h e constant c? (b) For one at-bat, 'vhat is t he Pl\IIF of fl,
(b) W hat isP [N< l )? t he number of t imes C asey s\vings his
3.2.2 The random variable V has Pl\IIF
3.2.5 A tablet computer t r a ns mits a file
2 over a \Vi-fi link to a n access point. D epend-
Pv(v) = { cv v = 1, 2, 3 ,4,
ing on t he s ize of t he file, it is t r ansmitted
0 other,vise.
as N packets where N has PMF

(a) F ind t he value of t he constant c . - { c/n, n= l , 2,3,

P JV (Tl,) -
2 0 otherwise.
(b) F ind P [V E { ?.L 1'1.l= 1, 2, 3, .. }).
(c) F ind t he pr obability t hat V is even.
(a) F ind t he constant c .
(d) F ind P ['I > 2).
(b) \tV hat is t he probability t hat N is odd?
3.2.3 The random variable X has Pl\!IF (c) Each packet is r eceived correctly wit h
probability p, a nd t h e file is r eceived
_ { c/ x ::i; = 2, 4 , 8, correctly if all N packets are r eceived
P xx
( )-
0 ot herwise. correctly. Find P [CJ, t he probability
t hat t he file is received correctly .

(a) What is t he value of t h e constant c? 3.2.6 In college basketball, 'vhen a player

(b) W hat is P [X = 4)? is foul ed while not in t he act of s hooting
and t he opposing team is "in t he pena lty,"
( c) \i\fhat is P [X < 4)? t he player is awarded a "l and l ." In t he 1
(d ) W hat is P [3 < X < 9)7 and 1, t he player is awarded one free t hro\v,
a nd if t h at fr ee t hr ow goes in t he player
3.2.4 In each at-bat in a baseba ll gam e, is awarded a second free t hrow. F ind t he
migh ty Casey s\vings at every pitch. T he P MF of Y , t he number of points scored in


a 1 a nd 1 given t hat any free throv1 goes (a) Draw a tree d iagram t hat describes the
in 'vith probability p, independent of any call setup procedure.
other free t hrow . (b) If all transmissions are indepe ndent
3.2.7 You roll a 6-sided die repeatedly. and the probability is p that a SETUP
Starting with roll i = 1, le t Ri denote the message will get through, 'vhat is the
result of roll i. If Ri > i, t hen you will roll PMF of K , the number of messages
again; otherwise you stop. Let N denote trans1nitted in a call attempt?
t he number of rolls. (c) \i\fhat is the probability that the phone
(a) What is P [N > 3]? will generate a busy signal?
(b) F ind the PlVIF of J\T. (d) As manager of a cellular phone system,
you 'vant the probability of a busy sig-
3.2.8 v ou are manager of a t icket agency nal to be less than 0.02. If p = 0.9,
t hat sells concert t ickets. You assume that 'vhat is the minimum value of n neces-
people 'vill call three times in a n attempt sary to achieve your goal?
to buy t ickets and then give up. You vvant
to make sure that yo u are able to serve at 3.3.1 In a package of lVI&Ms, Y, the
least 953 of t he people 'vho 'vant t ickets. number of yellow M&~1Is , is uniformly d is-
Let p be the probability that a caller gets tributed bet,veen 5 and 15.
t hrough to your t icket agency. \i\fhat is the (a) \tVhat is t he P~!IF of Y?
minimum value of p necessary to meet your (b) \i\fhatisP[Y<lO]?
( c) \i\fhat is P [Y > 12] ?
3.2.9 In the t icket agency of Prob-
(d ) \iVhat is P [8 < Y < 12]?
le1n 3.2.8, each telephone ticket agent is
available to receive a call w ith probability 3.3.2 In a bag of 25 ~1I&Ms, each piece
0.2. If a ll agents are busy when someone is equally likely to be red, green, orange,
calls, t he caller hears a busy signal. '\i\fhat blue, or bro,vn, independent of t he color of
is the minimum number of agents that you any other piece. F ind the the PMF of R,
have to hire to meet your goal of serving the number of red pieces. \i\fhat is the prob-
953 of t he custo1ners 'vho 'vant t ickets? ability a bag has no red M&~lfs?
3.2.10 Suppose w hen a baseball player 3.3.3 \i\fhen a conventional paging system
gets a hit, a single is twice as likely as a transmits a message, the probability that
double, 'vhich is twice as likely as a triple, the message w ill be received b y t he pager
'vhich is t'vice as likely as a home run. Also, it is sent to is p. To be co nfident that a
t he player's batting average, i.e., the prob- message is received at least once, a system
ability the player gets a hit, is 0.300. Let B transmits t he message n, t imes.
denote the number of bases touched safely
(a) _Assuming all transmissions are inde-
during an at-bat. For example, B = 0 vvhen
pendent , 'vhat is the PMF of K, the
t he player makes an out, B = 1 on a single,
number of times t he pager receives the
and so on. \i\fhat is t he f> lVIF of B?
same message?
3.2.11 \i\fhen someone presses SEND on (b) Assume p = 0.8. What is the minimum
a cellular phone, t he phone attempts to set value of n that produces a probability
up a call by transmitting a SET.U P message of 0.95 of receiving the message at least
to a nearby base station. The phone waits once?
for a response , and if none arrives wit hin
0.5 seconds it tries again. If it doesn't get a 3.3.4 You roll a pair of fair dice unt il
response after n, = 6 tries, the phone stops you roll "doubles" (i.e., both dice are the
transmitting messages and generates a busy same). \iVhat is t he expected number, E[N],
signal. of rolls?


3.3.5 \i\fhen you go fishing, you attach 1n K , t he number of t ickets you buy up to
hooks to your line. E ach t im e you cast you r and including your fift h 'vinning t icket.
line, each hook will be sv;,rallo,ved b y a fis h (b) L is t he number of fli ps of a fair coin u p
'vit h probability h, independen t of whether to and including t he 33rd occu rrence of
any other hook is s'vallowed. What is t he tails. \t\f hat is t he P MF of L ?
PMF of I<, t he number o f fish t hat are (c) Star ting on d ay 1, you buy one lottery
hooked on a single cast of t he line? t icket each day. Each t icket is a winner
3.3.6 Any t ime a child t hrows a F risbee, 'vit h probability 0.01. Let JV! equal t he
t he child's dog catches t he Frisbee wit h number of t ickets you buy u p to and in-
p robability p, independen t of whet her t he cluding your first winning t icket. \i\fhat
Fr isbee is caught on any previous t hrow. is t he P MF of M?
\i\f hen t he d og catches t he F risbee, it runs
3.3.10 The number of buses t hat arrive at
a'vay 'vit h t he Fr isbee, never to be seen
a b us stop in T minutes is a P oisson random
again. The child cont inues to t hro'v t he
variable B wit h expected value T /5.
Fr is bee u nt il t he d og catch es it . Let X
d enote t he number of t imes t he F risbee is (a) \t\fhat is t he P~1IF of B , t he number of
t h rown. buses t hat ar rive in T minutes?

(a) W hat is t he P MF Px(x)? (b) \i\fhat is t he prob ab ility t hat in a t'vo-

minute interval, t hree buses 'vill arrive?
(b) If p = 0.2, what is t he probability t hat
(c) W hat is t he p robability of no buses ar-
t he child 'vill t hrow t h e F r isbee m or e
riving in a 10-minute interval?
t han four t imes?
( d) H o'v much t ime s hould you allo'v so
3.3.7 \i\fh en a t'vo-,vay p ag ing syste1n t hat 'vit h pr obability 0.99 at least one
transm its a message, t he p r obability t hat bus arrives?
t he m essage 'vill be received by t he pager it
3.3.11 In a w ir eless a utom at ic m eter-
is sen t to is p. W hen t he pager receives t he
r eading system, a b ase station sends ou t
message, it t r ans1nits an acknowledgment
a 'vake- up sig na l to n earby electric m e-
signa l (ACK ) to t he paging system. If t he
ters. On hearing t he 'vake-up signal, a me-
paging system d oes not receive t he ACK, it
ter t ra nsmits a message indicating t he elec-
sends t he m essage again.
t ric usage. Each message is repeated eigh t
(a) W hat is t he P MF of N, t he number of t imes.
t imes t he syste1n sends t he sam e mes- (a) If a single t r ansmission of a inessage is
sage? successful 'vit h probability p, 'vh at is
(b) The paging co1npany 'vants to li1nit t he t he PMF of N, t he number of success-
number of t imes it has to send t he same ful message t ransmissions?
m essage. It h as a goal of P [N < 3] > (b) I is an ind icator random variable such
0.95. \tVhat is t he minimum value of p t hat I = 1 if at least o ne m essage
necessary to achieve t he goal? is t ransmitted successfully; otherwise
I = 0. F ind t he P~1F of I.
3.3.8 The number of bytes B in an
HTML file is t he geo metr ic (2.5 10- 5 ) 3.3.12 A Zipf (77,, n = 1) random variable
r andom variable. \i\f hat is t he pr obability X has P MF
P[B > 500 ,000] t hat a file has over 500 ,000
bytes? Px(x) = { ~(n)/x x = 1, 2, ... ' 77, '
(a) Star ting on day 1, you b uy one lottery T he constant c( 77,) is set so t hat
t icket each d ay. E ach t icket is a winner I:;=1 Px(x) 1. Calculate c(77,) for
wit h p robability 0.1 . F ind t he P MF of 77, = 1, 2, ... ) 6.


3.3.13 In a bag of 64 "holiday season" interval, an arriving jet lands \Vith proba-
M&~/[s, each ~1I&M is equally likely to be bility p = 2/ 3, independent of an arriving
red or green, independent of any other jet in any other minute. Such an arriv-
M&M in the bag. ing jet blocks any \Vai t ing jet from taking
(a) If you randomly grab four M&Ms, 'vhat off in that one-minute interval. However, if
is the probability P [E] t hat you grab an there is no arrival, then t he \Vait ing jet at
equal number of red and green M&~l[s? the head of t he line takes off. Each take-off
requires exactly one minute.
(b) What is t he PMF of G, the number of
(a) Let L 1 denote the number of jets that
green ~![&Ms in the bag?
land before the jet at the front of t he
( c) You begin eating randomly chosen line takes off. Find the P~IIF PL 1 ( l).
~![&Ms one by one. Let R equal the
(b) Let W denote the number of minutes
number of red M&~/[s you eat before
you \Vait until your jet takes off. Find
you eat your first green M&M. \i\!hat is
P[vV = 10]. (Note that if no jets land
the PMF of R?
for ten minutes, then one waiting jet
3.3.14 A radio station gives a pair of con- \vill take off each minute and vV = 10.)
cert t ickets to the s ixth caller w ho kno,vs (c) What is the PMF of vV?
the birthday of t he performer. For each
3.3.17 Suppose each day (starting on day
person 'vho calls, the probability is 0.75 of
kno,ving the performer's birthday. All calls
1) you buy one lottery t icket vvith probabil-
ity 1/ 2; othervvise, you buy no t ickets. A
are independent.
ticket is a \vinner with probability p inde-
(a) What is the PMF of L, the number of pendent of the outcome of all other t ickets.
calls necessary to find t he \Vinner? Let Ni be t he event that on day i you do
(b) What is the probability of finding t he not buy a t icket. Let Wi be the event that
winner on the tenth call? on day i, you buy a winning ticket. Let L i
be the even t that on day i you buy a losing
( c) \i\fhat is the probability that the sta-
t ion will need nine or more calls to find
a winner? (a) \!\That are P [vV33], P[L81], and P[Ngg]?
(b) Let J{ be the number of the day on
3.3.15 In a packet voice communications \vhich you buy your first lottery t icket.
system, a source transmits packets contain- F ind t he P~l[F PK( k).
ing d igitized speech to a receiver. Because (c) F ind the PMF of R, the number of los-
transmission errors occasionally occur, an ing lottery t ickets you have purchased
ackno,vledgment (ACK) or a negative ac- in m days.
kno,vledgment (NAK) is transmitted back
to the source to indicate the status of each ( d) Let D be t he number of t he day on
received packet. \i\!hen the transmitter gets 'vhich you buy your jth losing t icket.
a NAK , t he packet is retransmitted. Voice \i\fhat is PD(d)? Hint: If yo u buy your
packets are delay sensit ive, a nd a packet jth losing ticket on day d, ho\v many
can be transmitted a maximum of d times. losers did you have after d - 1 days?
If a packet transmission is a n independent 3.3.18 The Sixers and the Celtics p lay
Bernoulli trial with success probability p, a best out of five playoff series. The se-
'vhat is the P~l[F of 'I the number of t imes
ries ends as soo n as one of the teams has
a packet is transmitted? won three games. Assume that either team
3.3.16 At Newark a irport, your jet joins a is equally likely to win any game indepen-
line as the tenth jet \vaiting for takeoff. At dently of any other game played. F ind
Ne,vark, takeoffs and landings are synchro- (a) T he P~1IF PN(n) for the total number
nized to the minute. In each one-minute 1'l of games played in the series;


(b) The PlVIF Pw( ?D) for the nu1nber W of

Celt ics 'vins in the series; (a) Draw a graph of the CDF.
( c) The P~1IF PL(l) for the number L of (b) \tVrite Px(1;), the PMF of X.
Celt ics losses in t he series.
3.4.4 Following Example 3.22, sho'v t hat
3.3.19 For a bino1nial random variable K a geometric (p) random variab le J{ has
representing t he number of successes in n, CDF
trials, E~=oPK(k) = 1. u se this fact to

prove the binomial theorem for any a > 0 k < 1,
and b > 0. That is, show t h at FK(k) = { (1-p) lkJ k > 1.

3.4.5 At the One Top Pizza Shop, a pizza

sold has mushrooms 'vith probability p =
2/ 3. On a day in 'vhich 100 pizzas are sold,
3.4.1Discrete random variable Y has the let N equal the number of pizzas sold be-
CDF Fy(y) as shown: fore the first pizza wit h mushrooms is sold.
vVhat is the PMF of N? What is the CDF
of N?
Fy(y) 0.75
0.5 3.4.6 In Problem 3.2.10, find and sketch
0.25 I the CDF of B, the number of bases touched
0 '----'--------- safely during an at-bat.
() I 2 3 4 5 y
3.4. 7 In Proble1n 3.2.6, find and sketch
Use the CDF to find the follo,ving probabil-
the CD F of Y, t he number of points scored
i n a 1 and 1 for p = 1/4, p = 1/ 2, and
(a) P[Y < 1] and P[Y < 1] p = 3/4.
(b) P[Y > 2] and P[Y > 2]
3.4.8 In Problem 3.2.11, find and sketch
( c) P [Y = 3] and P [Y > 3] the CDF of N, the ntunber of atte1npts
(d) Py(y) inade by t he cellular phone for p = 1/ 2.

3.4.2 The random variable X has CDF 3.5.1 Let X have t he uniform P~!IF

(0 x < -1, 0.01 x = 1,2, . . . , 100,

Px(x) = {0
0.2 -1 < x < 0, other,vise.
Fx(x) =
0.7 0 < x < 1,
1 x > 1. (a) F ind a mode of X. If the mode
1; 1nod
is not unique, find the set X 1110c1 of a ll
modes of X.
(a) Dra'v a graph of the CDF.
(b) F ind a median x 111 ec1 of X. If the me-
(b) Write Px(x), the PMF of X. Be sure to
d ian is not unique, find the set X1ne cl of
write t he value of Px(x) for all x from all numbers x t hat are medians of X.
-oo to oo.
3.5.2 It costs 20 cents to receive a photo
3.4.3 The random variable X has CDF
and 30 cents to send a photo from a cell-
0 x < -3, phone. C is the cost of one photo (either
sen t or received). The probability of receiv-
0.4 -3 < x < 5,
Fx(x) = ing a photo is 0.6. The probability sending
0.8 5 < x < 7, a photo is 0.4.
1 x > 7. (a) F ind Pc(c), t he PMF of C.

(b) What is E [CJ, t he expected value of C? (b) \tVhat is P [X > E[X]]?
3.5.3 3.5.11 K is t he geometric (1/11) random
(a) The number of trains.] that arrive at
t he station in t ime t minutes is a Pois- (a) What is P[K = E[K]]?
son random variable '~i th E [.J] = t. > E[ I<]]
(b) \tVhat is P [I<
F ind t such that P[.J > O] = 0.9. (c) W hat is P[K < E[K]]?
(b) The number of buses I< t hat arrive at
t he station in one hour is a Poisson ran- 3.5.12 At a casino, people line up to pay
dom variable w ith E [K] = 10. F ind $20 each to be a contestant in t he fo llow-
P [K = lO]. ing ga1ne: The contestant flips a fair coin
repeated ly. If s he flip s heads 20 t imes in
( c) In a 1 ins interval, the number of hi ts a row, s he walks away w ith R = 20 mil-
Lon a \i\f eb server is a Poisson random lion dollars; other,vise she 'valks away 'vith
variable 'vith expected value E[L] = 2 R = 0 dollars.
hits. What is P [L < 1]?
(a) F ind the Ptv1F of R, t he re,vard earned
3.5.4 You simultaneously flip a pair of fair by t he contestant.
coins. Your friend g ives you one do llar if (b) The casino counts "losing contestants"
both coins come up heads. You repeat this w ho fail to 'vin the 20 million do llar
ten t imes and your friend gives you X dol- prize. Let L equal the number of los-
lars. F ind E [X ], t he expected number of ing contestants before t he first winning
dollars you receive. \tVhat is t he probability contestant. What is t he PMF of L?
t hat you do '\vorse t han average"? (c) Why does t he casino offer t his game?
3.5.5 i\ packet received by your s1nart-
3.5.13 Give examples of practical appli-
phone is error-free 'vith probability 0.95, in-
cations of probability t heory that can be
dependent of any other packet.
inodeled by t he follo,ving PMFs. In each
(a) Out of 10 packets received, let X equal case, state an experiment, t he sample space,
t he number of packets received 'vith er- the range of the random variable, t he Pl\1F
rors. \i\fhat is t he PMF of X? of the random variable , and t he expected
(b) In one hour, your s martphone receives value:
12,000 packets. Let X equal t he num- (a) Bernoulli
ber of packets rece ived with errors. (b) Binomial
\i\fhat is E[X]?
(c) Pascal
3.5.6 F ind t he expected value of t he ran- (d) Poisson
dom variable Y in Problem 3.4.1. l\1ake up yotu o'vn examples. (Don't copy
3.5.7 F ind t he expected value of the ran- examples from the text .)
dom variable X in Problem 3.4.2. 3.5.14 Find P[K < E [K]] when
3.5.8 F ind t he expected value of t he ran- (a) K is geometric (1/3).
dom variable X in Problem 3.4.3. (b) J{ is binomial (6, 1/ 2).
3.5.9 Use Definit ion 3.13 t o calculate the (c) K is Poisson (3) .
expected value of a bino1nial ( 4, 1/ 2) ran- (d) J{ is d iscrete uniform ( 0, 6).
dom variable X.
3.5.15 Suppose you go to a casino wit h ex-
3.5.10 X is the discrete uniform (1, 5) ran- actly $63. At t his casino, t he only game is
dom variable. roulette and t he only bets allo,ved are red
(a) W hat is P [X = E [X]]? and green. The payoff for a w inning bet


is the amount of the bet. In addition, the E[D)? Hint: Consider making a ran-
'vheel is fair so that P [red] = P[green) = dom decision; if the host opens a suit -
1/ 2. You have the follo,ving strategy: F irst, case 'vith i dollars , let ai denote the
you bet $1. If you \Vin the bet, you quit probability that you s\vitch.
and leave the casino 'vith $64. If you lose,
you then bet $2. If yo u w in, you quit and 3.5.17 'Y ou are a contestant on a T V
go home. If you lose, you bet $4. In fact , game show; there are four ident ical-looking
'vhenever you lose, you double your bet un- suitcases containing $100, $200, $400, and
til either yo u \Vin a bet or you lose all of $800. You start the game b y rando1nly
your money. However, as soon as you win choosing a suitcase. Among the three 1ln-
a bet, yo u quit and go home. Let Y equal chosen suitcases, the game sho\v host opens
the amount of money that you take home. the suitcase that holds the median amount
F ind Py(y) and E [Y). \i\10 uld you like to of money. (For example, if the unopened
play this game every day? suitcases contain $100, $400 and $800, the
host opens the $400 suitcase.) The host
3.5.16 In a TV game sho,v, there are three
then asks you if you want to keep your suit-
identical-looking suitcases. T he first suit-
case or switch one of the other remaining
case has 3 do llars, the second has 30 dol- suitcases. For your analysis, use the follo\v-
lars and the third has 300 do llars. You ing notation for events:
start the game by randomly choosing a suit-
case. B et1Deen the t1110 7lnchosen s1litcases, Ci is the event that you choose a suit-
the game show host opens the suitcase \Vi th case \Vith i dollars.
more money. The host then asks you if oi denotes the event that the host
you \Vant to keep your suitcase or S\Vitch opens a suitcase with i dollars.
to t he other remaining suitcase. _After you
make your decision, you open your suitcase R is the reward in dollars that you
and keep the D dollars inside. Should you keep.
switch suitcases? To ans,ver this question,
(a) You refuse t he host 's offer and open the
solve the follo,ving subproblems and use the
suitcase you first chose. Find the PMF
follo\ving notation:
of R and the expected value E[ R].
Ci is the event that you first choose
(b) You ahvays S\vitch and randomly
the suitcase \Vith i dollars.
choose one of the t\vo remaining suit -
oi denotes the event that the host cases \vith equal probability. 'You re-
opens a suitcase \vith i dollars. ceive the R dollars in this chosen suit-
In addit ion, you may w ish to go back and case. Sketch a tree d iagram for t his
review the l\/Ionty Hall problem in Exam- experiment, and find the PMF and ex-
ple 2.4. pected value of R.

(a) Suppose you never s\vitch; you a l,vays (c) Can you do better than either a l,vays
stick w ith your original choice. u se a S\vitching or al\vays staying with your
tree d iagram to find the I>MF Pn(d) original choice? Explain.
and expected value E[D).
3.5.18 'You are a contestant on a TV
(b) Suppose you always switch. u se a tree
game sho,v; there are four ident ical-looking
diagram to find the P lVIF Pn( d) and ex-
suitcases containing $200, $400, $800, and
pected value E [D ).
$1600. You start the game b y randomly
(c) Perhaps your rule for switching should choosing a suitcase. Among the three un-
depend on ho\v many dollars are in the chosen s1litcases, the game sho\v host opens
suitcase that the host opens? \i\That the suitcase that holds t he least money.
is the optimal strategy to maximize The host then asks you if you \Vant to keep


your suitcase or sv;ritch one of the other re- 'veightlifting work. What mass m, in
maining suitcases. For the follo,ving analy- t he range 1 < m, < 100 should she use
sis) use the following notation for events: to maximize her probability of 'vinning
Ci is the event that you choose a suit- t he inoney? For t he best choice of m,
case 'vith i dollars. 'vhat is the probability that s he 'vins
the inoney?
oi denotes the event t hat t he host
opens a suitcase 'vith i dollars. 3.5.22 At t he gym, a weigh tlifter can
R is the re,vard in do llars that you bench press a maximum of 200 kg. For a
keep. mass of m kg, (1 < m, < 200), the max-
imum number of repetitions she can com-
(a) You refuse the host's offer and open the
plete is R, a geometric random variable
suitcase you first chose. F ind the PMF
with expected value 200/1n.
of Rand the expected value E[ R).
(a) In terms of the mass m,, what is the
(b) You switch and randomly choose one
of the t'vo remaining s11itcases. You re- PMF of R?
ceive the R dollars in this chosen suit - (b) \i\fhen she performs one repetition, she
case. Sketch a tree d iagram for this lifts t hem, kg mass a height h = 4/9.8
experiment, and find th.e P lVIF and ex- meters and t hus does work 71J = m,gh =
pected value of R. 4m Joules. For R repetitions , she does
W = 4m,R Joules of 'vork. \i\fhat is
3.5.19 Let binomial random variable X 11 t he expected work E[W) that she w ill
denote the number of successes in n, complete?
Bernoulli trials 'vith success probability p. ( c) A friend offers to pay her 1000 dol-
Prove t hat E[X11 ) = 'np. Hint: Use the fact lars if she can perform 1000 Joules of
that I:~ -~ Pxn_ 1 (1~) = 1. 'veightlift ing 'vork. \i\fhat mass m, in
3.5.20 Prove that if X is a nonnegative the range 1 < 1n < 200 should she use
integer-valued random variable, then to maximize her probability of winning
the money?

E [X) = L p [X > k] . 3.6.1 G iven the rando1n variable Y ln

k= O
Problem 3.4.1, let U = g(Y) = Y 2 .
3.5.21 i\.t the gym, a vveigh tlifter can (a) F ind Pu(11,).
bench press a maximum of 100 kg. l<Dr a (b) Find Fu( 7L).
mass of m, kg, (1 < 1n < 100), the max-
( c) Find E[U).
imum number of repetitions she can com-
p lete is R, a geometric random variab le 3.6.2 Given the random variable X ln
'vith expected value 100/m,. Problem 3.4.2, let V = g(X) = IXI.
(a) In terms of the mass m,, what is the
(a) F ind Pv(v).
P lVIF of R?
(b) Find Fv(v).
(b) W hen she performs one repetition, she
lifts them kg mass a height h = 5 /9 .8 ( c) Find E[V).
meters and thus does 'vork 11; = 1ngh =
51n Joules. For R repetitions, she does 3.6.3 G iven the random variable X ln
W = 5m,R Joules of \vork. \tVhat is Problem 3.4.3, let W = g(X) = -X.
the expected 'vork E[W) that s he 'vill (a) F ind Pw(11;).
(b) Find Fw(11; ).
( c) A friend offers to pay her 1000 dol-
( c) Find E[vV).
lars if s he can perform 1000 Joules of


3.6.4 At a d iscount brokerage, a stock E [M] = l /p = 30 minutes, \vhat is the P lVIF

purchase or sale \Vorth less than $10 ,000 in- of C, t he cost of the phone for one month?
curs a brokerage fee of 1% of the value of
3.6. 7 A professor t r ies to count t he num-
the transaction. A transaction worth more
ber of students attending lecture. For each
than $10,000 incurs a fee of $100 plus 0 .5%
student in the audience, the professor eit her
of the amount exceeding $10,000. Note t hat
counts t he student properly (\vith probabil-
for a fraction of a cent, the brokerage always
ity p) or overlooks (and does not count) the
charges the customer a full penny. You \vish student wit h probability 1 - p. T he exact
to buy 100 shares of a stock whose price D
number of attending st udents is 70.
in dollars has PMF
(a) The number of students counted by

Po(d) = { ~/ 3 d = 99.75, 100, 100.25,

the professor is a random variable N.
What is the PMF of N?
(b) Let U = 70 - N denote the number of
\i\fhat is the P lVIF of C, t he cost of buying uncounted students. \t\fhat is t he P lVIF
the stock (including the brokerage fee)? of U?
3.6.5 A source transmits data packets (c) \t\fhat is the probability that the under-
to a receiver over a radio link. The r&- count U is 2 or more?
ceiver uses error detection to identify pack- (d) For 'vhat value of p does E[U] = 2?
ets that have been corrupted by radio noise.
\i\fhen a packet is received error free, the 3.6.8 A forgetful professor tries to count
receiver sends an ackno\vledg1nent (ACK) the l\1&Ms in a package; ho,veve r, the
back to the source. \i\fhen t he receiver gets professor often loses his place and double
a packet w ith errors, a negative acknowl- counts an l\!I&M. For each l\!I&M in the
edgment (N AK) message is sent back to package, the professor counts the l\1&M and
the source. Each t ime the source receives then, \Vith probability p counts the l\!I&l\!I
a NAK, the packet is retransmitted. Vole again. The exact number of l\II&Ms in the
assume t hat each packet transmission is in- pack is 20.
dependently corrupted b y errors \Vith prob- (a) Find the Pl\l[F of R, the number of
ability q. double-counted M&Ms.
(a) F ind the PlVIF of X , the number of (b) F ind t he Pl\IIF of N, the number of
times that a packet is transmitted b y M&Ms counted by t he professor.
the source.
3.7.1 Starting on day n, = 1, you buy one
(b) Suppose each packet takes 1 mil- lottery t icket each day. Each ticket costs 1
lisecond to transmit and that the dollar and is independently a w inner t hat
source waits an additional millisecond can be cashed for 5 dollars \vith probability
to receive the ackno\vledgment message 0.1; ot herwise t he t icket is worthless Let X n
(ACK or NAK ) before retransmitt ing. equal your net profit after n, days. What is
Let 7., equal the t ime required until the E [X n]?
packet is successfully received. What
3.7.2 For random variable 'J' in Quiz 3.6,
is the relat ionship between T and X?
\i\fhat is t he PMF of T? first find t he expected value E[T] using The-
orem 3.10. Next, find E[T] using Defini-
t ion 3.13.
3.6.6 Suppose that a cellular phone costs
$20 per month w ith 30 m inutes of use in- 3.7 .3 In a certain lottery game, the chance
cluded and that each additional m inute of of getting a vvinning ticket is exactly one
use costs $0.50. If the number of min- in a t housand. Suppose a person buys one
utes yo u use in a month is a geometric t icket each day (except on the leap year day
random variable M \Vith expected value of February 29) over a period of fift y years.

\i\fhat is the expected number E[T] of win- 3.7.B A new cell ular phone billing plan
ning tickets in fifty years? If each win- costs $15 per mont h plus $1 for each minute
ning t icket is 'vorth $1000, what is the ex- of use. If the number of minutes you use
pected amount E[R] collected on these win- the phone in a month is a geometric ran-
ning t ickets? Lastly, if each t icket costs $2, dom variable v1ith expected value l /p, 'vhat
'vhat is your expected net profit E [Q]? is t he expected monthly cost E[ C J of the
phone? For 'vhat values of p is this billing
3.7.4 Suppose an NBA basketball player
plan preferable to the billing plan of Prob-
shooting an uncontested 2-point shot will
lem 3.6.6 and Problem 3.7.7?
make t he basket with probab ility 0.6. How-
ever, if you foul t he shooter, t he shot 'vill be 3.7.9 A particular circuit works if all 10 of
missed, but t'vo free thro,vs will be a'varded. its component devices work. Each circuit
Each free thro'v is an independent Bernoulli is tested before leaving the factory. Each
trial 'vith success probability p. Based on working circuit can be sold for k dollars, but
the expected number of points the shooter each nonworking circuit is worthless and
'vill score, for what values of p may it be mus t be t hrown away. Each circuit can be
desirable to foul the shooter? built with either ordinary devices or ultra-
reliab le devices. An ordinary device has a
3.7.5 It can take up to four days after failure probability of q = 0.1 and costs $1.
you call for service to get yo ur computer An ultrareliable device has a failure proba-
repaired. T he computer company charges bility of q / 2 but costs $3. i\.ssuming device
for repairs according to hov;; long you have failures are independent , s hould you build
to 'vait . The number of days D until the your circuit with ordinary devices or ultra-
service technician arrives and the service reliable devices in order to maximize your
charge C, in dollars, are described by expected profit E[R]? I-\.eep in mind that
your ans,ver 'vill depend on k.
2 3 4
0.4 0.3 0. 1 3.7.10 In the New Jersey state lottery,
each $1 t icket has s ix randomly marked
and numbers out of 1, ... , 46. A ticket is a 'vin-
ner if t he six marked numbers match six
90 for 1-day service,
numbers dra,vn at random at t he end of a
70 for 2-day service, week. For each t icket sold, 50 cents is added
40 for 3-day service, to the pot for the w inners. If there are k
40 for 4-day service. winning t ickets, the pot is d ivided equally
among the k winners. Suppose you bought
a winning t icket in a week in which 2ri tick-
(a) What is t he expected waiting time ets are sold and the pot is n dollars.
,n = E[D]?
(a) \t\lhat is the probability q that a ran-
(b) What is the expected deviation dom ticket will be a winner?
E [D - ,n]?
(b) F ind the P~l[F of Kn, the number of
( c) Express C as a function of D. other (besides your o'vn) winning tick-
(d) What is the expected v alue E [C]? ets.
(c) What is the expected value of Wn, the
3.7.6 True or False: For any random var- prize for your winning ticket?
iable X, E [l / X] = 1/ E [X].
3.7. 11 If there is no winner for the lot-
3.7.7 For t he cell ular phone in Pro~ tery described in Problem 3.7.10, then the
lem 3.6.6, express the monthly cost C as a pot is carried over to the next 'veek. Sup-
function of M, the number of m inutes used. pose t hat in a given 'veek, an r dollar pot
\i\fhat is the expected month ly cost E[C]? is carried over from the previous 'veek and


2n, tickets sold. Ans,ver the following ques- 3.8. 7 Sho\v that the variance of Y
tions. aX +b is Var[Y] = a 2 Var[X].
(a) What is the probability q that a ran- 3.8.8 Given a rando1n variable X 'vi th ex-
dom t icket 'vill be a \Vinner? pected value JJ,x and variance a~ , find the
(b) If you own one of the 2n, tickets sold, expected value and variance of
w hat is the expected value of V, the
value (i.e., the amount you win) of
t hat t icket? Is it ever possible that
E [V] > 1?
3.8.9 In real-time packet data transmis-
( c) S u ppose that in the instant before the sion, the time between successfully received
t icket sales are stopped, you are given packets is called t he interarrival tim,e, and
t he opportunity to buy one of each pos- randomness in packet interarrival t imes is
sible ticket. For what values (if any) of called .fitter. J itter is undes irab le. One
ri and r should you do it? measure of j itter is the standard deviation
of t he packet interarrival time. From Prob-
3.8.1 In an experiment to monitor t\vo lem 3.6.5 , calculate the j itter ar. How large
packets, the PI\l[F of N, the number of video must the successful transmission probabil-
packets, is it y q be to ensure that the jitter is less than
2 milliseconds?
1 2
0.7 0.1 3.8.10 Random variable K has a Pois-
son (a) distribution. Derive the proper-
F ind E [N], E[N2], Var[J\T], and a N . t ies E[K] = Var [K] = a. Hint : E[K2] =
3.8.2 F ind the variance of the random var- E[K(I< - 1)] + E[I<].
iable Yin Problem 3.4.l. 3.8.11 For t he delay D in Problem 3.7.5,
what is the standard deviat ion an of t he
3.8.3 F ind the variance of the random var-
wait ing time?
iable X in Problem 3.4.2.
3.9.1 Let X be t he binomial (100, 1/ 2)
3.8.4 F ind the variance of the random var-
random variable. Let E2 denote the event
iable X in Problem 3.4.3.
that Xis a perfect square. Calculate P[E2].
3.8.5 Let X have t he bino1nial PI\l[F
3.9.2 Write a MATLAB function

(~)(1/2) 4
x=s hipwe ight8 (m) that produces m ran-
Px(x) = dom sample values of the package \veight
X with PI\l[F given in Example 3.27.
3.9.3 u se the unique function to \vrite
(a) F ind the standard deviation of X.
a lVIATLAB script s hip cos tpmf . m that out-
(b) What is P[x - ax< X < JJ,x +ax], puts the pair of vectors sy and py repre-
t he probability t hat X is w ithin one senting t he PMF Py(y) of the shipping cost
standard dev iation of the expected Y in Example 3.27.
3.9.4 For m = 10, m, = 100, and m =
3.8.6 X is the b inomia l (5, 0.5) random 1000, use I\IIATLAB to find the average cost
variable. of sending m, packages using the model of
Example 3.27. Your program input should
(a) F ind the standard deviation of X. have the number of trials m, as t he input.
(b) F ind 1=>[1),x - ax < X < JJ,x + ax], the The output should be Y = -: I::n 1 Yi,
probability that X is \vithin one stan- where Yi is the cost of the i th package. As
dard deviation of the expected value. m becomes large, 'vhat do you observe?


3.9.5 The Zipf (ri, n = ]_) random var- ri = 10,000. How large should n, be to have
iable X introduced in Problem 3.3.12 is of- reasonable agreement?
ten used to inodel the "popularity" of a col-
3.9.7 Test t he convergence of Theo-
lection of n objects. For example, a Web
rem 3.8. l<Dr n = 10, plot the PMF of Kn
server can deliver one of n Web pages. The
for (n,,p) = (10, 1), (n,,p) = (100, 0. 1), and
pages are numbered such t hat the page 1
(n,p) = (1000,0.01) and compare each re-
is the most requested page , page 2 is the
sult \vith the Poisson (n) PMF.
second most requested page, and so on. If
page k is requested, then X = k. 3.9.8 use
the result of Problem 3.4.4
To reduce external net\vork traffic, an a nd the Random Sample Algorithm on
ISP gateway caches copies of the k most Page 102 to write a l\IIATLAB func-
popular pages. Calculate, as a function of t ion k=geometricrv (p, m) that generates m,
n for 1 < n, < 1000, ho'v large k must be samples of a geometric (p) random variable.
to ensure that the cache can deliver a page
\Vith probability 0.75. 3.9.9 Find n*, the smallest value of ri
for which the function poissonpmf (n,n)
sho,vn in Example 3.37 reports an error.
3.9.6 Generate n independent samples of What is t he source of the error? \i\frite
the Poisson (5) random variable Y. For a function bigpoissonpmf (alpha,n) that
each y E Sy, let n,(y) denote the num- calculates poissonpmf(n,n) for values of n,
ber of times that y was observed. T hus much larger than n,* . Hint: For a Poisson
l :yESy 'n(y) = n, and the re]ative frequency (n) random variable K,
of y is R(y) = n,(y) / n,. Compare the rela-

~ ln(i)).
tive frequency of y against Py(y) b y plot-
ting R(y) and Py(y) on the same grap h as PK(k) =exp (-a+ kln(n) -
functions of y for n = 100, n, = 1000 and

Continuous Random Variables

4.1 Continuous Sample Space

A randorr1 variable X is co'ntin,11,o'us if t he range S x consist s of one

or rnore inter vals. For eacr1 x; ES x ) P[X = x;] = 0.

Until novv) vie have studied discrete ran dorn variables . By defir1it ion, t r1e range of
a discrete random variable is a countable set of nurnbers . This chapter ar1al}rzes
randorn variables t hat ra nge over contint1ous sets of nl1mbers. A cont ir1t1ous set
of nurnbers, sornetirr1es referred t o as an iriter'val) contair1s all of t he real numbers
between tvvo lirnits. Marl}' experirnents lead t o ra ndorn variables vvit h a rar1ge
that is a continuous interval. Exarr1ples include rr1easl1ring T ) t he arrival t irr1e of a
particle (Sr = { tl O < t < oo} ); rneasl1ring v) t he volt age across a resistor (Sv =
{vi - oo < v < oo}); and rr1east1ring the phase a ngle A of a sinusoidal radio v.rave
(SA = {alO <a< 211} ). We will call T , V) arid A coritin,uous raridorn variables)
although we will defer a for rr1al definit ion t1nt il Sectior1 4.2.
Consistent vvith the axiorns of probability) vve assign nl1rnbers betvveen zero and
one t o all events (sets of elernents) in the sarnple space. A distinguishir1g feature of
t he models of contintlOllS randorn variables is that t he probabilit}' of eacr1 individt1al
outcorne is zero! To understand t his int uitively, consider an experirr1er1t in v.rhicr1
t he observation is the arrival t irne of t he professor at a class. Asst1me this professor
alvvays a rrives between 8 :55 a n d 9:05. We rnodel the a rrival t ime as a ra ndorn
variable T rninutes relative t o 9:00 o)clock. Therefore) Sr = {ti - 5 < t < 5}. Think
abol1t predictir1g t he professor)s arrival t ime. The rnore precise t he prediction, the
lov.rer t he ch ance it will be correct . For example) yot1 migh t gt1ess t r1e interval
- 1 < T < 1 rninute (8 :59 to 9:01 ). Your probability of being correct is higher
t h an if you gt1ess - 0.5 < T < 0.5 rninl1te (8 :59:30 t o 9:00:30). As }'Ollr prediction
becomes rr1ore a nd rr1ore p recise, the probabilit}' that it will be correct get s srr1aller
and srnaller. T r1e cr1ance t hat t he professor will arrive v.rithin a ferr1t osecond of 9:00


is microscopically srnall (on t he order of 10 - 15 ) , and t he proba bility of a precise
9:00 arrival is zero.
One wa}' to t 11ink a bout cont inl10t1s r a ndorn variables is t 11at t h e arnov,n,t of
probabihty in an interval gets srr1aller and sm aller as t he ir1terval shrinks. This is
like t 11e m ass in a contir1uous volume . Even t hough an}' finite volume h as sorr1e
rnass, t h er e is no rnass at a single point. In physics, a n al}rze t his sit u ation by
r eferring to densit ies of m atter. Sirnilarl}' , vve refer t o probability den,sity f un,cti on,s
t o describe probabilit ies r elated t o cor1t inuous r andom variables. T11e next section
introduces these ideas forrnally b}' describing an experirnent in vvhich t 11e sample
space cor1tair1s all r1urnber s bet vveen zero and one .
In rnany practical applicatior1s of probability, vve encour1ter uniforrn r andorr1 vari-
ables. The range of a uniforrr1 randorr1 variable is an inter V'a.l v.rit h finite lirr1its . The
probability rnodel of a ur1iform r andorn v ariable stat es t h at any t ir1ter vals of
equal size vvithin the r ange h ave eql1al probability. To introduce rr1any concepts of
continl1ous randorn variables, vvill refer frequently to a uniforrn rar1dorn variable
v.rith limits 0 and 1. }v'Iost cornputer lar1gl1ages include a r andorn number gener a-
tor. In l\IIAT LAB, t11is is t 11e rand function introduced in Cl1apter 1. These r andorn
nl1rnber generators produce a sequer1ce of pseudorar1dom nt1mbers t hat approxirr1ate
the propert ies of outcornes of r epeat ed trials of an experirner1t v.rith a probability
rnodel that is a continuous uniform r ar1dom variable.
In t he follov.ring exarr1ple, \Ve examine this randorr1 v ariable by defining an ex-
p erirnen t in vvl1ich t he procedure is t o spin a pointer in a circle of circ11mferen ce
one rnet er. T 11is model is v er}' sirr1ilar t o t he model of t he phase angle of t he sigr1al
that arrives at t he r adio r eceiver of a cellular telephone. Ir1stead of a pointer \vit l1
stopping points that can b e an}rvvhere bet weer1 0 and 1 rnet er , the phase angle can
h a,re any \ralue bet ween 0 an d 27r r adians . By referring to the spir1nir1g pointer
in t 11e examples in t 11is cl1apter , vve arri,re at rr1atl1ernatical expressions t hat illus-
trate t he rnain properties of continuous randorn variables. T he forrr1l1las that arise
frorn analyzing phase ar1gles in cornrnunications engineering rnodels ha,re factors of
27r that do not appear ir1 the ex arr1ples in t h is chapter. Exarnple 4.1 defines t h e
sarnple sp ace of the pointer experiment and demonstrates that all outcorr1es h a\re
probabilit}' zero.

- - - Example 4.1- - -
Suppose we have a whee l of c ircumference one meter and we mark a point on the
perimeter at the top of the wheel. In the center of the wheel is a rad ia I pointer
that we spin . After spinn ing the pointer, we measure the d istance, X meters, around
the circumfe rence of the wheel going clockwise from the marked point to the pointer
position as shown in Figure 4 .1. Clearly, 0 < X < l. Also, it is reasonable to be lieve
that if the spin is ha rd enough, the pointer is just as Ii kely to arrive at any pa rt of the
circle as at any other . For a given x; , what is the probabil ity P[X = x;] ?

Th is problem is surprisingly difficult . However, g iven that we have developed methods

for discrete random variables in Chapter 3, a reasonable approach is to find a discrete
approximation to X . As shown on the right side of Figure 4 .1, we can m ark the
perimeter with n, equal-length arcs numbered 1 to n, and let y denote the nu m ber



Figure 4.1 T he random pointer on disk of circurnference 1.

o f the arc in w hi ch the po inter stops. Y is a discrete random variable w ith range
S y = {1 , 2, ... , ri} . Since al l parts of the wheel are equally li kely, all arcs have the
same probab ility. Thus the PMF of y is

l /n, y = 1, 2, ... ,n,,

Py (y) = (4.1 )
0 otherwise.

From t he whee l on the r ight side of Fig ure 4.1, we can deduce that if X = x;, then
Y = jn,x;l , where t he notatio n Ial is defined as t he sma I lest in teger greate r tha n or
equal to a. Note tha t t he event { X = x} c {Y = jn,xl} , wh ich impl ies that

P [X = x;] < P ['Y = In,x l ] = l . (4.2)

We observe th is is true no m atter how f inely we d ivide up t he wheel. To find P [X = x),
we consider larger and larger va lues of r1,. As n, increases, the arcs on the circ le decrease
in size, approaching a single po int . The probab ili ty of t he pointer arriving in any
pa rticular arc decreases unti l we have in the lim it,

P [X = x) < lirr1 P [Y = Frixl ) = lirr1

n-+oo ri
= 0. (4.3)

Th is demonstrates that P [X = x] < 0. The f irst axiom of probab ility states th at

P[X = x;) > 0 . Therefo re , P [X = x) = 0. This is true regardless of t he outcome, x . It
fo llows that every outcome has probabil ity zero.

Just as in t 11e disct1ssion of t 11e professor arriving in class , sirnilar reasoning can
be applied to other experiment s to sho-vv t hat for any continuous r ar1dom variable,
t he probab ility of any ir1dividual outcome is zero. This is a fundamentally different
situation t han the one -vve er1countered in Ollr stl1dy of discret e r andorn variables.
Clearl}' a probabilit y m ass function defined in t errr1s of probabilities of individual
outcornes has no rr1eaning in this context. For a cor1t inuous randorr1 variable, t he
ir1teresting probabilities apply to intervals.


4.2 The Cumulative Distribution Function

The CDF Fx(x) is a probability model for any randorri variable.

The CDF Fx(x) is a continuous f\1nction if and orily if X is a
cor1tinuous randorr1 variable.
Exarnple 4.1 that "'' hen X is a continuous randorri variable, P (X = x] = 0
for x E Sx. This irnplies t hat vvhen X is contiriuol1s, it is irnpossible to define a
probabilit}' rnass furiction Px( ;i;) . Ori the other liand, vve vvill see that the curnulative
distribution ft1nction , Fx( x) in Definition 3.10, is a very usef\11 probabilit}' rnodel
for a contiriuous r andorn \rariable. vve repeat tlie definition here.

Definition 4.1 Cumulative Distribution Function (CDF)

The cumulative distribution f unction (GDF) of ran,dorn variable X is

Fx (::c) = P (X < J";].

The ke}' properties of the CDF , described in Theorerri 3.2 and Theorem 3.3, apply
to all ra ndorri \rariables. Graphs of all curriulative distribution funct ioris start at
zero on tlie left arid end at orie on tlie right. All are nondecreasing, a.nd, rnost irri-
portaritly, the probability that the r andorri variable is in an iriterval is the difference
in t he CDF eva1t1ated at the erids of the interval.

.----Theorem 4.Ir----
For ariy ran,dorn variable X;
{a) Fx(- oo) = 0 {b) Fx(oo) = 1
{c) P [x1 < X < x;2] = Fx(x2) - Fx(x1)

Although tliese proper t ies apply to any CDF, tliere is one irriporta.n t differerice
betvveeri the CDF of a discrete randorri variable arid the CDF of a cont inuous
r aridorri \rariable. R ecall that for a discrete r aridorri variable X, Fx(J";) lias zero
slope e\reryv.rhere except at values of x; wit h nonzero probability. At these poirits,
the function h as a discor1tiriuity in tlie forrn of a jl1rrip of rnagnitude Px(x) . By
contrast, the defining property of a cont iriuous random \rariable X is that Fx(x) is
a coritinuous function of X.

Definition 4.2 Continuous Random Variable

X is a continuous random variable if the CD FFx( x;) is a con,t'iriv,ov,s fv,n,ction,.

-= Example 4.2
In the whee l-spinni ng expe riment of Exa m ple 4.1 , f ind the CD F of X.


We begin by observing that any outcome x E Bx = (0, 1). This imp lies tha t Fx(x) = 0
for x < 0 , and Fx(:i;) = 1 fo r ;i; > 1 . To find the CD F for x between 0 and 1 we consider
the even t { X < ;i; }, with x; growi ng f rom 0 to 1. Each event cor responds to an arc
on the circle in Figure 4 .1. The arc is small when x ~ 0 and it includes nea r ly the
w hole circle when x; ~ 1. Fx(x) = P [X < x; ] is the probabi lity that the pointer stops
somewhere in the arc. T his probabil ity grows f rom 0 t o 1 as the arc increases to inc lude
the whole circle. Given our assumptio n that t he poin t er has no preferred stopping
places, it is reaso nable to expect the probabil ity to grow in proportion to the fraction
of the circ le occupied by the arc X < x . T his fract ion is simp ly x; . To be more formal ,
we can refer to Figure 4.1 and note that with the circle divided into n, arcs,

{ Y < frix; l - 1} C { X < ;i;} C {Y < fn,xl } . (4.4)

Therefore, t he pro babi lities of the three events are related by


Note that Y is a discrete random variab le with CDF

0 y < 0,
Fy(y) = k/n, (k - 1)/n, <y<k/n,,k =l , 2, ... ,n,, (4. 6)
1 y > 1.

Thus for x E (0, 1) and for al l n,, we have

- 1
- - - < Fx(x) <
rT/,X l
. (4.7)
'n, 1i

In Prob lem 4. 2 .3 , we ask the reader t o verify that lirr1n-+CX) fn,x l /n, = x. Th is imp lies
that as ri ---+ oo, both f ractio ns approach x . T he CDF of X is
Fx(x;) 0 x < 0,
0.5 Fx(x;) = x O <x;< l , (4.8)
0 1 x > 1.
0 0 .5 1 x

Quiz 4.2
The curn ulative d istribt1tion function of t11e r an dorr1 variable Y is

0 y < 0,
Fy(y) = y/4 O <y<4, (4.9)
1 y > 4.
Sketch t he CDF of y a nd calculat e t he follov.rir1g probabilit ies:



Figure 4.2 T he graph of an arbitrary CDF Fx(x) .

(a) P [Y < - 1] (b) P [Y < 1]

(c) P[2 <Y< 3] (d) P[Y > 1.5]

4.3 Probability Density Function

Like the CDF , the PDF f'x( x) is a probability model for a contin-
uo11s randorn variable X. fx(x;) is the derivative of the CDF. It is
proportional to the probability that X is close to x .
The slope of t he CDF contains t11e rr1ost interesting ir1forrr1ation about a contir1-
uous r ar1dorr1 variable. T l1e slope at an:y point x indicates t he probability that X
is n,ear :i;. To understand this ir1tuit ively, consider t he graph of a CDF Fx(:i;) given
in Figure 4.2. Theorern 4.l (c) states that the probability that Xis in t11e interval
of vvidt11 ~ t o the right of x 1 is


Note ir1 Fig11re 4.2 that this is less t h an t he probability of the interval of widt11 ~
to t he right of x2 ,


The comparison rr1akes sense because both intervals 11ave t11e sarr1e length. If vve
r edt1ce ~ to focus ot1r attent ion on outcornes nearer and nearer to x 1 and x2, bot11
probabilities get sm aller. Hovvever, their relative values still depend on t11e aver age
slope of Fx(x) at the two points. This is apparent if rewrite Eq11atior1 (4.10) ir1
the forrn


Here t11e fraction on t he right side is the average slope, and Equation (4.12) states
that t he probability t hat a, randorn variable is in a n interval near x 1 is the average


f 'x (x)

-5 5

Fig ure 4.3 T h e PDF of t he n1od en1 receiver voltage X.

slope O\rer the interval t irnes the length of the ir1terva.l. B:y definition, t he limit of
the a\rerage slope as .6. - + 0 is t he deri,rative of Fx( ;r;) e\raluat ed at ~D 1 .
We conclude from t he discussion leading t o Equatior1 (4.12) that the slope of t he
CDF in a region near any nurnber x; is ar1 indicator of t he probabilit y of observing
the random variable X near x; . Just as t11e arr1ount of rnatter in a small volume is
the density of the matter t imes t he size of volurne, t he arr1ount of probabilit:y in a
srr1all region is the slope of the CDF t irnes the size of t he region. This leads t o the
t erm probability derisity, d efir1ed as t he slope of t he CD F.

- - - Definition 4.3 Probability Density Function (PDF)

T he probability den sity function (PDF) of a co'ntin,?J,O'US ran,dorn variable X is

j .x (x. ) = dFx(x) .

This definition displays the cor1\rer1t ional notation for a PDF. T he narr1e of t11e
function is a lowercase f' \x.rith a subscript that is the narne of t11e randorr1 variable.
As wit h t he PMF and t he CDF , the argt1rr1er1t is a dl1mrrl}' variable: f x (x), f x(v,),
and f 'x(- ) are all t 11e sarr1e P DF.
The P DF is a complet e probabilit y rnodel of a cor1t int1ous ra ndorr1 \ra riable.
vV11ile t11ere are other ft1n ctions that also provide cornplet e rr1odels (the CDF an d
the mornent generat ing f\1nct ion that vie study in C11apter 9), the PDF is t he rnost
t1seful. One reasor1 for t11is is t 11at the graph of t he P DF provides a good indication
of the likely values of observations.

- =- Example 4.3
Figure 4.3 depicts th e PDF of a rando m varia ble X that descri bes t he vo ltage at th e
rece iver in a modem. W hat are proba ble va lues of X?

Note that th e re a re t wo places where the PDF ha s high val ues an d th at it is low
elsewhe re . Th e PDF ind icates that the random variabl e is likely t o be nea r - 5 V
(correspo ndi ng t o the symbol 0 transmitted ) a nd near + 5 V (correspo ndin g t o a 1
tra nsmitted) . Va lues far from 5 V (due t o strong disto rtion ) are possible but m uc h
less li kely.

Another reason the PDF is t he most useful probabilit y model is t h at it pla}'S a



key role in calct1lating t11e expected value of a cont inuous randorri variable, t he
subject of the next section. Irnportarit properties of the PDF follovv directl}' frorn
Definition 4.3 arid t he properties of the CDF.

---- Theorem 4.2----

F or a con,tin,u ov,s ran,dorn variable X v1ith P DF f'x( x;))

{a) f x(x) > 0 for all x, {b) Fx (x) = 1'= J x(u) du,

{c) 1: Jx (x) dx = 1.

Proof The first statem en t is t rue because Fx(:i;) is a nondecr easing function of x and
t herefore its d erivative, f x(.r,), is nonnegative. T he second fact follo,vs directly from t he
d efinit ion of fx(x) and t he fact t hat Fx(-oo) = 0. The t hird statement follows from t he
second one and Theorem 4.1 (b) .

Given t hese proper t ies of t lie PDF , vie can pro\re the next t heorerri, vvhich relates
t he PDF to t he probabilit ies of events .

=== Theorem 4.3

P [x1 < X < x2] = ~"' fx (x) dx .

Proof From Theorem 4.l (c) and Theorem 4.2(b),

P [:i;1 <X<1;2] = Fx(x2) - Fx(x1)

X2 /_Xl ( X2
= /_- oo f x(x) dx - - oo fx(:i;) dx = l xi fx(x) dx . (4.13)

Theorern 4.3 states that t he probabilit}' of observing X in an interval is the area

under t he PDF graph bet-vveen t he tvvo end poirits of t lie interval. This property of
t he PDF is depicted in F igure 4.4. Theorern 4.2 (c) states that t lie area urider t he
entire PDF graph is one. Note t hat the \ralue of the PDF can be any noririegati,re
nurriber. It is not a prob abilit}' arid need not be betvveen zero and one. To gain
further insight into t he PDF , it is instructi,re to reconsider Equation (4.12 ). For
very srriall values of D. , t he right side of Equation (4.12) a pproximatel}' equals
f x (x 1 )D.. W hen D. becorries the infinitesirnal dx;, vve h a\re
P [x < X < x; + dx] = f x (x) dx; . (4.14)

Eq11ation (4.1 4) is t1seft1l beca11se it per rriits t1s to iriterpret the integral of Theo-
rem 4.3 as t he limiting case of a sum of probabilit ies of events {x < X < x + dx;} .


f'x (x)

Figure 4.4 1,he PDF and CDF of X .

~= Example 4.4
For the experiment in Examples 4.1 and 4.2, find t he P DF of X and the probabil ity of
t he event {1/ 4 < X < 3/ 4}.
T aking the derivative of the CD F in Equation (4.8), f'x( x) = 0 whe n x; < 0 or x >l.
For x; between 0 and 1 we have f'x(x;) = dFx(x)/dx; = 1. T hus the PDF of X is

~ 0.5 1 O <x< l.
fx(x) =
- I
(4.1 5)
0 otherwise.
0 0.5 I
T he fact t hat the PDF is co nstant over the range of possible va lu es of X reflects the
fact that the pointe r has no favorite stopp ing places on t he circumfere nce of t he circle.
T o find the probability that X is between 1/ 4 and 3 / 4, we can use either T heorem 4 .1
or T heorem 4 .3. T hus

P [1 / 4 < X < 3/ 4) = Fx(3/ 4) - Fx( l / 4) = 1/ 2, (4.16)

and equivalently,
3/4 f,3/4
P [1/ 4 < X < 3/ 4) =
f,1/4 f'x (x) dx; =
dx = 1/ 2. (4.17)

W hen t he PDF and CD F are b ot h knovvn , it is easier to llSe t11e CDF to find the
probabilit}' of an interval. Hovvever , in m an}' cases vve begin \vit h t he PDF , in \vhich
case it is l1sually easiest to use T 11eorerr1 4.3 directly . The alterr1ati\re is t o find the
CDF explicitl}' b}' rr1eans of T heorerr1 4.2 (b) and t hen t o use T heorerr1 4.1.

Example 4.5
Consider an experiment that consists of spinn ing the pointer in Examp le 4.1 three times
and observing Y meters, the maximum val ue of X in the t hree spins. In Example 8.3,
we show that the CDF of Y is
Fy(y) 0 y < 0,
Fy(y) = y3 0 < y < 1, (4.18)
1 y > 1.
0 0.5 I y


Find the PDF of Y and the probabi lity that Y is between 1/ 4 and 3/ 4.
We apply Definition 4.3 to the CD F Fy(y) . When Fy(y) is piecewise differentiable, we
take the derivative of each piece:

fy(y) 2
jy(y) = dFy(y) 3y 2 0 < y < 1,
I (4.19)
0 .....___ _..::;......_"'--'
dy 0 otherwise.
0 0.5 1 y
Note that the PDF has values between 0 and 3. Its integral between any pair of
numbers is less than or equal to 1. T he graph of fy(y) shows that there is a higher
probability of finding Y at the right side of the range of possible values than at the left
side. This reflects the fact that the maxim um of three spins produces higher numbers
than individual spins. Either Theorem 4.1 or T heorem 4.3 can be used to ca lculate the
probability of observing Y between 1/ 4 and 3/ 4:

P [1 / 4 < Y < 3/ 4J = Fy(3 / 4) - Fy(l / 4) = (3/ 4) 3 - (1/4) 3 = 13/ 32, (4.20)

and equ ivalently ,
3/4 f,3/4
p [1/ 4 < y < 3/ 4J = f,1/ 4 fy(y) dy = 3y 2 dy = 13/ 32. (4.21 )

Note that this probability is less than 1/ 2, which is the probability of 1/ 4 < X < 3/ 4
calculated in Example 4.4 for one spin of the pointer.

vVr1en we work with contir1uous r andorn variables, it is t1sually riot necessary to be

precise a.bout specifying vvr1ether or not a r ange of n11mbers includes the endpoir1ts.
This is because individua l nurnbers h ave probability zero. In Exarnple 4.2 , there
are four different ever1ts d efir1ed by tr1e v.rords X is bet111een, 1/ 4 arid 3/4:

A= {1 / 4 < X < 3/ 4} , B = {l / 4<X<3/ 4} ,

c = {1 / 4 < x < 3/ 4} ' D = {1/ 4 < X < 3/4} .
vVhile they ar e all different events, they all have the sam e probability because tr1ey
differ only in vvhether they include {X = 1/ 4} , {X = 3/ 4} , or both. Since these
two events zero probability, their inclusion or exclusion does not a ffect tr1e
probability of the range of nt1rr1bers . This is qt1ite different from tr1e sitt1ation vve
en cour1ter witr1 discrete r andom variables. For example , suppose randorr1 variable
X has PMF
1/ 6 x = l / 4,x = 1/ 2,
Px (:i;) = 2/ 3 x = 3/ 4, (4.22)
0 otherwise.

For this rar1dom variable X , the probabilities of tr1e fot1r set s

P [AJ = 1/ 6, P [BJ = 5/6, P [CJ = 1/ 3, P [DJ = l.



So vve see t hat t he nature of a r1 inequality ir1 t h e definit ion of an event does not
affect the probability v.rhen we examine cont in11ot1s r andom v ariables. \i\Tit l1 discret e
r andorr1 v ariables, it is critically irr1portant t o exarnine t 11e inequality carefull:y.
If we compare other cl1ar acteristics of discrete and contin11ous randorn variables,
"''e find t 11at v.rith discret e r andorn v ariables, rnany facts are expressed as surns. W ith
continuous randorn variables, the correspondir1g fact s are expressed as integrals. For
exarr1ple, when X is discrete,

P(B ] = L Px(x;) . (Theorern 3.l (c) )


W hen X is contin11ous and B = (x; 1 , x; 2 ),

P(x;1 < X < x2) = 1,'" Jx(x) dx .

.,, 1
(Theorern 4.3)

Quiz 4.3
Rar1dorn variable X has probabilit}' der1sity fur1ctior1

x > o.
f'x (x) = - I
0 other\vise.

Sket ch t 11e PDF and find the follovving:

(a) t he const ant c (b ) t h e CDF Fx(x; )
(c) P(O < X < 4) (d) p (- 2 < x < 2]

4.4 Expected Values

Like the expected value of a discrete randorn variable, the expected

valt1e, E[X], of a continuot1s r andorn variable X is a t ypical valt1e
of X. It is an importa nt property of the probability model of X.
T11e prirnary reason that randorr1 v ariables are useful is that t11ey perrnit llS to
cornpute averages. For a discret e r andorr1 v ariable Y , t11e expect ed value,

E (Y] = L y,i, pY (Yi) ' (4.24)

YiE Sy

is a surn of t he possible values Yi, each multiplied by its probability. For a cont inuous
r andorr1 variable X , this definit ion is inadeq11ate beca11se all possible values of X
ha;ve probabilit}' zero. Ho"'ivever, "''e can develop a definit ion for the expect ed value


of the contint1ous ra.ndorn variable X by examining a. discrete approximation of X.

For a srn all ~ ' let


"'' here the notatior1 la J denotes t he largest integer less thar1 or equal to a. Y is an
approxirnation to X in that Y = k~ if arid or1l:y if k~ < X < k~ + ~. Since t he
r ange of Y is Sy = {... , - ~ , 0, ~ ' 2~ , ... } , the expected val11e is
00 00

E[Y] = L k~P[Y = k~] = L k~P[k~ < X <k~+~]. (4.26)

k = -oo k = -oo

As~ approaches zero and the intervals under consideration groV\r srr1aller , Y more
closely approximat es X. Furtherrr1ore, P [k~ < X < k ~ + ~] approaches f x(k ~)~
so that for srn all ~ '

E[X] ~ L k~fx(k~)~. (4.27)

k = -oo

In the lirr1it as ~ goes to zero, t11e surn converges to the integr al in Definit ion 4.4.

Definition 4.4- - - Expected Value

The expect ed v alue of a con,tin,v,ov,s ran,dorn variable X 'is

E [X] = J: x f x(x) dx .

When we consider Y, the discrete approxirr1at ion of X, t11e int11it ion dev eloped
in Section 3.5 st1ggests tl1at E[Y] is "'' h at "'' e w ill observe if "'' e add up a v ery
large r1urr1ber ri, of ir1depender1t observations of Y and divide by ri,. This sarr1e
intuition holds for t he cor1t inuous random variable X. As ri, --+ oo , the a;verage
of ri, independent sarr1ples of X "''ill approach E [X]. In probabilit}' theory, this
observation is kr10-vvn as t11e La'tJJ of Large Nv,rnbers, Theorem 10.6.

Example 4.6>---===
In Example 4.4, we fo un d t hat t he st opping point X of t he spi nning wheel experi ment
was a uniform rando m variab le with PD F

1 O <x;< l ,
f x (x) = (4.28)
0 otherw ise.

Find t he expected stopp ing po int E [X ] of the pointer.

E[X] = f-oo

x;fx(x) rlx = f
xrlx; = 1/ 2 meter. (4.29)


W ith no preferred stopping points on the circ le, t he average stopp ing poin t of the
pointer is exactly halfway arou nd t he circle.

c:::== Example 4. 7
In Example 4.5, find the expected va lue of the maximum stopping point Y of the three

E [Y) = f
- oo
00 y j'y (y) dy = f 1 y(3y 2 ) dy =
3/ 4 meter. (4.30)

Corresponding to ft1nctions of discrete randorr1 variables described in Section 3.6,

"''e have functions g(X) of a cont int1ous r ar1dom v ar iable X. A functior1 of a cor1-
tinuous r andom variable is also a r andorn variable; howe\rer , t 11is r andorn \rariable
is not r1ecessarily continuo us!

Example 4.8
Let X be a un iform random variable w ith PDF

1 0 <1'; < 1,
f x (x) = (4.31 )
0 otherwise.

Let VT! = g(X ) = 0 if X < 1/ 2, and W = g(X) = 1 if X > 1/ 2. vT! is a discrete

rando m variable with range S w = {O, 1} .

R egardless of t he nature of the r andom variable W = g(X) , its expected value

can be calculated b:y ar1 integr al t hat is ar1alogous t o t 11e surr1 in Theorern 3.10 for
discrete random variables .

- - - Theorem 4.4
T he expected 'ualv,e of a fv,n,ction,, g(X ), of raridorn variable X is

E [g(X )] = ; : g(x) f x (x) dx .

]\/Iar1y of the properties of expect ed va1t1es of discrete random variables also apply
t o cont inuous randorn \rariables . Definition 3.15 and Theorerr1s 3.11 , 3.12, 3.14, and
3. 15 apply to all randorr1 variables. All of t hese relationsl1ips are V\rritten in terrr1s of
expected values in t11e follov.rir1g t heorerr1, vvhere we use bot h notations for expected
value, E [X] a nd ,x, t o rnake the expressions clear arid cor1cise.

- - - Theorem 4.5
F or ariy raridorn variable X ,
(a) E [X - ,x ] = 0, {b) E[aX + b) = a E[X) + b,


(c) v ar [X] = E [X 2 ] - ,'i; ( d) Var[aX + b] = a 2 Var[X].

The rr1ethod of calcl1lating expected vall1es dep ends on t he t:ype of r ar1dom var-
iable, discret e or cont inuous . T 11eorerr1 4.4 stat es t hat E [X 2 ] , the mean square value
of X, and v ar[X] are t he integr als

Var[X] = J: ( 2
x - /J, x) f"x (a:;) dx . (4.32)

Ol1r interpretation of exp ected v alues of discr et e r an dorn variables carries over t o
cont inl1ous random variables. First , E[ X ] r epr esents a t ypical value of X , a n d
t he variar1ce describes the dispersion of outcornes relative to t11e expected value.
Second, E [X] is a best gt1ess for X in the sense t hat it minirnizes t11e rr1ean square
error (MSE ) and Var [X ] is the 11SE associat ed v.rit h the guess . Furt her rr1ore, if "''e
vievv t h e PDF f x( x) as t h e density of a rnass distributed or1 a line, t hen E [ X ] is
t he center of rr1ass.

- - - Example 4.91- - -
Fi nd the variance and standard deviatio n of t he pointer position in Exa mple 4.1.

To compute Var[X ]. we use T heo rem 4 .5(c) : Var [X ] = E [X 2 ] - fJ,'i . We ca lculate

E [X 2 ] direct ly from T heorem 4 .4 with g(X) = X 2 :

E [X 2 ] = f

f x(x) dx =
x; 2 dx; = 1/3 m 2 . (4.33)

In Example 4.6, we have E[X = 1/2. T hus Var[X] = 1/3 - (1/ 2) 2 = 1/ 12, a nd the
standard deviation is O'x = v ar[X] = 1/ vTI = 0.289 meters.

Example 4.10

Find the variance and standard dev iation of y , t he maximu m po inter position aft er
three sp ins , in Examp le 4.5.

We proceed as in Examp le 4 .9. We have fy(y) from Exa mple 4 .5 and E['Y] = 3/ 4
from Exa mp le 4.7:


T hus the variance is

v~1r [Y] = 3I 5 - (3I 4) 2 = 3 / 80 m2 , (4.35)

and the sta ndard deviation is O' y = 0.194 meters.



Quiz 4.4
The probability density ft1nction of the randorr1 variable Y is

3y 2 I 2 -1 < y < 1 ,
f y(y) = (4.36)
0 other\vise.

Sketch t he P DF an d find t he follovving:

(a) t he expect ed val11e E [Y] (b) t he second mornent E (Y2 )

( c) the variar1ce Var [Y] (d) t he standard deviation O" y

4.5 Families of Continuous Random Variables

The families of continuous ur1iforrn rar1dom variables, exponent ial

random variables, a.rid Erlang randorn v ariables related t o t he
families of discrete ur1iforrr1 randorr1 variables, geometric randorr1
variables, and P ascal randorr1v ariables, respectively.

Section 3.3 int roduces several farr1ilies of discrete ra r1dom variables t hat arise in a
"'ride variety of practical applications. Ir1 t his section, \Ve introduce t hree irr1portant
fa rr1ilies of cont ir1uo11s random variables: uniform , exponent ial, and Erlang. \!\fe
devote all of Section 4.6 t o GatlSsian ra r1dorr1 variables. Like the farr1ilies of dis-
crete randorr1 variables, tl1e PDFs of the rnernbers of each famil}' all have the sarr1e
rr1athernatie<.il forrri. They differ only in t:he values of or1e or two pa.rarr1eters. vVe
have already encountered an exarnple of a cont inuot1s ?J,'niforrn ra/ndorn '/Jariable ir1
t he wheel-spinr1ing experirr1ent . The general definition is

Definition 4.5 Uniform Random Variable

X 'is a 'IJ,'niforrn (a, b) ra/ndorn variable if the P DF of X is

l /(b - a) a < x; < b,

0 otheru1ise,

u1here the tu10 pararneters are b > a .

Expressior1s t hat are synonyrnous "''ith X is a 'Un,if orrn ran,dorn '/Jaria ble are X is
un,iforrnly distrib'uted and X has a 'U'nif orrn distri bution,,
If X is a uniforrn ra r1dom v ariable t here is ar1 equal p robabilit}' of fin ding an
outcome x; in any interval of length ~ < b - a wit hin Sx = [a, b) . \"!\fe can use
Theorerr1 4.2 (b), Theorerr1 4.4, and T11eorerr1 4.5 t o derive the follovving propert ies
of a llniforrn randorn variable.


Theorem 4.6- -.....,

If X is a ?J,riiforrn (a, b) ran,dorn '/Jariable)

(0 x <a,
Th e GDF of X is Fx(x;) = (x; - a)/(b - a) a<x<b,
1 x > b.
Th e expected 'ual'/j,e of X is E (X ] = (b + a)/2.
Th e '/Jarian,ce of X is Var (X ] = (b - a) 2 / 12.

Example 4.11

Th e phase angle , 8, of the sig nal at t he input to a modem is uni fo rmly distributed
between 0 and 27r rad ians. What are the PDF, CDF, expected va lue, and variance of
From the prob lem statement, we identify the parameters of the un iform (a, b) random
variable as a = 0 and b = 27r. Th erefore the PDF a nd CDF of 8 are

0 () < 0 ,
1/ (27r) 0 < () < 27r.
1e (e) = - I
Fe(B) = B/(27r) 0< x;< 27r, (4.37)
0 otherwise,
1 x > 27r.

The expected value is E [8] b/ 2 7r radian s, and t he var iance is Var(8]

(27r ) 2 / 12 = 7r 2 / 3 rad 2 .

The relatior1ship betv.reen t 11e farr1ily of discrete uniforrn rar1dom variables and t 11e
farr1ily of contir1uous uniform randorn variables is fairly direct. The follov.rir1g theo-
rerr1 expresses the relatior1sl1ip forrnally .

- - - Theorem 4. 7
L et X be a 'IJ,rl,iforrn (a, b) ran,dorn '/Jariable; vJhere a arid b are both in,tegers. L et
K = IXl. Then, K is a discrete ?J,n,iforrn (a + 1, b) ran,dorn '/Jariable.

Proof R ecall that for a ny x, I:rl is t he smallest integer greater t ha n or equal to x . It

follo ws t hat t he event { K = k} = {k - 1 < ::e < k }. Therefore,

P[I< =k]=PK(k)= { k Px(x)dx={l/(b-a) k=a~l,a+ 2 , ... , b, (4.38)

Jk - 1 0 other,v1se.

This expression for PK(k) conforms to Defini t ion 3.8 of a discrete uniform (a+ 1 b) P MF.

The continuous relativ es of the farr1ily of geornetric rar1dom variables, Defini-

tion 3.5, are the rnembers of t11e farr1ily of expon,en,tial ran,dorn '/Jariables.


==;;... Definition 4.6 Exponential Random Variable

Xis a,n, exponential (,A) random variable if the PDF of X is

x >O
fx(x) = - '
0 other'uJis e,
1JJhere the pararneter A > 0.

Example 4.12
The probabi lity that a telepho ne call lasts no more tha n t minutes is often modeled as
an exponential CDF.

Fr(t) 1 - e-t / 3 t > 0.

0.5 Fr(t) = - I

0 otherwise.
0 ~-___........_ ___
-5 0 5 t
What is the PDF of the duration in minutes of a te lep hone conversation? What is the
probabi lity that a conversation wi ll last between 2 and 4 minutes?

We find the PDF of T by taking the derivative of the CDF:

0.4 - - - - - - - - .

fr(t) _ dFT ( t) _ (1/ 3)e-t/ 3 t >0

f T (t ) - dt
0 otherwise
0 ...____ ___..____ __,
-5 0 5 t
From Definition 4.6, we recognize t hat T is an exponent ia l (,A = 1/ 3) random variable.
The pro bability that a ca ll lasts between 2 and 4 minutes is


Example 4.13
In Example 4 .12, what is E[T], t he expected d urat ion of a te lephone call? W hat are
the variance and standard deviation of T? What is the probability that a ca ll duration
is within 1 sta ndard deviation of the expected ca ll duratio n?
Usin g the PDF f'r(t) in Example 4 .12, we calculate the expected duration of a call:

100 t -1 e-t/
E [T) =
J oo

tfr(t) dt =
0 3
3 dt . (4.40)

Integration by parts (Appe ndix B, Math Fact B.10) yields

E [T] = - te-t/ 3 00
+ 100 e-t/ 3 dt = 3 minutes. (4.41)
0 0


T o calculate the variance, we begin with the second moment of T:

E [T 2] = j (X) t 2 fr(t) rlt = { CX) t 21 e-t/ 3 rlt. (4.42)

-CX) lo 3
Again integrati ng by parts, we have

E [T 2 ] = - t 2 e-t/ 3 CX) + f (X) (2t) e-t/ 3 rlt = 2 f (X) te-t/ 3 rlt. (4.43)
o lo lo
With the knowledge that E[T ] = 3, we observe that .f0CX) te-t/ 3 rlt = 3 E [T] = 9. T h us
E[T2 ] = 6E [T] = 18 and

Var [T] = E [T 2] - (E [T]) = 18 - 3 2 = 9 minutes2.


T he sta ndard deviatio n is ay = jVar[T ] = 3 minutes. Th e probability t hat the cal l

duration is with in 1 sta ndard deviation of the expected va lue is

P [O < T < 6] = Fr(6) - Fr(O) = 1 - e- 2 = 0.865 (4.45)

To derive general expressions for the CDF , t he expected value, and the variance
of ar1 exponential rar1dom variable, we apply T11eorerr1 4.2 (b), Theorern 4.4, and
Theorern 4.5 t o the expor1ential PDF in Definition 4.6.

- - Theorem 4.8:- --===

If X is ari ex;pon,en,tial (;\) ran,dorn 'variable;

1- e-.A~r; J; > 0,
The GDF of X is Fx(x) =
0 other1uise.
Th e ex;pected valv,e of X is E [X] = l / ;\.
Th e varian,ce of X is Var [X] = 1/ ;\ 2.

The follovving theorem shows the relations11ip between the farr1ily of exponential
randorr1 variables and the farnily of geometric r andorn variables.

Theorem 4.9
If X is an, ex;pon,en,tial (;\) ran,dorn variable, then, K IXl is a geornetric (IJ)
ro:ndorn variable v1ith '[J = 1 - e->- .

Proof As in t he Theorem 4 .7 proof, t he definition of J{ implies PK(k) = P[k - 1 < X < k ].

Referring to t h e CDF of X in Theorem 4.8, we observe

PK(k) = Fx (k)- F'E (k- 1)

{ ~ - .X ( k - t ) _ e - .>..k k = 1, 2, ... = { (e- ->-)k- 1(1- e- >-) k = 1 ) 2, ...
(4 .46)
otherwise, 0 other\vise.


If we let p = 1 - e- >-, v;.re have

k = 1 ) 2, . . .

\Vhich con fo rms to D efinit ion 3.5 of a geom etric (p) r a n dom variable wit h p = 1 - e- >- .

Example 4 .14
Phone company A charges $0.15 per minute for telephone ca lls . For any fraction of
a minute at t he end of a cal l, they charge for a fu ll minute. P hone Company B also
charges $0. 15 per minute. However, Phone Compa ny B ca lcu lates its charge based on
the exact d u ration of a ca II. If T , t he du ratio n of a ca 11 in minutes, is an exponentia I
( .\ = 1/ 3) random variab le , what are the expected revenues pe r ca ll E[RA] and E[RB]
for companies A and B?
Because T is an exponential random va riable, we have in T heorem 4.8 (and in Exam-
ple 4 .13) E [T ) = 1/ .\ = 3 mi nutes per ca ll. Therefore , for phone compa ny B, wh ich
charges for t he exact duration of a ca ll,

E (R B) = 0. 15E [T ) = $0.45 pe r ca ll. (4. 48)

Company A, by contrast, co llects $0.15 IT 1 for a ca ll of durat ion T minutes. T he-

orem 4.9 states that K = !T l is a geometric random variab le with parameter I> =
1 - e- 1 / 3 . T herefore, the expected revenue fo r Company A is

E [RA) = 0.1 5E [K ) = 0. 15/p = (0. 15) (3. 53) = $0 .529 per ca ll. (4. 49)

In T heorern 9.9, "''e s11ovv t11at t he sum of a set of ir1dependent identically dis-
t ributed exponential rand orr1 variables is ar1 Erlan,g rar1dom variable.

Defi nition 4.7 Erlang Random Variable

X is an, Erlang (n,, .\) ran,dorn variable if the P DF of X is

x > 0.
f x(;i;) = (n, - 1) ! - I

0 other'tuise,

1JJhere the pararneter A > 0) an,d the pararneter n, > 1 is an, in,teger.

The par arr1eter ri, is ofter1 called t he order of an E rlar1g r an dorr1 va1iable. Prob-
lern 4. 5.1 6 out lines a p rocedur e to verify that t he integral of t he Er lar1g PDF over
all ;i; is 1. The E rlang ('n, = 1, .\) r ar1dom v ariable is ident ical t o t h e exponent ial
(.\) r ar1dom variable. J ust as t he exponer1t ial ( .\) r andorr1 variable is related t o t 11e


Procedure Observation, Paramet er (s) Probability

Random variable M ode l
J\/I onitor X is the first t ime JJ=0.095 probabilit}' X rvGeorr1etric (0.095:
custorner arrivals interval in wl1ich of or1e or rnore E (X] -- l /'[J = 10.5
at one-rr1inute or1e or more arrivals in a one-min11te intervals
intervals custorners arrive one-rninute interval
Cont int1ously T is the t ime that 1/ A = 10 minutes is TrvExponential (0.1 ),
monitor the first custorner t he expect ed E[T ] = 10 rr1inutes .
custorner arrivals arrives arrival t irne of the
first custorner
J\/I onitor t11e Y is t he fifth JJ=0.095 probabilit}' Y rvP ascal (5, 0.095) ,
custorner arrivals interval with or1e of or1e or rnore E[.Y] = 5/IJ = 52.5
at one-rr1inute or rr1ore arrivals arrivals in a one-minute intervals
intervals one-rninute interval
Cont int1ously v is t he arrival 1/ A = 10 minutes is VrvErlang (5, 0.1)
rnonitor t ime of the fift l1 t he expect ed E[V] = 50 minutes.
custorner arrivals custorner arrival t irne of the
first custorner
J\/I onitor t11e N is the number a = AT = 5 is t11e NrvPoissor1 (5),
arrival of of ct1storners who average n11rr1ber of E [J\T] = 5 customers.
custorners for arrive ir1 T = 50 arrivals in 50
T = 50 m int1tes . rnint1tes. rnir1utes

Table 4.1 F ive probabilit y m odels all describing t he san1e pattern of a rrivals at t he Phones-
n1art stor e. 'rhe expected a rrival rate is .A = 0.1 customers/ n1inute. W hen we n1onitor
arrivals in discrete one-n1inute intervals, t he probability vve observe a n onempty in terval
(v.rit h one or n1ore a rrivals) is p = 1 - e- >. = 0.095.

geornetric (1 - e->- ) r andorr1 v ariable, the Erlang ( n,, .\) continuous r andorr1v ariable
is relat ed to t he P ascal (n,, 1 - e->- ) discret e r andom variable.

=== Theorem 4.10

If X is an, E r la/ng (ri , A) ran,dorn variable, then,
ri 'n,
(a) E [X ] = .\ ' {b) Var[X] = -:\2 .

B}' corr1paring Theorern 4.8 and Theorerr1 4. 10, vie see for X , a r1 Erla ng (ri , .\)
randorr1 variable, arid Y , a,r1 exponen t ial (A) r ar1dom v ar iable , t 11at E[ X ] = n, E['Y]
and Var [X ] = ri Var[Y]. Ir1 the follovving theorern, "''e car1 also connect Erlang and
P oisson r andorr1 variables.


==;;;; Theorem 4 .11-iiiiiiiiiiiii

Let I<cv. den,ote a Poisson, (a) ran,dorn 'oar'iable . For an,y x > 0, the GDF of an,
Erlarig (n,, A) ran,dorn 'variable X satisfies

- '\:"'n-1 (>.:i;)ke- >-x >0
Fx(x) = 1 - FK>- x (ri - 1) = Lt k = O k! x; - '
0 other111ise.

Problerr1 4.5.18 outlines a. proof of Theorerr14.11. Theor err1 4.11 states that t11e
probability that the Erlan g (n,, A) r andorn variable is < x is the probabilit:y that
the Poisson (AX) r andom v ariable is > n, because t he sum in Theor err1 4.11 is the
CDF of the Poisson (Ax) rar1dom variable evaluated at n, - 1.
The rr1athernatical relationsl1ips betweer1 t11e geornetric, P ascal, exponential, Er-
lang , and Poisson r andom variables derive frorr1 the vvidely-used Poisson, process
rnodel for of Cl1storners t o a ser vice facility. Formal definit ions and theo-
r ems for t11e Poisson process appear in Sectior1 13.4. The arriving customers car1
be, for exarnple, shoppers at the Phonesm art store, packets at an Interr1et router ,
or r equests to a \ i\Teb server . In this model, the r1l1rr1ber of custorners that arrive
ir1 a T-rninute tirne period is a Poisson (AT) r ar1dom v ariable. Under continuous
rnonitoring , the time tha,t we wait for one a rrival is an exponential ( A) r ar1dom
variable and t he t irne \rve wait for n, arri\rals is an Erlang (n,, A) randorn \rariable.
On t11e other hand , wher1 vve monit or arrivals in discr ete one-rninute inter\rals, the
nurnber of inter\rals we \r..rait until "''e observe a nonempty interval (vvith one or rr1ore
arri\rals) is a geornetric (p = 1 - e->.) random \rariable arid the r1l1rnber of inter\rals
"'' e \r..rait for 'n r1onerr1pty intervals is a P ascal ( n,, [J) random variable. Table 4.1
surnmarizes these properties for experirnents t11at rr1onitor custorr1er arrivals to t he
Phonesrnart store.

Quiz 4.5
Cont inuOllS rar1dom varia,ble X 11as E[X] = 3 a nd Var [X] = 9. Fir1d t he PDF ,
f x(x) , if
(a) X is an exponential ra ndorr1 \rar- (b) X is a continuous uniforrr1 r a n-
iable, dorn \rariable.
(c) X is an Erlang rar1dorn variable.

4.6 Gaussian Random Variables

The famil}' of Gal1ssian random variables appears in more practical

applications thar1 any other farnily . The graph of a Gaussian PDF
is a bell-shaped c11rve.
Bell-shaped c'urves appear in rnany applications of probabilit}' tl1eory. The proba-
b ility rnodels in these applications ar e rnernbers of t he farr1ily of Gav,ssian, ran,dorn


0.8 I' 0.8

0.6 0.6
f'x( x;) f 'x (x;)
0.4 0.4

0.2 0.2
0 '---------____;-_ ____. \
-2 0 2 4 6 -2 0 2 4 6
.x x
(a) , = 2, (J = 1/ 2 (b) = 2, (J = 2

Figure 4.5 examples of a Gaussia n ra ndon1 variable X \:vit h exp ected value p, and
standard deviation a.

variables. Chap ter 9 contains a rnatherr1atical explanation for t h e prevalen ce of

Gaussian randorr1 variables ir1 models of practical phenomer1a . Because t hey occur
so frequently in pract ice, Gaussian r ar1dom v ar iables are sornetirr1es referred t o as
riorrnal r andorr1 v ariables.

Definition 4.8 Gaussian Random Variable

X is a Ga?J,SS'ta'n (,, (J ) ran,dorn variable 'if the P DF of X 'is

1JJhere the pararneter , can, be an,y real 'nv,rnber an,d the JJararne ter (J > 0.

~v1 ar1y statistics t exts t1se t:he notation X is N [,, (J 2 ] as shorthand for X is a
Ga'tJ,ssiari (11,, (J ) ran,dorn variable. In t his not ation, t he N denotes n,orrnal . The
gr aph of f 'x( x) h as a bell s hape, "''h ere t he cer1ter of t he bell is x = , and (J reflects
the widt 11 of the bell. If (J is srr1a.ll, t11e bell is narTOV\', vvit h a h igh , pointy peak. If (J
is large, t he bell is wide, "''ith a lo"''' fiat peak. (T11e heig11t of the peak is 1 / (J,/2;.)
Figt1re 4.5 contains t wo ex arnples of Gat1ssian P DFs v.rith /J, = 2. In Figure 4. 5(a),
(J = 0.5, and in F igt1re 4.5 (b) , (J = 2. Of course, the area under an}' Gaussiar1 PDF
is .[ 00 f'x(x;) dx = 1. Furth errnore, t he p ar arneters of t h e PDF ar e the expected
value of X and t 11e stand[-trd deviation of X.

Theorem 4.12
If X is a Ga'tJ,SSian, (11,, (J ) raridorn variable,

E [X] = 11,

The proof of Theorem 4.12 , as well as the proof t hat t he a rea under a Ga ussian
PDF is 1, err1ploys integr ation by parts and other calc11h1s techniques . We leave
them as an exer cise for the reader in Problerr1 4.6.13.


It is impossible t o express the integr al of a Ga t1ssiar1 PDF bet ween nonir1fir1ite

lirnits as a f\1nction that a.ppears on rnost scientific calculators. Inst ead , t1suall}'
find int egrals of t11e Gaussian PDF by referring to t ables, such as T able 4.2 (p. 143),
that h ave been obt ained b y r1urr1erical integr ation. To learn hov..r t o llSe this t able , int roduce the follovving important propert}' of Gaussian r andorr1 v ariables.

Theorem 4.13
If X is Ga/ussian, (,, a ), y = aX +b is Gaussiari (a + b,aa ).

The t heorern st ates that a.n}' linear trar1sforrnatior1 of a Ga ussian randorr1 varia ble
produces another G aussian r andom variable. T 11is t 11eorerr1 us t o relate the
propert ies of an arb itra ry G aussian r ar1dom variable t o the properties of a sp ecific
randorr1 v ariable.

- - - Definition 4.9- - -Standard Normal Random Variable


T he s tandard normal r andom variable Z is the G a/tJ,ssian, (0, 1) raridorn var-

iable .

Theorern 4.12 indicat es t 11at E[Z] = 0 a n d Va.r [Z ] = 1. The t ables tha t we llSe
to find ir1tegr a ls of G at1ssia n PDFs contain v alues of Fz(z), the CD F of Z. \ Ve
introduce t 11e specia l nota.tion <I> ( z) for t his function.

Definition 4 .10 Standard Normal CDF

T he GDF of the st an,dard n,orrnal ran,dorn varia ble Z is

<I> (z ) = 1 jz 2
e- 'U, / 2 du.
y2; - CX)

Given a t a ble of val11es of <I> ( z), vve use t he follov..ring t heorerr1 to fir1d probabilit ies
of a Gaussian randorr1 variable vvit h p ar arnet ers , and a.

Theorem 4.14
If X is a G a'ussian, (,, a ) raridorn variable, the GDF of X is

Fx(:i;) = <I> ( a
,) .
T he probability that X is in, t he in,t erval (a, b] is

In llSing t his theorem , vie t ra nsforrr1 values of a Ga ussian randorn variable, X , t o

equivalent val11es of the standard norrnal rar1dorn variable , Z. For a sarnple v alue
x of the ra ndom variable X , the corresponding sarnple va.lt1e of Z is
x - ,
Z = (4. 50)


0.5 0.5
0.4 0.4
0.3 .--...
~ ~
0.2 0.2 <P(-z) 1-<P(z)
0.1 0.1
0 ()
-4 -2 0 z 2 4 -4 -2 -z 0 z 2 4
x x
(a) (b)
Figure 4.6 Syn1metry properties of t he Gaussian (0 , 1) P DF .

Note t11at z is dirnensionless. It represents x as a number of stan dard deviations

r elative t o t 11e expect ed , ra lue of X. Table 4.2 presents <P (z) for 0 < z < 2.99.
P eople working \vi t h probability and statistics spend a lot of time referring to tables
like Table 4.2. It seerr1s strange to t1s t hat <P (z) isn 't included in every scientific
calculator. For rnany p eople, it is fa r more t1sen1l t h an m an}' of t 11e fu r1ctioris
ir1cluded in ordir1ary scier1tific calc11lators.

=== Example 4.15;==::::a

Suppose your score on a test is x = 46, a sample value of the Gaussian (61, 10)
random variable. Express your test score as a sample va lu e of the standard normal
random variable, Z.

Equation (4 .50) ind icates that z = (46 - 61 )/10 = - 1.5. Therefore your score is 1.5
standard deviations less than the expected value.

T o find probabilit ies of Ga.t1ssian r ar1dom variables, -vve t1se t he values of <I> ( z) pre-
sented ir1 Table 4. 2. Note t hat t his table cor1tains ent ries onl}' for z > 0. For
negative valt1es of z, \Ve apply the follo-vving property of <P (z) .

=== Theorem 4.15

<P (- z) = 1- <P (z) .

Fig11re 4.6 disp la}'S t he symrnetry properties of <I> ( z). Both gr ap11s contain t h e
standa rd norrr1al PDF . In Figure 4.6 (a), t h e sh aded ar ea under t h e PDF is <I>( z).
Since the area under the PDF equals 1, t he 11nshaded area t1nder t 11e PDF is 1 - <I> (z) .
In F igt1re 4.6 (b), the shad ed area or1 the right is 1 - <P (z) and t h e sha ded ar ea on
the left is <I> (- z). This gr a.p11 dernor1str at es t hat <I> (- z) = 1 - <P (z) .

Example 4.16
If X is the Gaussian (61 , 10) random variable , what is P[X < 46)?


Applyi ng Th eorem 4.14, T heorem 4.15, and the result of Exa mple 4.15 , we have

P [X < 46] = Fx (46) = <P( - 1.5) = 1 - <P(l.5) = 1 - 0.933 = 0.067. (4.51 )

T h is suggests that if your test score is 1.5 st and a rd deviations below t he expected va lue,
you are in the lowest 6.7% of the popu lation of test takers.

Example 4.17'
If X is a Ga ussia n (11, = 61 , 0" = 10) rando m variable, what is P[51<X < 71]?
Applying Equation ( 4.50), Z = (X - 61) / 10 and

{51 < x -< 71} = { - 1 -< x 10
- -< 1} = { - 1<z -< 1} . (4. 52)

T he probab ility of this event is

p [- 1 < z< 1] = <P (l ) - <P( - 1)

= <P(l) - [1 - <P(l)] = 2<1> (1) - 1 = 0.683. (4. 53)

The solution to Exarnple 4.17 reflects t he fact t 11at in an experimer1t v.rith a G aussiar1
probabilit}' model, 68.3% ( abot1t tvvo t 11irds) of the ot1tcomes are wit11in 1 standard
deviation of the expected value. Abot1t 95% (2<1> (2) - 1) of the 011tcornes are wit hir1
tvvo st andard deviations of the expected value.
T ables of <P (z) are useful for obtaining nurr1erica.l values of integrals of a Ga11ssian
PDF O\rer intervals near t 11e expected value. Regions farther thar1 t hree standard
deviations frorn the expected \ralue (corresponding t o lzl > 3) a r e in t he tails of
the PDF . When lzl > 3, <I>(z) is very close t o one; for exarr1ple, <1> (3) = 0.9987 and
<I>( 4) = 0.9999768. T 11e properties of <P (z) for extrerne values of z a re apparent in
the stan,dard 'norrnal cornplernen,tary GDF.

Definition 4.11== Standard Normal Complementary CDF

The st andar d normal complement ary GD F is

Q(z) = 1
P [Z > z] = y!2;
1 z

e- 'cJ,2 12 du = 1 - <P (z ).

Althot1gh we rr1ay regard both <1>(3) = 0.9987 a nd <I>( 4) = 0.9999768 as being \rery
close t o one, vve see in T able 4.3 t hat Q(3) = 1.35 10-3 is alrnost tvvo orders of
rnagnit ude larger than Q (4) = 3.17 10- 5 .

- - - Example 4.18:- - -
ln an optica l fiber transmission system , the probabi lity of a bit error is Q(-v:;T2), where
r is t he signa l-to-noise ratio. What is t he minimum va lue of r t hat produces a bit error
rate not exceeding 10-6 ?


z <P ( z) z <P (z) z <P( z) z <P (z) z <P( z) z <P(z)

o.oo 0.5000 0.50 0.6915 1.00 0.8413 1.50 0.9332 2.00 0.97725 2.50 0.99379
0.01 0.5040 0.51 0.6950 1.01 0.8438 1.51 0.9345 2.01 0.97778 2.51 0.99396
0.02 0.5080 0.52 0.6985 1.02 0.8461 1.52 0.9357 2.02 0.97831 2.52 0.99413
0.03 0.5120 0.53 0.7019 1.03 0.8485 1.53 0.9370 2.03 0.97882 2.53 0.99430
0.04 0.5160 0.54 0.7054 1.04 0.8508 1.54 0.9382 2.04 0.97932 2.54 0.99446
0.05 0.5199 0.55 0.7088 1.05 0.8531 1.55 0.9394 2.05 0.97982 2.55 0.99461
0.06 0.5239 0.56 0.7123 1.06 0.8554 1.56 0.9406 2.06 0.98030 2.56 0.99477
0.07 0.5279 0.57 0.7157 1.07 0.8577 1.57 0.9418 2.07 0.98077 2.57 0.99492
0.08 0.5319 0.58 0.7190 1.08 0.8599 1.58 0.9429 2.08 0.98124 2.58 0.99506
0.09 0.5359 0.59 0.7224 1.09 0.8621 1.59 0.9441 2.09 0.98169 2.59 0.99520
0.10 0.5398 0.60 0.7257 1. 10 0.8643 1.60 0.9452 2.1 0 0.98214 2.60 0.99534
0.11 0.5438 0.61 0.7291 1. 11 0.8665 1.61 0.9463 2.11 0.98257 2.61 0.99547
0.12 0.5478 0.62 0.7324 1. 12 0.8686 1.62 0.9474 2.1 2 0.98300 2.62 0.99560
0.13 0.5517 0.63 0.7357 1. 13 0.8708 1.63 0.9484 2.1 3 0.98341 2.63 0.99573
0.14 0.5557 0.64 0.7389 1. 14 0.8729 1.64 0.9495 2.14 0.98382 2.64 0.99585
0.15 0.5596 0.65 0.7422 1. 15 0.8749 1.65 0.9505 2.1 5 0.98422 2.65 0.99598
0.16 0.5636 0.66 0.7454 1. 16 0.8770 1.66 0.9515 2.1 6 0.98461 2.66 0.99609
0.17 0.5675 0.67 0.7486 1. 17 0.8790 1.67 0.9525 2.1 7 0.98500 2.67 0.99621
0.18 0.5714 0.68 0.7517 1. 18 0.8810 1.68 0.9535 2.18 0.98537 2.68 0.99632
0.19 0.5753 0.69 0.7549 1. 19 0.8830 1.69 0.9545 2.1 9 0.98574 2.69 0.99643
0.20 0.5793 0.70 0.7580 1.20 0.8849 1.70 0.9554 2.20 0.98610 2.70 0.99653
0.21 0.5832 0.71 0.7611 1.21 0.8869 1.71 0.9564 2.21 0.98645 2.71 0.99664
0.22 0.5871 0.72 0.7642 1.22 0.8888 1.72 0.9573 2.22 0.98679 2.72 0.99674
0.23 0.5910 0.73 0.7673 1.23 0.8907 1.73 0.9582 2.23 0.98713 2.73 0.99683
0.24 0.5948 0.74 0.7704 1.24 0.8925 1.74 0.9591 2.24 0.98745 2.74 0.99693
0.25 0.5987 0.75 0.7734 1.25 0.8944 1.75 0.9599 2.25 0.98778 2.75 0.99702
0.26 0.6026 0.76 0.7764 1.26 0.8962 1.76 0.9608 2.26 0.98809 2.76 0.99711
0.27 0.6064 0.77 0.7794 1.27 0.8980 1.77 0.9616 2.27 0.98840 2.77 0.99720
0.28 0.6103 0.78 0.7823 1.28 0.8997 1.78 0.9625 2.28 0.98870 2.78 0.99728
0.29 0.6141 0.79 0.7852 1.29 0.9015 1.79 0.9633 2.29 0.98899 2.79 0.99736
0.30 0.6179 0.80 0.7881 1.30 0.9032 1.80 0.9641 2.30 0.98928 2.80 0.99744
0.31 0.6217 0.81 0.7910 1.31 0.9049 1.81 0.9649 2.31 0.98956 2.8 1 0.99752
0.32 0.6255 0.82 0.7939 1.32 0.9066 1.82 0.9656 2.32 0.98983 2.82 0.99760
0.33 0.6293 0.83 0.7967 1.33 0.9082 1.83 0.9664 2.33 0.99010 2.83 0.99767
0.34 0.6331 0.84 0.7995 1.34 0.9099 1.84 0.9671 2.34 0.99036 2.84 0.99774
0.35 0.6368 0.85 0.8023 1.35 0.9115 1.85 0.9678 2.35 0.99061 2.85 0.99781
0.36 0.6406 0.86 0.8051 1.36 0.9131 1.86 0.9686 2.36 0.99086 2.86 0.99788
0.37 0.6443 0.87 0.8078 1.37 0.9147 1.87 0.9693 2.37 0.99111 2.87 0.99795
0.38 0.6480 0.88 0.8106 1.38 0.9162 1.88 0.9699 2.38 0.99134 2.88 0.99801
0.39 0.6517 0.89 0.8133 1.39 0.9177 1.89 0.9706 2.39 0.99158 2.89 0.99807
0.40 0.6554 0.90 0.8159 1.40 0.9192 1.90 0.9713 2.40 0.99180 2.90 0.99813
0.41 0.6591 0.91 0.8186 1.41 0.9207 1.91 0.9719 2.4 1 0.99202 2.91 0.99819
0.42 0.6628 0.92 0.8212 1.42 0.9222 1.92 0.9726 2.42 0.99224 2.92 0.99825
0.43 0.6664 0.93 0.8238 1.43 0.9236 1.93 0.9732 2.43 0.99245 2.93 0.99831
0.44 0.6700 0.94 0.8264 1.44 0.9251 1.94 0.9738 2.44 0.99266 2.94 0.99836
0.45 0.6736 0.95 0.8289 1.45 0.9265 1.95 0.9744 2.45 0.99286 2.95 0.99841
0.46 0.6772 0.96 0.8315 1.46 0.9279 1.96 0.9750 2.46 0.99305 2.96 0.99846
0.47 0.6808 0.97 0.8340 1.47 0.9292 1.97 0.9756 2.47 0.99324 2.97 0.99851
0.48 0.6844 0.98 0.8365 1.48 0.9306 1.98 0.9761 2.48 0.99343 2.98 0.99856
0.49 0.6879 0.99 0.8389 1.49 0.9319 1.99 0.9767 2.49 0.99361 2.99 0.99861

Table 4.2 1"'he standard norn1al CDF <P(y).



z (-d(z) z (J(z) z (-d(z) z (-d(z) z (-d(z)

3.00 l .35 10 - 3
3.40 3.371 0 - 3.80 7.23 10 - 5 4 .20 1. 331 0 - ii 4 .60 2.111 0 - G
3 4
3.01 l.31 10 - 3.4 1 3.25 -1 0 - 3.81 6.95- 10 - 5 4 .21 1. 28 1 0 - ii 4 .61 2.0110 - G
3.02 l.26 10 - 3
3.42 3. 131 0 - 3.82 6.67 10 - 5 4 .22 1. 22 1 0 - ii 4 .62 1.921 0 - G
3.03 l.22 -1 0 - 3
3.43 3.02 -1 0 - 4
3.83 6.4 1-10 - 5 4 .23 1.171 0 - ii 4 .63 1.831 0 - G
3.04 l .18 10 - 3
3.44 2.91-10 - 4
3.84 6.1 5 10-"" 4 .24 1.12 l o - " 4 .64 1.74 10 - G
3.05 l. 14 10 - 3
3.45 2.80 -1 0 - 3.85 5.91-10 - 5 4 .25 1. 071 0 - ii 4 .65 1.661 0 - G
3 4
3.06 l. 1110 - 3.46 2.70 -1 0 - 3.86 5.67- 10 - 5 4 .26 1. 02 1 0 - ii 4 .66 1.581 0 - G
3.07 l.07 10 - 3
3.47 2.60 -1 0 - 3.87 5.44- 10 - 5 4 .27 9.771 0 - G 4 .67 1.5110 - G
3.08 l .04 10 - 3
3.48 2.51-10 - 4
3.88 5.22 10-"" 4 .28 9.341 0 - G 4 .68 1.43 10 - G
3.09 l .00 10 - 3
3.49 2.42 10 - 4
3.89 5.0110-"" 4 .29 8.93 10 - G 4 .69 1.37 10 - G
3.1 0 9.68 1 0 - 4
3.50 2.331 0 - 3.90 4.8 1-10 - 5 4 .30 8.541 0 - G 4 .70 1.301 0 - G
3.11 9.35 -1 0 - 4
3.51 2.24 10 - 3.91 4.61-10 - 5 4 .31 8.1 6 10 - G 4 .71 1.24 10 - G
3.1 2 9.04- 10 - 4
3.52 2. 161 0 - 3.92 4.43 10 - 5 4 .32 7.801 0 - G 4 .72 1. 18 1 0 - G
3.1 3 8.74 10 - 4
3.53 2.08 10 - 4
3.93 4.25 10-"" 4 .33 7.46 1 0 - G 4 .73 1.1 21 0 - G
3.14 8.45- 10 - 4
3.54 2.00 -1 0 - 3.94 4.07- 10 - 5 4 .34 7.1 2 10 - G 4 .74 1.071 0 - G
3.1 5 8.1 6 10 - 4
3.55 l .93 1 0 - 3.95 3.91-10 - 5 4 .35 6.8 110 - G 4 .75 1.021 0 - G
4 4 3. 75. 10 - 5 6.501 0 - G 7
3.1 6 7.89 10 - 3.56 l .85 1 0 - 3.96 4 .36 4 .76 9.68 1 0 -
3.1 7 7.62- 10 - 4
3.57 1. 78 10 - 4
3.97 3.59 10-''" 4 .37 6.2110 - G 4 .77 9.21-10 - 7

3.18 7.361 0 - 4
3.58 1. 72 10 - 4
3.98 3.45 10-''" 4 .38 5.93 10 - G 4 .78 8.76 1 0 - 7
3.1 9 7.11 -10 - 4
3.59 l .65 1 0 - 3.99 3.30- 10 - 5 4 .39 5.671 0 - G 4 .79 8.34 -1 0 - 7

3.20 6.87 10 - 4
3.60 l .591 0 - 4
4 .00 3.1 7 10-''" 4 .40 5.4 110 - G 4 .80 7.93 1 0 - 7
3.21 6.64 10 - 4
3.61 l .531 0 - 4 .01 3.04- 10 - 5 4 .4 1 5.1 71 0 - G 4 .8 1 7.55 -1 0 - 7

3.22 6.4 1-10 - 4

3.62 l .47 10 - 4
4 .02 2.9110-''" 4 .42 4.94 10 - G 4 .82 7.18 1 0 - 7
3.23 6.1 9 10 - 4
3.63 1.42 -1 0 - 4 .03 2.79- 10 - 5 4 .43 4 .7110 - G 4 .83 6.831 0 - 7
3.24 5.98 10 - 4
3.64 l .361 0 - 4 .04 2.67- 10 - 5 4 .44 4 .501 0 - G 4 .84 6.49 -1 0 - 7

3.25 5.77- 10 - 4
3.65 1.31-10 - 4
4 .05 2.56 10-''" 4 .45 4 .29 1 0 - G 4 .85 6.1 7 -1 0 - 7

3.26 5.57 -1 0 - 4
3.66 l .26 1 0 - 4
4 .06 2.45 10-''" 4 .46 4 .1 0 10 - G 4 .86 5.87 -1 0 - 7

3.27 5.38 10 - 4
3.67 l .2110 - 4
4 .07 2.35 10-''" 4 .47 3.9110 - G 4 .87 5.58 10 - 7
3.28 5.1 9 10 - 4
3.68 l . l 71 0 - 4 .08 2.25- 10 - 5 4 .48 3.73 1 0 - G 4 .88 5.30 -1 0 - 7

3.29 5.0110 - 4
3.69 1. 12-1 0 - 4
4 .09 2.1 6 10-''" 4 .49 '3 . ;)r.:5) .1 0 - G 4 .89 5.04 -1 0 - 7

3.30 4.831 0 - 4
3.70 l .08 1 0 - 4
4 .1 0 2.07 10-''" 4 .50 3.40 1 0 - G 4 .90 4.79 -1 0 - 7

3.31 4.66 1 0 - 4
3.71 l .041 0 - 4
4 .11 1.98 10-''" 4 .51 3.241 0 - G 4 .91 4.55 -1 0 - 7

3.32 4.50- 10 - 4
3.72 9.96 1 0 - ii 4 .1 2 l.89 10 - 5 4 .52 3.09 1 0 - G 4 .92 4.331 0 - 7

3.33 4.34 -1 0 - 4
3.73 9.571 0 - ii 4 .1 3 l.8110 - 5 4 .53 2.95 1 0 - G 4 .93 4.11-1 0 - 7

3.34 4.1 9 10 - 4
3.74 9.20 -1 0 - " 4 .14 1. 74 10-''" 4 .54 2.8 110 - G 4 .94 3.91.10 - 7

3.35 4.04 -1 0 - 4
3.75 8.841 0 - ii 4 .1 5 1.66 10-''" 4 .55 2.68 1 0 - G 4 .95 3.71-10 - 7

3.36 3.90 10 - 4
3.76 8.50 10 - ii 4 .1 6 1.59 10-''" 4 .56 2.56 10 - G 4 .96 3.52 10 - 7

3.37 3.76 -1 0 - 4
3.77 8. 16 10 - ii 4 .1 7 l.52 10 - 5 4 .57 2.441 0 - G 4 .97 3.35 -1 0 - 7

3.38 3.62 -1 0 - 4
3.78 7.841 0 - ii 4 .18 1.46 10-''" 4 .58 2.321 0 - G 4 .98 3.18 1 0 - 7

3.39 3.49 -1 0 - 4
3.79 7.531 0 - ii 4 .1 9 1.39 10-''" 4 .59 2.22 1 0 - G 4 .99 3.02 -1 0 - 7

Table 4 .3 'I'he standard normal con1plen1entary CDF Q (z) .



Referring to Ta ble 4.2, we find t hat Q(z) < 10-5 when z > 4.75. Therefore , if
~ > 4.75, or r > 45, the probability of error is less than 10- 6 . Alt hough 10C- 6)
seems a very smal l number, most pract ica l optical fiber transmission systems have
considerably lower binary error rates.

Keep in mind that Q(z) is t he probability that a Gaussian random variable ex-
ceeds its expect ed v all1e b:y more t han z st andard deviations. We can observe frorn
T able 4.3, Q(3) = 0.0013. This mearis that the probability t liat a Gaussian randorri
variable is rriore than t hree standard deviations above its expect ed value is approxi-
rnat el:y orie in a t liousand. In coriversation we refer to the event { X - 11,x > 3ax } as
a three-sigrna e'ven,t. It is unlikely to occur. Table 4.3 indicates t liat t lie probability
of a 5a event is on t he order of 10- 7 .

Quiz 4.6
X is the Ga ussian (0, 1) randorn variable and y is t he Gaussian (0, 2) raridorn
variable. Sket ch the P DFs f x( x) a rid j'.y (y) on t he same axes and find:
(a) P [- 1 < X < 1], (b) P [- 1 < y < 1],
(c) P [X > 3. 5], (d) P [Y > 3. 5].

4. 7 Delta Functions, Mixed Random Variables

X is a rnixe d randorn variable if S x has at least one sarnple value

vvith norizero probabilit}' (like a discrete random variable) and also
has sample values that cover an interval (like a continuous randorn
variable). Tlie PDF of a rriixed random variable contains finite
norizero values arid delta h1rictioris multiplied b}' probabilities.
Thus fa r, our analysis of cont inuous raridorn varia bles parallels ot1r a ritilysis of
discrete randorn variables in Cliapter 2. Becal1Se of t he different nature of discrete
and continuOllS randorn varia bles, \Ve represent the probability rnodel of a discret e
random variable as a PNIF a nd we represent t he probability model of a cont in11ous
randorn variable as a PDF . T hese ft1nctions are importarit because t liey enable t1s
to calct1lat e probabilities of e\ren ts and pararriet ers of probability rnodels (st1ch as
the expect ed valt1e and t he \rariarice). Calculations contairiing a P1!{F irivol\re surns.
The corresponding calculations for a PDF contairi integrals.
In this section, we introduce the uriit irnpt1lse f\1nction 6(1'; ) as a rriathematical
tool t h at uriites the a nalyses of discrete and con tinuous random variables. The
t1nit irnpulse, often called the delta f un,ction,, allows us to use t he sarne formulas
t o describe calculatioris vvit h both types of randorri variables. It does not alter the
calculations, it jl1st provides a nevv notation for describing t hern. Tliis is especially
convenient vvhen vve refer to a rn'i xed ra/ndorn variable, vvhich lias propert ies of both
continl1ous and discret e ra.ndorri variables.


E -- -161
E --
E= 1

E -- l
E= 1-
-1 / 2
1/ 2
- x

Figure 4. 7 As E ---+ 0 , d f.(x) a pproaches t he delta function 6 (::r) . For each E, t h e a rea under
t h e curve of d ( x) equals 1.

The delta furiction is riot complet ely respectable rriathematicall}' becal1se it is

zero everywhere except at orie point , and t here it is infinit e. Thus at its most inter-
esting point it has no nurrierical value at all. W liile 6 (x) is somewhat disreputable,
it is extremel}' useful. Tliere a re variOllS definit ions of the delta function. All of
them sh are the key property presented in Theorem 4.16. Here is tlie defiriition
adopted in t his book.
Definition 4.12 Unit Impulse ( Delta) Function
L et
1/ E - E/ 2 < X < E/ 2,
0 other111ise.
The unit impulse f unct ion is

The rriatherriatical problerri witli Defiriitiori 4.12 is that de (x) has no limit at ;x: = 0.
As indicat ed in Figure 4.7, de (O) just gets bigger a nd bigger as E ---+ 0. Although
this rriakes Definition 4.12 sorriev.rliat unsatisfactor}', t lie usefltl properties of the
delta function are readil}' derrionstrat ed vvhen 6(x) is approxirnated by de (x;) for
very srriall E. We nov.r present some proper t ies of the delta function. '\'Ve st at e these
propert ies as t lieorerris even though t hey are riot theorerris in the tlsual sense of this
t ext because we cannot prove therri. Inst ead of t lieorerri proofs, we refer to de (x)
for small values of E t o indicat e vvhy t:he properties hold.
Although dE(O) blovvs up as E---+ 0, t he area under dE (x ) is tlie integral
dE (x ) dx = - dx = 1. (4. 54)
- oo - E/2 E

That is, the area urider de.(x) is always 1, rio rriatter liovv srriall the value of E. '\'Ve
conclude that t he area under 6(x;) is also 1:

J oo
- oo
6(x; ) dx = 1. (4. 55)


This result is a special case of t he follo\vir1g property of t11e delta function.

- - Theorem 4.16---==::::::i
For an,y con,tinv,o?J,s j?J,riction, g(x);

J oo
g(;i; )<5(x; - x;o) rlx; = g(x;o)

Theorem 4.1 6 is often called t he siftin,g property of t11e delta fur1ction. \Ve car1
see that E quation (4.55) is a specia l case of t he sift ing property for g(x) = 1 a rid
xo = 0. T o underst ar1d T h eorern 4.16 >cor1sider t he integral

(4. 56)

On t11e right side, we have t he a\rerage value of g( x) over the ir1terval [x; 0 - E/ 2, x 0 +
E/ 2]. As E ---+ 0, t his a\rerage value ml1st con\rerge to g(x 0 ).
The delta functior1 has a close connection t o t he unit step function.

Definition 4 .13 Unit Step Function

The unit step f unction is

0 x < 0)
v,(x;) =
1 x > o.

=== Theorem 4.17i.........:::==

j'= O(v) dv = 'u(x) .

T o 11nderstand Theorem 4.17, observe t hat for any x; > 0, \Ve can choose E < 2x
so that

j :r:
de(v)dv = l. (4. 57)

Th11s for an}' x ':I 0 , in t he lirr1it as E---+ O> J:r:oo dc(v) rlv = v,(x) . Not e t ha t we
ha\re not yet considered x = 0. In fact >it is not cornpletel}' clear vvh at t h e \ralue
of J 00 <5 (v) dv s hould be. R easor1able arg11rner1ts can be rnade for O> 1 / 2> or 1.

vVe have adopted the conver1tion that .f 0 00 <5 (x;) dx = 1. vVe will see t h at t his is a
p art icl1larly convenier1t c11oice \vhen we reexarnine discrete randorr1 variables.


Theorem 4.17 allows us t o write

o(x) = d'n(x) . (4. 58)
Eqt1ation (4.58) ernbodies a certain kind of consist ency in its inconsist ency. That
is, o(x) does not r eally exist at x; = 0. Sirnilarly, t he derivative of 'u(x;) does riot
r eall}' exist at ;i; = 0. However , Equation (4.58) allovvs us to use 6(x) t o d efin e a
gener alized PDF t hat applies t o discrete r andorr1 variables as v.rell as t o cont ir1uo11s
randorr1 variables .
Cor1sider t he CDF of a. discr et e r andom variable, X. Recall that it is constar1t
everywhere except at poir1ts Xi E Bx, v.r11er e it 11as jt1rr1ps of height Px(x;i) Usir1g
the definit ion of t he unit step ft1nction, we can write t 11e CDF of X as

Fx (x) = L Px (;i;i) v,(x; - x,i) (4. 59)


Frorn Defir1itior1 4.3, we t a ke t 11e derivative of Fx( x) t o find t 11e PDF f x (x) . Refer-
ring t o Equation (4.58), t l1e PDF of t he discrete r andom variable X is

f'x (x;) = L Px (xi) o(;i; - x;i) (4.60)


W hen t he PDF ir1cludes delta fur1ctions of t h e forrn o(x; - x;i), say ther e is
an irr1pulse at xi . W hen \Ve gr a ph a PDF f x(x; ) t 11at contains ar1 irr1pulse at xi, -vve
draw a vertical arro-vv labeled by the constant that multiplies t he irr1pulse. We dra\v
each arro\v represent ing a.n irr1pulse at t he sarr1e height because the PDF is al-vvays
infinite at each such point . For exarnple, t11e graph of f'x (x) frorr1 Equation (4.60)

x, x,
Using d elta functior1s in t11e PDF , \Ve can appl}' t he for rr1t1las in t his ch ap ter
t o all r andorn variables. In t he case of discrete ra ndom variables, t hese formulas
are equi\ralent t o t he ones presented in C11apter 3. For example, if X is a discrete
randorr1 variable, Defir1itior1 4.4 becorr1es

E [X] = 1= L
~r:i ESx
Px (x;) O( x - Xi ) dx . (4.61 )

B}' -vvriting the integr a l of the surn as a s um of ir1tegr als a nd us ing t h e sift ing
propert}' of t:he delta fur1ction ,

E [X] = L
x i ESx
x Px(x;) O(x - x,i) dx; = L
x i ESx
xiPx(x;i), (4.62)


Figure 4.8 The P l\!l F , CDF , and P DF of t he discrete random variable Y .

vvhich is D efir1it ior1 3.13.

- - - Example 4.191- - -
Suppose Y takes on the va lues 1, 2, 3 with equa l probab ility. The PMF and the corre-
spond ing CDF of y are

0 y < 1,
1/ 3 y = 1, 2, 3, 1/ 3 1 < y < 2,
Py(y) = Fy(y) = (4.63)
0 otherwise, 2/ 3 2 < y < 3,
1 y > 3.

Using the unit step funct ion v,(y), we can write Fy(y) more compactly as

1 1 1
Fy(y) = - v,(y - 1) + - v,(y - 2) + - 'u,(y - 3). (4. 64)
3. 3 3
The PD F of Y is
dFy (y) 1 1 1
Jy(y) = = - o(y - 1) + - o(y - 2) + - o(y - 3) . (4. 65)
dy 3 3 3
We see that the discrete random variable Y can be represented graph ical ly either by a
PMF Py(y) with bars at y = 1, 2, 3, by a CDF with jumps at y = 1, 2, 3, or by a PDF
fy(y) with impulses at y = 1, 2, 3. These three representations are shown in Figure 4.8.
The expected value of Y can be calculated either by summing over the P MF P y(y) or
integrating over the PDF fy (y) . Using the PDF, we have

E [Y] = J: yjy(y) dy

= oo ;_Y o(y - 1) dy + Joo ;_Y o(y - 2) dy + Joo -Y o(y - 3) dy

J 3
-oo - oo 3 -oo 3
= 1/ 3 + 2/ 3 + 1 = 2. (4. 66)


When Fx(x) has adiscon t ir1uity at x, -vve llSe Fx(x+) a n d Fx(:i;-) to denote t he
llpper a nd lov.rer limits at ;i; . That is,

Fx (:i;-) = lim Fx (:i; - h) , (4. 67)


Using t his notation , -vve ca,n say t hat if t he CDF Fx(x) 11as a jump at :i; 0 , then 1x(x)
h as a n irr1pulse at ;i; 0 v.reigl1ted b}' the heigh t of t he discont ir1uity Fx(xci) - Fx(x0).

~= Example 4.20
For the random variab le Y of Example 4.19 ,

(4. 68)

- - - Theorem 4.18
For a ran,dorn '/Jariable X, 1J.Je ha'/Je the f ollo'uJin,g eq11,i'IJalen,t staternen,ts:
(a) P [X = :i;o] = q {b) Px(xo) = q
(c) Fx(xci) - Fx(:i;0) = q {d) 1x(xo) = qb(O)

In E x arr1ple 4. 19, sa,v.r t hat j'.y(y) consists of a series of impulses. T 11e value
of fy(y) is either 0 or oo . By con trast, t he PDF of a con t inuous r a nd om v ariable
has nonzero, finite values over intervals of ;i; . I r1 t he nex t exarn ple, -vve encour1ter a
r andorr1 variable t hat h as con t inuOllS parts a nd irnpulses.

- - - Definition 4.14- - - - Mixed Random Variable

X is a mixed ran,dorn variable 'if an,d on,ly if f x( ;i;) co'ntain,s both irnp11,lses an,d
rion,zero 7 fin,ite '/Jalv,es.

Example 4.21- - -
0bserve someone dialing a telephone and record the duration of the call. In a simple
model of the experiment , 1 / 3 of the cal ls never begin either because no one answers or
the line is busy. The duration of these cal ls is 0 minutes. Otherwise, with probability
2/3 , a ca ll duration is un iformly distributed between 0 and 3 minutes. Let Y denote
the cal l duration. Find the CDF Fy(y) , the PDF fy(y), and the expected va lue E(Y].

Let A denote the event that the phone was answered. P (A] = 2/3 and P (Ac] = 1/3 .
Since y > 0 , we know that for y < 0 , Fy(y) = 0. Similarly, we know that for y > 3,
Fy(y) = l. For 0 < y < 3 , we apply the law of total probability to write

(4. 69)


When A c occurs, Y = 0, so that for 0 < y < 3, P[Y < ylAc] = l. When A
occurs , the cal l duration is uniform ly distributed ove r [O>3], so that for 0 < y < 3,
P [Y < ylA] = y/3. So, f or 0 < y < 3,

Py (y) = (1/ 3)(1) + (2/3)(y/3) = 1/ 3 + 2y/ 9. (4.70)

T he complete CDF of y is
Py(y) 0 y < 0,
Fy(y) = 1/ 3 + 2y/ 9 0 < y < 3,
0 1 y > 3.
0 1 2 3 y
Consequently, the corresponding PDF fy(y) is
fy (y) '
2/9 b(y)/3 + 2/ 9 0 < y < 3 ,
j'y (y) =
0 otherwise.
0 I 2 3 y
For the mixed random variable y , it is easiest to calculate E [Y] usi ng the PDF :

1 13 2 2
CX) y2 3
E [Y ] = y- b(y) dy + - y dy = 0 + - -:-- = 1 minute. (4. 71 )
-CX) 3 0 9 9 2 0

In Exarr1ple 4.21, we see that wit.h probability 1 / 3, Y resernbles a discrete randorr1

variable; otherv.rise, Y be11aves like a continuOllS ra ndom variable. This behavior is
reflected in t11e irnpl1lse in the PDF of y . In many practical applications of prob-
abilit}' >rr1ixed rar1dorn variables a rise as functions of continl1ous randorr1 v ariables.
Electronic circuits perforrr1 rnany of these fl1nctions. Exarnple 6.8 in Section 6.3
gives one exarnple .
B efore goir1g a n y further , -vve rev iew -vvhat we have learned abo11t rar1dorn vari-
ables. For any ra ndorr1 v.a.riable X ,
X al-vva}'S has a CDF Fx(x) = P [X < :c].
If F x( x;) is piecewise fiat -vvith discontinuous jurnps, then X is discret e.

If Fx(x) is a continuous functior1>ther1 Xis contir1uous.

If Fx(x;) is a piece\vise continuous function v.rith discontinuities, ther1 X is


When X is discrete or rr1ixed, the PDF f x(x) contains or1e or rnore d elta


Quiz 4.7
The curnulative distribt1tion function of rar1dom variable X is

0 ;i; < - 1,
Fx(x) = (x + 1)/4 - 1 < ;i; < 1, (4.72)
1 ;i; > 1.
Sket ch t11e CDF and find t he follovving:
(a) P [X < 1] (b) P[X < 1]
(c) P[X = 1] (d) t 11e PDF f x(x)

4.8 l\ilATLAB

Built-in J\II ATLAB functior1s, eit her alone or v.rith additional code,
can be t1sed t o calculate PDFs and CDFs of several randorn variable
farr1ilies. The rand and randn functions simulate experirnents
that gener ate sarnple values of continuot1s t1niform (0 , 1) r ar1dom
variables and Gaussiar1 (0 , 1) r andorr1 variables, respectively .

Probability Functions
T able 4.4 describes J\!I.A.TLAB ft1nctions related to four families of cont inuot1s randorn
variables introduced in t his cha pter: t1niform, exponential, Erlang, and Gaussian.
The functior1s calculate d irectly t 11e CDFs arid PDFs of t1niform and exponent ial
randorr1 v ariables.
function F=erlangcdf (n,lambda,x) For Erlang and Gaussiar1 rar1dorn variables,
F=1.0-poissoncdf(lambda*x,n-1); t he PDFs car1 be calculated directly but t he
CDFs require numerical integration. For Er-
lang randorn variables, erlangcdf uses T 11eor ern 4.11. For t he Gaussian CDF ,
vve use t 11e built-ir1 M ATLAB error function
erf( x) = r::. e -v, du . (4.73)
y7r 0

It is related to t he G aussiar1 CDF by

<P(x) = -1
+ -1 erf ( -x )
2 J2 ) (4.74)

vvhich is hovv we irr1plerner1t t 11e J\11.A.T LAB function phi (x). In each function
descript ion in T able 4.4, x denot es a vector x = [x; 1 Xrn ]'. The pdf function
out put is a vect or y suc11 t h at Yi = f'x(x;i) . The cdf function out put is a vect or y

4.8 MATLAB 153

Random Va riab le M ATLAB Func ti on Func ti on Out put
X Uniforrn (a , b) y=uniformpdf (a,b,x) Yi = f'x(x;i )
y=uniformcdf (a,b,x) Yi = Fx(xi)
x=un ifor mr v(a,b,m) X= [X1 X rn]
X Exponential (.A) y=exponentialpdf (lambda,x) Yi = f'x(x;i )
y =exponentialcdf (lambda,x) Yi = Fx(x,i)
x =exponentialrv(lambda,m) X= [X1 X 1n]
X Erlang (ri, A) y=erlangpdf (n,lambda,x) Yi = f'x(x;i )
y =erlangcdf (n,lambda,x) Yi = Fx(xi)
x =erlangrv(n,lambda,m) X= [X1 X rn]
X Gal1ssiar1 (11,, 0" 2 ) y=gausspdf (mu,sigma,x) Yi = f'x(x;i)
y =gausscdf (mu,sigma,x) Yi = Fx(xi)
x =gaussrv(mu,sigma,m) X= [X1 X rn]

Table 4.4 l\/IATLAB functions for continuous randon1 variables.

such that y,i = Fx(xi) T11e rv f\1nction output is a vector X = [X1 X rn] '
such that each X ,i is a san1ple value of the random variable X. If m, = 1, then the
output is a single sarnple va1t1e of randorn variable X.

Random Samples
Nov.r t11at we have introduced continuot1s randorr1 variables, v.;re can say t11a.t the
bt1ilt-in f\1nction y=rand (m, n) is M.A.TLAB 's approxirr1ation t o a uniforrn (0 , 1)
r a.r1dorr1 variable. It is ar1 approxirr1ation for two reasons. First, rand produces
pset1dora.ndorn r1urr1bers; the nt1rr1bers seern randorr1 bt1t actually the ot1tpt1t of
a deterrr1inistic a.lgorithrri. Second, rand prodl1ces a dot1ble precision fioatir1g poir1t
nt1rr1ber , represented in the computer by 64 bits. Thus J\IIATL.A.B distir1gt1ishes r10
rr1ore than 264 unique dot1ble precision fioa.tir1g point nt1rr1bers . B}' corr1parision,
there are uncour1tably infir1ite real numbers in (0 , 1 ). Even t11ough rand is riot
randorr1 and does not have a continuOllS range, we car1 for all practical pl1rposes use
it as a source of independent sample values of the uniforrr1 (0 , 1) randorr1 variable.
We ha;ve alread}' ernployed t11e rand fur1ction to generate randorr1 sarr1ples of
t1niform (0, 1) randorr1 var iables. Corrver1iently, MATL.A.B also ir1cludes the bt1ilt-ir1
function randn to ger1erate rar1dom sarr1ples of standard r1ormal random variables.

function x=gaussrv(mu,sigma,m) T11us gaussrv generates Gaussia n (,, O") ran-

x=mu +(sigma*randn(m,1)); dorn variables b}' stretcl1ing and sl1iftir1g stan-
dard r1ormal randorn variables. For other con-
tinl1ous rar1dorn variables, we 11se a. techr1ique described in Theorern 6.5 that tra.ns-
forrr1s a t1niform (0 , 1) rar1dom variable U into other types of ra.ndorr1 variables .
This is explained ir1 the MATL.A.B section of Chapter 6.


==-- Quiz 4. 8__,,;;=::::::i

W rite a MATLAB f\1nctior1 t=t2rv (m) that ger1erates rn sarr1ples of a ra r1dorn var-
iable wit r1 t he PDF f 'rir> 2(t) as given in Exarnple 7.10.

Difficulty: Easy Moderate D ifficu lt + Experts Only

4.2.1 T he cumulative distribut ion func- 4.2.4 The CDF of random var iable W is
t ion of random var iable X is
(o < - 5,

0 x < - 1, w."i
- 5< 71) < -3
8 - '
Fx (x) = (x + 1)/2 -l <x< l , Fw('111) = l -3 < 71J < 3,
1 x > 1. 1 + 3(w - 3)
3 <111<5,
4 8
1 71) > 5 .

(a) W hat is P [X > 1/ 2] ?

(b) W hat is P [-1 / 2 < X < 3/4]? (a) \!\That is P [W < 4]?
(c) vV hat is I> [IXI < 1/ 2]? (b ) \i\f hat is P [-2 < W < 2]?
( d) \i\f hat is t he valu e o f a, s uch t hat ( c) \tV hat is P [W > OJ?
P [X <a] = 0.8? ( d) W hat is t he value of a s uch t hat
I> [vV < a] = 1/ 2?
4.2.2 T he CDF of t he cont inuous r andom
variable V is 4.3.1 The random variable X has proba-
bility density function
0 v < - 5,
Fv(v) = c(v + 5) 2 - 5 < v < 7, ex 0 < x < 2,
f x(x) = {
1 v > 7. 0 ot herwise.

u se t he PDF to find
(a) W hat is c?
(a) t he constant c,
(b) W hat is P [V > 4]?
(b) P [O < X < 1],
(c) \i\1hatis P [-3 <V< O] ?
( c) P [-1/ 2 < x < 1 ; 2J,
( d) \ i\f hat is t he valu e o f a, su ch t hat
(d) t he CD FF:x:(x) .
P [V >a] = 2 /3?
4.3.2 The cumulative distribut ion func-
4.2.3 In t his pro bl em , we verify t hat t ion of random variable X is
limn-too In,x l / n, = x .
(a) Ver ify t hat n;i; < fnxl < n,x + 1. (0 x < -1 ,
Fx(x)= ~ (1; +l)/2 -l <x< l ,
(b) Use part (a) t o sho'v
l1 x > 1.
n -too
rn,x l /n, = x.
Find t he P DF f x(x) of X.
( c) Use a similar a rgumen t to show t hat 4.3.3 F ind t he PDF f u(7J,) of t he r andom
limn-too ln,x J/ ri = x . variable U in Problem 4.2.4.


4.3.4 For a constant parameter a > 0, a 4.4.3 Random variable X has CDF
Rayleigh random variable X has PDF
0 x < 0,
2 2
2 - a x /2
x > 0, Fx(x) = x/ 2 0 < x < 2,
j .xx
( ) -_ a xe
{ 0 otherwise. 1 x > 2.
\i\fhat is the CDF of X?
(a) \!\That is E [X]?
4.3.5 Random variable X has a PDF of
t he form fx(x) = ~f1(::r) + ~f2(x) , 'vhere (b) \iVhat is Var[X]?

4 .4.4The probability density function of

j .1 ( x ) -_ { C1 0 < X < 2,
random variable Y is
0 othervvise,

.( ) _
j2 x -
{c2e-x x > 0, f y ( y) = { yI 2 o < y ~ 2,
0 oth erwise. 0 other,v1se.

\i\fhat conditions must c1 and c2 satisfy so What are E[Y] and Var[Y]?
that f x( x) is a valid PDF? 4 .4.5 T he ctunulative d istribution func-
4.3.6 For constants a and b, random var- tion of the random variable Y is
iable X has PDF
0 y < - 1,
. ( ) _ {
j xx -
a:i; + b:i; 0 < 1; < 1, (y + 1) /2 -l<y<l ,
0 other,vise. 1 y > 1.
\i\fhat conditions on a and b are necessary \iVhat are E[Y] and v ar[Y]?
and sufficient to guarantee that fx(x) is a
valid PDF? 4 .4.6 T he cumulat ive d istribution func-
t ion of random variable V is
4.4.1 Random variable X has PDF
0 v < -5 ,
1/4 -1 < x < 3, F v ( v) = (v + 5) 2 I 144 - 5 < v < 7)
f X (,,,'" ) --
{0 other,vise. 1 v > 7.
Define the random var iab le Y by Y
h(X) = X 2 . (a) \!\That are E[V] and v ar['!]?
(a) Find E [X] and Var[X ]. (b) \iVhat is E [V 3 ]?
(b) F ind h(E[X]) and E[h(X)].
4 .4. 7 T he cumulat ive d istribution func-
( c) F ind E[Y] and Var[Y].
t ion of random variable U is
4.4.2 Let X be a continuous random var-
< -5,
iable 'vith PDF
-5 < 'IJ, < -3,
() -- {1/8 l<x<9, -3 <11, < 3,
f xx 0 other,vise. 3 <11. < 5,
11. >5.
Let Y = h(X) = 1/ v'x.
(a) Find E[X] and Var[X ].
(a) What are E[U] and Var[U]?
(b) F ind h(E[X]) and E[h(X)].
(b) \iVhat is E [2 ]?
( c) F ind E[Y] and Var[Y].


4.4.8 X is a Pareto (ex, JJ,) random var- 4.5. 7Y is an Erlang ( n, = 2, ,\ = 2) ran-

iable, as defined in Appendix A. \tVhat is dom variable.
the largest value of n, for 'vhich the n,th mo- (a) What is E [Y)?
ment E[Xn) exists? For all feasible values
(b) \tVhat is Var[Y)?
of n,, find E[X11' ] .
(c) \t\fhat is P[0.5 < Y < 1.5)?
4.5.1 Y is a continuous uniform (1, 5) ran-
dom variable. 4.5.8 U is a zero mean continuous uniform
random variable. \tVhat is P[U 2 < \ far[U))?
(a) What is P [Y > E[Y])?
4.5.9 U is a continuous uniform ran-
(b) What is P [Y < Var[Y])?
dom variable such that E[U] = 10 and
4.5.2 The current Y across a 1 kS1 resistor P[U > 12) = 1/4. What is P[U < 9)?
is a continuous uniform (- 10, 10) random 4.5.10 X is a continuous uniform (-5, 5)
variable. F ind P [IYI < 3). random variable.
4.5.3 Radars detect flying objects by mea- (a) \t\fhat is the PDF fx(1;)?
suring the po,ver reflected from them. The (b) \tVhat is the CDF Fx(x )?
reflected power of an aircraft can be mod- ( c) \t\fhat is E[X)?
eled as a random variable Y with PDF
(d) \tVhat is E[X 5 )?
y>O ( e) What is E[ex)?
4.5.11 X is a continuous unifor1n (-a, a)
'vhere Po > 0 is some constant. The aircraft random variable. Find P [IXI < Var[X]).
is correctly identified by the radar if the re- 4.5.12 X is a uniform random variable
flected po,ver of the a ircraft is larger than with expected value JJ,x = 7 and variance
its average value. \tVhat is the probability Var[X) = 3. What is the PDF of X?
P[C) that an aircraft is correctly identified?
4.5.13 The probability density function of
4.5.4 Y is an exponential random variable random variable X is
'vith variance Var[Y) = 25.
. { (1/2)e- x/ 2 x > 0,
(a) What is the PDF of Y? f x (x) =
0 other,vise.
(b) What is E [Y )?
( c) \i\fhat is P [Y > 5)? (a) \t\fhat is P [l < X < 2)?
4.5.5 T he time delay Y (in milliseconds) (b) \tVhat is Fx(x), the cumulative distri-
that your computer needs to connect to an bution function of X?
access point is an exponential random var- (c) What is E[X], the expected value of X?
iable. (d) \t\fhat is Var[X], the variance of X?
(a) Find P[Y > E[Y]). 4.5.14 Verify parts (b) and (c) of Theo-
(b) F ind P [Y > 2 E [Y]). rem 4.6 by directly calculating the expected
value and variance of a uniform random var-
4.5.6 X is an Erlang (ri, >.)random var- iable with parameters a < b.
iable 'vith parameter ,\ = 1/3 and expected
value E [X) = 15. 4.5.15 Long-distance calling plan A of-
fers flat-rate service at 10 cents per minute.
(a) What is t he value of the parameter n,? Calling plan B charges 99 cents for every
(b) What is the PDF of X? call under 20 m inutes; for calls over 20 m in-
utes , the charge is 99 cents for the first 20
(c) \i\fhat is Var[X)?
m inutes plus 10 cents for every additional


minute. (Note that these plans measure value 1/ A has n,t h moment
your call duration exactly, \vit hout round-
ing to t he next minute or even second.)
If your long-distance calls have exponent ial
distribution with expected value T minutes, Hint: Use integration by parts (Ap-
\Vhich plan offers a lo,ver expected cost per pendix B, Math Fact B.10).
4.5.20 This problem outlines the steps
4.5.16 In this problem we verify t hat a n needed to show that a nonnegative contin-
Erlang (ri, .A) PDF integrates to 1. Let the uous random variable X has expected value
integral of the n,th order Erlang PDF be de-

noted b y E [X] = [1 - Fx (x)] dx .

In -
_1= .An:::e n- 1e - >-x
(n, _ 1)I. dx.
0 (a) l:<"br any r > 0, show t hat
First, show directly t hat the Erlang l=> DF 00

\Vith n, = 1 integrates to 1 by verifying that rP [X > r] < [ xfx(x) dx.

Il = 1. Second, use integration b y parts
(Appendix B , lV.Iath Fact B.10) to sho'v that
(b) Use part (a) to argue that if E [X ] < oo,
In= In- 1 then
4.5.17 Calculate the kth moment E [X k] lim r P [X > r] = 0.
of an Erlang (ri, A) random variable X. Use r -+ oo

your result to verify Theorem 4. 10. H int:

Remember t hat the Erlang (ri + k, .A) PDF (c) Now use integration by parts (Ap-
integrates to 1. pendix B , lVIath Fact B .10) to evaluate

4.5.18 In this problem, we outline the

proof of Theorem 4 .11.
1= 0
[1 - Fx (x) ] dx.

(a) Let Xn denote an Erlang (ri, A) random 4.6.1 The peak temperature 7.,, as mea-
variable. Use t he definition of t he Er-
sured in degrees Fahrenheit, on a July
lang PDF to show that for any x > 0 ,
day in New Jersey is t he Gaussian (85, 10)
random variab le. What is P [T > 100],
P[T < 60], and P[70 < T < 100]?

4.6.2 \i\!hat is t he PDF of Z , the standard

(b) Apply integration b y parts (see Ap- normal random variable?
pendix B , Mat h Fact B.10) to this in-
tegral to sho'v that for x > 0, 4.6.3 Find each probability.

(.\;:i;)n- le - .Ax (a) V is a Gaussian (JL = 0, a= 2) random

Fxn (x) = Fxn - i (x) - (n _ l)! variable. Find I>[V > 4].
(b) vV is a Gaussian ( = 2, a= 5) random
(c) Use t he fact that Fx1 (x) = 1 - e - .Ax variable. What is P[W < 2]?
for x > 0 to verify the cla im of Theo- (c) l:<"br a Gaussian (,a= 2) random var-
rem 4. 11. iable X, find P[X < JL + l].
(d) Y is a Gaussian (, = 50, a = 10) ran-
4.5.19 Prove by induction that an expo- dom variable. Calculate P[Y > 65].
nential random variable X \vit h expected


4.6.4 In each of t he following cases, Y is of n, years filled wit h blackboard errors, t he

a G aussian r ando m variable. F ind t he ex- total a mount in d ollars pa id can be approx-
pected value= E [Y]. im ated b y a G aussian ra ndom variable y,;i
(a) Y has standard d eviat ion a = 10 a nd wit h expected value 40n and variance lOOn,.
P [Y < 10] = 0.933. \tV hat is t he probability t hat Y20 exceeds
1000? How many years n must t he professor
(b) Y has standar d deviation a = 10 a nd teach in order t hat P [Y'.;i > 1000] > 0.99?
P [Y < OJ = 0.067.
4.6.11 S upp ose t hat out of 100 million
(c) Y h as stand ard deviation a and
m en in t he U nited States, 23, 000 are at
P [Y < 10] = 0 .977. (Find JL as a func- least 7 feet tall. Suppose t hat t he heights of
t ion of a. ) u .S. men are independen t G aussian random
(d) P [Y > 5] = 1/ 2. variables w ith a expected value of 5'10".
Let 1'l equa l t he number of m en 'vho ar e
4.6.5 Your internal body tem per ature T at least 7 1 611 tall.
in d egr ees F ahrenheit is a G a ussian ( JL =
(a) Ca lculate a x, t he standar d d eviat ion
98 .6, a = 0.4) random variab le. In terms of t he heigh t of u .s . men.
of t he <P (-) function , find P[T > 100]. Does
t his m od el seem reasonable? (b) In terms of t he <P (-) function , what is
t he probability t hat a r andomly chosen
4.6.6 The temper atu re T in t his t hermo- m an is at least 8 feet tall?
statically con t rolled lecture hall is a G aus- (c) \t\f hat is t he probability t hat no m an
sian r andom var iable wit h expected value a live in t he u nited States tod ay is at
JL = 68 degrees Fa h renheit . In addit ion , least 7' 6" tall?
P[T < 66] = 0.1587. \i\fhat is t he varia nce
(d ) vv hat is E[N]?
of T?
4.6.12 In t his pr oblem , we verify t hat for
4.6.7 X is a Gaussian r a ndom var iab le
x > 0,
w it h E [X ] = 0 a nd P [I XI < 10] = 0.1.
\ i\f hat is t he standard d eviation a x?

4.6.8 _A.. function commonly used in com-

. ()
<P x = 1
( x)
+ 21 erf J2 .
m1u1ications textbooks for t he tail proba-
bilit ies of G aussian ra ndom variables is t he (a) Let Y have a G a ussian (0 , 1/ J2 ) d is-
complemen tary error function, d efined as t r ibution and show t hat

erfc(z) = J;. f, 00

e- x
dx . Fy(y) =
/_- oo f y(u) d?L =
+ erf(y) .
Sho'v t hat (b ) O bser ve t hat Z = J2y is Gaussia n

~ erfc ( 72) .
(0 , 1) and sho'v t hat
Q(z) =

4.6.9 T he peak temper ature T, in d egrees

if? (z) = Fz(z) = Fv (72).
Fahrenheit, on a J uly d ay in Antarctica is
4.6.13 This problem ou tli nes t he steps
a Gaussian r a ndom variable 'vith a vari-
need ed to sho'v t hat t he G aussia n P DF in-
ance of 225. W it h p robability 1/ 2, t he tem-
perature 'J' exceeds -75 d egrees . W hat is tegrates to unity. For a Gaussian ( ,, a ) ran-
P[T > OJ? \i\fhat is P[T < -100]? dom variable vV, we ,vill sho'v t hat

4.6.10 A pr ofessor pays 25 cents for each

blackboard error made in lecture to t he stu-
I = 1_: f 1tv ('UJ) d'UJ = 1.

d ent who points out t he er ror. In a career



(a) Use the substit u t ion x = ( 'ID - )/a to The average probability of bit error, a lso
show that kno\vn as the bit error rate or BER, is

r.;- /_
v 211

- oo
- x- / 2
dx. P e= E [Pe(Y)] = 1_: Q(v12if)fy(y) dy .

(b) Show that Find a simple formula for the BER P e as a

00 00
function of the average SNR 'Y .
1 /_- oo / _- oo e -
12 = 211 (x 2 +y 2 ) / 2 d:i~ dy . 4.6.16 i\.t t ime t = 0, the price of a stock
is a constant k dollars. At some future time
( c) Change to polar coordinates to show t > 0 , the price X of t he stock is a u niform
t hat 1 2 = 1. (k - t , k + t) random var iable. At this t ime
t , a P1lt Opti on at Strik e k (which is t he
4.6.14 At t ime t = 0 , the p rice of a stock right to sell the stock at price k) has value
is a constant k dollars. At time t > 0 the (k - x)+ dollars where t he operator(-) +
price of a stock is a Gaussian random var- is defined as (z)+ = max(z, 0). Similarly
iable X 'vith E[X] = k and v ar[X] = t. At a Call Option at Strike k (the r igh t to buy
time t , a Call Option at Strike k has value the stock at price k) at t ime t has value
V = (X - k )+, 'vhere the operator (-) + is (X-k)+.
defined as (z)+ = max(z, 0) . (a) At time 0, you sell t he put and receive
(a) F ind the expected value E [V]. d dollars. At t ime t, you purchase the
put for (k- x) + dollars to cancel your
(b) Suppose you can buy the call option for
posit ion. Your gain is
d dollars at time t = 0. At time t , you
can sell the call for V d ollars and earn R = gp(X) = d - (k - X )+.
a profit (or loss perhaps) of R = V - d
do llars. Let do denote the value of d F ind the central moments E [R] and
such that P[R > O] = 1 / 2. Your strat- \ !ar[R].
egy is t hat you buy the option if d < do (b) In a short straddle, you sell the put
so t hat your probabili ty of a profit is for d do llars and you also sell the call
P [R >OJ > 1/ 2. F ind do. for d dollars. At a fu t ure t ime t > 0,
( c) Let d1 denote the va lue of d such t hat you purchase the p u t for (k - x) + dol-
E [R] = 0.01 x d. Now your strategy is lars and t he call for ( X - k )+ dollars
to buy t he option if d < d1 so that your to cancel both positions. Your gain
expected return is at least one percent on the put is gp(X) = d - (k - X) +
of t he option cost. Find di. dollars and yo1u gain on the call is
gc(X) = d - (X - k)+ dollars. Your
( d) Are the strategies "B11y the option if
net gain is
d <do" and "Buy t he option if d < d1 "
reasonable strategies?

4.6.15 In mobile radio communications, F ind the expected value E[R'] and vari-
the radio channel can vary randomly. In ance Var[R'].
particular, in communicating \Vith a fixed (c) Explain why selling the straddle m ight
transmitter po,ver over a "Rayleigh fading"
be attractive compared to selling just
channel, the receiver signal- to-noise ratio Y the put or just the call.
is an exponential random variable with ex-
pected value '"'( . fvloreover, \Vhen Y = y , the 4.6.17 Continuing Problem 4.6.16, sup-
probability of an error in decoding a trans- pose you sell the straddle at t ime t = 0 a nd
mitted bit is Pe(Y) = Q( J2y) where Q(-) is liquidate your posit ion at t ime t , generat-
t he standard normal complementary CDF. ing a profit (or per haps a loss) R'. Find the


P DF f R1(r) of R'. Supp ose d is sufficient ly (a) W hat is F-vv('UJ)?

large t hat E [R'] > 0. \iVou]d you be int er-
(b ) \tVhat is f w('llJ)?
ested in selling t he short straddle? Are you
getting something, namely E[ R'] d ollars, for (c) \tV hat ar e E [vV] and Var[ vV]?
4.7.1 Let X be a r andom var iable \Vit h 4.7. 7 F or 80% of lectures, Professor X ar-
CDF rives on t ime and starts lecturing \vit h d elay
T = 0. \tVhen Professor Xis late, t he start -
(0 x < - 1, ing t ime d elay 'J' is uniformly distributed
x/3 + 1 / 3 -1 <x< 0 , bet,veen 0 and 300 seconds. F ind t he CDF
Fx(x) = and PDF of T .
x/3 + 2 / 3 O <::r < l ,
1 1 <::i;. 4.7.8 \tV ith probability 0.7 , t he t oss of
an O ly mpic shot -putter t r avels D = 60 +
Sketch t he CD F and find X feet, 'vher e X is a n exponen t ial ra ndom
(a) P[X < -1] a nd P [X < -1], variable \vit h expected value , = 10. O t h-
(b) P[X < OJ a nd P[X < O], erwise, \Vit h probability 0.3 , a foul is com-
mitted by stepping outsid e of t he shot -put
( c) P [O < X < 1] and P [0 < X < l].
circle and \Ve say D = 0. \tVhat ar e t he CDF
4.7.2 Let X be a r andom var ia ble wit h and PDF of r andom variable D ?
4.7.9 For 70% of lectures, Professor Yar-
0 x < - 1, r ives on t ime. W hen Professor Y is late,
t he arrival t im e delay is a cont inuous ran-
Fx(x)= x/4 +1 / 2 -l <x< l ,
dom var iable uniformly d istribut ed from 0
1 1 < ;:i;. t o 10 m inut es . Yet, as s oon as f>rofessor
Y is 5 minutes late, a ll t he students get
Sketch t he CD F and find
up a nd leave . (It is unknow n if Professor
(a) P[X < -1] a nd P[X < -1]. Y still conducts t he lecture.) If a lecture
(b) P [X < OJ and I>[X < OJ. star ts \vhen I>rofessor Y arrives and al\vays
(c) P[X > 1] a nd P [X > l ]. ends 80 ininut es aft er t he scheduled start -
ing t ime, 'vhat is t he PDF of T, t he length
4. 7 .3 For rando1n variable X of Prob- of t ime t hat t he students observe a lecture .
lem 4.7.2, find f x(1;), E [X ], and Var[X].
4.8.1 Write a function y=qui z 31rv(m)
4.7.4 Xis Bernoulli random variable \vit h t hat produces m, samples of random var-
expected value p. \iVhat is t he P D F f x(1;)? iable Y d efined in Quiz 4.2.
4. 7 .5 X is a geomet ric ra ndom variable 4.8.2 For t he G aussian (0 , 1) complemen-
\Vit h expected value l / p. W hat is t he PDF tary CDF Q(z), a useful numerical approx-
f x(x)? imation for z > 0 is
4.7.6 \tVhen you make a phone call , t he
line is busy \Vit h probabili ty 0.2 a nd no - z2/ 2
= ( ~ ant
5 n)
Q(z) e ,
one answers \vit h probability 0.3. The ra n-
dom variable X d escribes t h e con versation
t ime (in m inutes) of a pho ne call t hat is where
answered. X is an exponen t ial random var-
iable wit h E[X ] = 3 minutes . Let t he ran- t = - - -1- - - a1 = 0.127414796
1 + 0.231641888z
dom variable W denot e t h e conversation
a2 = -0.142248368 a3 = 0.7107068705
t ime (in seconds) of all calls ( W = 0 when
a4 = -0. 7265760135 a 5 = 0.5307027145 .
t he line is busy or t here is n.o ans\ver. )

To compare t his ap proximat ion to Q(z) 1 use If we gener ate a large number n, of sam p-
l\II ATLAB to gr aph les of random variable X, let ni denote t he
number of occurrences of t he event
_ Q(z) - Q (z)
e(z - Q(z) . {i ll < x < (i + 1) fl } .

vV e would expect t hat

4.8.3 u se exponentialrv .rn and Theo-
lim ni = f x (ill) fl ,
rem 4.9 and to \Vrite a l\IIATLAB func- n--+oo ri
tion k=georv(p ,rn) t hat generates m, samp-
les of a geom etr ic (p) ra ndom variable J{ . or equivalent ly,
Com pare t he r esult ing algorit hm to t he n
technique employed in Problem 3.9.8 for
n --+ oo riu
: = f x (ill) .
u se l\II ATLAB to confirm t his \Vit h fl = 0.01
4.8.4 Applying Equation ( 4.14) \vit h x re-
(a) a n exponent ial (.A = 1) r andom var-
placed by i ll a nd d1~ r eplaced by ll 1 we ob-
iable X and fo r i = 0, . .. , 500,
(b) a Gaussian (3, 1) r andom var iable X
P [ill < X <ill+ fl]= f x (ill) fl. a nd for i = 0, . . . , 600.

Multiple Random Variables

Chapter 3 and Chap ter 4 an alyze exper irr1ents in vvhich an outcome is one nl1rr1-
ber. Begirrr1ing vvit h this c11apter , we an alyze exper irnents in which an outcorr1e is
a collection of nurnbers. E ach r1urr1ber is a sarnple value of a randorr1 variable. T11e
probab ility rr1odel for sucr1 an experirnent contains t11e propert ies of the ir1dividual
randorr1 variables arid it also cor1tains the relat ionships among the r andorn v ariables.
Chapter 3 considers only d iscrete randorr1 \rariab les and Chapter 4 considers or1ly
continuous r andorn variables. The preser1t ch ap ter considers all r andorn variables
because a 11igl1 proportion of the definitions and t 11eorerr1s appl}' t o both discret e
and continl1ous random \rar iab les. Hov.rever , just as vvith ir1dividual r andom vari-
ables, the details of r1umerical calculatior1s depend or1 v.rhether random variables are
discret e or continuOllS. Conseql1ent ly, vve find t hat rnany forrr1ltlas corne in pairs.
One forrnula, for discrete r andorn \rar iab les , contains sums , and the other formula ,
for contir1uous randorn variables, contair1s ir1tegrals.
Ir1 this c11apter , \Ve cor1sider experiments that produce a collection of r andorn
variables, X 1 , X 2 , ... , X n, v.rher e n, ca n be an}' ir1teger. For most of this cha pter , stlld}' 'n = 2 randorr1 va,riables: X arid y . A p air of randorr1 variables is enough
to shov.r t 11e important cor1cepts and useful problern-sol\ring t echr1iques. Moreo\rer,
the definitions arid theorerns we introduce for X and Y gener alize to n, randorn
variables . These generalized definitions appea,r n ear the end of t his c11a pter in
Sect ion 5.10.
vVe also note t11at a pa,ir of r ar1dom variables X and Y is the sam e as t he t\vo-
dirr1ensior1al vector [X YJ '. Sirr1ilar ly, the r andom variables X 1 , ... , X n ca r1 be
v.rritten as t 11e n, dirr1er1sior1al vector X = [ X 1 X n] Since t h e corr1ponents
of X are rar1dorn variables, X is called a ran,dorn 'Uector. ThllS t 11is chapter begins
our study of randorn \rectors . This subj ect is contir1ued in Chapter 8, which uses
t echniques of linear algebra t o develop furt her t he properties of random vectors .
We b egin h ere \vi th t11e definition of F x, y (;i; , y) , t he join,t c'u,rn:tJ,lati've distri b11,-
tion, f11,n,ction, of t vvo r andorn var iables , a generalization of t 11e CDF introduced in


Section 3.4 a rid agairi iri Section 4.2. The joint CDF is a corriplete probabilit:y
rriodel for any experiment t hat prod11ces randorri variables. Ho,vever, it not
very t1seful for arialyzing practical experiments . More llSeful rriodels are Px, y(x;, y),
the jo'irit probability rnass f11,n,ction, for t'\vo discrete random variables, presented in
Sections 5.2 and 5.3, and f x,y(x, y ), t he j oirit pr obab,ility den,sity f'un,ctiori of t'\vo
contin11ot1s raridom variables, presented iri Sections 5.4 and 5.5. Sectiori 5.7 con-
siders ft1nctions of two rar1dorn variables and expect ations, respectivel:yr. We extend
the definit ion of independent events to define independent raridom v ariables . The
subject of Section 5.9 is tlie special case in '\vhich X and y are Gaussian.
P airs of raridom variables appear in a wide variety of practical sit uations. An
exarriple is the strength of t he signal at a cellular t elephone base st ation receiver
CY) and the dist ance (X ) of t he t elephorie frorn t lie base station. Anot lier example
of tvvo ra ridom variables t h at \Ve en counter all t he tirne in our researcli is the
sigrial (X ), emitted by a radio t ransrriitter , and t lie corresponding signal (Y) t liat
event ually arrives at a receiver. In practice \Ve observe y , but vve reall:yr v.ran t to
know X. Noise and distort ion preverit us from observing X directl:yr, and use a
probabilit:yr rriodel to estirriate X.

Example 5. 1
We would like to measure random variable X, but we instead observe

y = x + z. (5.1 )

The noise Z prevents us from perfectly observing X. In so me setti ngs, Z is an interfering

signal. In the simp lest setti ng , Z is just noise inside the circu itry of your measu rement
device that is unrelated to X. In t his case , it is appropriate to assu me that t he signa l
and noise are independent; t hat is , the events X = ;r; and Z = z are independent. Thi s
simple model produces th ree random variables , X, Y and Z , but any pair completely
specifies the remain ing random variable. Thu s we w ill see that a probabil ity mode l for
the pair (X , Z ) or for the pair (X, Y ) wi ll be sufficient to analyze experi ments related
to th is system .

5 .1 Joint Cumulative Distribution Function

The joint CDF Fx,Y(x;, y) = P [X < x, Y < y] is a complet e prob-

ability rriodel for a ny pair of raridom variables X and y .

In an experiment t liat prodt1ces one raridom variable, events are points or iritervals
on a line. In a ri experiment that leads t o two ra ndorri v ariables X and Y , eacli
out corne (x, y) is a point in a plane and events a re poirits or a reas in the pla ne.
J11st as t lie CDF of one ra ndorn variable, Fx(;r:), is the probability of the interval
to the left of x, t lie joint CDF Fx ,Y(x,y) of t'\vo randorri variables is t he probabilit:yr
of the a rea below and to t he left of (x, y) . This is t he infinite region that inch1des
the shaded area in Fig11re 5. 1 and e\rerything below arid t o the left of it .



{X<x, Y< y} (x,y)


Figure 5.1 T he area of th e ( X , Y ) plane corresponding to t he joint cun1ulative distribut ion

function Fx,Y(x, y) .

Definition 5.1 Joint Cumulative Distribution Function ( CDF)

T he joint cumulative distribution function of ran,dorn variables X an,d Y is

Fx,y (x,y) = P [X < x;, Y < y) .

The joint CDF is a complete probability rr1odel. The not ation is an extension of
the r1otation convention a dopted in Chapter 3. T11e subscripts of F , sep arat ed by
a cornrr1a, are the narnes of t he tvvo r andorn variables . E ach r1arne is an upper case
letter. vVe llSl1ally vvrite t h e arguments of t h e functior1 as the lower case let ters
associat ed \vith the randorr1 variable r1arnes.
The joint CDF has properties that are direct conseql1ences of t he definit iori. For
exarr1ple, \Ve note that t he event { X < x} st1ggest s that Y can ha,re any \ralue so lor1g
as t he condit ion on X is rr1et. T11is corresponds t o t he joint event {X < x;, Y < oo}.

Fx(x;) = P [X < x;) = P [X < x,Y < oo) = lirr1 Fx ,Y (x;,y ) = Fx,Y (x,oo) . (5.2)

vVe obtain a sirnilar result v.rl1en consider the event {Y < y }. The follo,ving
theorern st1rnrna rizes sorne b asic properties of the joint CDF.

==;;;;; Theorem 5.1- - -

For ariy p air of r an,dorn v ariables, X , Y ,
{a) 0 < Fx,Y(x;, y ) < 1, {b) Fx ,Y( oo, oo) = 1,
{c) Fx(x ) = Fx ,Y(x, oo ), {d) Fy( y ) = Fx ,y(oo, y ),
{e) Fx,y(:;c, - oo) = 0, {f ) Fx,y(- oo, y) = 0,
(g) If x; <x1 an,dy < y1, then,

Although its definit ion is sirnple, vve rarely use the joint CDF t o study probability


rr1od els . It is easier to vvork '\vit h a prob a bility rr1ass f\1nction when t h e r a ndorr1
varia bles a re discrete or vvith a probabilit:y d ensity functior1 if they a re cont ir1uo11s .
Consider t he joir1t CDF in t he follo'\ving ex ample.

- - Example 5.2- -
X yea rs is the age of ch iId ren e ntering first grade in a school. Y yea rs is the age of
chi ldren entering second grade . The joint CDF of X and Y is

0 x < 5)
0 y < 6,
(1'; - 5)(y - 6) 5<1'; < 6, 6 <y< 7,
Fx,y (:i;,y) = (5.3)
y- 6 x> 6>6 <y< 7,
x- 5 5 <1'; < 6,y> 7 ,
1 otherwise.

Find Fx(1';) and Fy(y) .

Using Theorem 5.l( b) and Theo rem 5.l(c) , we find

0 x;<5 , 0 y < 6,
Fx (x) = x - 5 5 < x < 6, Fy(y) = y - 6 6 < y < 7, (5.4)
1 x;> 6 , 1 y > 7.

Refe rring to Theorem 4 .6, we see from Equation (5.4) that X is a contin uous uniform
(5, 6) rando m variable and y is a continuous uniform (6>7) ra ndom variable.

In this ex a rr1ple, \Ve n eed to r efer t o s ix differ e nt r egions in t 11e x , y pla ne and
three differen t forrnulas t o express a proba bility rr1odel as a joint CDF. Section 5.4
introduces the joint proba bility d ensit}' function as a r1other representation of the
probability m od el of a pa ir of ra ndorn v ariables f x ,y(x;, y ). For childrens' ages X
a nd Yin Exa rr1ple 5.2>"''e '\vill sl10V1r in Example 5.6 t h at the CDF Fx,y(x, y) irr1plies
that t 11e joir1t PDF is t he simple expression

1 5< ;r:< 6, 6 <y< 7,

1x ,Y (x;, y) = (5.5)
0 otl1erV1rise.

To get ar1other idea of the complexity of llSing t 11e joint CDF, try prov ing t 11e
following t11eorern, which expresses the probability that an outcorne is ir1 a rect a r1gle
in t 11e X , Y pla ne in t erms of t 11e joint CDF.

- - - Theorem 5.2'- - -

P [x;1 < X < x 2, Y1 < Y < Y2] = Fx,Y (:r:2, Y2) - Fx,Y (x2, Y1)
- Fx ,Y (x 1,Y2) + Fx,Y(x1,Y1) .


The st eps r1eeded to prove t he theorem are outlined in Problern 5.1. 5. The theorerr1
sa}'S that to find t11e probability that ar1 outcome is in a rectangle, it is necessary
to evaluate t he joir1t CDF at all four corners . vVh er1 t11e probability of interest
corresponds t o a nonrect a,r1gl1lar area, using the joint CDF is even more complex.

Quiz 5.1- - =
Express t11e follovving extrerne values of the joir1t CDF Fx,Y(x;, y) as nl1mbers or in
terms of t he CDFs Fx(x;) and Fy(y).
(a) Fx,y( - oo, 2) (b) Fx,Y( oo, oo)
(c) Fx,y(oo,y) (d) Fx,y(oo, - oo)

5.2 Joint Probability Mass Function

For discrete random variables X and Y , t he joint P l\IIF Px,y(x , y) is

the probability that X = x arid y = y . It is a corr1plete probabilit}r
model for X and Y.
Corresponding t o the PJ\l[F of a single discrete r ar1dom v ariable, ha;ve a proba-
bility rnass function of tvvo variables.

Definition 5.2 Joint Probability Mass Function ( PMF)

The joint probabilit y mass f unction of discrete ran,dorn 'variables X arid Y is

Px,Y(x,y) = P [X = x;, Y = y) .

For a pair of discrete ra,r1dom variables, the j oint PMF Px ,Y(x, y) is a complet e
probability rnodel. For a,ny pair of real ntrrnbers, t h e PMF is t he probability of
observir1g t11ese r1urnbers. The r1otation is consistent v.rith t h at of the joint CDF.
The upper case Sl1bscripts of P , separated by a cornma, a re t he n am es of t h e
r andorr1 variables. We usually 'ivrite the argurnents of the f\1nctior1 as t11e lo'iver case
let ters associated witl1 the randorn \rariable names . Corresponding t o S x, t11e range
of a sir1gle discrete randorr1 variable, we use t11e notation Sx,Y to denote the set of
possible values of t11e p air ( X, Y). That is,

Sx,Y = {(x; , y)IPx,y(x , y) > O}. (5.6)

K eep in rnind t11at {X = x; , y = y} is a n event in a n experimer1t. That is, for

this experirnent, t here is a set of observations that leads t o both X = x; and y = y .
For any x and y, we find Px,Y(x; , y) by Stlrr1rning t he probabilit ies of all outcornes
of t11e experirr1ent for whicl1 X = x and Y = y .


There are various vvays t o represer1t a joint P1!{F. We use t11ree of t11err1 in the
follovvir1g exa rr1ple: a graph, a list , and a t able.

- - - Example 5.3- - -
Test two integrated circuits one after t he other. On each test, the possible outcomes
are a, (accept) and r (reject). Ass ume t hat all circu its are acceptable with probability
0. 9 and that the outcomes of successive tests a re in dependent. Count t he number of
acceptable circuits X and count t he number of successful tests Y before you observe
the first reject. ( If both tests are successful, let y = 2.) Draw a tree d iagram for the
experiment and find the joint P MF Px,Y(x, y) .

The experime nt has the tree diagra m

0.9 a aa X = 2,Y = 2
shown to t he left. The samp le s pace of
r ar X = l ,Y = l
t he experiment is

a ra X = l ,Y = O
S = {a a,, a, r, r a, , rr} . (5 .7)
Observing the tree diagram, we compute
r rr X = O,Y = O

P [aa] = 0.81 , P [ar ] = 0.09 , (5.8)

P [ra ] = 0.09, P [rr ] = 0.01. (5.9)

Each outcome speci fi es a pair of values X and y . Let g(s) be the function that
transforms each o utcome s in t he sample space S into the pair of random va riab les
(X , 'Y). Then

g(aa) = (2 , 2) , g(a,r) = (1, 1), g(ra) = (1, 0) , g(rr) = (0, 0). (5.10)

For each pair of values x, y, Px,Y(x, y) is the sum of the probabilit ies of t he outcomes
for which X = x; and Y = y. For example, Px,y(l , 1) = P [ar].

Px y(x, y) y = O y = l y =2 The joint P MF can be represented by the t able o n

x; = 0 0.01 0 0 left , or , as shown below, as a set of labeled poi nts
x; = 1 0.09 0.09 0 in the x; , y plane where each poi nt is a possible
x; = 2 0 0 0.81 value (probabi lity > 0) of the pair (x, y), or as a
simple list:
.8 1
0.81 x = 2, y = 2,
0.09 x = 1, y = 1,
Px,Y (x, y) = 0.09 x = 1, y = 0,
09_ _,,_ x 0.01 x = 0, y = o.
0 1 2 0 otherwise

Note that all of the probabilities add t1p to 1. This reflects the second axiorr1


of probability (Sectior1 1.3) t hat st ates P [S ] = l . Using t 11e notation of r andorn

variables. we write t his as

L L Px,y (x , y) = l. (5.11 )
:i:ESx yESy

As defined in Chapter 3, t11e r an ge Sx is the set of all values of X v.rith nonzero

probability and similarly for S y . It is easy t o see the role of t he firs t a xiorn of
probability in t 11e PMF : Px,Y(x, y) > 0 for all p a irs ;r;, y. T11e third axiorr1, w hich
has to do with t he 11r1ior1 of rn11t t1a.1ly excl11sive events, takes us to ar1other irnportant
propert}' of the joint PMF.
We represent ar1 ever1t B as a r egion in t he X , Y pla r1e. Figure 5.2 s11ovvs t\vO
exarr1ples of ever1ts . \ e vvot1ld like t o find the probabilit}' that the p air of ra ndorr1
variables (X , Y ) is ir1 t 11e set B. vV hen (X , Y) E B , we sa}' t he ev en t B occurs.
]\/Ioreover , vve \vrite P[B] as a short hand for P[(X , y ) EB]. The r1ext t heorerr1 says
that we can find P [B] by a.dding the probabilities of all points ( x;, y) that a re in B.

Theorem 5.3
For discret e r an,dorn variables X an,d Y an,d an,y se t B in, t he X , Y plan,e, the
pr obability of the e'verit { (X, Y) EB } is

P [B] = L Px,Y(x,y) .
('.1;, y ) EB

The followir1g example l1ses Theorern 5.3.

Example 5.4
Continu ing Example 5.3, find t he pro babil ity of t he event B t hat X, t he number of
acceptable ci rcu its, eq ua ls Y, the number of t ests before o bserving t he first failure.

Mathematically, B is the event {X = Y }. Th e element s of B with nonzero pro babi lity


B n Sx,Y = {(0, 0) , (1 , 1),(2, 2)}. (5.12)


P [B] = Px,Y (0, 0) + Px,Y (1, 1) + Px,Y (2, 2)

= 0.01 + 0.09 + 0.81 = 0.91. (5.13)

If vve vievv x;, y a.s t he outcome of ar1 experiment , then Theorern 5.3 sirnpl}' says
tha t to fir1d the proba bility of a n e\rent , vve s um O\rer a ll the outcorr1es ir1 t h a t
event . In esser1ce, Theorern 5.3 is a restaternent of T heorerr1 1.5 in t erms of r andorn
variables X and Y and joint PMF Px,Y(x, y) .


y y
B={X-t }' < 3 } B={X- + y- < 9}


Figure 5.2 Subsets B of the (X , Y) plane. Poin ts (X , Y) E Sx ,Y are n1arked by bullets.

Quiz 5.2
The joint PMF PQ ,G( q, g) for rar1dom variables Q and G is giv er1 in the follovving

P. c(q,g) g= O g= 1 g= 2 g= 3
q= 0 0.06 0.18 0.24 0.12
q= 1 0.04 0.12 0.16 0.08

Calculate t11e follov.ring probabilities:

(a) P[Q = 0) (b) P[Q = G]
(c) P[G > 1) (d) P[G>Q)

5 .3 Marginal P MF

For discrete random variables, the rnarginal PMFs Px( x;) and
Py(y) are probability models for t11e individual randorr1 variables
X a.nd Y but they do not provide a cornplete probability model
for the pair X , Y.

In an experirnent that produces two randorn variab les X and Y, it is alwa}'S

possible to consider one of the r ar1dom variables , Y , arid ignore t he other one,
X. In t11is case, we car1 t1se the m ethods of Chapter 3 t o a na l}rze the experirr1ent
and derive Py(y), vvhich contains the probability rr1odel for the randorr1variable of
interest . On t11e otl1er 11ar1d, if vve have already a nalyzed the experirnent to derive
the joint P JVIF Px,Y(x;, y), it v.ro11ld be corrvenient to derive Py(y) from Px,Y(x, y)
vvithout reexarr1inir1g the d etails of the experirr1er1t.
To do s o, '""'e v ie\v x;, y as t he 011tcorne of ar1 exp eriment and observe t hat
Px ,y(x , y) is the probabilit}' of an outcorr1e . Moreover , {Y = y} is ar1 event , so
that Py(y) = P [Y = y) is t he probability of ar1 event. Theorern 5.3 relates t he


probabilit}' of a n event to the joint P 1!fF. It implies t h at we can find Py(y) by

surnming Px,y(x; , y) over all points in Sx,Y witl1 t11e property Y = y . I n the surn ,
y is a const ant , a nd each terrr1 corresponds to a value of x E Bx. Sirnilarl}', we can
find Px( x) b}' surr1ming Px,Y( x, y) over all points X , Y s11ch t hat X = x; . "\"/Ve st ate
this rr1atl1err1atically in t he next theorerri.

- - - Theorem 5.4
For discrete ra/ndorn variables X an,d Y 'UJith jo'irit PM.F Px ,Y(x;, y),

Px(x;) = L Px,Y(x,y), Py (y) = L Px,Y (x; , y).

yESy :r;ESx

T11eorem 5.4 shows t1s how to obtain t h e probability rnodel (PNIF) of X , a nd

the probability model of y given a probability rr1odel (joint PMF) of X a nd Y.
vVhen a rar1dom v ar iable X is part of a.r1 experirr1ent that produces t-vvo r a.r1dom
variables, we sorr1etirr1es r efer t o its PMF as a rnargin,al probability rnass f11,rict'ion,.
This terminology com es frorr1 t11e m atrix representatior1 of the joir1t P 1!fF. B}' adding
rows a nd colurnns a nd v.rriting the results in the rr1argir1s, -vve obtain the rnarginal
P1!fFs of X a nd Y. vVe illustrate t his by referer1ce to t11e experirnen t in Example 5.3.

i::::::== Example 5.5

Px.Y x, y y= O y= 1 y= 2 In Examp le 5.3, we found X and Y have the
x; = 0 0.01 0 0 joint PM F shown in th is table. Fi nd the margina l
x; = 1 0.09 0.09 0 PM Fs fo r t he ra ndom va ri ables X and Y .
x; = 2 0 0 0.81

We not e that bot h X and y have range {O, 1, 2}. Theorem 5.4 g ives
2 2
Px (0) = L Px,Y (0 , y) = 0.01 Px( l) = L Px,Y(l ,y) = 0.1 8 (5. 14)
y=O y=O
Px (2) = L Px,Y (2 , y) = 0. 81 Px(x;) = 0 x; -=J 0, 1, 2 (5. 15)

Referr ing to t he table representat ion of Px,Y( x, y), we observe t hat each val ue of Px(x)
is t he resu lt of addin g all the entries in o ne row of t he table. Simil arly, the formul a
for t he P M F of Yin T heorem 5.4 , Py(y) = 2=xESx Px,Y(x;, y), is the sum of a ll t he
ent ri es in one column of the table. We display Px(x) and Py(y) by rewriting the table
and placing t he row sums and colum n sums in t he margins.

Px.y(;r; , y) y= O y= l y = 2 Px(x )
x; = 0 0.01 0 0 0.01
.T, -- 1 0.09 0.09 0 0.18
.T, -- 2 0 0 0. 81 0. 81
Py (y) 0.10 0.09 0. 81


T hus the column in th e right margin shows Px(x;) and the row in the bottom margin
shows Py(y) . Note that the s um of all the ent ri es in th e bottom margin is 1 and
so is the sum of a 11 the entries in t he right margin. This is simply a verifi cation of
T heorem 3.l(b), which states that the PMF of a ny random variable must sum t o 1.

=== Quiz 5.3~==:::::::1

The probability mass furictiori PH ,B(h) b) for t he ran dorn variables H arid B is
given in t lie follov.ring table. F ind t he rnargirial P l\/IFs PH(h,) and PB(b).

PH B (h,) b) b= 0 b= 2 b= 4
h, = -1 0 0.4 0.2
h, = 0 0.1 0 0.1
h, = 1 0.1 0.1 0

5.4 Joint Probability Density Function

The rriost l1sef\1l probability rnodel of continuous random variables

X arid y is the joint PDF 1x ,y(x) y) . It is a generalization of the
PDF of a sirigle ra ndorri variable.

Definition 5.3 Joint Probability Density Function (PDF)

T he j oin,t PDF of the cor1,tir1/UO'US raridorn 'Varia bles X an,d Y is a fv,n,ction, f x ,y( x) y)
111ith the property

Fx ,Y(x;,y) = J'.r; JY f x,Y(v,)'v) dvdv,.

-()() -()()

Giveri Fx ,Y(x , y) , Definition 5.3 irriplies t h at 1x,Y(x;, y) is a derivative of t he CDF.

=== Theorem 5.5;-:==

. , )
f X Y ( X,y - _ 8 Fx ,Y (x) y )
:.:i :.:i
' ux u y

F or a sirigle randorri variable X , the P DF 1x (x) is a measure of probability per

unit length. For t vvo random variables X arid Y ) t he joint P DF f x ,y(x, y) rneasures
probability per unit area. In part icular ) frorn the definition of t lie PDF ,

P [x < X < x + dx,y < Y < y + dy) = f x ,Y(;r;)y) dxdy . (5.17)

Definition 5.3 and Theorem 5. 5 derrioristrate t liat tlie joint CDF Fx,Y(x, y) an d t he
joint PDF f x ,Y(x;, y ) represent tlie same probability rriodel for raridom variables X


and Y . In t lie case of one rand om variable> vve fo t1nd in Chapter 4 t hat t he PDF
is typicall}' rriore useful for problerri sol-virig. The advant age is even stronger for a
pair of random variables.

c:::== Example 5.6

Use the joint CDF for ch il d rens' ages X and y given in Example 5.2 to derive t he joint
PDF presented in Equation (5.5).

Referring to Equation (5.3) for the joint CDF Fx,y(:i;, y) , we must eva luate the partial
derivative 8 2 Fx ,y(;r;, y)/ f};,r;f}y for each of the six regions specified in Equation (5.3).
However, 8 2 Fx ,y(;r;, y)/ 8x8y is nonzero only if Fx.Y(:i;.y) is a function of both x and
y . In this example, only the region {5 <:i; < 6, 6 <y < 7} meets this requirement.
Over this region,

82 8 8
fx y(:i;, y) = ,. [(:i; - 5)(y - 6) ) = - [:i; - 5)- [y - 6) = 1. (5 .18)
' 8x8y ox &y
Over a ll other regions , the joint PDF f x,Y(x, y) is zero.

Of course, riot every functiori f"x ,y(;r;> y) is a joint P DF . Proper t ies (e) arid (f) of
T heorern 5.1 for the CDF Fx ,Y(x , y) irriply corresponding properties for the PDF .

~-- Theorem 5.6

A joirit PDF f x ,Y( ;i;, y) has the follo'uJin,g properties correspon,d'in,g to first an,d secon,d
ax;iorns of probab'ility (see Section, 1. 3) :
(a) fx ,y(:i;, y) > 0 f or all (:i;, y); (b) 1: 1: Jx,Y(x,y) dxdy = 1.

Given an experirrient tl1at produces a pair of corit in11ous ra.ndorn variables X and
Y, an e\ren t A corresporids to a region of the X, Y plane. The probability of A is
t he double integral of f x ,y(:i;, y) over the region A of t he X> Y plane.

=== Theorem 5.7- - -

The probability that the coritin,uo'us ran,dorn 'variables ( X, Y) are in, A is

p [A] = JJ f X,Y(x,y) dxdy .


Example 5.7
Random variables X and Y have joint PDF

c 0 < x < 5, 0 < y < 3,

(5 .19)
0 otherwise.

Find the constant c and P [A) = P[2 < X < 3, 1 < Y < 3).


The large rectangle in the diagram is t he a rea of nonzero probabi lity. T heo rem 5.6
states that t he integral of the joint PDF over th is recta ngle is 1:

1= 110

cdydx = 15c. (5.20)

T herefore, c = 1/ 15. T he small da rk rectangle in the dia-

gram is t he event A = {2 < X < 3 >1 < Y < 3}. P[A] is the
integra l of t he P DF ove r t his rectangle , wh ich is

p [Al = r r
12 11 15
dv dv, = 2I 15. (5.21 )

This probability mode l is an examp le of a pair of random variables unifo rmly d is-
tributed over a rectangle in the X , Y pla ne.

The follo-vving ex arr1ple derives t 11e CDF of a pair of randorr1 variables t hat has
a joint P DF t h at is easy to 'ivrite rr1atherr1atically. The purpose of the extur1ple is
t o int roduce t echniques for analyzir1g a rnore corr1plex probabilit}' rnodel than the
one in Example 5.7. T ypically, 'ive ext ract ir1teresting ir1forrr1ation from a rnodel b}'
integrating t11e PDF or a fur1ction of t he P DF O\rer sorne region in the X , Y plane.
Ir1 perforrning t his integration, t11e rr1ost difficult t ask is to ider1t ify t11e limits. The
PDF in t he exa rr1ple is very simple, just a const an t O\rer a triangle in t11e X , Y
pla r1e. However, t o evalui1te its ir1tegral over the region in Figure 5.1 -vve need t o
consider five different sit uations dependir1g on the \ralues of (x;,y). T he soh1tion
of t he ex arnple dernonstr at es the point that t11e PDF is usually a more concise
probability rr1odel t h at offers rnore insights int o the nature of ar1 experiment than
the CDF.

- Example 5.8
Fi nd the joint CDF Fx,Y(x;, y) whe n X and Y have joint PD F

\ f'x ,Y(x;,y) =
O <y<x< l >

We can derive t he joint CD F using Definition 5.3 in which we integrate t he joint PD F

fx ,Y(x>y) over t he a rea shown in Figure 5.1. To perform the integ rat ion it is extremely
useful to d raw a diagram that clearly shows t he a rea wit h no nzero pro babil ity a nd t hen
to use the diagram to derive the lim its of t he integral in Definition 5.3 .
T he d ifficulty wit h this integral is t hat t he nature of the region of integration depends
critically on x and y . In th is apparent ly simple example, there are fi ve cases to consider!
The five cases are shown in Figure 5.3. First, we note that with x < 0 or y < 0, the
triangle is completely outside the region of integratio n, as shown in Figure 5.3a . T hus


y y

.'( x

.r; < 0 or !J <0 O< y <.r < l

(a) (b)
y y

:-: :-:...

x x

0 < .r < y O< y < l

0 < .f < l .l'> l
(c) (d)

x > 1 and y > l


Figure 5.3 Five cases for the CDF Fx, Y(x, y) of Example 5.8.


we have Fx,Y( x, y) = 0 if either x < 0 or y < 0. Another simple case arises when x > 1
and y > l . In this case , we see in Figure 5.3e that the triangle is completely inside the
region of integration , and we infer from Theorem 5.6 that Fx,y(x , y) = 1. The other
cases we must consider are more complicated. In each case, since f'x ,y(x, y) = 2 over
the triangular region, the value of the integral is two times the indicated area. When
(::c, y) is inside the area of nonzero probability (Figure 5.3b ) , the integral is

Fx ,y(x;, y) = {Y { x 2 dv,dv = 2xy - y 2 (Figure5 .3b) . (5 .23)

Jo J.u
In Figure 5.3c, (x;, y) is a bove the triangle, and the integra l is

Fx,Y (x, y) = rxJ.rx

}0 u
2 dv,dv = ;i;
(Figure 5.3c). (5 .24)

The remaining situation to consider is shown in Figure 5.3d , when (x , y) is to the right
of the triangle of nonzero probabi lity, in which case the integra l is

Fx ,Y (x , y) = 1" 1 1
2 dudv = 2y - y 2 ( Figure 5.3d) (5 .25)

The resulting CDF , corresponding to the five cases of Figure 5.3 , is

0 ;r; < 0 or y < 0 (a) ,
2x;y - y 2 O<y<x; <l (b ) ,
x2 0 < x < y,0 < x < 1 ( c) ' (5 .26)
2y - y2 O<y<l , x; > l ( d) ,
1 ;i; > l , y> l (e).
In Figure 5.4, the surface plot of Fx ,y(x;, y) shows that cases (a) through (e) correspond
to contours on the "hill" that is Fx,Y(x;, y) . In terms of visualizing the random variables,
the surface plot of Fx,Y(x;, y) is less instructive than the simple triangle characterizing
the PDF f'x ,Y(x , y) .
Because the PDF in this examp le is f'x ,Y(x , y) = 2 over (x;, y) E Sx,Y, each
probability is just two times the area of the region shown in one of the diagrams (either
a triangle or a trapezoid). You may want to apply some high school geometry to verify
that the results obtained from the integra ls are indeed twice the areas of the regions
indicated . The approach taken in our so lution , integrat ing over Sx,Y to obtain the
CDF , works for any PD F.

In Exarnple 5.8, it takes careful study to verify t hat Fx,y(;r: , y) is a valid CDF
that satisfies the properties of Theorern 5.1 , or e\ren that it is defined for all \ralues
x and y . Comparing the joir1t PDF vvit11 the joint CDF , we see that the PDF
indicates clearl}' t11a.t X , Y occurs with equal probability in a ll areas of t11e sarne
size in the t riangl1lar region 0 < y < x < 1. T11e joint CDF corr1pletel}' hides this
sirnple, importa nt propert y of the probability model.
In the previous example, t11e t riar1gl1la.r s11ape of t11e area of nonzero probability
dernanded our careful attention. In the next exarr1ple, the a rea of nonzero prob-
ability is a rectangle. Hovvever, t11e area corresponding to the e\rer1t of interest is
rnore corr1plicated.


I I 1


. .
. . ... o I


0 0.5 1 2 0 y

Figure 5.4 A graph of the joint CDF Fx,y(:i~, y) of Exan1ple 5.8.

Example 5.9
As in Example 5.7, random variables X and Y have joint PDF

1/ 15 O <x;<5, 0 <y< 3,
0 otherwise.

What is P [A] = P[Y > X ]?

Applying Theorem 5.7, we integrate f x ,Y(x;, y) over the part of the X , y plane satisfying
Y > X . In this case,

Y>X 3 3
P [A] = ( ( ( dydx (5 .28)
Jo Jx l o
3 3
3- x (3 - ::r; ) 2 3
1 0
dT - -
15 ,. , - 30 0
- -
10 (5 .29)

In this exarr1ple, it makes little difference wl1ether -vve ir1tegrate first over y and
then over x or the otl1er '\Va.Jr around. In general, 110'\vever , an initial effort to
decide the simplest -vva:y to integrate over a region can avoid a lot of complicated
rnathematical maneu'irering in perforrning the integration.

Quiz 5.4
The joint probability density f\1nction of randorr1 variables X and y is

cxy 0< ;i; < 1, 0 < y < 2,

0 otherwise.

Find t11e constant c. W hat is the probability of t11e e\rent A = X 2 + Y 2 < 1?



5 .5 Marginal P OF

For contint1ous ra ndorr1 v ariables, t.he marginal PDFs f'x( x) and

j y(y) are probabilit}' rnodels for the individual random variables
X and y , but they do not provide a complet e probability model
for the pair X , Y.
Suppose -vve perforrn an experirnent t hat produces a pair of randorn v ariables X
and Y wit h joint PDF f x ,Y(x, y) . For certain purposes rnay be interested onl}' in
the randorr1 variable X. \ l\f e can irnagine that we ignore Y and observe or1ly X. Since
Xis a ra ndorn variable, it has a P DF f x(x) . It should be apparer1t that t here is
a relationship bet -vveen f x(x) and f x ,y (::c, y). In part icular , if f'x ,y(x, y) corr1plet ely
surnmarizes our knowledge of joint e\rents of the form X = x, y = y , then we s11ould
be able to derive t11e PDFs of X and Y frorr1 f'x ,y(x; , y ). The situatior1 parallels
(-vvith integrals replacing surr1s) the relationship in Theorern 5.4 betv.reen the joir1t
P11.IF Px,y(x,y), and the rr1arginal PMFs Px(x) a rid Py(y). Therefore, vie refer to
f x (x) and j y(y ) as t he rnargin,al probability den,sity furiction,s of f'x ,Y(x; , y).
--=-- Theorem 5.8:- - -
If X a/nd Y are raridorn variables VJith join,t PDF f x ,Y(x;, y ),

f x(x ) = 1: f x ,Y (x , y ) dy, fy (y) = 1: f x,Y(x , y_)_d_,x_. - - -

Proof From t he d efinit ion of t he joint P DF, we can 'vrit e


T aking t he d erivative of bo t h sides wit h respect to x ('vhich involves differen tiating an

integr al wit h variable limits), 've obtain f x(x) = J 00 f x,Y(x , y ) dy . _A simila r a rgument

holds for f y(y).

c::::== Example 5.10

The joint PDF of X and y is

5y I 4 -1 < x < 1, x; 2 < y < 1,

fx ,y(x, y) = (5.32)
0 otherwise.

Fi nd the marginal PDFs f x(x) and j'.y (y).

We use Theorem 5.8 to find the margina l PDF f x(x;) . In the fig ure that accompa nies
Equation (5 .33) below, the gray bowl-shaped regio n depicts those values of X and Y
for wh ich f'x ,Y(x , y ) > 0. W hen x < - 1 or when x > 1, f x ,Y(;r; , y ) = 0, a nd t herefore
fx( x) = 0. For - 1 <x< 1,
y X =x
5y 5(1 - x 4 )

-1 x 1
f x(x;) =
- rly
. (5.33)


The complete expression for the marginal PDF of X is

.--..._ 0.5
~ 5(1 - x 4 )/ 8 - 1 <
- :r <
- 1
f x(x) = )
0 otherwise.
-1 0 1
For the margina l PDF of Y, we note that for y < 0 or y > 1, j'y(y) = 0. For
0 < y < 1, we integrate over the horizontal bar marked y = y. Th e boundaries of the
bar are x; = - Jf; and x = JY.
Therefore, for 0 < y < 1,
JY 5y 5y J;= ./Y

- 1 -y 112 y 1121
f y(y) =
-JY 4 J
- dx = - x
4 J;=-fi;
= 5y 312 / 2. (5 .35)

T he complete marginal PDF of Y is

3- - - - - -
(5/2)y 312 0 < y < 1,
0 otherwise.
-1 0 1

Quiz 5.5
The joint probability density function of randorr1 variables X and y is

6 (x + y 2 ) / 5 o < x; < 1 , o < y < 1 ,

1x ,Y ( x , y) = (5.37)
0 otherwise.

Find fx( x) a nd j'y(y), the m arginal PDFs of X and Y.

5.6 Independent Random Variables

R andorn variables X and y are independent if a nd only if the

events {X = x;} a.nd { 'Y = y} are independent for all x, y in S x ,Y.
Discrete randorr1 variables X and y are indeper1dent if and or1ly if
Px,Y(x; , y) = Px(:I;)Py(y). Continuous randorn variables X and Y
are independent if and only if f x,Y(x;, y) = 1x(x;)j.y(y) .
Chapter 1 presents the cor1cept of indeper1dent events. Definit ion 1.6 stat es t11at
events A and B are ir1dependent if and onl}' if the probability of the intersection is
the prodl1ct of t11e individl1al probabilities , P [AB] = P [A] P [B ].


Applyir1g the idea of independence to rar1dorn variables, say that X and Y are
independent randorr1 variables if and or1ly if the events {X = x} and {'Y = y} are
independent for all ;i; E Bx and all y E Sy . Ir1 terms of probability m ass fur1ctions
and probability density functions, ""'e have the follovving definit ion.

=== Definition 5.4 Independent Random Variables

Ran,dorn '/Jariables X an,d Y are independent if arid o'nly if

Discrete: P x ,y ( x , y) = P x (:i;) Py (y) ;

Con,tin/u,011,s: f"x ,Y (x , y) = f x (x) f y (y).

Example 5.11
Are the childrens' ages X a nd y in Exa mple 5.2 independent?

In Exam ple 5.2, we derived the CD Fs Fx(x) and Fy(y), wh ich showed t hat Xis uniform
(5, 6) and Y is un iform (6, 7) . Th us X a nd Y have margina l P DFs

1 5< -
;i; < 6.
1 6 <x<
- 7 )
0 otherwise, 0 otherwise.

Referring to Equation (5 .5), we observe t hat f"x,Y(x , y) = fx(:i;)j"y(y) . Thus X and

Y are independent .

Because Definition 5.4 is an eqt1ality of functior1s , it mt1st be trt1e for all va1t1es of
x and y .

i::::::== Example 5.12

4xy 0< ;i; < 1, 0 < y < 1,

0 otherwise.

Are X and Y independent?

T he margina l P DFs of X a nd Y a re

2x 0 < x < 1, 2y 0 < y < 1,

f"x (x) = fy(y) = (5.39)
0 otherwise, 0 otherwise.

It is easily verified that f"x ,y(:i;, y) = f'x (:i;)j"y(y) for a ll pa irs (x, y), and so we conclude
t hat X and Y are independe nt .

- - - Example 5.1 ~--

2411,v u > 0 , v > 0, 'U, + v < 1,

f u,v ('u,, 1J) = (5.40)
0 otherwise.


Are U and V independent?

Since f u,v(v,>v) looks similar in form to f'x ,Y(x ,y) in the previous example , we might
suppose that U and V can also be factored into marginal PDFs f'u(v,) and f'v(v) .
However , this is not the case. Owing to the triangular shape of the region of nonzero
probability, the marginal PDFs are

12v,(1 - v,) 2 O <v, < 1. 12v( l - v) 2 O <'l;< l.

f'u (v,) = - - I
f'v (v ) = - - I

0 otherwise, 0 otherwise.

Clearly, U and v are not independent . Learning U changes our knowledge of V. For
example, learning U = 1/ 2 informs us that P[V < 1/ 2] = 1.

In t hese two exarriples, -vve see that the regiori of nonzero probability plays a
crucial role in deterrnining whether randorn variables are independent . Once again>
-vve empliasize that to infer that X and Y are independerit , it is necessary to verify
the functional equalities in Defiriition 5.4 for all ;_r; E Bx and y E Sy . There are
rnany cases in v.rhicli sorne events of the forrri { X = x} and { 'Y = y} are iridependent
and others are riot independent. If this is the case> the randorn variables X and Y
are not independent.
In Exarnples 5.12 and 5.13 , \Ve are giveri a joint PDF arid asked to determine
-vvhether the randorri varia bles ar e independent . By contrast , iri rnan}' applications
of probability>the n ature of an experiment leads to a rnodel in whicli X and y are
independent. In these ap1Jlications we exarnine an experirrierit and determine tliat
it is appropriate t o rriodel a pair of r a ndorn \rariables X and Y as independent.
To analyze tlie experirnent, -vve start \A.Tith the PDFs f x(i;) a nd j'y(y), and then
construct tlie joint PDF f'x,Y(x, y) = f x(x)fy(y) .

Example 5 .14
Consider again the noisy observation model of Example 5.1. Suppose Xis a Gaussian
(0 , ax) information signa l sent by a radio transmitter and Y = X + Z is the output
of a low-noise amplifier attached to the antenna of a rad io receiver . The no ise Z is
a Gaussian ( 0, az) random variable that is generated within the receiver. What is the
joint PDF f'x,z(i;, z)?

From the information given , we know that X and Z have PDFs

f'x (x ) = 1 e-x2 /20"~ ' (5 .41)

The signal X depends on the information being transmitted by the sender and the noise
Z depends on electrons bouncing around in the receiver circuitry. As there is no reason
for these to be related , we model X and Z as independent. Thus , the joint PDF is

1 -1 ( x.; + r~ )
fx ,z (i;,z) = f'x(i;) f'z(z) = !727ie ax C7z . (5 .42)
27r y axaz c::=====


==--Quiz 5. 6---==:::2

(A) Randorn variables X and y in Exarnple 5.3 and randorn variables Q arid G
in Quiz 5.2 have joint PMFs :

Px,Y(x, y) y=O y=l y=2 PQ ,c(q, g) g=O g=l g=2 g=3

x= O 0.01 0 0 q= 0 0.06 0.18 0.24 0.12
x=l 0.09 0.09 0 q= 1 0.04 0.12 0.16 0.08
x=2 0 0 0.81

(a) Are X and Y ir1dependerit? (b) Are Q arid G indeperident?

(B) Ra ndorri variables X 1 and X2 are independent and ideritically distributed

vvith probability der1sity f\1riction

x;/2 0 -< x -< 2

f x (;r:) = )
0 otlier\vise.

5. 7 Expected Value of a Function of Two Random Variables

g(X, Y) , a function of two random variables, is also a random var-

iable. As vvith one random variable, it is converiient to calculate the
expected value, E[g(X , Y)], \vithout deriving a probability rnodel
of g(X, Y).

There are man:yr sitl1ations iri which vie observe two randorn variables and use
their values t o comptlte <:a. new randorri variable. For example , can model the
arnplit ude of the signal transrnitted by a r adio st ation as a r aridorn variable, X.
vVe can rnodel the attentlation of the signal as it travels to the antenna of a rrioving
car as ariother randorn variable, Y. Iri this case the amplitude of the sigrial at tlie
r adio receiver in t he car is t he randorn variable W = X / y .
Forrnally, h ave the follovving situation. We perforrn an experirrierit and ob-
serve sarnple v alues of t \v O random variables X and Y. Based on our knowledge
of the experiment , \Ve have a probability rnodel for X arid Y ernbodied in a joint
P11!F Px ,Y(x;,y) or a join t PDF 1x,Y(x,y) . After perforrrring the experiment , \Ve
calculate a sarnple vall1e of the randorn variable W = g(X, Y). VV is referred to
as a derived randorn varia,ble. Tliis section ident ifies irnportant properties of the
expected value, E [W]. The probability rnodel for VV, ernbodied in P'v\1('1D) or f w('w),
is tlie subject of Chapter 6.
As witli a ft1nction of one random variable, we can calct1late E [W] directly frorri
Px,y(x, y) or f'x ,y(x, y) w ithout deriving Pvv('w) or 111v(w) . Corresporidirig to The-
orerris 3.10 and 4.4, vve h a;ve:


=== Theorem 5. 91____.;;==

For ro/ndorn variables X an,d Y , the eJ;pected val?J,e of W = g(X, Y) is

Discrete: E [l!Tl) = L L g(x;, y) Px,Y (x, y);

~r;ESx yESy

Continuous: E [W] = l: l: g(x , y) f x,Y (x, y) dxdy .

Theorern 5.9 is surprisingly powerful. For exarr1ple, it lets us calculat e easily t he

expected v alue of a linear corr1bination of several functior1s .

=== Theorem 5.10

E [a1g1(X, Y) + + a, ,gn(X, Y)) = a1 E [g1(X, Y)) + +an E [gn(X, 'Y)) .


Proof Let g(X, Y) = a 1 g 1 (X , Y) + + 'n9n(X, Y). For discrete random variables X, Y,

Theorem 5.9 states

E [g(X, Y)] = L L (a1g1 (x, y) + + angn(x, y)) Px,y(x, y). (5.44)

xESx yESy

\ 'Ale can break t he double summat ion into n, 'veighted double summations:

E[g(X,Y)] =a1 L L g1(x,y)Px,Y(x, y)+ + an L L 9n(x, y)Px ,y(x, y).

xESx yESy xESx yESy

By Theorem 5.9 , t he ith double sum1nat ion on t he righ t side is E[gi(X, Y)]; t hus,

E [g(X, Y)] = a1 E [g1(X, Y)] + ... + 'n E [gn(X , Y)]. (5.45)

For con t inuous random variables, Theorem 5.9 says

E [g(X, Y)] = 1: 1: (a1g1 (x, y) + + angn(x, y)) fx ,Y (x, y) d1;dy. (5.46)

To complete t he proof, 've expr ess t his integral as t he s um of n integr als and recognize
t hat each of t he new integr a ls is a weighted expected value, a,i E[gi(X, Y)].

In v.rords, Theorerr1 5.10 says t:hat t he expected va.lt1e of a linear corr1bination equals
the linear combinatior1 of tr1e expected va lues . We will have rr1any occasions to
apply this t heorem. The follovvir1g theorern describes t h e expected surn of t\vo
r andom variables, a sp ecial case of Tr1eorerr1 5 .10.

=== Theorem 5.11- - -

F or ariy t'tlJO ran,dorn variables X an,d Y,

E [X + y ) = E [X) + E ['Y) .


This theorem irnplies that "if..Te can find the expected surr1 of randorn variables
frorn the separate probability rr1odels : Px( x) and P y(y) or f'x( x) and fy( y) . '\''!\! e do
not n eed a cornplete proba,bilit:y rnodel err1bodied in Px,y(x , y) or f x,y(x, y).
By contrast, the variar1ce of X + Y depends on t11e entire joint P JVIF or joint

---== Theorem 5.12

The 'oarian,ce of the s'urn of t1110 ran,dorn variables is

Var[X + Y] = Var[X] + Var [Y] + 2E [(X - ,x)CY - ,y)] .

Proof Since E[X + Y] = JLx + ,y,

v ar[X + Y] = E [(X + Y- (x + y )) 2 ]
= E [((X - x) + (Y- ,y )) 2 ]
=E [(X- ,x) 2+2(X-JLx)(Y-,y)+(Y-y) 2] . (5.47)

\\!e observe that each of t he three terms in the preceding expected values is a function of
X and Y. Therefore, Theor em 5.10 implies

Var[X + Y] =E [(X - x) 2 ] + 2E [(X - ,x)(Y- JLY )] + E [(Y- JLY ) 2 ] . (5.48)

The first a nd last terms ar e, r espectively, \ far[ X] and \ f ar[Y].

The expression E [( X - ,x) (Y - ,y) ] in the fin al t erm of Theor err1 5 .12 is a. pa-
r a rneter of the probabilit:y model of X arid y . It reveals importar1t properties of
the relations11ip of X a nd Y. This quar1tity a ppe<"1rs over and over in practical
applications, and it 11as its o-vvn r1a me, covarian,ce.

Example 5.15
A com pany website has t h ree pages. Th ey requ ire 750 kilobytes, 1500 kilobytes, and
2500 kilobytes for transmissio n. T he t ransmissio n speed ca n be 5 M b / s for exte r-
n a I req uests or 10 M b / s for internal req uests. Requests arrive random ly from in side
and o utside t he co mpany in dependently of page lengt h , wh ich is also random. T he
probability models for t ransm isio n speed , R, and page length, L, are:

( 0.3 l = 750 ,
0.4 r = 5 )

PL (l) = ~ 0.5
l = 1500,
PR(r) = 0.6 T = 10 ) (5.49)
lO otherwise, l~ 2 l = 2500

W ri te an expressio n for t he transm issio n t ime g(R, L ) seco nds. Derive the expected
transm ission time E[g(R, L )] . Does E[(g(R, L )] = g(E[R] , E[L])?

T he tra nsm issio n time T seconds is the the page length (in kb) d ivided by the trans-


mission speed ( in kb / s), o r T = 8L / 1000R. Because R and L are independent,

PR,L(r , l) = PR(r)PL(l) and

E [g(R, L )] = ~ ~ P1z(r) PL (l) Sl

~~ lOOOr
r l

~ PL(l) l

= 1000
8 (0.4 0.6)
5 + 10 (0.3(750
)+ (
0.5 1500) + 0.2 (

= 1.652 s. (5.50)

By comparison, E [R] = l:r r PR(r) = 8 M b/ sand E[L] = l:t lPL(l) = 1475 kilobytes.
T his im plies

_ 8E [L] _ r.::
g(E [R] , E [L]) - lOOOE[R] - 1.470 s # E [g(R,L) ] . (5.51)

5 .8 Covariance, Correlation and Independence

The covariarice Cov[X, Y], the correlation coefficient Px,Y, and

the correlatiori rx,Y are parameters of the probability model of X
and y . For independent randorri variables X and y , Cov [X, Y] =
PX,Y = o.

Definition 5.5 Covariance

The covariance of t 1110 raridorn 'variables X arid y is

Cov[X, Y] = E [(X - ,x) (Y - ,y )] .

Sometirnes, t he notation CJxy is used to deriote the cov arian ce of X and y . '\'!Ve
have alread:y learried tliat the expect ed value pararneter, E[X], is a typical value of
X arid that tlie variance parameter, Var[X], is a single riurriber that describes hovv
sarnples of X terid to be spread around tlie expect ed value E [X ]. In an an alogous
"''ay, t he covariarice parameter Cov[X, y ] is a sirigle number that describes hov.r the
pair of random variables X arid Y vary together .
The key to understanding CO\rariarice is the r andom variable

W = (X - ,x)CY - ,y) . (5.52)

Since Co,r[X, Y] = E[vV], vve observe t h at Cov[X, Y] > 0 tells us that t h e typical
values of (X - 1;,x) (Y - ,y) are positive. Hovvever , this is equivalent to saying tliat
X - 11,x and Y - y typically ha\re the sarne sigri. That is, if X > 1;,x t hen "''e would


t ypicall}' expect Y > ,y; :1nd if X < 11,x t lien "''e vvould expect t o observe Y < y.
Iri short , if Cov [X , Y ] > 0 , v.rol1ld expect X arid y t o go up or dov.rri t ogether.
On tlie other h a nd , if Cov [X , y] < 0, "'' e would expect X - ,x and Y - y to
t ypically ha\re opposite sig ns. In this case, when X goes up , Y t}rpically goes dovvri.
Finally, if Cov [X, Y ] ~ 0, vve miglit expect t hat tlie sign of X - 1),x doesn 't provide
rnucli of a clue about the sign of Y - ,y.
W hile this casua l arg1llllerit rnay be r easonably clear , it rriay a lso b e s ome-
"''h at unsatisfacto1y . For example, v.rol1ld Cov [X , y ] = 0. 1 be fairly described as
Cov[X , Y ] ~ O? T he ansvver t o t his questiori depends on the rrieasurem ent units of
X arid Y.
Example 5.16
Suppose we perform an experiment in which we m easu re X and Y in centimeters
(for examp le the height of tvvo siste rs) . However, if we change units and measure
height in meters, we w ill perform the sa me experiment except we observe X = X / 100
and Y = Y/ 100. In this case, X
and Y have expected va lu es x_ = ,x / 100 m,
,y = ,y / 100 m and

CO\T [x,f] = E [(x - ,x_ )(Y - ,y)]

- E [(X - ,x )('Y - ,y )] - Co' ' [X, Y ] m2 (5. 53)

- 10, 000 - 10, 000 .
Changing the unit of measurement from cm2 to m 2 reduces th e covariance by a f act o r
of 10, 000. However, the tendency of X - ,x and Y - y to ha ve the sa me sign is
the same as the tendency of X-
/J,x and Y - 11,y to have the same sign. (Both are an
indication of how likely it is that a girl is ta ll er than average if her sister is t aller than

A par arrieter that indicat es t he relationship of t wo randorri \rariables regardless

of m easurerrient uriits is a. norrrialized version of Cov [X , Y], called tlie correlation
coefficient .
Definition 5 .6 Correlation Coefficient
T he correlation coefficient of t 1110 r o/ndorn '/Jariables X a'n d y 'is
Cov [X , Y] C ov [X , Y ]
PX,Y = JVar[X ] v ar ['Y]

Note t liat the covariance has units equal t o tlie product of t he units of X arid Y.
Thus, if X lias units of kilograms a rid y h as units of seconds, t heri Cov[X , Y ] has
tlnits of kilograrri-seconds. By contrast , Px ,Y is a dirrierisionless ql1a nt ity t liat is
not affected by scale ch anges.
Theorem 5.13

If X = aX + b a'nd Y = cY + d, then,
{a) px y = p x ,y ,
{b) Co\r[X , Y ] = ac Co\r[X , Y].



.... - . . .
.:t ...

#. 't ~r.:,~ ;..,, ..:

:. # "'



0 ........

.. .
fr.1 .:.


.::._....~'". . .
'.t ld .


.r!..!W ~
:t~: :

-2 -2 -2 . f.

-2 0 2 -2 0 2 -2 0 2
x x x
(a) PX ,Y = - 0.9 (b) PX,Y = 0 (c) PX ,Y = 0.9

Fig ure 5.5 Each graph has 200 samples, each n1arked by a dot, of t he randon1 variable pair
(X , Y ) such t hat E[ X] = E [Y] = 0, Var[X] = Var [Y] = 1.

The proof st eps are out lir1ed in Problem 5.8.9. Related t o this insensitivity of PX, Y
to scale ch ariges, an irriportant propert:y of the correlation coefficient is t h at it is
bounded by - 1 and1:

=== Theorem 5.14

-1 < PX,Y < 1.

Proof Let a~ and a~ d eno te t he varia nces of X and Y , and for a constan t a, let W =
X - aY. Then ,

Var[W ] = E [(X - a Y ) 2 ] - (E [X - a Y]) 2 . (5.54)

Since E[X - aY ] = p,x - aJJ,y, expanding t he squar es yields

Var[W ] = E [ X 2 - 2aXY + a 2 Y 2] - ( p,x

2 - 2ap,x p,y + a 2 y2 )
= Var [X] - 2a C ov [X , Y] + a Var[Y].

Since Var[W] > 0 for a n y a, we h ave 2a Cov[X, Y] < v ar[X ] + a 2 Var[Y]. C hoosing
a = a x/ a y yields Cov[X , Y] < a y a x, 'vhich implies px,y < 1. Choosing a= -ax/ a y
yields Cov[X , Y] > -ay a x 1 which implies px, Y > -1.

vVhen PX,Y > 0, we say triat X and y are positively correlated, arid vvhen Px, Y < 0
vve Sa}' X arid Y are 'negatively cor;elat ed. If IPx ,Y I is close to 1, say IPx,Y I > 0 .9 ,
then X and Y are highly correlated. Note that .high correlation can be posit ive or
negative. Figl1re 5.5 shO\S outcornes of indeperident t rials of an experirnent that
produces randorri variables X and y for ra ndorn varia ble p airs vvith (a) riegative
correlation, (b) zero correlation, and (c) positive correlatiori. Tlie following theorern
derrioristrates that IPx ,Y I = 1 when there is a liriear relat ionsliip between X and y .


Theorem 5.15
If X a'nd Y are ran,dom, variables such that Y = aX + b;
-1 (], < 0,
PX,Y = 0 (], = 0,
1 (], > o.

T he proof is left as an exercise for t he reader (Problern 5.5.7). Sorne ex arriples of

positive, negative, arid zero correlation coefficients iriclude:
Xis a studerit's height . y is the sarne student 's weiglit . 0 < Px,Y < 1.
X is the dist a nce of a cellular phone frorri the riearest b ase st atiori. y is the
pov.rer of the r eceived signal at the cellular phone. - 1 < Px ,Y < 0.
X is t he ternperature of a resistor rneasl1red in d egr ees Celsius . Y is the
t emper ature of the sarne r esistor m easured in Kelvins. PX,Y = 1 .
Xis the gain of an electrical circuit measured in decibels. Y is the at tenuation,
rneasured in decibels , of t he sarne circuit . PX,Y = - 1.
X is the t elephone number of a cellular phorie. y is t he Socia l Securit}'
nl1rnber of tlie phone 's owner. Px,Y = 0.
The correlation, of two random v ariables, derioted r x ,Y, is another p ararnet er of
the probability rnodel of X arid y . r x ,Y is a close r elative of the covariarice.

Definition 5 .7 Correlation
T he correlation of X an,d y is r x ,Y = E[XY]

The follovving tlieorerri coritairis useful relatioriships arriong three expected v alues :
the covariance of X and Y , tlie correlation of X arid Y , and the variance of X + y .

=== Theorem 5.16

(a) Cov[X, Y] = r x ,Y - ,x ,y .
{b) Var[X + Y] = Var [X] + Var[Y] + 2 Cov [X , Y].
(c) If X = y ) Cov[X , Y ] = v ar[X] = Var[Y] an,d rx,Y = E [X 2 ] = E [Y 2 ].

Proof C ross-mult iplying ins ide t he expected value of Defini t ion 5.5 yields

Cov [X , Y) = E [XY - p,x Y- /LY X + p,x p,y ) . (5.56)

Since t he expected value of t he sum equals t he sum of t he expected values,

C ov [X , Y) = E [X Y) - E [fLX Y) - E [y X ) + E [!LY /Lx ). (5.57)



Note t hat in the expression E[,y X], JLY is a constant. Referring to Theorem 3.12, we
set a = JLy and b = 0 to obtain E [y X] = JLy E[X] = ,y ,x . The same reasoning
demonstrates t hat E[,x Y ] = JLX E[Y] = x ,y. Therefore,

Cov [X, Y ] = E [XY] - ,x /LY - /LY /Lx + /LY x = r x ,Y - ,x /LY . (5.58)

The other r elat ions hips follo'v direct ly from t he definitions and Theor em 5.12.

- = Example 5.11
For the int egrated circ uits tests in Examp le 5.3, we fo und in Exa mple 5.5 that t he
proba bi lity model fo r X and Y is given by the fo llowing mat rix.

Px,Y(x , y) y= O y= l y= 2 Px(x)
:r; = 0 0.01 0 0 0.01
:r; = 1 0.09 0.09 0 0.18
:r; = 2 0 0 0.81 0.81
Py (y) 0.10 0.09 0.81

Find rx,Y a nd Cov[X , Y ].

By Defi nition 5.7,

2 2
rx,Y = E [X .Y] = L L xyPx,Y(x;, y) (5.59)
x =Oy=O
= (1) (1 )0.09 + (2) (2)0.81 = 3.33. (5.60)

T o use Theorem 5.16( a) to find t he covariance, we fi nd

E [X] = (1)(0.18) + (2)(0.81) = 1.80,

E [Y] = (1)(0.09) + (2)(0.81) = 1.71. (5.61)

T herefore, by Theorem 5.16(a), Cov[X , y] = 3.33 - (1.80)(1.71) = 0.252.

T 11e te1TI1s orthogorial and urico'rrelated describe r ar1dorr1 v ariables for vv11icl1
rx,Y = 0 a nd ra r1dom variables for which Cov[X, y ] = 0 respectivel}'

Definition 5 .8 Orthogonal Random Variables

Ran,dorn variables X an,d Y are orthogonal if rx,Y = 0.

- - - Definition 5.9- - - Uncorrelated Random Variables


Ran,dorn variables X an,d Y are uncorrelat ed 'i f Cov[X, Y] = 0.

This terrr1inolog}' , v.rhile widely used, is som ewhat cor1fusing, since orthogon,al rr1ear1s
zero correlation and v,n,correlated rr1ea ns zero covariance.


W e have already rioted that if X and y are highly correlat ed , then observirig X
t ells us a lot abot1t the accorripanying observatiori Y. Graphically, t his is visible in
F igt1re 5.5 v.rlien we corripare the correlated cases (a) and ( c) t o t lie uncorrelated
case (b ) . On t he ot her h a nd , if Cov [X , Y ] = 0, it is oft en t he case t hat learning X
t ells us lit tle about Y. \A/e have used riearly the sarrie v.rords to describe i'Tl,depe'Tl,derit
r andorri v ariables X arid Y.
The follovving theorern conta ins sever al irriportant propert ies of expect ed values
of independerit r aridom \r[triables. It st at es t hat independerit r aridom variables are
uncorrelat ed but not necessaril}' ort hogonal.

- - -Theorem 5.11- - -
For iridepe'Tl,de'Tl,t ra'Tl,dorn 'variables X arid Y ;
(a) E[g(X )h,(Y) ] = E[g(X)]E [h (Y)],
{b) rx ,Y = E[X.Y] = E[X] E[Y] ;

( c) co,r[X , Y] = p X ,Y = O;
{d) Var[X + y ] = Var [X] + Var'["Y],

Proof V/e presen t t he proof for discrete r andom variables. By replacing P ~l[ Fs
and sums
'vit h P DFs and integra ls \Ve arrive at essent ially t he same proof for cont inuous random
variables. Since Px ,y(1;, y) = Px(1;)Py(y),

E [g(X )h(Y )] = L L g(x)h(y)Px (x) Py (y)

x ES x y ESy

L g(x)Px (1;) L h(y)Py(y) =E [g (X) ]E[h(Y )]. (5.62)

xESx y ES y

If g(X ) = X , a nd h(Y) = Y, t his equation implies rx,Y = E [XY] = E [X] E [Y]. This
equation and Theorem 5.16(a) imply Cov [X, Y] = 0. _As a result, Theorem 5.16 (b) implies
Var[X + Y ] = Var [X] + Var [Y]. F urt hermore, px,Y = Cov[X , Y ]/ (crx cr y) = 0.

T liese r esults all follovv directl}' from t he j oint P 11:F for independen t r andorri
variables. We observe that T lieorerri 5 .17 ( c) st at es that i'Tl,dep e'Tl,de'Tl,t raridorn 'vari-
ables are 'LJ.'Tl,correlated. SN"e vvill h a,re rriany occasions to refer to t liis propert}' It is
important t o knovv t hat \hile c o,r[x ,Y] = 0 is a necessary proper t}Tfor indep eri-
dence, it is riot sufficient . Ther e are rriany p airs of uncorrelated randorn \rariables
that are 'Tl,ot independent.

=== Exam pie 5 .1a:==::::::a

For the noisy observatio n Y = X + Z of Exa mp le 5.1, find the covariances Co,r[X , Z]
and Cov[X, y] and the correlat ion coefficients Px,z and Px ,Y .

We recall from Exa mp le 5.1 that t he signa l X is Gaussian (O, o-x), t hat the noise Z is
Gaussian (O , o-z) , and that X and Z are independent. We know from Theore m 5.17(c)


t hat indepe ndence of X and Z implies

Cov [X, Z] = Px ,z = 0. (5.63)

In add it ion, by Th eorem 5.1 7(d),

V ar[Y] = Var[X ] + Var[ Z] = o-1 + o-~. (5.64)

Since E[X] = E [Z] = 0, Th eorem 5.11 te lls us t hat E['Y] = E[X] + E[Z] = 0 and
T heorem 5. 17)(b) says t hat E [X Z] = E [X ] E[Z] = 0. T his pe rmi ts us to write

Cov[X, Y ] = E [XY] = E [X(X + Z) ]

= E [X +
XZ] = E [X
J + E [XZ] = E [X 2 J = o-1.
T his implies

pX , y =
Cov [X, Y] o-1 Io-~ (5.65)
l + o-x I O"z2
JVar[X ] Var[Y]

We see in Example 5.18 that the cov ariance betV\reen tlie transrriitted signal X and
the received sigrial Y depends on the ratio o-1 /
o-~. This ratio, referred to as tlie
sign,al-to -rioise rat'io, lias a strong effect ori comrriunication quality. If /o-~ << 1,o-1
the correlation of X and Y is "''eak arid the noise dorninates tlie signal at the receiver.
Learning y, a sarnple of the received s igrial, is not very helpful iri deterrnining
the corresporiding sample of the transrnitted sigrial, x . On the other hand, if
o-1 /o-~ >> 1, the trarismitted sigrial domiriates the noise and Px,Y ~ 1, an indication
of a close relationsliip betV\reen X and Y. vVhen there is strong correlation betV\reen
X arid Y , learnirig y is ver}' lielpful in deterrnining x .
==-- Quiz 5. 8___;;=~

(A) Randorn variables L and T have joint PJ\l[F

PL r(l , t) t = 40sec t = 60 sec
l = 1 page 0.15 0.1
l = 2 pages 0.30 0.2
l = 3 pages 0.15 0.1.

Find the following ql1antities.

(a) E[L] and Var [L] (b) E [T] and Var[T]
( c) The covariance Cov[L, T ] (d) The correlation coefficient PL ,T

(B) The joint probability density function of random variables X and Y is

xy 0 < ;i; < 1 , 0 < y < 2,

fx ,Y(x,y) = (5.66)
0 otherv.rise.


Find the following ql1antities .
(a) E (X] and Var (X ] (b) E (Y] and Var[Y]
( c) The covariance Cov(X ) Y] (d) The correlatiori coefficient Px,Y

5 .9 Bivariate Gaussian Random Variables

The bivariate Gaus,'3'ian, PDF of X and Y has five pararneters:

the expected vah1es and standard deviations of X and y and the
correlation coefficient of X and y . The rria.rginal PDF of X and
the rriarginal PDF of y are both Gaussian.
For a PDF representirig a famil}' of randorn variables, one or rriore pararrieters defirie
a specific P DF. Propert ies sucli as E (X] and Var(X] depend on the parameters. For
exarriple, a coritinl1ous uniforrn (a , b) r andom variab le has expected value (a+ b) / 2
and variance (b - a) 2 / 12. For t he bivariate Gaussian PDF , the pararneters x ,
,y , CJx , CJy and PX,Y are eqtlal t o the expected valtles, standard deviatioris , and
correlation coefficient of X arid y .

Definition 5 .10 Bivariate Gaussian Random Variables

Ra:ndorn variables X an,d Y have a bivariate G aussian PDF 'UJ'ith pararneters
,x, y, CJx > 0, Cly> 0, a'nd Px,Y satisfyin,g - 1<PX,Y<1 if

:J: -/J, )()
_ 2px.Y( :1; -,x) (y -1J,y) + (u-,v) 2 1
exp r
rrx rrxrry rry

l 2 ( 1 - p~, y ) J
f X ,Y (x , Y) = ----------;:::===------
27rCJ xCJy V11 - P2X ,Y

Figl1re 5.6 illustrates the bivariate Gaussian PDF for ,x = /J,y = 0, CJx = CJy =
1, arid tliree va lues of PX,Y = p. vVhen p = 0) tlie joirit PDF lias the cirCtllar
syrnmetry of a sorribrero. When p = 0.9 , tlie joint PDF forms a r idge over the line
x = y , and when p = - 0.9 there is a ridge over the lirie x = - y. T he ridge becorries
increasingl}' steep as p ---+ 1. Adj acerit to each P DF , we repeat the graplis in
Figtlre 5.5; each grapli 200 sarriple pairs ( X , Y) drav.rri frorn tliat bivariate
Gaussian PDF. "\e see that the sarriple pairs are clustered in the region of the x, y
plarie "'' here the PDF is large.
To exarnine rnatliernatically tlie properties of tlie bivariate Gaussiari PDF ,



... .. . . .. ., . . .
. p = -0.9
.. ..
0.3 .. ..
.. . ....-. . .. 2
~. 0.2
~ 0
~ 0. l
-1 I 2 )' -2 0 2
. . . .. .. .. . . ... . ..
.. p=O
. - .. ..
0.3 ..

. .. ....- ... . .. . . . .. ' 2
. ., ... .


~. 0.2 ..
' '

0 . . r~
.:. ...-:":?' ..
.. ..
'-..-i. 0. I




r -2 0 2
... . . .. ... .. . .. , ..
. ..
.. p = 0.9
. : ...
0.3 .. .
.. . .. . .. .. ... . . ...
-~. 0., -
~0. 1

0 2 -2 0 2
.x x
Fig ure 5.6 T he J oint Gaussian P DF f x ,Y(:i;, y) for x = ,y = 0, a x= a y = 1, and t h ree
values of p x,Y = p. Next to each PDF, we plot 200 san1ple pairs (X , Y) generated wit h t hat

and m anipulate t 11e form11la in Definit ion 5.10 t o obtain the follov.ring expression
for t he joir1t Ga t1ssian PDF:

(5 .68)

Eq11ation (5.68) expresses f x,y(x,y) as the product of Ga ussian PDFs, one
vvith par arr1eters ,x and a- x and t l1e ot l1er v.rith p ar a rr1eters jJ,y a nd 0-y. This
forrnula plays a ke:y role in t he proof of the follovving theorem.


Theorem 5 .18:----==::::::i
If X a/nd Y are the bi'var'iat e Gaussian, ra'ndorn variables in, Defin,ition, 5.10, X 'ts
the Gav,ssian, (,x, CJ x) raridorn variable arid Y is the G aussian, (,y, <Jy) ran,dorn

Proof Integrating fx ,Y(x, y) in Equation (5.68) over all y, 've have

fx (;i~) = 1_: fx ;Y (x, y) dy

In": e
- (x - J.Lx ) 2 / 2ai 1_= _ 1
In": e
- ( y - jl,y(x )) 2 / 20-~ d
y (5.69)
axv2-rr - oo a yy211

The integr a l above t he b r acket equals 1 because it is t he integral of a G a ussian P DF.

The rem ainder of t he formula is t he PDF of t he Gaussian (p,x , ax) ra ndom variable. The
sam e r easoning 'vit h t he roles of X and Y reversed lead s to t he formula for fy(y).

The next t heorem identifies Px ,Y in Definit ion 5.10 as tlie correlatiori coefficient of
X arid y .

=== Theorem 5.19---==::J

Bivariate G a'tJ,SSian, ran,dorn variables X an,d Y 'tri D efiriition, 5.10 have correlation,
coefficierit p x ,y .

The proof of Tlieorerri 5.1 9 involves algebra t liat is more easily digested vvit h sorrie
insight from Chapter 7; see Section 7.6 for t lie proof.
Frorn Tlieorern 5.1 9, \Ve observe t hat if X and y are t1ncorrelated, t hen Px,Y = 0
and , b}' eva.lt1ating tlie PDF in Definit ion 5.10 with p x ,Y = 0, we have f x ,Y(x;, y) =
f x( ;i;) j "y(y) . Thl l S vve lia ve t he followirig t heorerri.
Theorem 5.20
Bivariate G a'IJ,Ssian, ran,dorn variables X arid Y are 'uricor~related if an,d on,ly if they
are in,deperiderit.

Ariother irnportant propert}' of bivariat e Gaussiari randorri v ariables X arid Y is

t h at a pair of linear cornb inatioris of X and y forms a pair of b ivariate G aussian
randorri v ariables.

Theorem 5.21 =--

If X a'nd Y are bivariat e G a?J,SSia'n ran,dorn variables 'tuith P D F giveri by D efirii-
tion, 5.10, an,d VV1 an,d W 2 are gi'ven, by the l'i riearly in,depen,den,t equatioris


then, T1V1 an,d vV2 are bivariate Ga/u,ssian, ran,dorn 1;ariables S'IJ,Ch that

E [vTfi] = a,,i,x + biY ) i = 1, 2,

Var[vVi] = a7 a-.k + b7a-~ + 2aibiPx,Yo-xo-y, i = 1, 2,
Cov [W1 , W2] = a1a20-l + b1b20-~ + (a1b2 + a2b1)Px,Y o-x o-y .

Theorerr1 5,21 is a special case of Theorem 8, 11 vvhen vve h ave n, = 2 jointl:yr G aussian
r a ndom va riables . We orr1it the proof sir1ce the proof of Theorem 8.11 for ri joir1tly
G a ussiar1 rar1dom variables is, vvith sorne knovvledge of linear algebra , sirr1pler. The
r eqt1irern ent that t11e eqt1a.tions for vV1 arid W 2 be "linearly ir1dep endent" is linear
a lgebra terrr1inology that excludes d eger1er ate cases su ch as T 1 = X + 2Y and
W 2 = 3X + 6Y w here VT/ 2 = 3W1 is just a scaled replica of vT/ 1 .
Theorern 5 ,21 is pov.rerful. Even the partial result that Wi by itself is G a ussia n
is a nor1t riv ial conclt1siori. vVher1 a n experiment produces linear combir1ations of
G a ussiar1 ra ndom varia bles, knovving t ha t these corr1bir1ations a re G aussia n sirr1pli-
fies t11e an a lysis b ecat1se a ll vve n eed to do is calculate the expected v alues , variar1ces,
a nd covariar1ces of the ot1tputs in order to derive probabilit:yr models.

- = Example 5.19
For the noisy o bservation in Examp le 5.14, find the PDF of Y = X + Z.
Since X is Gaussian ( 0, o-x) and Z is Gaussian (0, a-z) and X and Z a re independent,
X and Z are joint ly Gaussian. It fo llows fro m Theorem 5.21 that Y is Gaussian with
E[Y] = E[X] + E[Z] = 0 and varia nce a-~ = o-1
+a-~ . Th e P DF of Y is

1 e_Y2 / 2 (0"~ +0"~). (5.70)

/ 211(0-1 +a-~)

Example 5.20
Continu ing Example 5. 19, fi nd the joi nt PD F of X and y whe n o-x = 4 and o-z = 3.
From T heorem 5.21, we know that X a nd Y are bivariate Gaussian . We also know that
,x = ,y = 0 and that y has varia nce a- ~ = o-1
+a-~ = 25 . Substituting a- x = 4 a nd
o-z = 3 in the fo rmula fo r the correlation coefficient derived in Exa mple 5.18, we have

Px ,Y = (5. 71 )

Applying these paramete rs to Defin ition 5.10, we obtai n

j .x y (::r;, Y ) = 1 e -(25:r; 2 / 16-2~r;'t'J+Y 2

) / 18 . (5. 72)
) 2411


Quiz 5.9
Let X and y be joint ly Ga.l1ssian (0, 1) random variables v.rith correlation coefficier1t
1/ 2. vVhat is the joint PDF of X and Y?

5.10 Multivariate Probability Models

The probability model of an experirr1ent that produces n, random

variables can be represented as an n,-dirr1ensional CDF. If all of
t11e random variables are discrete , there is a conesponding n,-
dirnensiona.l PMF. If all of the randorr1 variables continuous,
there is an n,-dirnensional PDF. T11e PDF is the n,th partial deriva-
tive of the CDF '\vith respect to all n, variables. The probabilit}'
model (CDF, PMF , or PDF) of n, independent random variables is
the product of the uni,rariate proba.bilit}' rnodels of the ri random
T11is chapter has err1phas ized probability rnodels of tvvo randorr1 variables X and
Y. We no\v generalize the definitions and theorerrIB to experiments that yield an
arbitrary number of rar1dom variables X 1 , ... , Xn. This section is hea.vy on n,-
dimemional definitions and theorems but relatively light on exarnples. Hovvever,
the ideas are straightforwa.rd exter1sions of concepts for a pair of randorr1 variables.
If you have trouble \vith a theorern or definitior1, revvrite it for the special case of
n, = 2 random variables. This "''ill yield a fa rrriliar result for a pair of random
To express a corr1plete probabilit}' rnodel of X 1 , ... , X n, vve define the joint cu-
rr1ulative distribl1tion function.

=== Definition 5.ll=== Multivariate Joint CDF

The joint GDF of X1 , ... , X n 'is

Definition 5.1 1 is concise and general. It provides a corr1plete probability rr1odel re-
gardless of \vhether any or all of the X i are discrete, continl1ous, or mixed. However,
the joint CDF is usually riot corrvenient to use in analyzir1g practical probabilit}'
rr1odels . Instead , \Ve use the joint PMF or the joint PDF.

Definition 5 .12 Multivariate Joint PMF

The joint P MF of the discrete ran,dorn 1;ariables X 1 , ... , Xn is


Definition 5 .13 Multivariate Joint PDF

The j oint PDF of the con,tin/IJ,01J,s ran,dorn 'uariables X 1 , ... , Xn is the f'1J,rictior1,

. (, , . ) _ an F x 1 , ... , x n (::r i , . . . , x n)
j X 1 ,..., Xn X 1 , , Xn - (} x ... Ox
. 1 . n

Theorerr1s 5.22 arid 5.23 indicate t.hat t he joint PNIF and the joint PDF have prop-
erties t hat are ger1er alizations of t11e axiorns of probability.

Theorem 5.22
If X 1, ... , Xn are discrete ran,dorn variables 11rithjoir1,t Plv!F Px 1 , ... ,x n(;r;1, ... ,~r;n)J
(a) Px1 , ... ,x .,,(x;1, ... ,xn) > O)
{b) L L Px 1 , ... ,x n(x1, ... , Xn) = 1.

Theorem 5.23
If X1 , ... , Xn are con,tin/1J,01J,S ran,dorn variables 'tuithjoin,t PDF f x 1 , ... ,x n(x;1, ... , Xn);
{a) fx 1 , ... ,Xn (x 1 , . , x;n) > 0 7

{b} F X 1 , ... , X n ( ;i; 1' ' X n ) = J J :i;


f'x 1 , ... , X n ( 'lJ, 1' ' 'U, n) r17J,1 r17J,n '

{c) J J 00


f'x 1 , ... ,x n(x; 1, ... , Xn) rlx;1 rlx;n = 1.

Often vve consider an ever1t A described in terms of a proper ty of X 1 , ... , Xn , suc11

as IX1 + X2 + + X n l < 1, or rr1ax,i: X ,i: < 100. To find the probability of t he event
A , we sum t he joir1t P11.IF or integrat e the joint PDF over all ;r; 1 , ... , ;r;n that belong
to A.

Theorem 5.24
The probability of ari even,t A ex;rJressed 'iri terrns of the ran,dorn 'uariables X 1 , ... , X n

Discrete: p [A] =
(x1 , ... ,xn) EA

Continuous: P [A] = j j f x,, ... ,x . (xi, ... , Xn) dx1 dx2 ... dxn .

Althol1gh 11ave writ ten the discrete version of Theor em 5.24 vvith a single
surnmation, rnust rerr1ember that in fact it is a ml1ltiple s11m over the ri variables
X1, ... , x;n


x y z Px,Y,z(x ,y ,z) Total Events

(1 Page) (2 P ages) (3 Pages) P ages
0 0 4 1/ 1296 12 B
0 1 3 1/ 108 11 B
0 2 2 1/ 24 10 B
0 3 1 1/ 12 9 B
0 4 0 1/ 16 8 AB
1 0 3 1/ 162 10 B
1 1 2 1/ 18 9 B
1 2 1 1/ 6 8 AB
1 3 0 1/ 6 7 B
2 0 2 1/54 8 AB
2 1 1 1/ 9 7 B
2 2 0 1/ 6 6 B
3 0 1 2/81 6
3 1 0 2/ 27 5
4 0 0 1/81 4

Table 5.1 T he PMF Px ,Y,z(x,y,z) and t he events A and B for Example 5. 22.

Example 5.21
Consider a set of n, independent trials in which there are r possible outcomes s 1 , ... , Sr
for each trial. In each trial , P [s,i] = '[J,i . Let N ,i equa l the number of times that outcome
Si occurs over n, tria ls . What is t he joint PM F of N 1 , ... , Nr ?

The solution to t his problem appears in Theorem 2.9 and is repeated here :

(5 .73)

<==== Example 5.22

For each product that a compa ny sel ls, a company website l 1 2 3
has a tech support docume nt available for download . Th e 1/ 3 1/ 2 1/ 6
PM F of L , the number of pages in one document , is shown
in t he tab le on the rig ht. For a set of four independent information requests , fi nd:
(a ) the joint PMF of the random variables, X, Y, a nd Z , the number of 1-page,
2-page , and 3-page downloads, respectively,
( b) P[A] = P [total length of four downloads is 8 pages],
(c) P[B ] = P [at least ha lf of the four downloads have more than 1 page].

The down loads a re independent trials, each with three possib le outcomes: L = 1,
L = 2, and L = 3. Hence , the probability model of t he number of down loads of each


lengt h in th e set of four down loads is t he mu lt in o mia l PM F of Exa mple 5.21:

Px y
' '
z(x,y ,z) = (.4 )
(l):i; (1
) (l)z -
(5 .74)

T he P MF is displayed numerically in T able 5.1. The fina l column of t he table ind icat es
that the re are t hree outcomes in eve nt A a nd 12 outcomes in event B. Addi ng t he
probabilit ies in t he two events, we have P[A] = 107 /432 and P[B] = 8/9.

In anal:yzir1g an exper im ent , -vve rnight v.rish t o study sorne of t 11e randorr1 vari-
ables and ignor e other ones . To accornplish t h is, \Ve car1 derive rnarginal P 1![Fs
or rr1argir1al P D Fs t hat ar e prob abilit}' models for a fraction of t he r andom vari-
ables in t he complete experirner1t . Consider ar1 experirner1t -vvit11 fol1r randorr1 vari-
ables l;Tl ,X , Y, Z. The probability m odel for t he experirnent is the joir1t P 1![F ,
Pw, x ,Y,z('w, x, y, z) or t he j oir1t PDF , f'w,x ,Y,z(w, x, y , z) . The followir1g t heorerns
give examples of rr1argir1al P 1![Fs arid PDFs.

==;;;;: Theorem 5. 25__....;;;=;;;;i

For a joirit P MF P 11v, x ,Y,z('w, x, y, z) of d'iscrete ra'ndorn variables W, X , Y , Z , sorne
rnarg'in,al P MFs are

Px ,Y,z (x ,Y , z) = L Pv11,x, Y,z ( x ,y , z ) ,

'll J ,

P11v,z('ID,z) = L L P'v\1,x ,Y,z('w,x,y,z),

:i:ESx yESy

- - - Theorem 5.26
For a join,t P DF f w, x ,Y,z('w,x,y,z) of con,tin,uous ran,dorn variables W ,X ,Y ,Z ,
sorne rnarg'irial PDFs are

f w,x ,Y (w ,x, y) = 1: f w,x ,Y,z(w ,x, y , z) dz,

fx (x) = 1: 1: 1: f w,x,Y,z(w ,x ,y,z ) dwdydz .

Theorerns 5.25 and 5.26 can be generalized in a straig11tforward way to any rr1arginal
P1![F or rr1argir1al PDF of a.n arbitrary nurnber of randorn variables. For a probabil-
it y model described by the set of r ar1dorn variables {X 1 , ... , X n}, each nonempty
strict subset of t hose ra n dorn variables has a rnarginal probability rnodel. There
ar e 2n Sl1bsets of {X 1 , ... , Xn} After excluding t he entire set an d t 11e r1ull set 0,
-vve fir1d that there ar e 211' - 2 m arginal probability models.


Exa mple 5.23
As in Qu iz 5.10 , the ra ndom variables Y1 , ... , Y4 have the joint PDF

4 0 < Y1 < Y2 < 1, 0 < Y3 < Y4 < 1,

(5 .75)
0 otherwise.

f y,,Y. (yi, 'Y4) = 1: 1: f y., ,Y4 (Yi, ... , Y4) dyz dy3 . (5 .76)

In the foregoing integral, the hard part is ident ifying the correct limits . Th ese lim its
will depend on YI and y4. For 0 <YI < 1 and 0 < y4 < 1,

(5 .77)

T he complete expression for Jy ,Y (Y1 , y4)

1 4 is

4(1 - y1)Y4 0 <YI < 1, 0 <y4< 1,

(5. 78)
0 otherwise.

Simi larly, fo r 0 < y 2 < 1 and 0 < y3 < 1,

(5 .79)

T he complete expression for f y 2 ,y 3 (Y2 , y3) is

4y2( l - y3) O <y2 < 1, 0 <y3 < 1,

0 otherwise.

Lastly, for 0 < y3 < 1,

f '.Y.3 (y3) =Joo fY

2 ,Y.3 (y2 , y3) dy2 = fl 4y2( l
- y3) dy2 = 2(1 - y3) . (5 .81)

T he complete expression is

2(1 - y3) 0 < Y3 < 1,

0 otherwise.

Exarnple 5.22 dernonstrates that a fairly sirnp le experirnent car1 generate a joint
P MF that, in table forrn , is perhaps st1rprisingl}' long. Ir1 fact, a practical experi-
rner1t often generates a joir1t P MF or PDF t hat is forbiddingly cornp lex. The irn-
portar1t exception is an experimer1t that prod11ces n, independent randorr1 variables .
The follov.ring defir1it ion ex ter1ds the defir1it ion of ir1dependence of two r andorn vari-
ables. It stat es that X 1 , ... , X n ar e indeper1dent w11en t he joint P J\l.IF or PDF can
be factored int o a prodt1ct of n, rr1argir1al P J\l.IFs or PDFs.


Definition 5 .14 N Independent Random Variables

Ran,dorn variables X 1 , ... , X n are independent if for all x 1 , ... , x;n,


Independer1ce of n, r ar1dom variables is typically a property of an experirr1ent

consisting of ri indeper1dent st1bexperirnents, in "'' hich subexperirnent i produces
the rar1dom variable X ,;, . If all subexperiments follov.r the sarne procedt1re and h ave
the sarr1e observation, all of the X i h ave t h e sarne PMF or PDF. In this case , -vve
say the rar1dorn variables X ,;, are iden,tically distribv,ted.

- - - Definition 5.1 Independent and Identically Distributed (iid)

X 1 , ... , X n are independent and identically distributed (iid) if


Example 5.24
The random variables X 1 , . .. , Xn have the joint PDF

1 0 < x; ,;, < 1, i = 1, ... , n,,

0 otherwise.

Let A denote the event that rnaxi X ,;, < 1/ 2. Find P [A).
We can solve th is problem by applying Theorem 5.24:

(5 .84)

As grows , the probability that the maximum is less than 1/ 2 rapidly goes to 0.
We note that inspection of the joint PDF reveals that X 1 , ... , X4 are iid continuous
un iform (0, 1) random variables. T he integration in Equat ion ( 5 .84) is easy because
independence implies

P [A) = P [X 1 < 1/ 2, ... , X n < 1/ 2)

= P [X1 < 1/ 2) x x P [Xn < 1/ 2) = (1/ 2)
-. (5 .85)

s.11 M ATLAB 201

==-- Quiz 5 .10--==:::i

The randorn variables Y1 , ... , Y4 have t h e joir1t PDF

4 0 < Y1 < Y2 < 1, 0 < Y3 < Y4 < 1,

(5 .86)
0 oth er vvise.

Let C denote the event tr1at rn axi Yi < 1/ 2. F ir1d P [C ).

5.11 MA1.,L A B

It is convenient t o use J\IIATLAB t o generat e pairs of discret e ran-

dorn variables X and y witr1 an arbit rary joint P J\IIF. There are
no gen er ally app licable t ecr1r1iqu es for genera.ting sa.rnple p airs of
a cont inuous r andom variable. There are techniques t a ilored t o
specific joint P DFs, for exarr1ple, bivariate Gaussian.
MATLAB is a useful tool for stud}ring exp eri rr1en ts that p r od uce a pair of ran-
dom variables X, Y . Sirr1ulation experiments often depen d or1 t h e generation of
sarnp le pairs of r andom v ar iables witr1 specific probabilit}' models. That is, given a
joint P MF Px ,y(:;c, y) or PD F f'x,y(x , y), vve n eed to produce a collection of pairs
{(x1,Y1), (:c2,Y2), ... , (xrn,Yrn)} . For finite discrete randorn variables, vve are able
to develop sorne general techniques. For cont inuot1s rar1dom variables, we give sorne
specific examples.

Discrete Random Variables

vVe start v.rit h the case when X and Y ar e finite r ar1dom variables vvith ranges

Sy = {Y1) ... )Yrn} . (5 .87)

In t his case, vve can t ake advantage of J\IIATLAB techr1iques for surface plots of g(x, y)
over t he x, y plane. Ir1 J\IIATLA.B, vie represent Sx and Sy by t hen, elernent vector
sx and rn, elernent vector sy. T he function [SX , SY] =ndgrid (sx, sy) produces the
pair of n, x rn, rr1atrices,

SX =
IX! ~11 SY =
IYI Y;nl (5 .88)
lx.n ;i;.n J l;l Y~nj
We refer to rnatrices SX arid SY as a sarnple space grid because the}' a re a grid
r epresentation of the joint sam ple space

Sx ,Y = {(x,y) lx E Sx,Y E Sy} . (5 .89)



That is, [SX(i,j) SY(i,,j)] is t11e pair (xi,Y.i)

T o corr1plete the p robability m odel, for X arid Y , in J\l.IATLAB, err1ploy t11e n, x
m rnatrix PXY sucl1 t hat PXY(i , j) = Px ,Y(x,i, Y1) To rn a ke st1r e t hat probabilit ies
have been gener ated p roperly, we note that [SX( : ) SY( : ) PXY(: ) ] is a rn atrix
vvhose rovvs list all possible pairs Xi, Y.7 and corresponding probabilit ies Px,Y(x ,i, Y.7) .
Given a functior1 g(x, y) t11at oper ates on the elern ents o f vectors x and y,
t he advantage of this gr id approach is that t he MATLAB fur1ctior1 g(SX, SY) v.rill
calculate g(x , y) for each x E Sx a nd y E Sy . In par t icula r , g(SX,SY) produces
a n n, x rn, m atrix'\vitl1 i,jth elerr1en t g(x,i,Y.i)

Example 5.25
An Internet photo developer website prints co m pressed photo images. Each im age file
conta ins a va ri able-sized image of X x y pixels described by the joint PMF

Pxy(x,y) y = 400 y = 800 y = 1200

x = 8 00 0.2 0.05 0.1
(5 .90)
x = 1200 0.05 0.2 0.1
x = 1600 0 0.1 0.2.

For ra ndom variables X, y , write a script imagepmf .m tha t defines the samp le space
grid matrices SX, SY, and PXY.

In the script imagepmf .m, the matrix SX has [800 1200 1600]' for each column and
SY has [400 800 1200] for each row . After runn ing imagepmf . m, we ca n inspect
t he variables:
%irnageprnf .rn >> irnageprnf ; SX
PXY=[0.2 0.05 0.1; sx =
0.05 0.2 0.1; 800 800 800
0 0.1 0.2]; 1200 1200 1200
[SX,SY]=ndgrid( [800 1200 1600], ... 1600 1600 1600
[ 400 800 1200]) ; >>SY
SY =
400 800 1200
400 800 1200
400 800 1200

Example 5.26
At 24 bits (3 bytes) pe r pixel, a 10:1 image compression factor y ields image f iles with
B = 0.3x y bytes . Find the expected value E(B] and the PMF PB(b) .

%irnagesize .rn Th e script imagesize . m produces the expect ed va lue

irnageprnf; as eb , and produces the PMF , which is represe nted
SB=O. 3* (SX. *SY); by t he vectors sb and pb. The 3 x 3 matrix SB has
eb=surn(surn (SB.*PXY)) . J"th eIeme nt g ( Xi, Y.i ) = 0 .3;i;iY.i . Th e ca Icu Iat .ion
sb=unique(SB) ' of eb is sim p ly a MATLAB imp lem entat io n of The-
pb=finiteprnf (SB,PXY ,sb)' orem 5.9. Since some elements o f SB are identical ,
sb=unique (SB) extracts the un ique elemen t s. Alt hough SB and PXY are both 3 x 3

s.11 MATLAB 203

>> image size

eb -
sb -
96000 144000 192000 288000 384000 432000 576000
pb -
0 .2000 0. 0 500 0 .0500 0 .3000 0 .1000 0 .1000 0 .2000

Figure 5. 7 Output resulting fron1 image size. m in Example 5.26.

matrices, each is stored internally by l\IIATLAB as a 9-element vector. Hence , we can

pass SB and PXY to the f ini tepmf () funct ion, which was designed to hand le a fi nite
random variable described by a pair of column vectors . Figure 5. 7 shows one resu It
of running the program imagesize. T he vectors sb and pb comprise PB(b) . For
example, PB(288000) = 0.3.

Random Sample Pairs

For finite randorr1 variables X , y described by Sx , Sy and joint P l\!IF Px,y(x;,y),

or equivalentl}' SX, SY, a nd PXY in l\IIATLAB, we car1 generate r andorn sarr1ple
pairs t1sing t he function f ini terv ( s, p, m) defined in Chapter 3. R ecall t11at
x=fini terv(s ,p ,m) returned m sarnples (arranged as a column vector x) of a ran-
dom variable X such that a sarr1ple val11e is s (i) v.rith probability p (i). In fact,
to s11pport randorr1 variable pairs X , Y , the funct ion w=f ini terv ( s, p, m) perrr1its
s to beak x 2 rr1atrix -vvhere t11e rO'\VS of s en11merate all pa irs (x , y) wit11 nonzero
probabilit}' Giver1 the grid represent atior1 SX, SY, and PXY, we gener ate rn, sarr1ple
. .
pairs via
xy=finiterv([SX(:) SY(:)] ,PXY(:),m)

In part icular, the 'i th pair , SX(i) ,SY(i) , '\vill occur wit11 probability PXY(i). The
ot1tput xy -vvill be a n rn, x 2 m atrix such that each ro-vv represer1ts a sam ple pair
x, y .

===- Exam p Ie 5. 2 7i"---==::::1

Write a funct ion xy=imagerv(m) that generates rn sa m ple pairs of the image size
random variables X , Y of Example 5 .26.

T he funct ion imagerv uses the imagesize .m script to define the matrices SX, SY,
and PXY. It then ca lls the finiterv .m funct io n . Here is the code imagerv .m and a
sample run :
function xy = imagerv(m); >> xy=imagerv(3)
imagepmf; xy -
S= [SX ( : ) SY ( : ) ] ; 800 400
xy=finiterv(S,PXY(:),m); 1200 800
1600 800


Example 5.27 car1 be generalized to produce sarnple pairs for an:y discrete random
vari able pair X , y . Hovvever , giver1 a collection of, for exarnple , rn, = 10, 000 sarr1ples
of X , y , it is desirable to be able to check v.r11ether the code generates the sarnple
pairs properly. In particula.r ' V\Te vvish to check for eacr1 ;_r; E x and y E y vvhether s s
the relative frequency of x;, y in rn, sam ples is close to Px ,Y(x;, y) . Ir1 the follo\ving
exarr1ple, vie develop a program t o calculate a rr1atrix of relative frequencies that
corresponds t o the rr1atrix PXY.

Example 5.28
Given a list xy of sample pa irs of random variables X , Y with JVI.A.TLAB range grids
SX and SY, wr ite a l\IIATLAB function fxy=freqxy (xy, SX, SY) that ca lculates the
relat ive frequency of every pa i r x, y . T he outp ut fxy shou ld correspond to the matrix
[ SX ( : ) SY ( : ) P XY ( : ) ] .

function fxy = freqxy(xy,SX,SY) T he matrix [SX (:) SY(:) ] in freqxy has

xy= [xy; SX ( : ) SY ( : ) ] ; rows that list al l possi ble pairs x, y . We append
[U,I,J]=unique(xy,'rows'); this matrix to xy to ensure that t he new xy has
N=hist(J,1:max(J))-1; every possible pa ir ;r: , y. Next, the unique func-
N=N/sum(N); tion copies all un ique rows of xy to the matrix
f xy= [U N( : ) ] ; U and also provides the vector J that i ndexes
fxy=sortrows(fxy, [2 1 3]);
the rows of xy in U; that is, xy=U(J) . In add i-
t ion, the number of occurrences of j in J ind icates the number of occurrences in xy of
row j in U. Thus we use t h e hist function on J to ca lculate the re lative frequencies.
We i nclude the correction factor -1 because we had appended [SX (:) SY(:)] to
xy at t he start . Last ly, we reorder the rows of fxy because the output of unique
produces the rows of U in a d ifferent order from [SX (:) SY(:) PXY (: )] .

MATLAB provides the fur1ctior1 stem3 (x, y, z) , "'' here x , y, a nd z are length n,
vect ors, for visualizing a b iv ariate P l\![F Px,Y(x;, y) or for visualizir1g relative fre-
quencies of sample values of a pair of random variables. At eac11 position x (i) , y (i)
on the xy plane, the function draws a stem of heig11t z (i).

Exa mple 5.29

Generate rn, = 10, 000 samples of random variables X , y of Example 5.26. Calculate
the relat ive f requencies and use stem3 to graph them.

The script imagestem. m generates the fol lowing re lat ive frequency stem plot .
'/.imagestem.m . ' . . . . .. t o t t f I o I
.' ..
imagepmf; .. . . .
xy=imagerv(10000); 0.2
stem3(fxy(:,1), ... 0.1 t o t t o I o t I

fxy(: ,2),fxy(: ,3)); "'

t I t t 0 t ..
o f ' f t I
.. f

I I o I o
xlabel ('\it x'); 0 -~::::--_:__-__:..:
. ~___..
1200 800
ylabel('\it y'); 400 0 0 x

s.11 MATLAB 205

Continuous Random Variables

For continuot1s randorr1 \ra riables , MATLAB can be 11seful ir1 a variet:y of ways . Sorr1e
of these are obvious. For exarnple, a joint PDF f x,Y(x, y) or CD F Fx,Y(x, y) can
be \rieV\red using the function p lot3. Figure 5.4 was generated this way. Howe\rer ,
for ger1er atir1g sarr1ple pairs of continuot1s randorn variables, there are no general
techniques such as the sarnple space grids we err1ployed vvith discr ete rar1dorn vari-
ables .
W hen \rve introd11ced continuo11s randorn \ra.r iables in C11apter 4 , we also intro-
duced farriilies of vvidel}' used randorr1 \rariab les . In Section 4.8, we provided a
collection of MATLAB ft1n ctions s11ch as x=erlangrv (n, l ambda , m) to generate m
sarnples frorr1 the corresponding PDF. HoV\rever , for pairs of continuous r andom
variables, "''e int rodt1ced only or1e family of probability rr1odels, r1amel}' the bi\rar i-
ate G aussian randorr1 variables X and y . For t h e bivariate Ga ussia.r1 model, we
can use Theorem 5.21 and the randn funct ion to generate sample values. T11e
cornrnand Z=randn(2, 1) ret 11rns t he vector Z = [Z1 Z 2 \r.rhere Z 1 and Z 2 are
iid G aussian (0, 1) randorn \rar iables . Next "''e forrr1 the lir1ear corr1binat ions

W1 = 0"1Z1 (5 .91a)

W2 = p0"2Z1 + V(1 - p2 )0"~Z2 (5.91b)

Frorn Theorerr1 5.21 we knoV\r t hat W 1 and l;T/ 2 ar e a bivariat e Gaussian pair . In
addit ion, from t11e forrr1 u las given in Theorerr1 5.21 , \rve can sho\r.r t h at E [lV1]
E[W2] = 0, Var[W1] = O"r , Var ['VT12 ] = O"~ and p 11v 1 , w2 = p. This implies t hat


is a pair of bivariate G aussian randorr1 \rariables "'' ith E [X ,i ] = /J,i , v ar[X,i ] = O"f,
and px 1 ,x 2 = p. We irr1plem ent this algorithm t hat transforrns t 11e iid pair Z 1, Z 2
ir1to t11e bivariate Gaussiar1 pair X 1, X 2 in the MATLA.B functior1

xy=gauss 2var(mx,my,sdx, s dy,r,m)

The output xy is a 2 x rn, rnatrix in \r.rhich each 2-element col11mr1 is a sarnple of

a bi\rar iate G aussian p air X , Y with pararr1eters ,x = mx, ,y = my, O" x = s dx,
O" y = sdy and covariance Px,Y = r.

f un ction xy=gauss2rv(mx,sdx,my,sdy ,r ,m) In t his code , mu is a 2 x rn, matrix in

mu= [mx my J ' ; which eac11 colurr1r1 holds t11e pair mx,
cxy=r*sdx*sdy; my. Each colt1rr1n of randn(2 ,m) is a
C= [sdx~2 cxy; cxy sdy~2]; pair Z 1, Z 2 of independer1t Gaussian
xy=gaussvector (mu, C,m); (0, 1) r andorn vari ables. T11e calcu-
lation A*randn(2,m) irr1plem ents Equation (5.91) form, difl'erer1t pairs Z 1, Z 2 .


g .--~~~....-~~~--~~~-
The sarnple outp11t of gauss2var shovvn here is
prod11ced v.rith the cornrr1ands
6 >> xy=gauss2rv(3,3,5,1,0.5,500);
>> plot(xy(1,:),xy(2,:),'.');

4 vVe obser ve t11a.t t he center of t 11e clo ud is

(JJ,x , /J, y) = (3, 5). In a ddi tior1, '""'e r1ote t h at

the X and Y axes are scaled different l}' beca11se
-10 0 10 20 CJx = 3 andCJy =l .

We observe t hat this ex arr1ple vvith Px ,Y = 0.5 r ar1dorn variables t hat are
less correlated than t he ex arnples in F igure 5.5 vvith IPI = 0.9.
We note t 11at b ivariat e G aussian r andorr1 varia bles are a sp ecial case of n,-
dimensional Gaussian r ar1dorn vectors, v.r11ich are int roduced in Chapter 8. Based
on linear algebr a techniques ,C11apter 8 introduces t he gaussvector function t o
gen er ate sarnples of G aussian r andorn vectors t h at gener alizes gauss2rv to ri di-
Beyond bivariat e Gaussian pairs, t here exist a variet y of t ec11niques for generat-
ing sarnple vah1es of p airs of cont ir1uous r ar1dom variables of specific t}rpes . A basic
approach is t o gener ate X based on t he m a rginal PDF 1x(x) and t hen gener ate
Y llSing a condit ional pro bability rnodel t hat depends or1 t he val11e of X. Condi-
t ional probability rnodels and MAT LAB techniques t11at emplO}' t hese rr1odels are
the subject of C11apter 7.

Difficulty: Easy Moderate D ifficu lt + Experts Only

5 .1 .1 R andom variables X and Y have t he (e) Fx ,y(oo,y)

joint CDF

(1 - e- x)(l - e - Y) 1; > O; 5 .1.3 For continuous random variables

Fx,Y(x , y) = y > 0, X, Y 'vit h j oint C DF Fx ,Y(x , y) and
inarginal CD Fs Fx( x) and Fy(y), find
P [x1 < X < 1; 2 Uy1 < Y < y2] . This is t he
probability of t he shad ed "cross" region in
(a) W hat is P [X < 2, Y < 3] ? t he following d iagram.
(b) W hat is t he marginal CDF, Fx(x )? y

(c) \i\f hat is t he m arginal C DF, Fy(y)? .......

5.1 .2 Expr ess t he followin.g extr em e val- ..............

~~~y ;.....~: :~:.~:.~:.~:.~:.~:.~: ~j~j~j~j~j~j~j~.....:~ .:~.:~.:~.:~.:~.:~.:~ .~:.............

ues of Fx ,Y(x, y) in terms o f t he m arginal
cumulative d istribution fu nctions Fx(1;) ~~~~~~~
and Fy(y) .
............ ...
(a) Fx ,y(x , -oo) ................................................
(b) Fx ,y(1;, oo) ...........................
(c) Fx,y(-00,00)
(d) Fx ,y(-oo , y)


5.1 .4 R andom variables X and Y have (b) \tV hat is P [Y < X]?
CDF Fx(x) and Fy(y) . Is F(x , y) = ( c) \t\l hat is P [Y > X ]?
Fx(x)Fy(y) a valid CDF? Expla in your an-
s,ver .
(d ) \tV hat is P [Y = X]?
(e) W hat is P [X < 1]?
5.1 .5 In t his p roblem , \V e prove Theo-
r em 5.2. 5.2.3 Test t\vo integrated circuits. In each
test , t he probabilit y of r ejecting t he circuit
(a) Sketch t he following even ts on t he X , Y
is p, independent of t he other test. Let X
be t he number of r ej ects (eit her 0 or 1) in
A= {X < X1 ' Y1 < y < y2}' t he fi rst test and let Y be t he number of
r ejects in t he second test . F ind t he joint
B = {::r 1 < X < x2 , Y < y1},
P l\!IF Px,y(1;, y) .
C = {x1 < X < x2 , Y1 < Y < y2} . 5.2.4 F or two independen t flips of a fair
(b) Express t he p r obabilit y of t he events coin, let X eq ual t he total n um ber of tails
A, B, a nd A U B U C in term s of t he and let Y equal t he n um ber of head s on t he
joint CD F Fx ,Y(x, y) . last flip. Find t he joint Pl\!IF Px ,Y(x, y) .

(c) Use t he observation t hat even ts A , B , 5.2.5 In F igure 5.2, t he axes of t he figures
and Car e mut ually exclusive to prove are labeled X and Y because t he figtues
Theorem 5. 2. d epict possible values of t he ra ndom vari-
ables X and Y . Ho,vever , t he figure at t he
5.1 .6 Can t he following function be t he end of Examp le 5.3 depicts Px,y(x , y) on
joint CDF of random variables X and Y? axes labeled 'vi t h lovvercase x and y . Should
t hose axes be labeled wit h t he upper case X
1- e - (x + y ) x>O , y> O, and Y? Hint : R easonab le arguments can
F(1;, y) = { O otherwise. be m ade for both views.
5.2.6 As a gener alization of Example 5.3,
5.2.1 R andom variables X and Y have t he consid er a test of n circuits such t hat each
joint P l\!IF circuit is accep tab le wit h probabilit y p , in-
dependen t of t he outcome of any other test.
P.x ,y ( x,y) = {
cxy x = 1, 2, 4; y = 1, 3, Show t hat t he joint Pl\!IF of X, t he number
0 other\vise. of acceptable circuits, and Y , t he number
of acceptable circuits found before observ-
ing t he first rej ect , is
(a) W hat is t he value of t h e constant c?
(b) W hat is P [Y < X]? Px ,Y (;i;, y)
( c) \i\f hat is P [Y > X ]? (nx y y I)Px(l - p)n- x 0 < y < ;i; < n ,
(d) W hat is P [Y = X]? P x = y = r1,,
( e) \ i\1 hat is P [Y = 3]? 0 other,vise.

5.2.2 R andom variables X and Y have t he

Hint : For 0 < y < x < n , show t hat
joint P l\!IF {x = x, Y = y} = _4 n B n c,
cl1; +YI x = - 2, 0, 2; where
Px,Y (x , y) = y = -1 , 0, 1, A: The fi rst y tests are acceptable.
0 otherwise. B: Testy + 1 is a rej ection.
C : The rema ining n, - y - 1 tests yield
(a) W hat is t he value of t h e constant c? x - y acceptable circuits


5.2.7 \i\fith t'vo minutes left in a fi ve- first reject is found. F ind the joint PMF
minute overtime, t he score .is 0- 0 in a Rut- PK,x(k , x).
gers soccer inatch versus Villanova. (Note
5.3.1 Given the random variables X and
that the overtime is NOT s11,dden- death)
Yin Problem 5.2.1, find
In the next-to-last minute of t he game, ei-
ther (1) Rutgers scores a goal with prob- (a) The marginal PMFs Px(x) and Py(y),
ability p = 0.2, (2) 'lillanova scores with (b) The expected values E[X) and E [Y ],
probability p = 0.2, or (3) neither team
scores with probability 1 - 2p = 0.6. If nei- (c) The standard deviations O' x and O'y.

ther team scores in the next-to-last minute,

5.3.2 Given the random variables X and
t hen in the final minute, eit her (1) Rutgers
scores a goal with probability q = 0.3, (2) Yin Problem 5.2.2, find
V illanova scores with probability q = 0.3 , (a) The marginal PMFs Px(x) and Py(y),
or (3) neit her team scores w ith probability (b) The expected values E[X) and E [Y],
1 - 2q = 0.4. However, if a team scores in
the next-to-last minute, t he trailing team (c) The standard deviations O' x and O'y.

goes for broke so that in the last minute, ei-

t her (1) the leading team scores with prob- 5.3.3 For n, = 0, 1, .. . and 0 < k < 100,
ability 0.5 , or (2) t he trailing team scores the joint Ptv1F of random variables N and
J{ is
'vith probability 0.5. For the final two min-
utes of overtime:
PN,K (n,, k)
(a) Sketch a probability tree and construct
a table for PR,v(r, v), t he joint PMF of = l OOne - lOO (100) k(l _ )100 - k
I i. p p .
ri. K:
R, the number of Rutgers goals scored,
and V, the number of Villanova goals
scored. Other,vise, PN,K(n, k) = 0. F ind t he
marginal P lVIFs PN(r1,) and P 1<(k).
(b) What is t he probabilit)r P [T) t hat t he
overtime ends in a t ie? 5.3.4 Random variables X and Y have
joint P lVIF
( c) \i\fhat is the IY~l[F of R, t he number of
goals scored by Rutgers? 1/ 21 X= 0, 1, 2, 3,4,5;
( d) What is the PMF of G , the total num- Px,Y(x,y) = y = 0 , 1, ... 'x'
ber of goals scored? 0 otherwise.

5.2.8 Each test of an integrated circuit Find the marginal PMFs Px(:i;) and Py(y)
produces an acceptable circuit 'vith proba- and the expected values E[X) and E [Y).
bility p, independent of t he outcome of the
5.3.5 Random variables N and J{ have
test of any othe.r circuit. In testing n, cir-
the joint P~IIF
cuits, let J{ denote the number of circuits
rejected and let X denote the number of ac- k = l , . .. ,n;
ceptable circuits (either 0 or 1) in t he last n =l ,2, . ..

test. Find the joint PMF PK,x(k, 1;). other,vise.

5.2.9 Each test of an integrated circuit Find t he marginal PMFs PN(n,) and P 1<(k).
produces an acceptable circuit 'vith proba-
5.3.6 Random variables N and K have the
bility p, independent of t he outcome of the
joint P~!lF
test of an y other circuit. In testing n, cir-
cuits, let J{ denote the number of circuits k = 0 ,1 ,. . . ,n;
n = 0 ,1, .. .
rejected a nd let X denote t he number of
acceptable circuits that appear before t he other,vise.


F ind t he marginal P MF PN(n) . Sho'v t hat Sketch t he region of nonzero probabilit y

t he marginal P MF PK( k) satisfies PK(k) = and ansvver t he following questions.
P[N > k]/100. (a) \tVhat is P [X > OJ?
5.4.1 R andom variables X and Y have t he (b ) '\ i\f hat is .fx( x)?
j oint PDF
( c) W hat is E[X]?
f x,Y(x, y ) = {c
x > 0, y > 0, x
o ther wise.
+ y < 1,
5.5.2 R andom variables X and Y have
joint PDF

(a) W hat is t he value of t h e constant c? ex O <:i; < l ,O < y < l

f x ,y(:i;, y ) =
(b) W hat is P [X < Y ]? {0 ot herwise
(c) W hat is P [X + Y < 1/ 2] ?
5.4.2 R andom variables X and Y have (a) F ind t he constant c .
j oint P DF (b) F ind t he m arginal P D F f x( x) .

cxy 2 O <x< l , O < y < l , ( c) _Are X and Y independen t? Justify

f x ,Y (x, y )= { your answer.
0 otherw ise.
5.5.3 X and Y are r andom variables 'vit h
(a) F ind t he constant c . t he j oint PDF
(b) F ind P [X > Y] and P [Y < X 2 ] .
(c) F ind P [min(X , Y ) < 1/ 2]. f x ,Y(x , y ) = { ~ x + y < 1 )1; > 0 , y > 0 ,
oth er wise.
(d ) F ind P[max(X , Y ) < 3/ 4].
5.4.3 R andom variables X and Y have
(a) \tVhat is t he marginal P DF f x(x)?
joint P DF
(b) \tVhat is t he marginal PDF f y(y)?
. { 6e - ( 2x + 3y) x> O, y > O,
f x ,y (x , y ) = 5.5.4 Over t he circle X 2 + Y 2 < r 2 , ran-
0 other,vise.
dom variables X and Y have t he uniform
(a) F ind P[X > Y ] a nd P [X + Y < l ].
(b) F ind P [1nin(X , Y ) > l ]. ( . y ) -_ 1/ ( 7rr 2) x
+ y 2. < r 2 ,
X .Y J.,, { 0 ot herwise.
(c) F ind P [max(X , Y) < l ].

5.4.4 R a ndom variables X and Y have

j oint P DF (a) W hat is t he marginal P DF f x(x)?
(b) \tVhat is t he marginal PDF f y(y )?
O < y <::e < l ,
f .x ,Y(x, y ) = { 8xy
0 other,vise. 5.5.5 X and Y are random variables wit h
t he j oint PDF
Follo,ving t he method of Example 5.8, find
t he j oint CDF Fx ,y(:i;, y ). .
f X , Y (:i;, y) =
{5x /2
2 - l <x< l
- - '
O<y< x 2
5.5.1 R andom variables X and Y have t he ot her,vise.
j oint PDF

-l <:i; < y < l , (a) W hat is t he marginal P DF f x(x)?

1/ 2
f x ,y(x,y) =
{0 ot herwise. (b) \tVhat is t he marginal P DF f y(y )?


5.5.6 Over the circle X 2 + Y 2 < r 2, ran- stra,vberry supplier is 300 miles away. An
dom variables X and Y have the l")DF experiment consists of monitoring an order
and observing vV, the weight of t he order,
f X ,Y(X, y) = { ~ lxyl ;r x2+112 < r2 , and D, the distance the shipment must be
otherwise. sent. The following probability model de-
scribes the experiment :
(a) What is the marginal PDF fx(1;)? van. choc. stra\v.
small 0.2 0.2 0.2
(b) What is the marginal PDF fy(y)? big 0.1 0.2 0.1
5.5.7 For a random variable X , let Y = (a) What is the joint PMF Pw,D('l11, d) of
aX + b. Show that if a > 0 then px,Y = 1. the weight and the distance?
Also sho'v that if a < 0, t hen px ,Y = -1. (b) F ind the expected shipping distance
5.5.8 Random variables X and Y have
joint PDF ( c) Are W and D independent?

5.6.2 A company receives shipments from

(x +y)/3 0<;;.; <1;
t\vo factories. Depending on the size of the
fx ,y(1;,y)= O<y<2,
order, a shipment can be in
otherwise. 1 box for a small order,
2 boxes for a medium order,
(a) Find the marginal PDFs fx(x) and 3 boxes for a large order.
fy(y). The company has t\vo different suppliers.
(b) What are E[X] and Var[X]? Factory Q is 60 miles from the company.
Factory R is 180 miles from t he company.
( c) \i\fhat are E[Y] and Var[Y]? An experiment consists of monitoring a
ship1nent and observing B, t he number of
5.5.9 Random variables X and Y have t he
boxes , and J\lf, t he number of miles the
joint PDF shipment travels. The follo,ving probabil-
ity model describes the experiment:
f X,Y (x , y) = { ~J) O<y<x<l,
Factory Q Factory R
small order 0.3 0.2
medium order 0.1 0.2
(a) Dra'v t he region of nonzero probability. large order 0.1 0.1
(b) What is the value of t he constant c? (a) F ind PB,1VJ(b, m,), the joint PlVIF of the
number of boxes and the distance.
(c) \i\fhat is Fx(x)?
(b) \tVhat is E[B], the expected number of
(d) What is Fy(y)?
( e) \i\1 hat is P [Y < X / 2]? ( c) Are B and J\lf independent?
5.6.1 An ice crea1n company needs to or- 5.6.3 Observe 100 independent flips of a
der ingredients from its suppliers. Depend- fair coin. Let X equal the number of heads
ing on the s ize of t he order, the weight of in t he first 75 flips. Let Y equal the num-
the shipment can be either ber of heads in t he remaining 25 flips. Find
1 kg for a small order, Px(1;) and Py(y). Are X and Y indepen-
2 kg for a big order. dent? F ind Px ,y(x, y).
The company has t hree different suppliers. 5.6.4 Observe independent flips of a fair
The vanilla supplier is 20 miles away. The coin until heads occurs t'vice. Let X 1 equal
chocolate supplier is 100 miles away. The the number of flips up to and including the

first H. L et X2 equal the number of addi- 5.7. 1 Continuing Problem 5.6.1, the price
t ional flips up to and including the second per kilogram for shipping the order is one
H. What are Px 1 (x1) and Px 2 (x2). Are X1 cent per mile. C cents is the shipping cost
and X 2 independent? F ind Px 1 ,x2 (x1, x2). of one order. What is E[ CJ?
5.6.5 X is the continuous uniform (0, 2) 5.7.2 Continuing Problem 5.6.2, the price
random variable. Y has the continuous uni- per mile of shipping each box is one cent per
form (0, 5) PDF, independent of X. \i\fhat mile the box travels. C cen ts is the price of
is the joint PDF f~'<,Y(x, y)? one shipment. What is E [C], the expected
price of one shipment?
5.6.6 X1 and X2 are independent random
variables such that X i has PDF 5.7.3 A random ECE sophomore h as
height X (rounded to the nearest foot) and
x > 0, GPA Y (rounded to the nearest integer).
other,vise. These random variables have joint PMF

\i\fhat is P [X2 < X1]? Px,y(1;,y)l11=l y= 2 y=3 y=4

5.6.7 In terms of a positive constant k, x=5 0.05 0.1 0.2 0.05
random variables X and Y have joint PDF x=6 0.1 0.1 0.3 0.1

. , y) = {k+ 3x
2 - l / 2<x<l/ 2,
- l / 2<y<l/ 2,
Find E [X + Y] and Var[X + Y].
0 otherwise. 5.7.4 X and Y are independent, iden-
t ically distributed random variables \vith
(a) W hat is k?
(b) What is the inarginal PDF of X? 3/4 k=O,
(c) \i\fhat is the marginal PDF of Y? Px(k)=Py(k)= 1/4 k=20,
0 otherwise.
(d) Are X and Y independent?
Find t he follo,ving quantities:
5.6.8 X1 and X2 are independent, iden-
t ically distributed random variables with E [X] , \!ar[X],
PDF E [X + Y], Var[X + Y], E [XY2xYJ .

.( )_ {x/ 2 0 < x < 2,

f xx - 0 otherwise.
5.7.5 X and Y are random variables 'vith
E [X ] = E[Y] = 0 and Var[X] = 1, Var[Y ] =
4 and correlation coefficient p = 1/ 2. F ind
(a) Find t he CDF, Fx(x). v ar[X + Y].
(b) What is P [X1 < 1, X2 < l ], the prob- 5.7.6 X and Y are random variables such
ability that X1 and X2 are both less that X has expected value f.lX = 0 and
t han or equal to 1? standard deviation ax = 3 \V hile Y has ex-
(c) Let vV = max(X1 , X2). \l\fhat is pected value JLY = 1 and standard devia-
Fw(l) , the CDF of W evaluated at t ion ay = 4. In addition, X and Y have
'UJ= l? covariance Cov[X , Y] = -3. F ind the ex-
pected value and variance of W = 2X +2Y.
(d) F ind t he CDF Fw(1D).
5.7.7 Observe independent flips of a fair
5.6.9 Prove that random variables X and coin until heads occurs t'vice. Let Xi equal
Y are independent if and only if the number of flips up to and including the
first H. Let X2 equal the number of ad-
Fx,Y(x,y) = Fx(:i;) Fy(y). ditional flips up to and including the sec-


ond H. Let Y = X1 - X2. Find E[Y) and Ans,ver t he follo,ving questions.

Var[Y). Hint : Don't t ry to find Py(y).
(a) W hat ar e E[X) and Var[X)?
5.7.8 X1 and X2 ar e independent iden t i-
(b) \i\fhat are E [Y) and \far [Y)?
cally distributed random variables wit h ex-
pected value E[X] and varia nce Var[X]. (c) \tVhat is Cov [X, Y]?
(a) W hat is E [X1 - X2)? (d ) \i\!hat is E [X + Y )?
(b) W hat is Var[X1 - X2 )?
(e) W hat is \ far[X + Y)?
5.7.9 X and Y are identically d istributed
r andom variables \Vi t h E[ X ) = E [Y) = 0 5.7.14 Random variables X and Y have
and covariance Cov[X , Y ) = 3 and correla- joint PDF
t ion coefficient px ,Y = 1/2. For nonzero
constants a and b, U = aX and V = bY.
(a) Find Cov[U, VJ.
f X,Y(X, JJ) = { ~ O <y<:i; <l,
(b) F ind t he correlation coefficien t p u ,\r .
( c) Let vV = U + V . For \vhat values of a (a) \t\fhat are E [X) and Var [X)?
and b are X and W uncorrelated ?
(b) \i\fhat are E [Y) and\! ar [Y)?
5.7.10 True or False: }Dr identically d is-
tributed random var iables Y1 and Y2 wit h (c) \t\f hat is Cov [X, Y)?
E[Y1) = E [Y2) = 0, Var [Y1 + Y2) > Var[Y1). (d ) \i\!hat is E [X + Y )?
5.7.11 X and Y are random variables 'vith (e) \i\fhat is Var[X + Y)?
E[X) = E [Y) = 0 such t hat X has standard
d eviation a x = 2 \vhile Y has standard de- 5.7.15 A t ransmitter sends a s ignal X
viation a y = 4. and a r eceiver makes t he observation Y =
(a) For V = X - Y, what a re t he sm allest X + Z , \vhere Z is a r eceiver noise t hat is
and largest possible values of Var[V)? independent of X a nd E[X) = E[Z) = 0.
(b) For vV = X-2Y, \\rhat are t he smallest Since t he average po,ver of t he signal is
and largest possible values of Var [vV)? E [X 2) a nd t he average power of t he noise
is E [Z 2), a quality m easure for t he received
5.7.12 Random variables X and Y have signal is t he signal- to-noise ratio
joint P DF

.fx ,Y(X , y) = {4:Q:ey O<x<l,O<y<l ,

Ho\V is r related to t he correlation coeffi-
cient px ,Y?
(a) W hat ar e E[X) and Var [X)?
(b) W hat are E[Y) and Var[Y)? 5.8. 1 X and Z are independen t random
variables wit h E[X) = E [Z) = 0 and var i-
( c) \i\fhat is Cov[X, Y)?
ance Var[X ) = 1 a nd Var[Z) = 16. Let
( d) W hat is E [X + Y)? Y = X + Z. Find t he correlation coefficient
(e) \i\fhat is v ar[X + Y)? p of X and Y. _Are X and Y independen t?

5.7.13 R andom variables X and Y have 5.8.2 For t he random variables X and Y
joint PDF in Problem 5.2.1, find

/2 (a) The expected value of W = Y / X,

( 51; -1 < x < 1;
fx,Y(x,y) = O <y<:i;2 ' (b) The correlation, r x ,Y = E [XY),
0 otherwise. (c) The covariance, Cov[X, Y),


(d) The correlation coefficient, p x ,Y, (c) The correlation, r x,Y = E[ XY],
(e) The variance of X + Y, Var[X + Y]. (d) The covariance, Cov[X, Y ],
(Refer to the results of Problem 5.3.l to an- (e) The correlation coefficien t, p x,Y.
swer some of t hese questions.)
5.8.7 For X and Y with P l\/IF Px ,y(x, y)
5.8.3 For the random variables X and Y given in Problem 5.8.6, let W = min(X, Y)
in Problem 5.2.2 find and V = max( X , Y). F ind
(a) The expected value of vV = 2XY ' (a) The expected values, E [W ] and E[V],
(b) The correlation, r x ,Y = E[XY] , (b) The variances , Var [vV] and Var[V],
(c) The covariance, Cov[X, Y], (c) The correlation, rw ,v,
(d) The correlation coefficient, p x ,Y, (d) The covariance, Cov[W, VJ,
(e) The variance of X + Y, Var[X + Y]. (e) The correlation coefficient, p w ,v.
(Refer to the results of Proble1n 5.3.2 to an- 5.8.8 Random variables X and Y have
swer some of t hese questions.) joint PDF
5.8.4 Let H and B be the random vari-
ables in Quiz 5.3. F ind TH,B and Cov[H, B J. . ,Y(x,y)
fx = { 1/ 2 -1 <:i; <y <l,
0 other,vise.
5.8.5 X and Y are independent random
variables with PDFs Find r x ,Y and E[ex+Y ].
le- x/ 3 x > 0, 5.8.9 This problem outlines a proof of
fx(x) = { ~ Theorem 5.13.
l e- y/2 (a) Show that
y > 0,
fy(y)= ~
{ other\vise. X - E[X] = a(X - E [X]),
Y- E[Y] = c(Y- E [Y]).
(a) F ind the correlation r x ,y.
(b) Use part (a) to shovv that
(b) F ind the covariance Cov[X, Y].

5.8.6 The random variables X and Y have Cov [x, Y-] = acCov [X, Y].
joint Pl\!IF
( c) Show that Var[X] = a,2 v ar[X] and
Var[Y] = c 2 Var[Y ] .
4 e lG
( d) Combine parts (b) and ( c) to relate
..l.. Px,Y and px,Y -
3 lG

l ...l. ..l.. 5.8. 10 Random variables N and K have

2 8 12 e lG
the joint P l\/IF
...l. ...l.
1 12 e lG PN,K ( n,, k)

(1 - p)n- 1 p/n k = 1, ... , ri;
0 1 2 3 4 rl, = 1,2, ... ,
0 otherwise.
F ind
Find the marginal Pl\!IF P jv (n) and the ex-
(a) The expected values E[X] and E[Y], pected values E[N], v ar [N], E[N2 ], E [I<],
(b) The variances Var [X ] and Var[Y], Var[ I<], E[N + K], r 1v,K, Cov[N, I<].


5.9.1 Random variables X and Y have (b) The PDF f x ,Y(x, y) is unifor1n over the
joint PDF 50 cm circular target.
- (x 2 / 8 ) - ( 2 / 18) (c) X and Y are iid Gaussian (Jl, = 0, a=
fx,Y(x, y) = ce , Y .
10) random variables.
\i\fhat is the constant c? Are X and Y in-
5.9.7 A person's white blood cell (WBC)
count W (measured in thousands of cells
5.9.2 X is the Gaussian (p, = 1, a = 2) per microliter of blood) and body temper-
rando1n variable. Y is t he Gaussian (Jl, = ature T (in degrees Celsius) can be mod-
2, a = 4) random variable. X and Y are eled as bivariate Gaussian rando1n variables
independent. such that W is Gaussian (7, 2) and T is
Gaussian (37, 1). To determine \vhether a
(a) What is the PDF of V = X + Y?
person is sick, first t he person's tempera-
(b) What is the PDF of vV = 3X + 2Y? ture 'J' is measured. If T > 38, then the per-
son's WBC count is measured. If vV > 10,
5.9.3 TR.U E OR F i\.LSE: X 1 and X 2 are
the person is declared ill (event I).
bivariate Gaussian random variables. l:<"br
any constant y, there exists a constant a (a) Suppose W and T are uncorrelated.
such that P[X1 + aX2 < y) = 1/ 2. What is P[I)? Hint: Draw a tree di-
agram for the experiment.
5.9.4 X1 and X2 are identically dis-
(b) No\v suppose W and T have correla-
tributed Gaussian (0, 1) random variables.
t ion coefficient pw,'r = 1/ J2. F ind the
Moreover, they are jointly Gaussian. Under
condit ional probability P[IIT = t] that
'vhat condit ions are X1 , X2 and X1 + X2
a person is declared ill given t hat the
identically distributed?
person's temperature is T = t.
5.9.5 Random variables X and Y have
joint PDF 5.9.8 Suppose yo ur grade in a probabil-
it y course depends on your exam scores X 1
( 2x 2 - 4xy+4y 2 )
j .x ,.y ( ::i,, , y ) -_ ce - . and X2. The professor, a fan of probability,
releases exam scores in a normalized fash-
ion such that X 1 and X 2 are iid Gaussian
(a) What are E[X) a nd E [Y)? (Jl, = 0, a = J2) random variables. Your
semester average is X = 0.5(X1 + X2).
(b) F ind the correlation coefficient px, y.
(a) You earn an A grade if X > 1. \i\fhat
(c) \i\fhat are v ar [X ) and Var[Y)?
is P [A)?
(d) What is the constant c? (b) To improve his SIRS (Studen t Instruc-
( e) Are X and Y independent? t ional Rating Service) score, the profes-
sor decides he should award more A's .
5.9.6 An archer shoots an arro\v at a Now you get an A if max(X1 , X2) > 1.
circular target of radius 50 cm. The ar- \i\fhat is P [-4) no,v?
ro'v pierces the target at a random posi-
(c) The professor found out he is unpop-
t ion (X, Y), measured in centimeters from
ular at ratemyprofessor. com and de-
the center of the disk at position (X, Y) =
cides to a\vard an A if either X > 1 or
(0, 0). The bullseye is a solid black circle
max(X1, X2) > 1. No\v what is P [A)?
of radius 2 cm, at the center of the target.
Calculate t he probability P [BJ of the event (d) u nder criticism of grade inflat ion from
t hat the archer hits the bullseye under each t he depart1nent chair, the professor
of the following models: adopts a new policy. An A is a\varded
ifmax(X1,X2) > 1 andmin(X1,X2) >
(a) X and Y are iid cont inuous uniform
0. N O\V 'vhat is P [A)?
(-50, 50) random variables.


5.9.9 Your course grade depends on t'vo where

test scores: X1 and X2. 'Y our score Xi on
test i is Gaussian ( = 74, a = 16) ran- x > 0,
sgn(x)= {l
dom variable, independent of any other test -1 x < o.
(a) \i\l ith equal weighting, grades are de- (a) F ind t he CD F Fy1 (y1) in terms of the
termined b y Y = X 1/2 + X2/2. You <P ( ) function.
earn an A if Y > 90. What is P[-4] =
(b) Sho'v that Y1 and Y2 are both Gaussian
P [Y > 90]?
random variables.
(b) A student asks t he professor to choose
( c) i\re Y1 and Y2 bivariate Gaussian ran-
a 'veight factor 'llJ, 0 < 'l.JJ < 1, such that
dom variables?

5.10.1 E very laptop returned to a repair

F ind P[-4] as a function of the we ight center is classified according its needed re-
'l.1J. What value or values of V J maximize pairs: (1) LCD screen, (2) motherboard, (3)
P [A] = P[Y > 90]? key board, or (4) other. A random broken
laptop needs a type i repair with probabil-
( c) A d ifferent student proposes that the ity Pi = 2 4 - i / 15. Let 1',Ti equal t he number
better exam is the o ne that should of type i broken laptops returned on a day
count and that grades s hould be based in which four laptops are returned.
on M = max(X1, X2). In a fit of gen-
erosity, the professor agrees! No'v 'vhat (a) F ind the joint PMF of Ni, N2, N3, N4.
is P [A] = P[M > 90]? (b) \t\fhat is the probability that two lap-
tops require LCD repairs?
( d) How generous was the professor? In a
class of 100 students, ' vhat is the ex- (c) \t\fhat is t he probability that more lap-
pected increase in the number of A 's tops require motherboard repairs than
a'varded? keyboard repairs?

5.9.10 u nder what conditions on the con- 5.10.2 When ordering a personal com-
stants a, b, c, and d is puter, a customer can add the follo,ving fea-
tures to t he basic configuration: (1) addi-
t ional memory, (2) flat panel display, (3)
professional software, and (4) wireless mo-
a joint Gaussian PDF? dem. A random computer order has fea-
ture i with probability Pi = 2- i indepen-
5.9.11 Show that the joint Gaussian PDF dent of other features. In an hour in 'vhich
f x ,y(x, y) given by Definition 5.10 satisfies three computers are ordered, let Ni equal

1_: 1_: fx ,Y (x, y) d:i;dy = 1.

the number of computers wit h feature i.
(a) F ind the joint PMF

Hint: u se Equation (5.68) and the result of

Problem 4.6. 13.
(b) \tVhat is the probability of selling a
5.9.12 Random variables X1 and X2 are co1nputer 'vith no additional features?
independen t identical Gaussian (0, 1) ran-
dom variables. Let ( c) W hat is the probability of selling a
computer ,\rith at least three addit ional


5.10.3 The random variables X 1, ... , X 11 (a) the PDF of'/= min(X1 , X2 ,Xs),
have the joint PDF (b) the PDF of W = max(X1 ,X2, Xs).
1 0 <Xi < 1; 5.10.8 In a race of 10 sailboats, t he finish-
fx 1,. . . ,X n ( 1;1, . .. , Xn) = i = 1, . .. , 71, , ing t imes of all boats are iid Gaussian ran-
0 otherwise. dom variables with expected value 35 min-
utes and standard deviation 5 minutes.
F ind
(a) What is t he probability that the win-
(a) The joint CDF, Fx 1 , ... ,xn(x1, ... , Xn), ning boat \Vill finish the race in less
(b) P[min(X1,X2,Xs) <3/4]. than 25 minutes?
5.10.4 Are 1'l1, N2, Ns, N4 in Prob- (b) \tVhat is the probability that the last
lem 5.10.l independent? boat w ill cross the finish line in more
than 50 minutes?
5.10.5 In a compressed data file of 10,000
bytes, each byte is equally likely to be any (c) Given this model, vvhat is the proba-
one of 256 possible characters bo , ... , b255 b ility t hat a boat \Vill finish before it
independent of any other byte. If Ni is the starts (negative finishing t ime)?
nu1nber of times bi appears in the file, find
the joint P JVIF of No, ... , N255 Also, \vhat 5.10.9 Random variables X1 , X2 , ... , X n
is t he joint PMF of 1'lo and N 1? are iid; each X j has CDF Fx(:i;) and P DF
f x( :i;). Consider
5.10.6 In Example 5.22, \Ve derived the
joint P JVIF of the the number of pages in L n = min(X1 , ... , X n)
each of four downloads: Un= max(X1 , ... , X n)

Px y z(:i;,1;,z) = 4) - 1 -1 - 1 . In ter1ns of Fx(x) and/or fx(:i;):

'' ' (X y'Z~ 3x2y 6
, )7 z
(a) F ind the CDF Fun(u).
(b) F ind the CDF FLn(l).
(a) In a group of four dow nloads , w hat is
the PlVIF of the number of 3-page doc- (c) F ind the joint CDF FLn,Un(l, 11,).
5.10.10 Suppose you have ri suitcases
(b) In a group of four dow nloads, \vhat is
and suitcase i holds Xi dollars \Vhere
the expected number of 3-page docu-
X 1, X2, ... , X 11 are iid continuous uniform
(0, m) random variables. (Think of a num-
( c) G iven that there are t\vo 3-page doc- ber like one million for the symbol m.) Un-
uments in a group of four, what is the fortunately, you don't know xi until you
joint PMF of the number of 1-page doc- open suitcase i.
uments and t he number of 2-page doc- Suppose you can open t he suitcases one
uments? by one, starting \Vith suitcase n, and going
( d) Given that there are t\vo 3-page doc- down to suitcase 1. After opening suitcase
uments in a group of four, vvhat is the i, you can eit her accept or reject X i dollars.
expected number of 1-page documents? If you accept suitcase i, t he game ends. If
( e) In a group of four do\vnloads, \vhat is you reject, t hen you get to choose only from
the joint P~l[F of t he n11mber of 1-page the still unopened suitcases.
documents and the number of 2-page \i\!hat should you do? Perhaps it is not
documents? so obvious? In fact , you can decide before
the game on a policy, a set of rules to fol-
5.10.7 X1,X2,X3 are iid exponential (.:\) low. We w ill specify a policy by a vector
random variables. Find: (T1, ... , Tn) of threshold parameters.


After opening suitcase i, you accept 5.1 1.1 For random variables X and Y in
the amount X i if X i> T i . Example 5.26, use l\IIATLAB to generate a
Otherwise, you reject suitcase i and list of the form
open suitcase i - 1.
X1 Y1 Px ,Y(x1, Y1)
If you have rejected suitcases n, down X2 Y2 Px ,Y ( x2, Y2)
through 2, then you must accept the
amount X 1 in suitcase 1. Thus the
threshold Ti = 0 s ince you never re-
ject the amount in the last suitcase. that includes all possible pairs (x, y).

(a) Suppose you reject suitcases n, t hrough

i + 1, but then you accept suitcase i. 5.1 1.2 For random variables X and Y
Find E[Xi lX i >Ti] . in Example 5.26, use lVIATLAB to calculate
E [X ), E[Y), the correlat ion E[ XY), and t he
(b) Let Wk denote your re\vard given that
covariance Cov[X, Y).
there are k unopened s11itcases remain-
ing. \i\fhat is E[vV1)?
( c) As a function of Tk, find a recursive re- 5.11.3 You generate random variable
lationship for E[vVk) in terms of Tk and vV = W by typing W=sum(4*randn(1, 2))
E [Wk- 1). in a 1VIATLAB Co1nmand \vindo\v. \i\fhat is
(d) For n, = 4 suitcases, find the policy
(T{, ... , T;), that maximizes E[vV4).
5.11.4 \i\frite trianglecdfplot .m , a
5. 10.11 Given the set {U1 , ... , Un} of iid script that graphs Fx,Y(x, y) of F igure 5.4.
uniform (0, T) random variables, we define

5.11.5 Problem 5.2.6 extended Exam-

ple 5. 3 to a test of ri circuits and identi-
as the kth "smallest" element of the set. fied t he joint PDF of X, the number of ac-
That is, X1 is the minimum ele1nent, X2 ceptable circuits, and Y, the number of suc-
is the second smallest, and so on, up to cessful tests before t he first reject. \i\f rite a
X n, which is t he maximum element of 1VIATLAB function
{U1, ... ,Un}. Note that X1 , ... ,Xn are
kno\vn as t he order statistics of U1 , . . . , Un. [SX,SY,PXY]=circuits(n,p)
Prove that
that generates the sample space grid for the
n, circuit test. Check your ans,ver against
Equation (5.11) for t he p = 0.9 and n, = 2
= { ~!/T" 0 < ::r1 < < Xn < T,
case. For p = 0.9 and n, = 50, calculate the
correlation coefficient p x,Y.

Probability Models of Derived

Random Variables

There are rr1any situations in vvhic11 vve observe one or more r andom variables and
use t heir values t o corr1pute a nevv randorr1 variable. For exarnple, vvhen voltage
across an ro ohrn resistor is a r ar1dom variable X , t he povver dissipat ed in t11at
resistor is Y = X 2 /r0 . Circuit desigr1ers need a probability model for Y t o ev aluate
the power consl1rr1ptior1 of t he circuit. Similarly, if t 11e arr1plitude (current or volt-
age) of a r adio sign al is X , the received signal povver is proportional t o y = X 2 .
A probability rnodel for Y is essential in evaluatir1g the perforrnar1ce of a radio re-
ceiver. T11e ot1t put of a lirr1iter or rectifier is anot11er r ar1dom variable t 11at a circuit
designer rr1ay need t o an alyze.
R adio syst ems also provide practical exarnples of ft1nctions of two randorr1 vari-
ables. For exarr1ple, we can describe t11e arr1plitude of the sigT1al t r ansrnitted by
a r adio station as a randorr1 variable, X. We can describe t he attenuation of t h e
sigr1al as it t r avels to t h e anter1na of a rnoving car as anoth er r andom v ariable,
Y. Ir1 t his case the a mplit ude of t h e signal at t11e r adio r eceiver in t h e car is t h e
randorr1 variable vV = X / Y. Ot 11er practical exarr1ples appear in cellular telephon e
base stations v.rith antennas . T11e arr1plitudes of t11e sigr1als arriving at t he
antennas are rnodeled as r andorn variables X arid y . The radio receiver connected
t o t 11e t vvo ar1tennas can use the received sigr1als in a variet y of ways.
It can choose the sigr1al wit h t 11e larger arnplit ude a nd ignore t11e other one.
Ir1 t 11is case, the receiver produces t 11e r andorn varia ble W = X if IXI > IYI
and vV = y , ot11erwise. This is an exarnple of select'ion, di'versity cornb'iri'irig.
The receiver can add t 11e two signals and use W = X + Y. This process is
referred t o as equal ga'iri cornbin,in,g becat1se it t reat s both signals eqt1a1ly.
A third a lternative is t o corr1bine t 11e signals unequally in order t o give
less v.reight t o the signal considered t o be more dist orted. Ir1 this case W =
aX + b"Y. If a and b are opt irnized , t he receiver perforrns rnax;'irnal ratio


All three corribining processes appear in practical radio receivers.
Forrnally, -vve have tlie follo-vving situations.

vve perforrri an experiment and observe a sarriple value of randorri variable

X . Based on our kr1ovvledge of tlie experiment, have a probability model
for X erribodied in the PNIF Px(:i;) or PDF fx(x). After perforrning tlie
experirrient, vie calculate a sarriple value of the random variable W = g(X).

vVe perforrri ari experirrient and observe a sarriple value of t-vvo r aridom vari-
ables X arid Y. Based on ot1r kriowledge of tlie experirnent, vie have a proba-
bility rnodel for X arid Y ernbodied in a joint PMF Px,Y(x;, y) or ajoirit PDF
fx ,Y(x,y) . After perforrning the experirrient, we calculate a sarriple value of
the randorri variable W = g ( X , y ).

In both cases, the rriatherriatical problern is to deterrrline the properties of '{;Tl .

Previo11s chapters address aspects of this problerri. Theorern 3.9 provides a forrr111la
for Pv.1('w ), tlie PMF of Vf! = g(X) and Theorerri 3.10 provides a forrriula for E [vV)
given Px(x) and g(X). Chapter 4 , on contint1ous random variables, provides , iri
Theorerri 4.4, a forrriula for E[W) given f x(x;) arid g(X) but defers to this chapt er
exarriining the probability model of W. Sirnilarly, Chapter 5 examines E [g(X, Y))
bt1t does not explain how to find the PNIF or PDF of lV = g(X, 'Y ). In this chapter,
-vve develop rnethods to derive tlie distributiori (PMF, CDF or PDF) of a function
of orie or two randorri variables.
Prior chapters ha:ve a lot of new ideas and concepts, each illt1strat ed by a rela-
tively srriall nurriber of exarnples. Iri contrast , t liis chapter has relatively fev.r riew
concepts but rnany exarriples to illustrate the t echniq11es. In particular, Sections 6.2
and 6.3 advocate a single approach: firid tlie CDF F1,v('w) = P [W < w) by finding
those values of X sucli that W = g(X) < w. Sirriilarly, Section 6.4 uses the same
basic idea: Find those values of X , y sucli tliat W = g(X, y ) < w . Wliile tliis idea
is simple, the derivatioris can be cornplicated.

6.1 PM F of a Function of Two Discrete Random Variables

Pw('tJJ), the PMF of a function of discrete randorn variables X and

Y is the surri of the probabilities of all sarnple values ( ;i;, y) for
-vvhich g( ;i;, y) = 'ID.

vVhen X arid Y are discrete randorri variables, 5 1,v, the range of W , is a countable set
corresponding to all possible values of g ( X , y ). Therefore, Wis a discrete randorn
variable and lias a P1!{F P1t\1('ID). '\N"e can a pply Tlieorerri 5 .3 to find Pw('w) =
P[W = 'W). Since {W = 'W} is ariother riame for tlie event {g(X, y ) = tu}, -vve obtairi
P1('w) b}' adding tlie values of Px,Y(x;, y) corresporiding to the x, y pairs for -vvliich
g(x, y) = 'W.

=== Theorem 6.l- -

For discrete ran,dorn variables X arid Y; the derived raridorn variable W = g(X, y )


has PMF
Pw('uJ) = Px,y(:i;,y) .
( :r; , y ) :g ( :r;, y) =w

i::::::::== Exa mple 6 .1

;i; = 40 = 60
A firm sends out two ki nds of newsletters. One kind

l= 1 0.15 0.1 conta ins on ly text and grayscale images and req ui res
l= 2 0.3 0.2 40 cents to print each page. T he other kind co ntains
l= 3 0.15 0.1 color pictures t hat cost 60 cents per page . Newslet-
ters can be 1, 2, or 3 pages long . Let the rando m
variable L represent t he length of a newslet ter in pages . SL = {1, 2, 3}. Let the ran -
dom variable X represent t he cost in cents to pr int each page . Bx = {40, 60}. A f te r
observing many newsletters, t he firm has derived the probabil ity model shown above .
Let W = g(L , X) = LX be the tota l cost in cents of a newsletter. Find the range S 1 "1
and the PM F P 1"1('u1).

PL x(l , x;) ;i; = 40 x = 60 Fo r each of t he six possible com binations of L

l= 1 0.15 0.1 and X, we record T .V = LX u nder the corre-
(\!\1=40) (\!\1= 60)
spond ing ent ry in the P M F table o n the left . T he
l =2 0.3 0.2 range of W is Sw = {40, 60 , 80, 120, 180}. With
( \!\! =80) (l!\1= 120)
l =3 0.15 0.1 the exception of W = 120, there is a u n ique
( l!\1= 120) (l!\1= 180)
pair L , X such that T .V = LX. For W = 120,
Pw(l20) = PL ,x(3 , 40) + PL ,x( 2, 60). T he cor-
responding probabil it ies are reco rded in t he sec-
ond ta ble o n the left .
'UJ 40 60 80 120 180
PM1('UJ) 0.15 0.1 0.3 0.35 0.1

6.2 Functions Y ielding Continuous Random Variables

To obtain the PDF of T.V = g(X) , a continuous function of a coritin-

uous randorri variable, derive trie CDF of VT! and t hen different iate.
The procedure is straightforvvard v.rhen g(y;, y) is a linear f\1nction.
It is rnore complex for other functions.
vVhen X arid W = g(X) are continuot1s randorn variables, we develop a two-step
procedt1re to derive the PDF f'w('w) :
1. Firid the CDF Fw('w) = P [W < 'llJ] .
2. The PDF is the derivative fw(w) = dFH1('U;)/dw .
This procedure ahnays vvorks arid is ea.S}' t o rernerriber. Wrien g(X) is a liriear furic-
tion of X , the rnethod is straightforvvard. Othenvise, as shall see iri exa.rnples ,
finding F w(tv) can be t ricky.


Before proceedir1g to t he exarr1ples and t heorerns, we add one rerninder. It is

easier to calct1late E[g(X)] directly frorn t he PDF 1x(x) t1sing Theorem 4.4 than
it is t o derive t11e PDF of Y = g(X) and t hen use the defir1ition of expect ed value,
Definition 4.4. This section applies t o situations in which it is necessar}' t o find a
cornplete probability rr1odel of W = g(X).

Example 6.2
In Example 4.2, lV centim eters is the locat ion of the pointer o n the 1-meter circum-
ference of the circle. Use the solution of Examp le 4.2 to derive fv,1('1D).

The f unction W = lOOX, where X in Example 4 .2 is the

W= l OOX
location of the pointer measured in met ers. To find the CDF
100 Fw('UJ) = P[W < w]. the f irst step is to translate the event
'ID ( -3J.L 'W) {W < 'W} into an event described by X . Eac h o utcome of
100 '
t he experiment is mapped to an (X , VT! ) pair on t he line W =
lOOX . Thus t he event {W <'ID}, shown with gray highl ight on
t he vertical axis, is the same event as {X < w/100}, which is
:.----+--~-~ X shown w ith gray high light o n the horizontal axis. Bot h of these
100 1 events correspond in t he figure to observing an (X, W) pair
along the highlighted section of the line 'UJ = g(X) = 10010.
This translatio n of the event W = 'llJ to an event described in
terms of X de pends o nly on the functio n g(X) . Specifica lly, it does not depe nd on the
probabil ity model for X . From the f igu re, we see that

Fw(w) = P [W < 'w] = P[lOOX <'ID ] = P [X < 'w/ 100] = Fx('w/ 100) . (6.1)

The calcu lation of Fx('w/100) depe nds on the probabil ity model for X. For this prob-
lem , we recall t hat Examp le 4.2 derives the CDF of X,

0 x; < 0,
Fx(x) = x O<x;< l , (6.2)
1 x; > 1.

From t his result, we ca n use algebra to f ind

0 100 < O, 0 w < 0,
'ID 'U) 'W
Fw('w) = Fx ( 'ID ) = 0< < 1, = 0 < 'W < 100. (6.3)
100 100 - 100 100
- I

1 > 1 1 w > 100.
100 - )

We take the derivative of the CDF of VT! over each of the interva ls to find the PDF:

dF1,v('UJ) 1/ 100 0 <'ID < 100,

f It\! ('ID ) =
d'w 0 otherwise.


We see that T1V is the uniform (0 , 100) random variable.

We llSe this tvvo-step procedure in the following theorem to generalize E xarr1ple 6.2
by derivir1g the CDF a nd PDF for an}' scale cha nge a nd arl}' cor1t inuous randorr1

Theorem 6.2
If VV = aX ; 1JJhere a,> 0, then, Ml has GDF a'n d PDF

F w ('w) = F x ('llJ /a) , f"w(vJ) = la, f"x('w/a) .

Proof First, \Ve find t he CDF of W,

F\!v(vJ) = P [aX < 'liJ] = P [X < '11J/a] = Fx('1D/a) . (6.5)

\'Ve take t he derivative of Fy(y) to find the PDF:

fw ('1D) = dF; ('1D) = 1 f x (vJ/a) . (6.6)

'UJ a

Theorern 6.2 st at es that rr1ltltiplying a r a ndorr1 variable b}' a positive constant

stret ches (a > 1) or s11rinl<:s (a < 1) the original PDF.

Example 6.3
The triangular PDF of X is

2x; 0 < x < 1.

f"x (x;) =
- - I
0 otherwise.

Find the PDF of W = aX. Sketch the PDF of W for a, = 1/2, 1, 2 .

. . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . .. . . . . .
For any a, > 0, we use Theorem 6.2 to find the
~ a=l/2

1 2'WI a,2 0 < 'W <a.

- - I

~a=2 (6.8)
O"'===:.L---l~~--1.~~.....J 0 otherwise.
0 1 2 3
w As a increases, the PDF stretches horizonta lly .

For the farr1ilies of cont inl1ous randorr1 v ariables in Sectior1s 4.5 a nd 4.6 , we can
t1se T11eorern 6.2 to shov.r t11at rr1ultipl}ring a randorr1 variable by a constar1t produces
a nev.r farr1ily rr1err1ber witl1 transforrr1ed pararr1eters.


Theorem 6.3
W = aX, 1JJhere a > 0 .
(a) If X is 'Un,iforrn (b,c), then, Wis 'U'niforrn (ab,ac) .
{b) If X is exprJ'nen,tial (A) , then, Ml is ex;pon,en,tial (.\/a).
(c) If X is Erlarig (n,, .\), then, Wis Erlan,g (n,,.\/a).
{d) If X is Gaussiari (,, <J ), then, vV is Gav,ssian, (a,,a<J) .

T11e next t11eorem shovvs that adding a constant to a random v ariable sirr1ply
sl1ifts the CDF and t he PDF b}' that constar1t.
Theorem 6.4
IfW = X + b,
Fw ('w) = Fx ('w - b) , f"w ('UJ) = f"x (VJ - b) .

Proof F irst, we find t he CDF Fw(1D) = P[X + b < ?D] = P[X < ?lJ - b] = Fx('UJ - b). \?Ve
take t he derivative of Fvv(111) to find t he P DF : fw(1D) = dFw(11J) / d11J = f x('UJ - b).

In contrast to the line<ir trar1sforrnatior1s of Theorerr1 6.2 and Theorem 6.4, t11e
followir1g exarnple is trick}' becat1se g(X) transforms rnore than one valt1e of X to
the same W.
Example 6.4
Suppose Xis the conti nuous un iform ( - 1, 1) ra ndom variable and W = X 2 . Fin d the
CDF Fw('w) and P DF f"11v( 'ID) .

vV Altho ugh X can be negative, vT! is always nonnegative.

T hus F1('w) = 0 for 'W < 0. To find t he CDF Fw(w)
for 'W > 0, t he figure o n the left shows that the event
\ - - ~1
{W <'ID}, marked with gray high light on the ve rtical axis,
is t he same as t he event {- ./W < X < fo} ma rked on
t he horizonta l axis. Both events correspond to (X, W)
pairs o n the high lighted segme nt of t he function lV =
.---r--+----+---t X g(X). T he correspond ing a lgebra is

We can ta ke o ne more step by writing the probability (6.9) as an integra l usin g the
PDF f x( ;i;) :
F w ('w) = P [- v 'w < X < v w J = j-,/W
1x (x) d;i;. (6.10)


So far , we have used no properties of the PDF f x(x) . However, to evaluate the integral
(6.10), we now recall from the problem statement and Definition 4.5 that the PDF of
X is
' I
1/ 2 -
1/4 -l <x<3.
f'x (x) =
- - I
0 otherwise.
- I I I
-- x
-1 3
The integral ( 6.10) is somewhat tricky because the Ii m its depend on the va Iue of VJ.
We first observe that -1 < X < 3 implies 0 <VT! < 9. Thus F 1!\r(vJ) = 0 for VJ < 0,
and F 1,v(vJ) = 1 for 'W > 9. For 0 < 'W < 1,
1/ 2
fi:; 1 v'W
F w ('ID) = j-fl:; 4
- dx; =
. (6.12)

-J'W J'W 3

For 1 < 'ID < 9 ,

1/ 2 ~.
fi:; 1 v'W + 1 .
Fw('ID) = j -1
- dx =
4 4
-~ I I I
- x

- fl:; -1 fl:; 3

By combining the separate pieces, we can write a complete expression for Fw('ID) :

0 'LU < 0>

v1W O< ,w< l.
2 - - I

F \;\!('ID) = (6.14)
fo + 1
o------- l <iv<9.
- - I

0 5 10 'UJ
1 'W > 9.
To find f 1,v(w), we take the derivative of F 1,v('w) over each interval.

We end this section vvith a useful applicatior1 of derived randorr1 variables. The
follovvir1g theorem shovvs hovv to derive sarnple values of randorr1 variables usir1g


the trar1sforrnat ion X = g(U) where U is a llniform ( 0 , 1) random v ariable. In

Section 4.8, '""' e t1sed this techniqt1e witl1 the MATLAB rand function to ger1erate
sarnple values of a ra ndorr1 variable X.

Theorem 6.5
Let U be a 'U'niforrn (0, 1) ra'ndorn variable a'nd let F(x) de'note a c'urn,11,lative distri-
[yution, f11,n,ctior1, 111ith an, in,verse p- I ( v,) defin,ed for 0 < v, < 1. The ran,dorn variable
X = p- 1 (U) has GDF Fx(x) = F(x).

Proof First, we verify t h at p - 1 ( 11,) is a n ondecreasing function. To show t his, suppose

t h at for 'IL> 'IL 1 , x = p - 1 (1L) and x' = p - 1(1L'). In t his case, 'IL= F(x) a nd 1/ = F(x').
Since F(:i;) is n ondecreasing, F(x) > F(x') implies t hat x > x' . He nce, for t he random
variable X = p - I ( U), vve can 'vri te

Fx(x) = P [F- 1 (U) < x] = P [U < F(x) ] = F(x) . (6.16)

vVe observe that the req1.iirerr1en t that Fx('u) have a n inverse for 0 < u < 1 lirnits
the applicability of T11eor ern 6.5. For exarnple, this reqt1irem ent is not rnet by the
rnixed ra ndom varia bles of Section 4. 7. A ger1er alizaton of the theorerr1 that does
hold for rnixed ra ndorr1 variables is given in Problern 6.3.13. The follov.ring examples
demonstrate the utility of T11eorem 6.5.

Example 6.5
U is the un iform (0, 1) random variable and X = g(U). Derive g(U) such that X is
the exponential (1) rando m variable.

T he CDF of X is

0 x < 0,
Fx(:i;) = (6.17)
1 - e-x x > o.

Notethat if v, = Fx(x;) = 1 - e-~r;. then x = - lr1(l - u). T hat is ,Fx 1 (v,) = - ln(l - 'u)
for 0 < v, < 1. T hus, by T heorem 6.5 ,

X = g ( U) = - ln ( 1 - U) (6.18)

is the exponent ial random variab le with parameter >. = 1. Problem 6 .2.7 asks the
reader to derive the PD F of X = - ln(l - U) directly from fi rst pri nciples.
- - - Example 6.6,___
For a un iform (0, 1) random variable U, find a function g(-) such that X = g(U) has
a un iform (a, b) distribution .

T he CD F of X is

0 <a, ;i;

Fx (x) = (x - a)/(b - a) a< ;i; < b, (6.19)

ll x > b.


For any u satisfying 0 < v, < 1 , v, = Fx(x) = (1'; - a)/(b - a) if and o nl y if

x = Fx 1 (v,) = a + (b - a)v,. (6.20)
Th us by Theo rem 6.5, X = a + (b - a)U is a uniform (a, b) random variable . Note
that we cou ld have reached the sa me conclusion by observing th at Theorem 6.3 im plies
(b- a)U has a un iform (O,b - a) distribution and that Theorem 6.4 impliesa + (b- a )U
has a un iform (a, (b - a) + o,) distrib ution. Another a pp roach, ta ke n in Problem 6.2.11,
is to derive the CDF a nd PDF of a+ (b - a)U.

The techniql1e of Theorem 6.5 is p art icl1larly useful vvhen t 11e CDF is an easil}'
invert ible function. Unfortunatel}', t 11er e are m any r andorr1 variables, including
G aussian and Erlang, in vvhich t he CDF and its irrverse are difficult to corr1pute. In
these cases, \rve need t o develop ot her m ethods for t ransforrr1ir1g sarnple values of a
uniform randorr1 vaiable to sample \rallies of a r andom variable of ir1ter est.

Quiz 6.2
X is an exponer1t ial (;\) PDF. Sl1ow t hat y = v'X.
is a R a}rleigh randorr1 variable
(see Appendix A.2) . Express t 11e R ayleigh p ar arnet er a in t errr1s of t he exponer1t ial
pararnet er ;\.

6.3 Functions Y ielding Discrete or Mixed Random Variables

A hard lirnit er electronic circuit has t wo possible output \roltages.

If the input voltage is a sample value of a continuous r andorn \rar-
iab le, the output voltage is a sample value of a discrete ra r1dom
variable. The outpl1t of a soft limiter circuit is a sarr1ple vall1e of a
mixed random variable. The probability rr1odels of the lirniters de-
pend on t he proba,bility rnodel of the input and on t he t \rvOlimiting
In Sectior1 6.2 , our exarr1ples and theorerns relate t o a continl1ous r ar1dom variable
deri\red from two continl1ous r andorn variables. By contrast, in the follo,r..ring ex-
arnple, the function g (X) t ransforrr1s a cor1t int1ous randorr1 variable to a discret e
randorr1 \rariable.
Example 6. 7----===
Let X be a random variab le with CDF Fx( x) . Let y be the output of a clipping circuit ,
also referred to as a hard limiter, with the characteristic y = g(X ) where

g(x) 1 x < 0,
2 g(x) = (6.21 )
3 x > 0.
-5 0 5 .T,


Express Fy(y) and j'y(y) in t erms of Fx(x;) a nd fx(x;).

Before going deeply into t he math, it is helpfu l to th in k about the nature of t he derived
random variable y . T he defin it ion of g(x;) tells us that Y has on ly two possible va lues,
Y = 1 and y = 3. T hus y is a discrete ra ndo m variable . Furt hermore, t he CDF,
Fy(y), has j umps at y = 1 a nd y = 3; it is zero for y < 1 a nd it is one for y > 3. Ou r
job is to find the heights of t he jumps at y = 1 a nd y = 3. In particu lar,

F y ( 1) = P [Y < 1) = P [X < 0) = F x ( 0) . (6 .22)

Fy(y) 0 y < 1,
Fx(O) Fy(y) = Fx(O) 1 < y < 3, (6.23)
0 1 y > 3.
0 I 2 3 4 y
The PDF co nsists of impu lses at y = 1 and y = 3. T he weights of the impulses a re the
sizes of the two jumps in the CDF: Fx(O) and 1 - Fx(O), res pectively.

fy(y) ' . ..
j'y (y) = Fx (0) 6(y - 1) + [1 - Fx (0))6(y - 3).
0 I 2 3 4 y

The follov.ring example contains a function that transforrns cor1t inuous rar1dorn
variables to a rr1ixed rar1dorn variable.

c:::::= Example 6. 8
The output voltage of a microphone is a Gaussian rando m variab le V with expected
va lue 11,v = 0 a nd standard deviation o-v = 5 V. T he microp hone s ignal is the input
to a soft lim iter circuit with cutoff value 10 V. The rando m variab le vT! is the output
of the lim iter:

(- 10 v< - 10,
W =g ('V) = { V - 10 < v < 10, (6.24)
l 10 v > 10.

What are the CDF a nd PDF of vV?

To fi nd the CDF , we need to fin d F11v(w) = P[W < 'UJ] for a ll val ues of 'UJ. The key is
that all possible pairs ('V, W) satisfy lV = g('V). This implies each 'W belongs to one
of three cases:


w w w

(a) 'W < -10 (b)-10 < 'W < 10 (c) 'w>lO
(a) 'W < -10: From the function W = g(V) we see that no possible pairs CV, l!\l)
satisfy vl! <'ID< - 10. Hence F11v('IJJ) = P (W <tu] = 0 in this case. This is
perhaps a roundabout way of observing that vV = - 10 is the minumum possible
(b) -10 < 'ID < 10: In this case we see that the event {vV < 'W } , marked in gray
on the vertical axis, corresponds to the event {V < 'W }, marked in gray on the
horizo nta l axis. The corresponding C V, W) pairs are shown in the highlighted
segment of the function W = g(V). In this case , Fw(vJ) = P [vl! < 'W] =
P('V < 'w] = Fv('w) .
(c) 'W > 10: Here we see that the event {W < 'W} corresponds to all values of v
and P(W < 'UJ ] = P [V < oo] = 1. This is another way of saying W = 10 is the
maximum vll .
We combine these separate cases in the CDF

0 'UJ < - 10,

FH1('1D) = P [vV < w] = Fv('w) - 10 < vJ < 10 , (6.25)
1 'U) > 10.

These conclusions are based solely on the structure of the limiter functiong (V) without
regard for the probabi lity model of V . Now we observe that because V is Gaussian
(0, 5) , Theorem 4.14 states that Fv(v) = <P(v/5). Therefore,

0 'UJ < - 10,

Flt\1 (VJ) = iJ?('ID/5) -10 < 'UJ < 10 , (6.26)
1 'UJ > 10.
Note that the CDF jumps from 0 to <J?( - 10/ 5) = 0.023 at 'ID = -10 and that it jumps
from <J? (l0/ 5) = 0.977 to 1 at 'ID = 10. Therefore ,

0.023o(tv + 10) 'UJ = -10,

1 e - 'W 2 /50 -10 <'ID< 10 ,
dF11v (w)
f"w ('w) = -- 5J2; (6.27)
0.023o('w - 10) 'UJ = 10,
0 otherwise.


Quiz 6.3
Rar1dorn variable X is p assed t o a h ard lirniter that outpt1ts y . The P DF of X and
the limiter Ol1tpt1t y are

fx (x) =
1 - x/2 O <x;
< 2'
- y =
x x < 1,
0 otherwise, 1 x > 1.

(a) W h at is the CDF Fx (x;)? (b) '\i\111at is P['Y = 1)?

(c) vVhat is Fy(y)? (d) '\i\111at is f y(y)?

6.4 Continuous Functions of Two Continuous Random Variables

To obtain the PDF of W = g(X, Y) , a contint1ous function of

two continuous ra.ndorn variables, derive the CDF of W and t 11en
d ifferentiat e . The procedl1re is straight forv.rard when g(x, y) is a
linear function. It is rnore cornplex for other ft1nctions.

At the start of t11is chapter , vve described three vvays radio receivers can t1se signals
frorn t wo ar1tennas. T11ese techniques are exarnples of t 11e follovving sit uat ion. We
perforrn an experirr1er1t and obser ve sample va.lt1es of t vvo randorr1 variables X and
Y. After perforrning t h e exper irnent, we calcltlate a sarr1ple value of t he r andom
variable vV = g( X , Y). B ased on our knowled ge of the experirnent, \Ve 11ave a
probabilit}' rnodel for X a nd Y ernbodied in a joint PMF Px ,y(x;, y) or a joir1t PDF
fx ,y(x,y) .
In t11is section, we present rnethods for deri,ring a probability rnodel for W. '\i\Th er1
X and y are continuous r andorn \rariables and g(x;, y) is a continl1ous function ,
W = g(X, Y) is a cont ir1t1ous r andom variable. To find t he P DF , f'vv(vJ), it is
usu ally helpful t o first find t he CDF Fw('w) and t h en calcu lat e t11e derivat i,re .
Vie,vir1g {lV < 'W} as an e\rent A , "''e can apply T heorern 5. 7.

~--- Theorem 6.6

For coritiriv,o'u.s raridorn variables X an,d Y ; t he GDF of W = g(X , Y) is

Fw(w) = P [W < w] = jj f x ,y (x,y) dxdy .

q( ~r;, y ) <~lJ

Theorern 6.6 is an a logous to ot1r a pproach in Sections 6.2 and 6.3 for ft1n ctions
W = g(X). There "''e t1sed t he fur1ct ion g(X ) to t ranslat e t he event { W < 'W} ir1to
an event { g(X) <VJ} t h a t \vas a subset of the X -axis. vve then calcl1lat ed Fw(w)
by integratir1g f x (x) over that subset.


In Theorerr1 6.6, we t ra nslate the ever1t {g( X , Y) < 'W} into a region of t he X , Y
plar1e. Ir1tegra ting the joint PDF f x,y(x, y) O\rer t h at region "'' ill y ield the CDF
F1('w) . Once "''e obtain F11v('w), it is generall}' str aigh t forward to calct1late t h e
derivati,re f w('w) = dFw('U;)/d'UJ.\rer, for rnost funct ior1s g(x,y), perforrning
the integr ation to find Fw('w) can be a tedious process. Fortur1ately, t11ere a re
convenien t tecl1niq11es for fir1ding f11v('UJ) for certair1 functions that arise in rnany
applications . Sectior1 6.5 and Chapter 9 consider the function, g(X, Y) = X + Y.
The follo,ving theorem addresses W = rnax(X, 'Y ) , t11e m axirr1um of t vvo r a ndom
variables. I t follO\VS frorn t h e fact t11at {rr1ax(X, 'Y ) < 'W} = {X <'ID} n {Y < 'UJ}.

--==- Theorem 6. 7
For crJ'nt'i'Tl/IJ,OUS ran,dorn variables X an,d Y; t he GDF of W = m ax(X, Y) is

Example 6.9
In Examples 5.7 and 5.9, X a nd Y have joint PDF

1/ 15 O <x;<5, 0 <y<3,
fx ,y(:r;,y) = (6.29)
0 otherwise.

Fi nd the PDF of W = rr1ax(X, 'Y ).

Because X > 0 and Y > 0 , W > 0. Therefore, F1t11(w) = 0 for 'ID < 0. Because X < 5
and Y < 3, W < 5. Thus F\t\1('w) = 1 for 'ID > 5. For 0 < 'W < 5, diagrams showi ng
the regions of integration provide a guide to calculating Fw(tv) . Two cases, 0 < 'W < 3
and 3 < 'W < 5, have to be considered se parately. When 0 < 'ID < 3, Theore m 6. 7

w 'W 1 ~1) 1
Fw('w) =
1 0 0
dxdy = w 2 / 15. (6.30)

Because t he joint PDF is uniform , we see t his probability is the area 'W 2 t im es the
va lue of the joint PDF over that area. When 3 < 'W < 5, t he integral over the regio n
{ X < 'W , Y < 'W} is

F w ('w) = 'W(13 - 1 dy ) d:i; = 1'W;1 dx; = 'W / 5, (6. 31 )
. 10 0 15 0 v


which is the area 3'w ti mes the value of the joint PDF over that area. Combining the
parts , we can write the joint CDF:
0 'UJ < 0,
0.5 2
'UJ / 15 0 <vJ <3,
Fw('w) = (6.32)
'UJ/ 5 3< 'w<5,
() 2 4- 6 'UJ w > 5. 1
By taki ng t he derivative, we fi nd the correspond ing joint P DF:
0.4- , - - - - - - - - - ,

Fw('t1J) ( 2't1J/ l 5 0 <'ID < 3. - - I

fM1(w) = 11/5 3< 'w<5, (6.33)

0 2 4- 6 'ID
lo otherwise.

In the follovving example, lV is the quotient of tvvo positive numbers.

Example 6.10
X and y have the joint PDF
AjJ,e-(>.x+Jl,'.IJ) X > 0, y > 0,
0 otherwise.

Fi nd the PDF of W = Y/ X.
First we fi nd the CDF:

F w ('w) = P ["Y/ X < 'ID] = P [Y < v; X) . (6.35)

For 'W < 0, Fw('w) = 0. For 'W > 0, we integrate t he jo int PDF f'x ,Y(x , y) over t he
region of the X , Y plane for wh ich Y < 'WX, X > 0, and Y > 0 as shown:

y P [Y < wX] =lo=(lo""f xy(x, y) dy) dx

Y wX
=lo=.\e- (lo"'" dy) dx 1
"" 1w- 1'Y

=lo=.\e- (1 - dx 1
"" e-:wx)

= 1- (6.36)
>.. + ,'llJ
0 'ID < 0,
F\;\1 ( 'IJJ) = >.. (6.37)
1- w > o.
A+ ,VJ


Differentiating with respect to VJ, we obta in

'ID > 0 ,
f'w ('ID ) = (6.38)

Quiz 6.4

(A ) A sm artphon e ruris a, a pplication that dovvnloads Internet n ews every 15

mirit1tes. At the st a r t of a download , t h e radio rnodem s negotiat e a corinectiori
sp eed that d epends o ri t h e radio ch a nnel q11alit:y. vVh eri the riegotia t ed sp eed
is lov.r, t h e sm a rtphon e redt1ces the a rnount of n evvs t h a t it t ra n sfers t o avoid
vvasting its battery. Tlie nt1mber of kilobyt es tra nsmitted, L , a nd the sp eed
Bin kb/s, h av e the joint PMF

PL ,B(l, b) b = 512 b = 1, 024 b = 2, 048

l = 256 0.2 0.1 0.05
l = 768 0.05 0.1 0.2
l = 1536 0 0.1 0.2

Let T den ote t lie nurnber of seconds rieed ed for t h e tra nsfer . Express T as a
ft1n ction of L and B. W liat is t h e P 1!{F of T ?
(B) Find t h e C D F a rid the PDF of 'VT! = XY vvh en ra ndorn variables X a rid Y
have joint PDF

1 O <::r;< l , O <y< l ,
0 oth ervvise.

6.5 PDF of the Sum of Two Random Variables

The PDF of the s11m of t indep erident contiriuous ra ndorri v ari-
ables X a nd Y is the convolt1tion of the PDF of X and t lie P D F of
Y. The P1!{F of the s11rri of independent integer-v alued randorn
v a riables is tlie discrete corrvolution of t h e two PMFs .

y We novv ex a rriine the s t1rri W = X + y of tvvo contin11ous

r a ndom variables . As we see in Theorem 6.6 , t h e PDF of W
w X+Y< vv d epe nds on t h e joint PDF f x,y(::r;, y) . In p articula r, in the
proof of tlie n ext t h eor erri, firid the PDF of W using the
X t vvo-s t ep procedure in v.rliich vve first firid t h e CD F F w( VJ) by
_..__ _ _ integrating the joint PDF f x, Y (x, y) over t lie regiori X + Y <
w 'W, as sho,;vri.


- - - - Theorem 6.8
The PDF of W = X +Y is

f w(w) = 1: f xy (x,w - x) dx = 1: f x ,y(w - y, y) dy.

Fw(w) = P [X + Y < w] = J: (J':' Jx,Y(x, y) dy) dx. (6.40)

T aking t he derivative of t he CDF to find t he PDF, vve have

. (11J)
fvv = dFvv
i (11;)
= Joo (-dd (Jw-xfx ,y(x , y) dy )) dx
_ 71J _
00 00

= J- oo

fx ,y(1;, 11; - x) d1;. (6.41)

By m aking t he substit u t ion y = 11; - x, we obtain

f vv(11J) = J: fx ,y(11; -y, y) dy. (6.42)

- = Example 6.11
Find the PDF of W = X +Y when X and Y have the jo int PDF

2 O <y< l , O <:i;< l , x + y<l ,

f x,Y(:i;,y) = (6.43)
0 otherwise.

The PDF of W = X + Y can be found using Th eorem 6.8.
I The possible values of X , Y are in the shaded triangular region
) ' - \\.'-.\' where 0 < X+Y = W < 1. Thusfv.;(w) = 0 for vJ < 0 or
'ID > 1. For 0 < 'W < 1, applying Theorem 6.8 yields

f w (VJ) = J 2 dx = 2vJ, (6.44)
)I' I 0

The complete expression for the PDF of l;\/ is

2'W Q < 'UJ < 1,

f w ('w) = (6.45)
0 otherwise.

When X arid Y are independer1t, the joir1t PDF of X and y is the product of
the rr1arginal PDFs 1x,y(x , y) = f x(x) j'y(y) . Applying T11eorerr1 6.8 t o t his special
case, -vve obtain the follow ir1g theorern.


==;;;; Theorem 6.91........;==

When, X arid Y are in,depen,den,t ran,dorn variables7 the PDF of W = X + Y is

f w (w) = 1: f x (w - y) f y (y) dy = 1: f x ( x) f y ( w - x) dx .

In Theorem 6.9 , '""'e corr1bir1e ur1ivariate functions, fx (-) and j'y( ), ir1 order t o
produce a t hird ft1nction, f'w() . The combinatior1 in T11eorerr1 6.9 , referred to as a
con,volution,, arises in m ar1:yr brar1ches of applied rnat11ernatics.
W hen X arid y are indeper1dent ir1teger-valued discret e randorn variables, the
PMF of W = X + Y is a corrvolut ion (see Problerr1 6.5.1).

Pw ('w) = ~ Px ( k) Py ('llJ - k) . (6.46)

k = -oo

You may have encountered convoll1tions alread:yr ir1 stt1dying lir1ear syst erns. Sorne-
times, we t1se t he notation f w(w) = f x(x) * fy(y) to denote corrvolution.

Quiz 6.5
Let X and Y be indeper1dent exponential r ar1dom variables wit11 expected v alues
E[X) = 1/3 and E[Y) = 1/ 2. Find the PDF of W = X + Y.

6.6 1\1.IATLAB

Theorem 6.5 and the rand function can be ernployed to generate

sample values of continuous randorr1 variables.

Example 6.12
Use Exam ple 6 .5 to write a J\IIATLAB program t hat generates m, samples of an expo-
nential (.\) random variable.

function x=exponentialrv(larnbda,rn) In Example 6 .5, we fou nd that if U is a

x=-(1/larnbda)*log(1-rand(rn,1)); un iform (0, 1) random variable, then Y =
- lr1(l - U) is t he exponentia I (1) random
variab le. By T heorem 6.3(b) , X = y ; ,A is an expo nential (.A) random variable.

- - Example 6.13
Use Example 6.6 to w r ite a J\IIATLAB functio n that generates 'IT/, samples of a uniform
(a , b) random variable.

function x=uniforrnrv(a,b,rn) Example 6.6 says t hat Y = a, + (b - a)U is a uniform

x=a+(b-a)*rand(rn,1); (a, b) ra ndom va riable . We use this in uniformrv.

6.6 MATLAB 235

function x=erlangrv(n,lambda,m) Theorerr1 9. 9 will dernonstrate that the sum of
y=exponentialrv(lambda,rn*n); n, independent exponential(;\) randorn vari-
x=sum(reshape(y,m,n),2); ables is an Erlar1g randorn variable. The func-
t ior1 erlangrv generates rn, sarnple values of
the Erlang (n,, ;\) randorn variable. Note that -vve first generate rirn, exponent ial
randorr1 variables . The reshape function arranges these sarr1ples in an rn, x n, array.
St1rr1rning across the rO\AJS }' ields m Erlang sarnples.
function x=icdfrv(icdfhandle,m) Fina.11}', for a randorn variable X vvith ar1 arbi-
%Usage: x=icdfrv(icdf ,m) t r ar}' CDF Fx(x;), vve implernent t he function
%returns m samples of rv X icdfrv .m, vvhich uses Theorem 6.5 for gener-
%wi th inverse CDF icdf. m atir1g r a ndorn sarr1ples. T h e ke}' is to define
u=rand(m,1); a l\IIA.TLAB fur1ctior1 x=icdfx(u) t11at calcu-
x=feval(icdfhandle,u); lates x = Fx 1 (v,). T he function icdfx(u) is
then passed as an argument to icdfrv. m \vhich gen er ates samples of X. Note
that MATLAB passes a function as an argurr1ent to another fur1ction using a func-
tion haridle, vvhich is a kir1d of pointer. The following exarr1ple s11ovvs ho\v to use
icdfrv .m.

Example 6.14
Write a l\IIATLAB function t hat uses icdfrv .m to generate samples ofY , the maximum
of three pointer spins, in Example 4.5.

function y = icdf3spin(u); From Equation (4 .18) , we see that for 0 < y < 1,
y=u.-(1/3); Fy(y) = y 3 . If u = Fy(y) = y 3 , then y = Fy 1 (v,) =
'u, 113 . So we define (and save to disk) icdf3spin.m.
Now, the function ca ll y=icdfrv(icdf3spin, 1000) generates a vector holding 1000
samples of random variab le Y. The notation icdf3spin is the f unct ion handle for
the function icdf 3spin. m.

K eep in mind that for the l\IIATLAB code to rur1 quickly, it is best for the inverse
CDF function ( icdf 3spin. m ir1 the case of the last example) t o process the vector
u vvithot1t t1sing a for loop to find t h e ir1verse CDF for each elern ent u(i). \'Ve
also r1ote that t11is sarne technique car1 be extended to cases \vhere the inverse CDF
F x 1 ( v,) does not exist for all 0 < 'n < 1. For exarnple, t he in\rerse CDF does not
exist if X is a rnixed randorn \rariable or if f'x(x) is constant over an interval (a, b).
Ho\v to use icdfrv. m in these cases is addressed in Problerr1s 6.3.13 and 6.6.4.

Quiz 6.6
Write a l\IIATLAB ft1nction V=V sample (m) t hat ret11rns m sarr1ple of rar1dom variable
v \vith PDF
(v+5)/72 -5<
- v <
- 72 )
f'v (v) = (6.47)
0 other\vise.


Difficulty: Easy Moderate D ifficu lt t Experts Only

6.1 .1 Random variables X and Y have 6.2.3 In a 50 km Tour d e Fl.'ance t ime

joint PMF t ria l, a rider's t ime T, measured in min-
utes, is t he continuous uniform (60, 75) ran-
Ix+ Yl/ 14 x = -2, 0, 2; dom variable. Let V = 3000 /T d enote t he
Px,Y(x,y) = y = -1,0, 1, rider's speed over t he course in km/hr. Find
0 t he PDF of V.

6.2.4 In t he presence of a head,vind of nor-

F ind the PlVIF of W = X - Y
malized intensity W, yo ur speed on yo ur
6.1 .2 For random variables X and Y in bike is V = g(W) = 20 - lOvV 113 mi/ hr.
Problem 6.1.1, find t he I>MF of vV = X + The 'vind intensity vV is t he continuous uni-
2Y. form (-1, 1) random variable. (Note: If W
is negative, t hen the head,vind is actually a
6.1 .3 N is a binomial (n, = 100,p = 0.4)
tail,vind.) Find t he PDF fv(v).
random variable. M is a binomial ( n =
50,p = 0.4) random variable. Given t hat J\lf 6.2.5 If X has an exponent ial (.A) PDF,
and N are independen t, what is t he P l\/IF what is t he PDF of W = X 2 ?
of L = JV! + J\T?
6.2.6 Let X denote t he position of t he
6.1 .4 Let X and Y be d iscrete random pointer after a spin on a wheel of circumfer-
variables wit h joint P l\/IF Px,y(x, y) t hat is ence 1. For t hat sa1ne spin, let Y d enote t he
zero except 'vhen x and y are integers. Let area wit hin t he arc defined by t he stopping
W = X + Y and sho'v t hat t he PlVIF of vV position of t he pointer:

Pw (1D) = L Px,Y (x, 'UJ - x) .

x = - oo

6.1 .5 Let X and Y be d iscrete random

variables wit h joint P ~l[F

0.01 x= l ,2 ... , 10,

Px,Y(x,y)= y=l,2 ... , 10,
(a) What is t he relationship between X
\i\fhat is t he l")MF of W = min (X, Y)? and Y?
6.1 .6 For random variables X and Y in (b) \t\f hat isFy(y)?
Problem 6.1.5, 'vhat is t he PMF of V = ( c) \t\f hat is fy(y)?
max(X, Y)?
(d ) \t\f hat is E[Y)?
6.2.1 The voltage X across a 1 n resistor
is a uniform random variable wit h param-
eters 0 and 1. The instantaneous po,ver is 6.2.7 U is t he unifor1n (0, 1) random var-
iable and X = - ln(l - U) .
Y = X 2 . Find t he CDF Fy(y) and t he PDF
fy(y) of Y. (a) \t\fhat is Fx(x )?
6.2.2 Xis t he Gaussian (0 , 1) rando1n var- (b) \t\f hat is fx(x)?
iable. Find t he CDF of Y = IXI a nd its ( c) \t\f hat is E[X)?
expected value E[Y).

6.2.8 X is t he uniform (0, 1) random var- 6.3.1 X has CD F
iable. F ind a function g(::i;) such t hat t he
PDF of Y = g(X) is 0 x < -1,
::i;/3 + 1/ 3 -1 < x < 0,
Fx(x) =
0 < y < 1, ::i;/3 + 2/ 3 0 < x < 1,
otherwise. 1 1 < x.

Y = g(X) wher e
6.2.9 An amplifier circuit has power con-
sumption Y t hat grows nonlinearly \vith t he
input signal voltage X. \i\fhen t he input sig-
g(X) = { ~00 x
< 0,
> o.
nal is X volts, t he instan taneous power con-
sumed by t he amplifier is Y = 20 + 15X2 (a) \t\fhat is Fy(y)?
\i\fatts . The input signal X is t he con t inu-
(b) \iVhat isfy(y)?
ous uniform (-1, 1) random variable. F ind
t he PDF fy(y). ( c) W hat is E[Y] ?

6.3.2 In a 50 km cycling t ime t rial , a

6.2.10 Use Theorem 6.2 t o prove Theo- 1
rider's exact t ime 'J m easured in minutes,

r em 6.3. is t he continuous uniform (50, 60) random

variable. Hovvever, a rider 's recorded t ime
R in seconds is obtained by rounding up T
to next \vhole second. That is, if T is 50
6.2.11 For t he uniform (0, 1) random var-
minutes, 27.001 seconds , t hen R = 3028
iable U, find t h e CDF and PDF of Y =
a+ (b-a)U \Vit h a< b. Show t hat Y is t he seconds. On t he other hand, if T is ex-
uniform (a, b) random variable. actly 50 minutes 27 seconds, t hen R = 3027.
\tVh at is t he PMF of R ?
6.3.3 The voltage Vat t he output of a mi-
6.2.12 Theorem 6.5 required t he inverse
crophone is t he continuous uniform (- 1, 1)
CDF p - 1(11,) to exist for 0 < 11, < 1. \i\f hy
random variable. The microphone voltage
\Vas it not necessary t hat p - l (11,) exist at
is processed by a clipping rectifier wit h out-
eit her 11, = 0 or 11, = 1?

6.2 . 13
X is a con t inuous random variable.
L= { IVI IVI < 0.5,
Y = aX + b, where a, b f:. 0. Prove t hat 0.5 otherwise.

. ( ) _ fx ( (y - b) / a)
jY y - lal . (a) \t\fhat is P [L = 0.5]?
(b) \i\fhat isFL(l)?
H int : Consider t he cases a < 0 a nd a >0
separately. ( c) \t\f hat is E[L]?

6.3.4 U is t he uniform r andom variable

6.2.14 Let con t inuous random variable X wit h parameters 0 and 2. The ra ndom var-
have a CDF F(x) such t hat p - 1(11,) exists iable W is t he output of t he clipper:
for all 7L in [O, l ]. Sho\v t hat U = F(X) is
t he uniform (0, 1) r andom variable . Hint : vV = (U) = { U U < 1'
U is a random variable s uch t hat \Vhen g 1 u > 1.
X = x', U = F(::i;' ) . That is, we evalu-
ate t he CDF of X at t he observed value of Find t he CDF Fw(1D), t he l")DF fw(1D), and
x. t he expected value E[W].


6.3.5 X is a r ando m var ia ble \Vit h CDF by

Fx(x) . Let Y = g(X) where
v < 0,
g (::i,') -- { 10 x <. 0' w = v 0 < v < 10,
= 10 x > o. 10 v > 10.
Express Fy(y) in terms of Fx(x) . Su ppose t he input V is t he con t inuous uni-
form (- 15, 15) ra ndom variable . F ind t he
6.3.6 Suppose t hat a cellular phone costs
PDF of W .
$30 per m on t h w it h 300 min.utes of use in-
cluded and t hat each ad d it ional minute of 6.3.10 T he current X across a r esistor is
use costs $0.50. The number of minutes you t he con tinuous uniform (-2, 2) r ando1n var-
use t he p h one in a m on t h is a n exp onen- iable. The power dissipated in t he resistor
tial random variable T wit h wit h expected is Y = 9X 2 \iVatts.
value E[T ] = 200 m inutes . The telephone
company charges you for exactly ho\v m any (a) F ind t he CD F a nd P DF of Y.
minutes yo u use wit hou t a .n y rounding of (b) A power measurement circuit is r ange-
fractional minutes. Let C d enote t he cost limited so t hat its ou t put is
in d ollars of one mon t h of service.
(a) W hat is P [C = 30]? w= {y y< 16,
16 oth er vvise.
(b) W hat is t he PDF of C?
( c) \i\fhat is E [C]? F ind t he P DF of W.

6.3.7 The input vo ltage to a r ectifier is

6.3.11 A d efective volt meter meas u res
t he con t inuous uniform (0 , 1) r andom var-
s mall voltages as zero . In part icular, when
iable U . The r ectifier ou t put is a r andom
t he input voltage is V, t he m easured volt -
variable W d efined by
age is

W=g(U)= {~ u < 0,
u > o. IVI < 0.6,
F in d t he C DF Fvv( v;) a nd t he exp ected
If V is t he con t inuous uniform (-5, 5) r an-
value E [W ].
d om variable, 'vhat is t he PDF of W ?
6.3.8 R andom variable X has P DF
6.3.12 Xis t he con t inuous uniform ( - 3, 3)
. ( ) -_ {x/2 0 < x < 2, random variab le . \i\fhen X is p assed
f xx t hrough a limiter , t he out put is t he discrete
0 other\vise.
random variable
X is p rocessed b y a clip ping circuit w it h
outpu t X = (X ) =
X <0
X > O
x < 1,
x > 1. where c is an unspecified posit ive constan t .
(a) \!\That is t he P~1IF P .x( x) of X?
(a) W hat is P [Y = 0.5]? (b ) \tVhe.n t he limiter input is X, t he d is-
(b) F ind t he C D F Fy(y) . t ort ion D bet,ve~n t he input X and t he
limiter outpu t X is
6.3.9 G iven a n input voltage V, t he ou t-
put voltage of a half-wave r ectifier is given D = d( X ) = (X - g(X )) 2 .


In terms of c, find t he expected distor- 6.4.2 For random variables X and Y in

t ion E[D] = E [d(X)]. \i\fhat value of c Problem 6.4.1, find t he CDF and PDF of
minimizes E [D]? W = min (X, Y).
(c) Y is a G aussian random variable v;,rit h 6.4.3 X and Y have joint PDF
t he same expected valu e and variance
as X. \i\fhat is t he PDF of Y? 2 x> O,y> O,:i; +y< l ,
f X .Y ( X 'I/ ) =
, '' {0 otherwise.
(d) Suppose Y is passed tl1rough t he lim-
iter yielding t he output Y = g(Y). T he
distortion D betv;,reen t he input Y and
(a) Are X and Y independent?
t he limiter output Y is (b) Let U = min(X, Y) . F ind t he CDF
D = d(Y) = (Y- g(Y)) 2 . and P D F of U .
( c) Let V = max (X, Y). F ind t h e CDF
In terms of c, find t he expected distor- a nd PDF of V.
t ion E[D] = E [d(Y) ]. \i\fhat value of c
minimizes E [D]? 6.4.4 Random variables X and Y have
joint PDF
6.3.13 In t his problem \Ve prove a gener-
alization of T heor em 6.5. Given a r andom . (X, Y) x+y
{Q O <:i;, y<l,
fx,Y = otherwise.
variable X wit h CDF Fx(x), define

F(11) = min {x lFx(x) >11,}. Let vV = max(X, Y).

(a) \t\fhat is Svv, t he r ange of vV?
This problem proves t hat for a continu-
(b) F ind Fw(11J) and f w(11J) .
o_us uniform (0, 1) random variable U, X =
F(U) has CDF F x(x) = Fx(x) . 6.4.5 Random variables X and Y have
(a) Sho\v t hat w hen Fx(x) is a continu- joint PDF
ous, strictly increasing function (i.e. , X
is no t m ixed , Fx(x) has no j ump dis- 6y O<y<x<l ,
fx ,Y(x,y) = {
continuities, and Fx(x) h as no "flat" 0 otherwise.
in tervals (a,b) 'vhere Fx(x) = c for
- - 1
a, < x < b), t hen F( 'IL) = F x ('IJ,) for
Let vV = Y- X.
0 < 'IJ, < 1. (a) \t\fhat is Svv, t he r ange of vV?
(b) S how t ha! if Fx(:i;) has a jump at 1; = (b) F ind Fw(11J) and fw(11J).
xo, t hen F( 'IJ,) = :i;o for all 11, in t he in- 6.4.6 Random variables X and Y have
terval joint PDF

f x,Y (x, JJ) = { ~ otherwise.
( c) Prove t hat X = F(U) has CDF
Fx(x) = Fx(:i;). Let vV = Y / X.
6.4.1 Random variables X and Y have (a) W hat is Svv, t he r ange of vV?
joint PDF (b) F ind F w( 11;), f w( 'llJ), and E [W].
6.4. 7 Random variables X and Y have
f x ,y(:i;, y) = {6 0
X'lj 2
joint PDF

Let V = m ax(X, Y). Find t he C D F a nd Jx,v(x, y) = { ~ O <y<:i; <l ,

PDF of V.


Let W = X / Y. (d ) The t ime unt il t he first t r a in (express

(a) W hat is Sw, t he range of W? or local) rea ches fina l stop is T =
min(X + 5, Y + 15). F ind fr(t).
(b) F ind F,v(ru;), f w('UJ), a nd E[W] .
(e) Suppose t he local t rain does arrive first
6.4.8 In a s imple model of a cel lular at your platform. Should you board
telep hone system , a portable telep hone is t he local t rain? J ustif'.y your ans,ver.
equally likely to be found a ny,vh er e in a (There m ay be more t han one corr ect
cir cula r cell of radius 4 km. (See Prob- ans,ver. )
lem 5.5.4.) F ind t he CD F FR(r) and P D F
f n(r) of R , t he d istance (in km) between 6.4.13 For a constant a > 0, random vari-
t he te lephone and t he base station at t he ables X and Y have joint PDF
cen ter of t he cell.
0 < x, y <a,
6.4.9 X and Y are independent id en- f .x,Y(1;,y) = { 1/ a
t ically distributed Gaussian (0, 1) random 0 otherwise.
variables. F ind t he CDF of W = X 2 + Y 2 .
Find t he CDF and PDF of random variable
6.4.10 X is t he exponent ia l (2) r a ndom
variable and Z is t he Bernoulli (1/ 2) ran-
dom variab le t hat is indep enden t of X.
F ind t he PDF of Y = ZX.
Hint : Is it possible to observe W < 1?
6.4.11 X is t he G a ussian (0 , 1) random
6.4.14 The join t PDF of X and Y is
variable and Z, independent of X, has P lVIF
0 < x < y,
Pz (z) = { 1 - P z = -1, other,vise.
p z = 1.
\i\!h at is t he PDF of W = Y - X?
F ind t he PDF of Y = ZX.
6.4.15 Consid er random variables X, Y,
6.4.12 You ar e 'vait ing on t he platform of
and vV from Problem 6.4.14.
t he first stop of a Manhattan subway line.
You could rid e eit her a local or express t rain (a) Are W and X independen t?
to your destination, 'vhich is t he last stop (b) Are W and Y independent?
on t he line . The waiting t im e X for t he
next express t r ain is t he exponential ran- 6.4.16 X and Y are independent random
dom variable wit h E [X] = 10 minutes. The variables 'vit h CDFs Fx(1;) and Fy(y). Let
'vait ing time Y for t he next local t rain is t he U = m in (X, Y) a nd V = m ax(X, Y).
exponent ial random variable wit h E [Y] = 5 (a) W hat is F u, v( '11,, v )?
minutes. Although t he arrival t imes X and
Y of t he t r ains are r andom and indepen- (b) v vhat is f u, v('IJ,, v)?
dent, t he trains' t ravel t imes are determin- H int : To find t he joint C DF , let A =
istic; t he local train travels from first stop { U < 'll} and B = {V < v } a nd no te t hat
to last stop in exactly 15 m i.n utes 'vhile t he P[AB] = P[B ] - P [_4 cB].
express t r avels from first to last stop in ex-
6.5.1 Let X and Y be independen t dis-
actly 5 m inutes.
cr ete ra ndom variables such t hat Px(k) =
(a) W hat is t he joint PDF fx,Y(x,y)? Py(k) = 0 for all non-in teger k. Sho'v t hat
(b) Find P [L] t hat t he local t rain arrives t he I=>MF of vV = X + Y satisfies
first at t he platform? 00

( c) Suppose you board t he first t rain t hat Pw ('ID) = L Px (k) Py ('UJ - k) .

arrives. F ind t he PDF of your 'vait ing k =- oo
t ime W = min(X, Y).


6.5.2 X and Y have join t PDF expected values a and (3, respectively, and
sho'v t hat N = J + J{ is a Poisson random
. (
f x ' y x, y
= {2 ::e> O, y > O,x +y < l , var iable 'vi t h ex pected value a + (3 . Hin t:
Show t hat
0 o ther wise.
F ind t he PDF of vV = X + Y. PN (n,) = L PK (m) PJ (n, - m),
6.5.3 F ind t he PDF of vV = X +Y vvhen ni = O

X and Y have t he joint PDF and t hen simplify t he summation b y ex-

.X .Y (x, y ) =
O <:::r< y < l ,
other wise.
t r acting t he sum of a binomial P MF over
all possible values.

6.6.1 Use i cdfrv . m t o writ e a function

6.5.4 F ind t he PDF of vV = X +Y when w=wrv1 (m) t hat generates m, sample.s of
X and Y have t he joint PDF random var iable W from Problem 4 .2.4.
Not e t hat Fw 1 ('11,) do es no t exist for 7L =
. (
j x y x, y
= {1 0
O <x <l , O < y < l ,
ot her,vise.
1/4; however, yo u must d efine a func-
t ion i cdfw(u) t hat r et1uns a value for
icdfw( 0 . 25). Does it m atter w hat v alue
you r eturn for u=O. 25?
6.5.5 Random variables X and Y are
independen t exponen t ial ra ndom variables 6.6.2 Write a MATLAB funct ion u=urv (m)
'vit h expected values E [X ] = 1/ .:\ and t hat generates m samp les of r a ndom var-
E[Y] = l / JJ,. If, f= .\, 'vh at is t h e PDF iable U defined in Problem 4.4.7.
of W = X + Y ? If JJ, = .:\ ,what is f vv(1u)?
6.6.3 For random variable W of Exa m-
6.5.6 R andom variables X and Y have ple 6.10, we can generate random samples
joint PDF in t'vo d ifferen t ways:

O < y <::e < l , 1. G ener ate s amp les of X and Y and

f .X ,Y (x, y) = { 8xy
ot her,vise. calculate W = Y / X.
2. F ind t he CDP Fvv(1D) and generate
\i\1hat is t he J)D F of vV = X + Y? samples using T heorem 6.5.
6.5.7 Cont inuous random v ariables X and \tVrite JVIATLAB functions w=wr v1 (m) and
Y have joint PDF f x ,Y(1;, y) . Show t hat w=wrv2 (m) t o i1nplemen t t hese inethods.
W = X -Yhas PDF Does one met hod run much faster? If s o ,

f w('111) = 1: f x,Y(Y + 111, y ) dy .

wh y? (Use cputime to make comparisons.)

6.6.4 \i\frite a function y=del tarv (m) t hat

Use a variable substit u t ion t o sho'v returns m, samples of t he random variable
X 'vit h PDF

x < -1 ,
Fx (x) = -1 < x < 1,
6.5.8 In t his problem 've show directly x > 1.
t hat t he s um of independen t Poisson r a n-
dom variables is Poisson. Let J and K be Since F x 1 ( 11,) is not d efined for 1 / 2 < 11, < 1,
independent Poisson random variables wit h use t he result of Problem 6.3.13.

Conditional Probability Models

In rnany applicatior1s of probability, -vve ha:ve a probability rnodel of an experirnent

but it is impossible to observe the outcorr1e of the experiment. Inst ead observe
an event that is related to the outcorr1e. In sorne applications , t11e outcom e of
interest, for exarr1ple a s~J.rnple value of randorr1 voltage X , can be obsct1red b}'
randorr1 noise J\T, and we observe or1ly a sarnple value of X + N . In other exarr1ples,
-vve obtain information about a r andorr1 variable before it is possible to observe t11e
r andom variable. For exarnple, rnight learn the r1ature of an email (\vhet11er it
contair1s irnages or or1ly t ext) before -vve obser\re the n11rr1ber of bytes that need to
be transrr1itted. In another exarnple, \Ve observe that the begir1nir1g of a lecture is
delayed b}' t-vvo minutes a nd -vve \Vant to predict the actual st artir1g time. In t11ese
sitt1atior1s, we obtain a. conditior1al probability rr1odel by rr1odifying the origir1al
probability model (for the voltage, or the ern ail size, or the st art ing t irne) t o t ake
into account the inforrr1ation gair1ed frorr1 the event -vve h a\re observed.

7 .1 Conditioning a Random Variable by an Event

The cor1ditiona.l PMF Px1B(1';) and conditional PDF Px1B(x;) ar e

probability rr1odels that use the definitior1 of conditior1al probabil-
ity, Definit ion 1.5 , to incorporate partial knowledge of the outcome
of an experirr1ent. T he partial kno-vvledge is that the outcom e is

Recall from Section 1.4 t11at the conditior1al probabilit}'

p [AIBJ = p [AB) I p [BJ (7.1)

is a nt1rr1ber that expr esses ot1r new know ledge abot1t t11e occurrence of e\ren t A ,
-vvhen \Ve learn that another ever1t B occ11rs. In t11is section, we consider an e\rer1t A


related to the observation of a randorr1 variable X. W hen X is discrete, usually

are ir1terested in A = { X = x} for sorne x; . W hen X is continuous, -vve rr1ay consider
A = {a-; 1 < X < x 2 } or A = {:x; < X < :x; + d:x; }. T11e condit ior1ir1g event B contair1s
ir1forrr1ation abot1t X but not the precise value of X.

Example 7.1
Let N equa l the number of bytes in an emai l. A condition ing event might be the event
I that the email contains a n image . A second kind of co ndition ing wou ld be the event
{N > 100,000} , which tells us t hat the emai l required more than 100,000 bytes. Both
events I and {J\T > 100,000} give us informatio n that t he ema il is likely to have many

Exa mple 7.2'----===

Recall t he expe riment in whic h yo u wait for the professor to a rrive for the probability
lecture. Let X denote the arrival t ime in minutes eithe r before (X < 0) or after
(X > 0) t he scheduled lecture t ime. When you observe t hat the professor is a lready
two minutes late but has not yet arrived , you have learned that X > 2 but you have
not learned the precise value of X.

Knov.rledge of the cor1ditioning event B changes the probabilit:y of the event A.

Given this inforrr1ation and a probability model, car1 llSe Defir1ition 1.5 to fir1d
the condit ior1al probability P[A IB]. A starting point is t11e ever1t A = {X < x}; -vve
v.ro11ld find

P [A IB] = P [X <xi B J (7.2)

for a ll r eal nurr1bers x . T his forrnula is a ftrr1ction of ;i; . It is the co'ndition,a,l
c'LJ,rn'IJ, l ati'ue dis trib 'IJ, ti on, f'uri c ti on,.

D efinition 7 .1 Cond itiona I CD F

Given, the even,t B 'IJJ'i th P[B] > 0, the conditional cumulat ive distribution
func tion of X is
Fx1s(x) = P [X < xlB] .

The definit ior1 of the cor1ditior1al CD F applies t o discrete, contir1uous, and rnixed
randorr1 variables. Hov.rev er , just as -vve 11ave found in prior ch apters, the conditional
CD F is not the rr1ost convenient probabilit.Y rr1odel for m an:yr calculations. Inst ead have definitions for the special cases of discret e X and continuous X that are
rr1ore useful.

...--- Definition 7 .2 Conditional PMF Given a n Eve nt

Given, the even,t B 'tuith P[B] > 0, the conditional probability mass function
of Xis


In Chapt er 4 we defiried the PDF of a cont inuous raridom variable as t he deriva-

tive of the CDF. Sirriilarly, v.rith the knovvledge tliat x EB , we define the condit ional
PDF as t he derivative of t he conditional CDF.

Definition 7 .3 Conditional PDF Given an Event

For a raridorn variable X arid an, even,t B 'tuith P[B ] > 0) t he conditional PDF
of X given B is
. ) _ dFx1 B(x)
f x 1B(x - dx .

The f\1nctions Px 1B(:x;) arid f"x 1B(x; ) are probability rnodels for a nev.r raridom var-
iable related t o X. Here we ha;ve extended our riotation corrvent iori for probabilit:y
furictioris. '\'!Ve cont inue the old cor1ven tion tliat a CDF is denoted b:y t he letter
F , a PMF b y P , a nd a PDF by f , vvith t he subscript contairiing the name of t he
randorri variable. Hov.rever, vvit h a condit ioning everit, the subscript coritains the
narne of the raridom v ari able by a ver t ical bar by a stat ernent
of the condit ioning event. The argurnent of t lie functiori is usually the lowercase
letter corresporiding to t lie variable narne. Tlie arg11rrient is a durrirny variable. It
cot1ld be any letter , so tha.t Px 1B(:x;) arid 1YIB(Y) are the same functions as Px1 B(v,)
and f YIB(v) . Sornetirries -vve v.rrite the function witli no specified argurnen t at all:
Px 1B(-) .
W hen a conditionirig everit B c Bx , both P(B] a nd P[AB] in Equation (7.1) are
propert ies of t lie PMF Px(x;) or PDF f"x( x; ). Nov.r eit her t he event A = {X = x;}
is contained in the event B or it is not. If X is discret e arid x EB , then {AB} =
{X = x} n B = { X = x} a rid P (X = x, B ] = Px(x; ). Ot herv;.rise, if ;x; tj_ B , t lien
{ X = x} n B = 0 and P(X = x;, B] = 0. Sirnilar obser va.tions a pply \vhen X is
cont inuous . T he next t lieorerri uses t hese observations t o calculate t he condit ional
probabilit}' rriodels .
---== Theorem 7.l =---
For a ran,dorn variable X an,d an, even,t B C S x 'tuith P[B] > 0) the con,dition,al
PDF of X g'iven, B is
Px (x)
Discrete: Px 1B (x;) = P [B]
0 otherv.Jise

f'x (x )
:x; EB ,
Con,tin/uov,s: f XIB (x;) = P [B ]
0 oth er'tJJis e.

The theorerri st at es that -vvhen vve learn t h at an outcorrie x E B , t he probabilities

of all x tj_ B are zero in our coriditional rriodel, and the probabilities of all x E B
are proportionall}' higlier t han t liey vvere before \Ve learried ;x; E B.


Example 7 ,],- ---===

A website distributes instructional videos on bicycle repair. The length of a video in
minutes X has PMF

0.15 x = 1, 2, 3, 4,
Px(x) = O.l x = 5, 6,7,8, (7.3)
0 otherwise.

Suppose the website has two servers, one for videos shorter than five minutes and the
other for videos of five or more minutes. What is the PM F of video length in the second

We seek a conditiona l PM F for the condition x EL = {5, 6, 7, 8}. From Theorem 7 .1,
x = 5, 6, 7, 8,
p [L] (7.4)
0 otherwise.

From the definition of L, we have

P [L] = L Px(x;) = 0.4. (7.5)

With Px(x) = 0.1 fo r a-; EL,

0.1/ 0.4 = 0.25 x = 5, 6, 7, 8,

0 otherwise.

Thus the lengths of long videos are equal ly likely. Among the long videos, each length
has probabil ity 0.25.

Sorr1etirr1es instead of a. letter such as B or L that der1otes the subset of S x that

forrr1s the cor1ditior1 , -vve '\Vrite t he condit ion itself in t he P1!{F. In the preceding
exa.rr1ple we could t1se the notation Px1x >5(x) for the conditional P NI F.
Example 7.4
For the pointer-spinning experiment of Examp le 4.1, find the conditional PDF of the
pointer position for spins in which the po inter stops on the left side of the circle.

Let L denote the left side of t he circle. In terms of the stopping position, L = [1 / 2, 1).
Recall ing from Example 4.4 that the pointer position X has a uniform PDF over [O, 1),

P [L] = f 1 f"x (;r;) dx; = f1 dx = 1/ 2. (7.7)

11;2 11;2

2 1/ 2 <x;<l,
0 otherwise.


==~ Example 7.5,.......;;=;;;:

Suppose X, the t ime in intege r min utes you wait for a bus, has t he discrete uniform

1/20 x =l , 2, ... , 20,

Px (:i;) = (7.9)
0 otherwise.

Suppose t he bus has not arrived by the eighth minute; what is the conditiona l PMF of
your waiting time X?
Let A denote the eve nt X > 8. 0 bservi ng t hat P [A) = 12/ 20 , we ca n write the
cond itiona l PM F of X as

1/ 20 1
x = 9, 10, ... , 20 ,
12/ 20 (7.10)
0 otherwise.

- - Example 7.6

Y The conti nuous uniform (- r/ 2, r/ 2) ra ndom var-

iab le X is processed by a b-bit un iform quantizer t o
3 produce the quantized output Y . Random variable
2 X is rounded to t he nearest quantizer level. Wit h a
b-b it quantizer , there are r1, = 26 quantization levels .
.-..-----........---........--.-.......---.___,..-. x The quantization ste p size is~ = r/ri, and y takes
D. 2D. JD. r/2 on va Iues in the set

Qy = {Y-n/2,Y-n/2+1, ,Yn/2-1} (7.11 )

where Yi = ~ / 2 + i~ . Th is relationship is shown for

b = 3 in t he fi gure on t he left. Given t he event Bi that Y = Yi find t he conditiona l
PDF of X given Bi .
In terms of X, we observe t hat Bi = {'i ~ < X < ('i + l ) ~} . Thus,
('i+ l ) ~ ~
P [Bi] = 1x(x) d:i; = - = -. (7.12)
i~ r ri
By Definition 7.3,

1x (x)
1/ ~ 'i ~ < x < ('i + l ) ~ ,
fx1Bi (x;) = P [Bi ) (7.13)
otherwise, 0 otherwise.

Given B,i: , the conditional PDF of X is un iform over the i th qua ntization interva l.


In sorr1e applications, '"'e begin \vit11 a set of condit ior1al probability models such
as t he P 1!{Fs Px1Bi(x), 'i = l , 2, ... ,rn,, v.rhere B 1, B 2, ... , B rn is a par t ition. We
t hen t1se t he lavv of total probabilit}' t o find t he P MF Px(x;).

=== Theorem 7.2'= ==

For ran,dorn variable X res11Jt'irig frorn an, experirnen,t 'tlJith part'ition, B 1 , ... , Brn,
Discrete: Px (x) = 2:: Px 1Bi (x;) P (Bi) ;
'l = l


Con,tin/u,011,s: 1x (x) = 2:: fx IBi (x;) P [B,i:]

i= l

Proof The t heorem follo,vs d irectly from Theorem 1.10 wit h A = {X = x } for d iscrete X
or _4 = {x < X < ::i; + dx } when X is cont inuous.

Example 7.7
Let X denote t he num ber of add it io na l years that a random ly chosen 70-year-old person
w ill li ve. If the person has high blood pressure, denoted as eve nt H , t hen X is a
geometric (IJ = 0. 1) rand om variable. Otherwise, if t he person 's blood pressure is
norma l, event N, X has a geometric (IJ = 0.05) PMF. Find the cond itiona l PM Fs
Px 1H(:J;) and Px 1N(x;) . If 40 percent of all 70-year-olds have high blood pressure , what
is the PM F of X ?
The problem stat ement specifies t he cond it ional P M Fs in words . Mathematically, t he
two cond it iona l PMFs are

o.1 (0. gr1;- l a-; = 1, 2, ... , () o.os(o.9sri;- l a-; = 1, 2, ... ,

Px1H(x) = Px 1N x =
0 otherwise, 0 otherwise.

Since H , N is a partition , we can use T heorem 7.2 to write

p x (a-;) = p x IH ( x) p [H.) + p x IN (a-;) p [N)

(0. 4)(0. 1) (0.9)x- i + (0.6)(0.05)(0.95r ;-l 1
x = 1, 2, ... ,
0 otherwise.

Exa mple 7.8

Random variable X is a voltage at the rece iver of a modem. W hen symbo l "O" is
transmitted (event B 0 ) , X is the Gaussian (- 5, 2) random variable . When symbol
"l " is transmitted (event B 1 ), X is the Gaussian (5 , 2) random variab le. Given that
symbo ls "O" a nd "l" are equa lly li kely to be sent, what is the PDF of X ?

The problem st at ement implies that P (B 0 ) = P (B 1 ) = 1/ 2 and



fx 1so(x) = 2~ e - <"+5)'/s, f x 1s,(x) = 2~ e - (x - 5)' /8 (7.15)

By Theorem 7.2 ,

f"x(x; ) = f"x lBo (x ) P [Bo]+ f XIB 1 (x;) P (B1]

= 4~ ( e - (x+5 )2 / 8 + e - (x - 5) 2 /8) . (7.16)

Problem 7.7.1 asks the reader to graph f x(x) to show its sim ilar ity to Figure 4.3.


(A) On the Internet , dat a is t ransmitted in packet s. In a sirnple model for vv orld
vVide Web traffic, t h e nt1rnber of packets N needed t o trar1srnit a '\i\Teb page
depends on vvhether t h