the Theory of
RandomSignals
and Noise
OTHER IEEE PRESS BOOKS
Advances in Local Area Networks, Edited by K. Kiimmerle, J. O. Limb, and
F. A. Tobagi
Undersea Lightwave Communications, Edited by P. K. Runge and P. R.
Trischitta
Multidimensional Digital Signal Processing, Edited by the IEEE Multidimen
sional Signal Processing Committee
Residue Number System Arithmetic, Edited by M. A. Soderstrand, W. K.
Jenkins, G. A. Jutlien, and F. J. Taylor
Getting the Picture, by S. B. Weinstein
Modem Spectrum Analysis, Edited by S. B. Kesler
The Calculus Tutoring Book, by C. Ash and R. Ash
PhaseLocked Loops, Edited by W. C. Lindsey and C. M. Chie
Spectrum Management and Engineering, Edited by F. Matos
LandMobile Communications Engineering, Edited by D. Bodson, G. F.
McClure, and S. R. McConaughey
Electronic Switching: Digital Central Systems of the World, Edited by A. E.
Joel, Jr.
Satellite Communications, Edited by H. L. Van Trees
PhaseLocked Loops & Their Application, Edited by W. C. Lindsey and M.
K. Simon
Spread Spectrum Techniques, Edited by R. C. Dixon
Electronic Switching: Central Office Systems of the World, Edited by A. E.
Joel, Jr.
Waveform Quantization and Coding, Edited by N. S. Jayant
Communications Channels: Characterization and Behavior, Edited by B.
Goldberg
Computer Communications, Edited by P. E. Green and R. W. Lucky
ANOTE TOTIlEREADER
Thisbook has been electronically reproduced from
digitalinformation storedat JohnWiley &Sons, Inc.
Weare pleased that the useof thisnewtechnology
will enableus to keep worksof enduringscholarly
valuein print as longas there is a reasonable demand
for them. The contentof this bookis identicalto
previous printings.
ii
An Introduction to
the Theory of
RandomSignals
andNoise
Wilbur R Davenport, Jr.
William L. Root
Published under the sponsorship of the
IEEE Communications Society
+IEEE
The Institute of Electrical and Electronics Engineers, Inc., NewYork
AJOHNWILEY& SONS, INC., PUBLICATION
NewYork Chichester Weinheim Brisbane Singapore Toronto
James Aylor
F. S. Barnes
J. E. Brittain
B. D. Carrol
Aileen Cavanagh
D. G. Childers
H. W. Colburn
J. F. Hayes
IEEE PRESS
1987 Editorial Board
R. F. Cotellessa, Editor in Chief
J. K. Aggarwal, Editor, Selected Reprint Series
Glen Wade, Editor, Special Issue Series
W. K. Jenkins A. C. Schell
A. E. Joel, Jr. L. G. Shaw
Shlomo Karni M. I. Skolnik
R. W. Lucky P. W. Smith
R. G. Meyer M. A. Soderstrand
Seinosuke Narita M. E. Van Valkenburg
J. D. Ryder John Zaborsky
w. R. Crone, Managing Editor
Hans P. Leander, Technical Editor
Laura J. Kelly, Administrative Assistant
Allen Appel, Associate Editor
Copyright 1987 by
THE INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, INC.
All rights 'reserved.
This is the IEEE PRESS edition of a book published by McGraw Hill Book
Company in 1958 under the same title.
Library of Congress CatalogingInPublication Data
Davenport, Wilbur B.
An introduction to the theory of random signals and noise.
"Published under the sponsorship of the IEEE Communications Society."
Bibliography: p.
Includes index.
1. Statistical communication theory. 2. Signal theory (Telecommunications)
I. Root, William L. II. IEEE Communications Society. III. Title.
TK5101.D3 1987 621.38'043 8722594
ISBN 0879422351
iv
CONTENTS
PREFACE TO THE IEEE PRESS EDITION
PREFACE
ERRATA
CHAPTER 1. INTRODUCTION .
11. Communications Systems and Statistics, 12. The Book
CHAPTER 2. PROBABILITY .
21. Introduction, 22. Fundamentals, 23. Joint Probabilities, 24. Con
ditional Probabilities, 25. Statistical Independence, 26. Examples, 27.
Problems
ix
x
xi
1
5
CHAPTER 3. RANDOM VARIABLES AND PROBABILITY
DISTRIBUTIONS 19
31. Definitions, 32. Probability Distribution Functions, 33. Discrete
Random Variables, 34. Continuous Random Variables, 35. Statistically
Independent Random Variables, 36. Functions of Random Variables, 37.
Random Processes, 38. Problems
CHAPTER 4. AVERAGES .
41. Statistical Averages, 42. Moments, 43. Characteristic Functions,
44. Correlation, 45. Correlation Functions, 46. Convergence, 47. Inte
grals of Random Processes, 48. Time Averages, 49. Problems
CHAPTER 5. SAMPLING .
51. Introduction, 52. The Sample Mean, 53. Convergence of the Sample
Mean, 54. The Central Limit Theorem, 55. Relative Frequency, 56.
Problems
CHAPTER 6. SPECTRAL ANALYSIS
61. Introduction, 62. The Spectral Density of a Periodic Function, 63.
Spectral Density of a Periodic Random Process, 64. Orthogonal Series
Expansion of a Random Process, 65. Spectral Density for an Arbitrary
Function, 66. Spectral Analysis of a Widesense Stationary Random Proc
ess, 67. Crossspectral Densities, 68. Problems
v
45
76
87
vi CONTENTS
CHAPTER 7. SHOT NOISE
112
71. Electronics Review, 72. The Probability Distribution of Electron
emission Times, 73. Average Current through a Temperaturelimited
Diode, 74. Shotnoise Spectral Density for a Temperaturelimited Diode,
75. Shotnoise Probability Density for a Temperaturelimited Diode,
76. Spacecharge Limiting of Diode Current, 77. Shot Noise in a Space
chargelimited Diode, 78. Shot Noise in Spacechargelimited Triodes and
Pentodes, 79. Problems
CHAPTER 8. THE GAUSSIAN PROCESS .
145
.
81. The Gau88ian Random Variable, 82. The Bivariate Distribution, 83.
The Multivariate Distribution, 84. The Gaussian Random Process, 85.
The Narrowband Gaussian Random Process, 86. Sine Wave plus Narrow
band Gaussian Random Process, 87. Problems
CHAPTER 9. LINEAR SYSTEMS.
171
91. Elements of Linearsystem Analysis, 92. Random Inputs, 93. Out
put Correlation Functions and Spectra, 94. Thermal Noise, 95. Output
Probability Distributions, 96. Problems
CHAPTER 10. NOISE FIGURE .
204
101. Definitiona, 102. Noise Figure, 103. Cascaded StagtJI, 104. Exam
ple, 105. Problems
CHAPTER 11. OPTIMUM LINEAR SYSTEMS .
219
111. Introduction, 112. Smoothing and Predicting of Stationary Inputs
Using the Infinite Past, (Wiener Theory), 113. Pure Prediction: Nonde
terministic Processes, 114. Solution of the Equation for Predicting and
Filtering, 115. Other Filtering Problems Using Leastmeansquare Error
Criterion, 116. Smoothing and Predicting with a Finite Observation Time,
117. Maximizing SignaltoNoise Ratio: The Matched Filter, 118.
Problems
CHAPTER 12. NONLINEAR DEVICES: THE DIRECT
METHOD. . 250
121. General Remarks, 122. The Squarelaw Detector, 123. The Square
law Detector: SignalplusNoise Input, 124. The Halfwave Linear Detec
tor, 125. Problems
CHAPTER 13. NONLINEAR DEVICES: THE TRANSFORM
METHOD. . 277
131. The Transfer Function, 132. J'thLaw Devices, 133. The Output
Autocorrelation Function, and Spectral Density 134. The Output Spectral
Density, 135. Narrowband Inputs, 136. "tIlIaw Detectors, 137.
Problems
CONTENTS vii
CHAPTER 14. STATISTICAL DETECTION OF SIGNALS. . 312
141. Application of Statistical Notions to Radio and Radar, 142. Testing
of Statistical Hypotheses, 143. Likelihood Tests, 144. Statistical Estima
tion, 145. Communication with Fixed Signals in Gaussian Noise, 146.
Signals with Unknown Parameters in Gaussian Noise, 147. Radar Signals
in Gaussian Noise, 148. Problems
APPENDIX 1. THE IMPULSE FUNCTION .
365
AII. Definitions, Al2. The Sifting Integral, AI3. Fourier Transforms,
Al4. Derivatives of Impulse Functions
APPENDIX 2. INTEGRAL EQUATIONS .
A21. Definitions, A22. Theorems, A23. Rational Spectra
BIBLIOGRAPHY .
INDEX .
371
383
391
PREFACE TO THE IEEE PRESS EDITION
We want to thank the IEEE PRESS for republishing this book, nearly thirty
years after it first appeared, and to thank the IEEE Communication Society for
being willing to serve as Sponsor. We are very pleased that the book will again
be available; perhaps some students and engineers working on problems not
formulated until years after the book was written will find it useful.
The book is being printed exactly as in the original edition with, of course, all
the errors still there. However, an errata list has been provided to rectify those
errors of which we are aware.
We point out to the reader that the modifier "realizable" as used in Chapter
11 means simply causal, or nonanticipative, in current systems theory language.
We also point out that the problems at the end of each chapter, except the first,
are exercises for the serious reader and, as for the 1958 edition, no solutions
manual is available.
WILBUR B. DAVENPORT, JR.
WILLIAM L. ROOT
ix
PREFACE
During the past decade there has been a rapidly growing realization
amongst engineers of the power of the methods and concepts of mathe
matical statistics when applied to problems arising in the transmission
and processing of information. The use of the statistical approach has
resulted not only in a better understanding of the theory of communica
tions but also in the practical development of new equipment. We have
tried here to provide an introduction to the statistical theory underlying
a study of signals and noises in communications systems; in particular,
we have tried to emphasize techniques as well as results.
This book is an outgrowth of a set of lecture notes prepared by the
firstnamed author in 1952 for a firstyear graduate course on the sta
tistical theory of noise and modulation given in the Department of Elec
trical Engineering at M.I.T. The material has recently been used in
substantially its present form as a text for an informal course at the
M.I.T. Lincoln Laboratory. With some expansion, it could form the
basis of a twosemester, firstyear graduate course. Alternatively, by
cutting out some parts, say Arts. 64 and 95, all of Chap. 11, and parts
of Chap. 14, the remaining material could probably be covered in a sin
gle semester. Prerequisites would include some electrical network the
ory and an advanced engineering calculus course containing Fourier
series and integrals.
It would be impracticable to acknowledge all those who have contrib
uted to this book. However, we do wish to mention David Middleton,
with whom we have had many stimulating discussions, and our col
leagues at M.I.T., who have made many valuable comments and criti
cisms. In particular, we wish to thank Morton Loewenthal for the time
and effort he spent in making a critical review of the entire manuscript.
WILBUR B. DAVENPORT, JR.
WILLIAM L. ROOT
x
ERRATA
p. 33 Replace sentence above Eq. (342) with: "If the probability density
function of y exists it may then be obtained by differentiation."
p. 34 At top of page in equation for P2(Y) change the rightmost side to:
dX
"PI(X( Y dY ..
Make the same kind of change in Eq. (343).
p. 35 Change last two lines to read: " ...sets in the sample space of the x; that
are preimages of given points in the sample space of the ym, and
equating probabilities. Note that... ."
p.42 In the line just above Eq. (362) changep(x
n
lx., ... , x;  1) to:
p. 43 At the end of Problem 8 delete the phrase: "random variables." In
Problem 11 change p(fJ) to "PI(fJ)," and change part b to read:
"Determine P2(X,), the probability density function for x.,'
p.44 In Problem 12 replace p(z), p(x) andp(y) with "P3(Z)," "Pt(x)" and
"P2( Y), " respectively.
In Problem 13 replace the equation for p(x) with
for [x] s 1
0, otherwise
and change p(y) to "P2(Y)."
In Problems 14 and 15 replace p(x) and p(y) with "P.(x)" and
"P2(y), " respectively.
p. 49 Change Eq. (413) to read
"(1;= IL2 =E[(x m)2] =E(x
2
)  m?"
p. 57 At the end of the second line add: ", and has slope ILII ."
(12
x
p. 71 Problem 3. Change the first sentence to read: "Let x be a random
variable with finite second moment and c an arbitrary constant."
p. 73 Problem 15. Change the first sentence to read: "Consider the three
jointly stationary random... .'
xi
p. 75 Problem 21. Change part a to read: "Show that if JQ J is not constant
with probability 1, E (Xt) =1= <x(/) > "
Problem 22. Add the phrase " ... , where p = 1  a'
p. 83 Change Eq. (524) to read:
1 rY (y2)
"F(Y)= P(ys Y)= (211")112 j _... exp "2 dy"
p. 86 Problem 3. In the formula expression (1 :) should be changed to:
"( Ir
l)"
1
ro
p. 91 Add to last sentence in first paragraph of Sec. 63: " ... , but it may not be
widesense stationary. "
p. 95 In the second integral in the expression for TE (xnx:), the argument for
the exponential should be:
"(
p. 101 The first equation on the page should be: "b cot bT=  1"
p. 110 Problem 6. Add to the specification of the autocorrelation functions:
". where a > 0 and b > 0."
p. III Problem 9. The formula for autocorrelation function should read:
p, 149 In the second line above Eq. (822), UJ and U2 should be replaced by:
and respectively.
p. 151 The final expression in Eq. (837) should be: "My(jw)".
p. 153 The line after Eq. (853) should read: "Therefore if N jointly Gaussian
random.... "
p.200 Problem 1. Change Ih(T)1
2
to: "lh(r)I."
Problem 2. In Eqs. (999) and (9100) change Rxy(T) to: "Ryx(r)."
p. 201 Problem 6. Add to the end of the first paragraph: " ...and where
RCT."
Problem 8. The first sentence should read: "Consider a lossless bilateral
coupling.... "
p. 202 In Eq. (9107) change e
j 2rjT
to: "ej2ffjT."
p. 238 In Eq. (1154) on the lefthand side, H(r) should be changed to:
"h(T)" .
p. 252 In Eqs. (129) and (1210), should be changed to: u,.JYt1a"
everywhere it occurs.
p.271 In Eq. (1299), (I + should be replaced by: "(I )".
xii
p. 289 In Eq. (1345) the expression in the second integral should
be changed to:
,,(
p. 341 Reference to Eq. (1425) just above Eq. (1461) should be to Eq. (14
23).
p. 342 First line of Eq. (464), delete 1/2 in front of second summation sign.
p. 372 In Eq. (A25) the lim should be changed to: "lim".
xiii
CHAPTER 1
INTRODUCTION
11. Communications Systems and Statistics
A simple communications system is the cascade of an information
source, a communication link, and an information user where, as shown
in Fig. 11, the communication link consists of a transmitter, a channel,
Information
user
I
I
~ I
Receiver Channel Transmitter
to4 Communication link      ~
FIG. 11. A communications system.
I
I
I
Information
source
and a receiver. The transmitter modulates 'or encodes information sup
plied by the source into a signal form suitable to the channel, the channel
conveys the signal over the space intervening between the transmitter
and receiver, and the receiver demodulates or decodes the signal into a
form suitable to the user. Typical communication links are (1) the
frequencymodulated, veryhighfrequency radio link that brings enter
tainment to the listener, (2) the timedivision multiplex radio link that
transmits various data to some controlled device, e.g., a ship or an air
plane, (3) the Teletype machine, connecting wire or cable, and associated
terminal apparatus that convey telegrams from one point to another, and
(4) the eye and pertinent portions of the nervous system that transmit a
visual image to the brain.
Randomness or unpredictability can enter a communications system
in' three ways: the information generated by the source may not be com
pletely predictable, the communication link may be randomly disturbed,
and the user may misinterpret the information presented to it. It is in
fact a fundamental tenet of information theory that the output of a source
must be unpredictable to some degree in order to impart information at
all; if the source output were completely predictable, the user could,
without communication from the source, state at any instant the entire
future output of the source. Disturbances in the communication link
can occur in many ways. Both the transmitter and the receiver can add
1
2 RANDOM SIGNALS AND NOISE
noise. If the channel is a radio channel, it can, for example, add atmos
pheric noise, galactic noise, or manmade interference. In addition, it
might be subject to a randomly variable multipath propagation mode
which causes a single transmitted signal to appear as a multiplicity of
interfering signals at the receiver.
Even though the signals and noises in a communications system are
random, the behavior of that system can be determined to a considerable
degree if various average properties of the signals and noises are known.
Such properties might include the average intensity or power, the dis
tribution of power with frequency, and the distribution of the instantane
ous amplitude. The determination of the relations among the various
average properties falls into the domain of probability theory and sta
tistics. The purpose of this bookis to introduce to the reader the appli
cation of statistical techniques to the study of communications systems. t
In general we shall assume the statistical properties of the signals and
noises to be given; the study of optimum statistical properties for signals
may be found in works on information theory] and will not be diacuesed
here.
12. The Book
Approximately the first half of this book is devoted to a development
of those elements of probability theory and statistics which are particu
larly pertinent to a study of random signals and noises in communications
systems. The remainder is devoted to applications.
Survey of the Text. In Chap. 2, the probability of an event, i.e., the
outcome of an experiment, is introduced in terms of the relative frequency
of occurrence of that event and a set of axioms of probability theory is
presented. The probabilities of multiple events are then discussed and
the concept of statistical independence is mentioned. The representation
of an event by a point in a sample space is made in Chap. 3, and a random
variable is defined there as a function over a sample space. Probabilities
are then introduced on the sample space, and probability distribution
functions and. probability density functions of random variables are
defined. The probability concepts are next extended to random func
tiona of time by the definition of a random procee as a family of random
variables indexed by the parameter t.
The notion of statistical average is introduced in Chap. 4 in terms of
the common arithmetic average, and certain statistical averages are con
sidered. In particular, the characteri8tic function of the probability dis
tribution function of the random variable z is defined to be the statistical
t An excellent short introduction to the subject matter of this book may be found
in Bennett (II). (See Bibliography.)
t For exampIe Shannon (I) and (II).
IKTBODUOTIOK 3
average of exp(jvz) and is shown to be the Fourier transform of the
probability density function of %. The correlation coefficient of the two
random variables % and y is next stated to be the statistical average of a
normalized product of z and y and is related to the problem of obtaining
the best meansquare prediction of y given z, The random variables x
and yare then said to be linearlyindependentif their correlation coefficient
is zero. Finally, in Chap. 4 the relation between time averages and sta
tistical averages is investigated.
In Chap. 5, sampling is introduced and the 8ample mean in particular
is discussed at some length. A simple form of the central limit theorem
is derived, and the relation between the relative frequency of occurrence
of an event and the probability of that event is further studied.
The epectral density, i.e., the distribution of power with frequency, of a
function of time is considered in Chap. 6 and is shown to be the Fourier
transform of the autocorrelation function of the time function. The
concept of spectral density is then extended to random processes, and
the problem of representing a random process by a series of orthogonal
time functions with linearly independent random coefficients is discussed.
The determination of the statistical properties of a physical process is
illustrated in Chap. 7 by a study of the shot noise generated in thermionic
vacuum tubes. First, the properties of shot noise in temperaturelimited
diodes are found in a manner similar to that used by Rice (I), and then
the results 80 obtained are extended to spacechargelimited tubes.
One of the most commonly occurring and thoroughly studied classes
of random processes is that of the gaussian processes. The statistical
properties of these processes are reviewed in Chap. 8. In particular, the
properties of a narrowband gaussian random process are considered in
some detail, as are the joint statistics of a sine wave and a narrowband
gaussian process.
The techniques developed in Chaps. 2 through 8 are applied to a study
of the passage of random signals and noises through linear systems in
Chaps. 9, 10, and 11. The analysis of the response of a linear system
to an ordinary function of time is reviewed in Chap. 9 and extended there
to random time functions. The correlation functions and spectral densi
ties at the output of a linear system in response to random inputs are
then determined; and the problem of obtaining the probability density
function of the output of a linear system is treated. These results are
applied in Chap. 10 to a study of noise in amplifiers. Noise figure is
defined, and some of its properties are discussed. The synthesis of
optimum linear systems is discussed in Chap. 11. In particular, the
theory of leastmeansquare error smoothing and predicting, using either
the infinite past of the input or only a finite portion of the past, is
investigated.
4 RANDOM SIGKALS AND NOISE
The passage of random processes through a class of nonlinear devices
which have DO memory is considered in Chaps. 12 and 13. In Chap. 12,
this problem is treated directly as a transformation of variables using the
nonlinear transfer characteristic of the device in question, and specific
results are obtained for the fullwave squarelaw detector and the half
wave linear detector. In Chap. 13, the tran8fer function of a nonlinear
device is defined to be the Fourier transform of the transfer characteristic
of the device. The transfer function is then used to determine the auto
correlation function and spectral density of the output of a nonlinear
device in response to an input consisting of the sum of a sine wave and a
gaussian random process. Particular results are then obtained for the
class of 11thlaw nonlinear devices.
Chap. 14 presents an introduction to the application of statistical
hypothesis testing and parameter estimation to problems of signal detec
tion and extraction. The statistical principles needed are developed,
including the NeymanPearson hypothesis test and other tests involving
the likelihood ratio and the maximumlikelihood method of parameter
estimation. Applications are made to radar and to radiocommunica
tions systems using a binary alphabet.
The Bibliography. The various sources referred to in this book have
been collected in the Bibliography at the end of the book. Only those
sources were included which seemed to us to be particularly pertinent
either to the text itself or to the problems at the ends of the chapters;
no attempt was made to be allinclusive. Extensive lists of references
may be found in the bibliographies of Ohessin, Green, and Stumpers
and in the books of BlancLapierre and Fortet, Bunimovieh, Cramer,
Doob, Gnedenko and Kolmogorov, and Solodovnikov (see Bibliography).
CHAPTER 2
PROBABILITY
The fields of mathematics pertinent to a study of random signals and
noise are probability theory and statistics. The purpose of Chaps. 2, 3,
and 4 is to treat the relevant portions of the probability calculus in suf
ficient detail so that a reader previously unfamiliar with the subject may
become acquainted with the tools he needs for the main part of the book.
This will "be done without any pretense of mathematical rigor. For any
one wanting to devote time to a careful study of mathematical proba
bility, there are several excellent texts, particularly Cramer (I), Feller (1),
and Loeve (I).
21. Introduction
One way to approach the notion of probability is through the phe
nomenon of statistical regularity. There are many repeating situations
in nature for which we can predict in advance from previous experience
roughly what will happen, or what will happen on the average, but not
exactly what will happen. We say in such cases that the occurrences
are random; in fact, the reason for our inability to predict exactly may be
that (1) we do not know all the causal forces at work, (2) we do not have
enough data about the conditions of the problem, (3) the forces are 80
complicated that calculation of their combined effect is unfeasible, or
possibly (4) there is some basic indeterminacy in the physical world.
Whatever the reason for the randomness, in very many situations leading
to random occurrences, a definite average pattern of results may be
observed when the situation is recreated a great number of times. For
example, it is a common observation that if 8 good coin is flipped many
times it will turn up heads on about half the flips. The tendency of
repeated similar experiments to result in the convergence of overall
averages as more and more trials are made is called statistical regularity.
It should be realized, however, that our belief in statistical regularity is
an induction and is not subject to mathematical proof.
A conventional method of introducing mathematics into the study of
random events is, first, to suppose that there are identifiable systems
subject to statistical regularity; second, to form a mathematical model
5
BANDOM SIGNALS AND NOISB
(i.e., a set of axioms and the propositions they imply) incorporating the
features of statistical regularity; and, last, to apply deductions made with
the mathematics to real systems. The mathematical model most widely
accepted and used is called mathematical probability theory and results
from the axioms stated by Kolmogorov (I, Chap. 1).
Let us now use the idea of statistical regularity to explain probability.
We first of all pre8cribe a bam experiment, for example, the observation
of the result of the throw of a die or the measurement of the amplitude
of a noise voltage at a given instant of time. Secondly, we specify all the
pOBBible outcomes of our basic experiment. For example, the possible
outcomes of the throw of a die are that the upper face shows one, two,
three, four, five, or six dots, whereas in the case of a noisevoltage meas
urement, the instantaneous amplitude might assume any value between
plus and minus infinity. Next, we repeat the basic experiment a largs
number 0/ time, under uniform conditions and observe the results.
Consider DOW one of the possible outcomes of our basic experiment
say, the rolling of a two with a die. To this event we want to attach a
nonnegative real number called the probability of occurrence. Suppose
that in a large number N of repetitions of our basic experiment, the event
of interest (A) occurs neAl times. The relative frequency of occurrence
of that event in those N repetitions is then n(A)/N. If there is a practi
cal certainty (i.e., a strong belief based upon practice) that the measured
relative frequency will tend to a limit as the number of repetitions of
the experiment increases without limit, we would like to say that the
event (A) has a definite probability of occurrence P(A) and take P(A)
to be that limit, i.e.,
n(A) 4 P(A)
N
asN.oo
Unfortunately, this simple approach involves many difficulties. For
example, one obvious difficulty is that, strictly speaking, the limit can
never be found (since one would never live long enough), even though in
Borne cases (as in gambling problems) there may be good reason to sup.
pose that the limit exists and is known. Therefore, rather than define a
probability as the limit of a relative frequency, we shall define probability
abstractly so that probabilities behave like limits of relative frequencies.
An important afterthefact justification of this procedure is that it leads
to socalled laW8 oj large number8, according to which, roughly, in certain
very general circumstances the mathematical counterpart of an empiri
cal relative frequency does converge to the appropriate probability,
and hence an empirical relative frequency may be used to estimate a
PROBABILITY
7
probability. t We shall discuss one form of the law of large numbers in
Art. 55.
22. Fundamentals
Having introduced the concept of probability and its relation to rela
tive frequency, we shall now define probability and consider some of its
properties.
First, however, we need to extend our notion of an event. It makes
perfectly good sense in speaking of the roll of a die to say that" the face
with three dots or the face with four dots turned up." Thus we want to
speak of compound events (A or B) where (A) and (B) are events.]
Alsowe want to speak of events (A and B), i.e., the simultaneous occur
rence of events (A) and (B). For example, in the roll of a die, let (A)
be the event cc at most four dots turn up" and (B) the event "at least
four dots turn up." Then (A and B) is the event "four dots turn up."
Finally, it is useful, if (A) is an event, to consider the event (not A).
Thus if (A) is the event that one dot turns up in the roll of a die, (not A)
is the event that two or more dots turn up.
In a succession of repetitions of an experiment, if after each trial we
can determine whether or not (A) happened and whether or not (B)
happened, then we can alsodetermine whether or not (A and B), (A or B),
(not A), and (not B) happened. One may then calculate an empirical
frequency ratio for the occurrence of each of the events (A), (B), (A and
B), (A or B), (not A), and (not B). It therefore seems reasonable to
require:
AXIOM I. To each event (A) oj a clCJ88 oj p08sible events oj a basic experi
ment, there is o8Bigned a nonnegative real number P(A) called the proba
bility of that event. If this cia's includes the event (A) and the event (B),
it also includes the events (A and B), (A or B), and (not A).
It follows from this axiom that a probability is defined for the certain
event (i.e., an event which must occur), since for any event (A), (A or
not A) is the certain event. Also, a probability is defined for the II null
event," since for any event (A), (A and not A) is the null event.
The relative frequency of occurrenceof a certain event is unity. Thus,
it seems reasonable to require:
AXIOM II. The probability of the certainevent is unity.
We say two events (A) and (B) are disjoint or mutually excluftfJe if
t The above dileu.lon touches on a difficult and controversial subject, the founda
tiona of probability. a detailed diseueaion of which is beyond the scope of this book.
For brief readable treatments of this subject see Carnap (I, Chap. II), Cram'r (I,
Chap. 13), and Jeffreys (I, Chap. I).
~ We shall generally take" A or B" to mean "either A or B or both."
8 RANDOM SIGNALS AND NOISE
they are in such relation to each other that if one occurs, the other' cannot
possibly occur. In the roll of a die, the events that the die turns up two
dots and that it turns up three dots are disjoint. In any case, (A) and
(not A) are disjoint. Suppose (A) and (B) are disjoint events that can
occur as the result of a given basic experiment. Let the basic experi
ment be repeated N times with (A) occurring n(A) times and (B) occur
ring nCB) times. Since (A) cannot occur when (B) does, and vice versa,
the number of times (A or B) occurred is n(A) + nCB). Thus,
n(A or B) = n(A) + nCB)
N N N
This relation holds as N . 00; hence we are led to require:
. ~ X I O M III. If (A) and (B) are mutually exclusive events, then
peA or B) = peA) + PCB) (21)
A consequence of this axiom is that if AI, A
2
, ,Ax are K mutally
exclusive events, then
K
P(AI or A 2 or or Ax) == 2peAl)
kt
(22)
This is easily shown by successive applications of Axiom III. A conse
quence of Axioms II and III is
os peA) s 1 (23)
for any event (A); for
peA) + P(not A) = P(certain event) = 1
and P(not A) is nonnegative. It also follows from Axioms II and III
that
P(null event) = 0 (24)
(25)
Note that if it is possible to decompose the certain event into a set of
mutually exclusive events AI, ... ,Ax, then
K
2peAl) == 1
i=l
These axioms are selfconsistent and are adequate for a satisfactory
theory of probability to cover cases in which the number of events is
finite. However, if the number of possible events is infinite, these axioms
are inadequate and some additional properties are required. The follow
ing axiom suffices:t
t See Kolmogorov (I, Chap. II).
PROBABILITY 9
AXIOM IV. If P(A,) i8 defined/or each 0/a class of events (AI), (AI), .. ,
then P(A
I
or AI or .) is defined, if (AI), (At), ... are mutually
exclusive events and the probability 0/ each oneis defined, then
.
peAl or AI or .) = l peA,)
iI
(26)
One point should perhaps be emphasized. Although the axioms imply
that the probability of the null event is zero, they do not imply that if
the probability of an event is zero, it is the null event. The null event is
the mathematical counterpart of an impossible event; thus, in interpre
tation, the mathematical theory assigns probability zero to anything
impossible, but does not say that if an event has probability zero it is
impossible. That this is a reasonable state of things can be seen from
the frequencyratio interpretation of probability. It is entirely conceiva
ble that there be an event (A) such that n(A)/N . 0 even though n(A)
does not remain zero. A common example of this is the following:
Let the basic experiment consist of choosing randomly and without bias
a point on a line one unit long. Then the choice of a particular point is
an event which we require to be exactly as probable as the choice of any
other point. Thus if one point has nonzero probability, all must have
nonzero probability; but this cannot be, for then the sum of the proba
bilities of these disjoint events would add up to more than one, in vio
lation of Axiom II. Thus every choice one makes must be an event of
probability zero.
It follows also, of course, that although the certain event must have
probability one, an event of probability one need not necessarily happen.
We conclude with a simple example. Let a basic experiment have six possible
mutually exclusive outcomes, which we shall call simply events 1, 2, 3, 4, 5, and 6.
The class of events to which we shall assign probabilities consists of these six events
plus any combination of these events of the form ( and  and ... ). It will be
noted that this class satisfies Axiom I. Now if we assign probabilities to each of the
events 1, 2, 3, 4, 5, and 6, then by Axiom III a probability will be defined for every
event in the class considered. We are free to assign any probabilities we choose to
events 1, 2, 3, 4, 5, and 6 &8 long &s they are nonnegative and add to one. Thus, for
example, we may take
P(l) .. ~ ~  probability of event 1
P(2) .. ~ ~
P(3)  ~ ,
P(4)  )f,
P(5)  }fe
P(6)  0
Then P(l or 3 or 6)  U, P(2 or 3)  ~ , etc.
Another consistent assignment of probabilities to this same class of events is
pel)  P(2)  P(3)  P(4)  P(5)  P(6) .. ~ ,
10 RANDOM SIGNALS AND NOISE
We cannot decide on the basis of mathematical probability theory which of these
uaignmenta applies to the experiment of the rolling of & particular die; both choioes
are valid mathematically.
23. Joint Probabilities
So far, we have been concerned primarily with the outcomes of a single
basic experiment. In many interesting problems, however, we might
instead be concerned with the outcomes of several different basic experi
ments, for example, the amplitudes of a noisevoltage wave at several
different instants of time, or the outcome of the throw of a pair of dice.
In the first case, we might wish to know the probability that the noise
at one instant of time tl exceeds a certain value %1 and that the noise at
another time t2 exceeds another value %2. In the second case we might
wish to know the probability that one die shows two dots and that the
other shows five dots.
Probabilities relating to such combined experiments are known as joint
probabilities. That these probabilities have the same basic properties as
those discussed in the previous section can be seen by realizing that a
joint experiment consisting of the combination of one experiment having
the possible outcomes (A
k
) with another having the possible outcomes
(B
m
) might just as well be considered as a single experiment having the
possible outcomes (A
k
and B
m
) . Therefore, if the probability that the
kth outcome of experiment A and the mth outcome of experiment B both
occur is denoted by P(A",B
m
) , it follows from Eq. (23) that
(27)
It further follows from the discussion leading to Eq. (25) that if there
are K possible outcomes (AI:) and M possible outcomes (B..), all of which
are mutually exclusive, we must then obtain the result that
M K
I LP(A"B.) = 1 (28)
tnI i=l
as we are dealing with an event that must occur. Both these results
may obviously be extended to cases in which we deal with combinations
of more than just two basic experiments.
A new problem now arises, however, when we become interested in the
relations between the joint probabilities of the combined experiment
and the elementary probabilities of the basic experiments making up the
combined experiment. For example, in the above case, we might wish
to know the relations between the joint probabilities P(Ai,B.) and the
elementary probabilities P(A
k
) and P(B
m
) . To this end, let us consider
the probability that the kth outcome of experiment A occurs and that
anyone of the possible outcomes of experiment B occurs. If all the
PROBABILITY 11
possible outcomes of experiment B are mutually exclusive, it then follows
from Axiom III that
M
P(A",B1 or Bs or or BJI) = l P(A",B.)
ml
This is simply the probability that the event Ale occurs irrespective of
the outcome of experiment B; i.e., it is simply the probability peAk).
Thus
M
P(A,,) = l P(A",B...)
.1
(29a)
when all the possible outcomes of experiment B are mutually exclusive.
In a similar manner,
K
P(B",) = l P(A",B",)
iI
(29b)
when all the possible outcomes of experiment A are mutually exclusive.
Thus we have shown the important fact that the elementary probabilities
of the component basic experiments making up a combined experiment
may be derived from the joint probabilities of that combined experiment.
It further follows from Eqs. (29a and b) that
peAk) ~ P(Ak,B",) and P(B",) ~ P(Ak,B",) (210)
for any value of k and m, since the joint probabilities P(A.,B.) are
nonnegative.
24. Conditional Probabilities
In the preceding section, we introduced joint probabilities pertaining
to the results of combined experiments and showed the relation between
the joint probabilities of the combined events and the elementary proba
bilities of the basic events. It is also of interest to answer the question:
"What is the probability of occurrence of the event (A) if we know that
the event (B) has occurred?" We will study probabilities of this type
"conditional probabilities"in this section.
Consider now a combined experiment consisting of two basic experi
ments, one giving rise to the basic events (At) and the other to the basic
events (B",). Suppose that the combined experiment is repeated N times,
that the basic event (At) occurs n(AA;) times in that sequence of N experi
ment repetitions, that the basic event (B",) occurs n(B",) times, and that
the joint event (Ak,B",) occurs n(Ak,B",) times.
For the moment, let us focus our attention on those n(A.) experi
ments in each of which the event (Ai:) has occurred. In each of these,
some one of the events (B",) has also occurred; in particular the event
12 RANDOM SIGNALS AND NOISE
(B.) occurred n(A.,B.) times in this subset of experiments. Thus the
relative frequency of occurrence of the event (B.) under the auumption
that the event (A,) also occurred is n(A.,B.)/n(A.). Such a relative
frequency is called a conditional relative frequency, since it deals with a
lPecified condition or hypothesis. It may also be expressed in the form
n(A.,B",) n(A.,B.)/N
n(A.) == n(Ai)/N
and hence is equal to the ratio of the relative frequency of occurrence
of the joint event (AI:,B.) to the relative frequency of occurrence of the
hypothesis event (A.). With this fact in mind, we are able to define
conditional probability. t
DEFINITION. The conditional probability P(BJA) of occurrence of the
event (B) subject to the hypothesis of the occurrence 0/ the event (A) is
defined a8 the ratio of the probability 0/ occurrence of the joint event (A ,B)
to the probability oj occurrence oj the hypothesis event (A):
P(BIA) = P(A,B)
peA)
The conditional probability is undefined if peA) is zero.
On rewriting Eq. (211) as
P(A,B) = P(BIA)P(A)
(211)
(212)
we see that the joint probability of two events may be expressed as the
product of the conditional probability of one event, given the other,
times the elementary probability of the other.
Conditional probability as defined above has essentially the same
properties as the various probabilities previously introduced. For exam
ple, consider the combined experiment giving rise to the joint events
(A.,B.) in which the basic events (B",) are mutually exclusive. From
the definition above, the conditional probability of the event (B
j
or B.)
subject to the hypothesis of the occurrence of the event (A.) is
P(B
. B IA )  P(A.,BJ or B.)
J or  P{A
i
)
From our previous discussion of the joint probabilities of mutually exclu
sive events, it follows that
P(A",B
J
or B.) P(A.,B
J
) +P(A.,B.)
P(A.) == P(A.)
t There are lOme subtleties connected with the definition of conditional probability
which can only be diecUllled in terms of measure theory and hence are beyond the
ICOpe of t.bie book. See Kolmogorov (I, Chap. V).
PROBABILITY 13
The righthand side is simply the sum of P(BJIA.) and P(B.fA.), hence
P(BJ or B",IA.) :a:: P(BJIA.) + P(B.IA.) (213)
i.e., conditional probabilities of mutually exclusive events are additive.
Furthermore, if the events (B.) form a set of M mutually exclusive
events, we obtain
M
l P(A",B..) M
P(Blor or BJlIA,,) == "'lp(A.) == LP(B..IA.)
",1
and if these M events (B",) comprise the certain event, it follows from
Eq. (290) that the numerator of the middle term is simply P(A.), hence
in this case
M
l P(B.. IA,,) == 1
",1
(214)
It follows from Eq. (210) and the defining equation (211) that con
ditional probabilities are also bounded by zero and one:
(215)
and that a conditional probability is at least equal to the corresponding
joint probability:
P(BIA) ~ P(A,B) (216)
since the hypothesis probability P(A) is bounded by zero and one.
2&. Statistical Independence
The conditional probability P(BIA) is the probability of occurrence
of the event (B) assuming the occurrence of the event (A). Suppose
now that this conditional probability is simply equal to the elementary
probability of occurrence of the event (B):
P(BIA) = PCB)
It then follows from Eq. (212) that the probability of occurrence of the
joint event (A,B) is equal to the product of the elementary probabilities
of the events (A) and (B):
P(A,B) == P(A)P(B) (217)
and hence that P(AIB) == peA)
i.e., the conditional probability of the event (A) assuming the occurrence
of the event (B) is simply equal to the elementary probability of the
event (A). 'I'hue we see that in this case a knowledge of the occurrence
of one event tells us no more about the probability of occurrenoe of 'the
RANDOM SIGNALS AND NOISE
other event than we .knew without that knowledge. Events (A) and
(B), which satisfy such relations, are said to be statistically independent
etitnU.
When more than two events are to beconsidered, the situation becomes
more complicated. For example, t consider an experiment having four
mutually exclusive outcomes (AI), (AI), (A.), and (A.), all with the same
probability of occurrence, 31. Let us now define three new events (B
i
)
by the relations
(B
1
) = (AI or AI)
(B
2
) = (AI or A.)
(B
a
) = (AI or A.)
Since the events (A",) are mutually exclusive, it follows that
P(B
1
) = peAl) + P(A
2
) =
and similarly that
P(B
I
) = = P(B.)
Consider now the joint occurrence of the events (B
1
) and (B
I
) . Since
the events (A.) are mutually exclusive, the event (B
1,B2
) occurs if and
only if (A I) occurs. Hence
P(Bt,Bt ) = P(At) = 31
P(B1,B.) == 34 = P(Bt,B
a
)
Since the elementary probabilities of the events (B
j
) are all 72, we have
thus shown that
P(Bt,B
I
) = P(B1)P(B
I
)
P(BI,B.) == P(B
1)1'(B.)
P(Bt,Ba) ==
and hence that the events (8J) are independent by pairs. However, note
that if we know that any two of the events (B
i
) have occurred, we also
know that the experiment outcome was (A 1) and hence that the remain
ing (BI) event must also have occurred. Thus, for example,
P(B.IB1,B,) =a 1 ;16 1'(B.) =
Thus we see that the knowledge that the' (N > 2) events in a given set
are pairwise statistically independent is not sufficient to guarantee that
three or more of those events are independent in a sense that satisfies
our intuition. We must therefore extend the definition of statistical
independence to cover the case of more than two events.
If, in addition to obtaining pairwise independence, the joint proba
t 'nUa example Wail firat pointed out by Serge Bernstein. See Kolmolorov (I,
p, 11, .footnote 12).
PROBABILITY 15
bility of the three events in the preceding example was equal to the
product of the elementary probabilities of those events, it would also
have turned out that
P(B
I
B B)
P(Bl)P(BI)P(Ba) == P(B )
I 1, I a: PCBl)P(BI)
and similarly for the other conditional probabilities. In this case, then,
the three events could justifiably be called statistically independent.
With this fact in mind, statistical independence for the case of N events
is defined as follows:
DEFINITION. N events (Aft) are said to be statistically independent events
if for all combinations 1 ~ i < j < k ~ N the following relations are
satisfied:
P(A"A;) == P(A.)P(Aj )
P(A.,A;,At} == P(A.)P(Aj)P(A
t
) (218)
Let us "now turn our attention to the experiments giving rise to our
events. In particular, let us consider the case of M experiments A(WI),
the mth of which has N", mutually exclusive outcomes. If we 80 desired,
we could consider the entire set of outcomes 8S a set of N events where
M
N = l N.
",1
The conditions of the preceding definition would then apply to the deter
mination of whether or not these events are statistically independent.
However, we are also interested in deciding whether or not the experi
ments themselves are statistically independent. The following definition
applies to this case:
DEFINITION. M experiment ACm), the rnth of which has N.. mutually
exclu3ive outcomes Aft_eM), are said to be statistically independent experi
mentsif fOT each set of M integer" nit n2, ,nM, the/allowing relation is
satisfied:
P(A".(1),A",C2), ,A".CM)] =P(Aft.(l)]P(A".(')] P(A"II(M)] (219)
The simplicity of this set of relations as compared to the similar rela
tionsEqs. (218)for events follows from the fact that the ioint prob
abilities for any K  1 of K experiments may be derived from the joint
probabilities of the outcomes of the K experiments. For example. sup
pose that we have M experiments for which Eqs. (219) are satisfied. Let
16 RANDOM SIGNALS AND NOISE
lL1 sum Up these equations over fl,..,. It then follows from Eq. (29). that
Nil
\" P[A (I) A (2) A CJI,] = P[A (1) A e,) ~ eMI))
/., "I' ftt ,..., "" "I , "",... ,n. ".....
",,I
On summing the righthand side of Eq. (219), remembering that
Nil
I P[..t(M)] = 1
nlll
we see that if Eq. (219) is satisfied, it then follows that the relation
P[A
(I) A (2) A (MI)] == P[A (l)]P[A (2)] PIA (MI)]
"1 , tit ,..., ".111 "I tit "111
is also satisfied. This result is simply Eq. (219) for M  1 experiments.
The process may be continued in order to show that, if Eq. (219) is
satisfied for M experiments, probability relations of the form of Eq. (219)
are satisfied for any K < M experiments.
26. Examples
Let us now consider a few examples involving the concepts introduced
in the preceding sections. These examples are of the socalled II com
binatorial" type and thus are typical of a large class of probability prob
lems. Feller (I) contains an excellent detailed discussion of problems of
this kind.
Emmple 18.1. Card DrAwing Consider the problem of drawing from a deck of
cards. The deck has 52 cards and is divided into 4: different suits, with 13 cards in
each suit, ranging from the two up through the ace. We will aa.ume that the deck
has been well shumed and that all the cards present are equally likely to be drawn.
Suppose that a single card is drawn from a full deck. What is the probability that
that card is the king of diamonds? We assumed above that the various events (A.)
representing the drawing of particular cards are all equally likely to occur, hence all
peA,) equal some number p. The drawings of different cards are mutually exclusive
events, hence
52
I peA;)  52p  1
iI
Therefore, 11  3  ~ 2 for any card, and in particular,
peking of diamonds)  ~ ~ 2
Suppose now that we ask, "What is the probability that the single card drawn is
king of anyone of the four euita?" Since there are four kings, and since these events
are mutually exclusive,
P(king)  P(king of spades) +Peking of hearts) +P(king of diamonds)
+P(king of clubs)
Hence:
P(king)  "2 ~ M.
In leDeral we lee that when we are dealiDI with a set of mutually exclusive basic
events, aUof which are equally likely, the probability of any event (basic or compoUDd)
PBOBABILITY 17
and
ia equal to the ratio of the Dumber of buic events .tisfying the conditions of the
8vent in question to the total number of poesible basic events.
Suppose next that we draw two carda from a full deck. What is the probability
that we have drawn a king and a queen, not necessarily of the same suit? This event
can occur in two ways: either a king may be drawn first and then a queen, or a queen
may be drawn first and then a king. In symbols:
P(king and queen)  P(king, queen) +P(queen, king)
From our discussion of conditional probability it follows that
P(king, queen)  P(queenlking)P(king)
P(queen, king) :II P(kinglqueen)P(queen)
Assuming that a king (queen) has been drawn, 51 cards remain in which are contained
all four queens (kings). Hence
P(queenlking)  ~ l II: P(kinglqueen)
and, using our previous results,
P(king and queen)  "1 ~ f 8 + ~ l ~ 3 .. "83
This result, of course, could alao have been obtained directly by taking the ratio of
the number of favorable basic events to the total number of possible basic events.
Bsamp" 18.'. Coi" To,ri",. Let U8 consider next coin tossings, for which we shall
tulUme that successive toBBings are statistically independent experiments. However,
we shall make no assumption as to whether the coin is fair or not and shan write
P(H)  p
and peT) .. q = 1  P
since a (head) and a (tail) are mutually exclusive events. Such tossings are known as
Bernoulli triaU.
Suppose that the coin is toSBed N times. What is the probability penH) that 11.
heads will appear? In order to answer this question, let us first consider a particular
sequence of N tosses in which "head" occurred n times. Since the eueceesive experi
ments making up this sequence were assumed to be statistically independent, the prob
ability of occurrence of our particular sequence p,(nH} is simply the product of the
probabilities of occurrence of" heads and (N  n) tails:
P,(ftH) _ p"qCN,,)
The particular sequence above is not the only possible sequence having" heads in
N tosses. The probability of obtaining anyone of the various possible sequences of
this kind is equal to the total number possible of such sequences times the probability
of obtaining a particular onesince we are dealing with mutually exclusive (com
pound) events all having the same probability of occurrence.
Let UI now determine the number of different possible sequences of N tosses, each
of which results in n heads. If the results of the N tosees were all different, there
would have been
N(N  1) 3 2 1  N!
di1ferent possible sequences. However, not all the results of the N tosses are different;
n are heads and (N  ta) are tails. Thus in the NI sequences, there are nl duplica
tions since it is not possible to tell one head from another, and there are (N  n)!
duplications since it is not possible to tell one tail from another. The total possible
Dumber of diJrerent sequences in which there are n heads in N tosses is therefore given
b7 the binomial eotliti""
18
(
N) NI
. ft  al(N . a)1
(110)
The total probability of obtaining anyone of the varioUl p088ible sequences of " head.
in N tosses is therefore
P(nH)  ( ~ ) pttq(N....) (221)
The set of probabilities corresponding to the various possible values of n(i.e., n .. 0,
1, . . . ,N) is known as the binomial di8tribution.
27. Problems
An extensive collection of problems covering the topics of this chapter
may be found in Feller (I, Chaps. 15).
1. Experiment A has three mutually exclusive possible outcomes (A.), with the
probabilities of occurrence P(A
m
) . Let the compound events (B) and (C) be defined
by
(B)  (A. or AI)
(0) == (A 1 or A
a
)
Determine the relation among PCB), pee), and PCB or C).
I. Experiment A baathree mutually exclusive possible outcomes (A.), aDd aperi.
ment B baa two mutually exclusive possible outcomes (B.). The joint probabilities
P(A""B.) are:
P(A1,B
1)
:III 0.2
P(A!,B
1
) = 0.1
P(Aa,B
1
} .. 0.1
P(A
1,Bt
) = 0.1
P(A"B
J
) .. 0.2
P(Aa,Bt )  0.3
Determine the probabilities P(A",) and P(B,.) for all values of m and ft.
a. For the experiments of Prob. 2, determine the conditional probabilities P(A",fB,,)
and P(B.IA.> for all values of m and n.
,. Show whether or not the experiments of Prob. 2 are statistically independent.
I. Let K be the total number of dots showing up when a pair of unbiased dice are
thrown. Determine P(K) for each possible value of K.
8. Evaluate the probability of occurrence of fI, head. in 10 independent to88ea of a
coin for each possible value of n when the probability of occurrence p of a head in a
single toss is ~ o .
7. Repeat Probe 6 for the case of an unbiased coin (p  ~ ) .
8. Determine the probability that at moB' ft < N heads occur in N independent
tosses of a coin. Evaluate for N .. 10, ft  5, and p .. ~ .
8. Determine the probability that at leaIt A < N heads occur in N independent
tosses of a coin. Evaluate for N ... 10, "  5, and p  ~ i .
10. Show the relation between the probabilities of Probe. 8 and 9.
11. The experiment A bas M mutually exclusive possible outcomes .A., and the
experiment B has N"mutually exclusive possible outcomes B". Show that P[f'.IA.l
may be expressed in terms of P[A",fB,,] and P[B,,] by the relat.ion
This relation is knoWll &8 B41/'" rule.
P(A.IB,.JP[B,,)
N
LP(A..IB,)P{B,)
iI
(222)
CHAPTER 3
RANDOM VARIABLES AND
PROBABILITY DISTRIBUTIONS
31. Deftnftions
Sample Points and Sample Spaces. In the preceding chapter, we dis
cussed experiments, events '(i.e., possible outcomes of experiments), and
probabilities of events. In such discussions, it is often convenient to
think of an experiment and its possible outcomes as defining a space
and its points. With each basic possible outcome we may associate a
point called the sample point. The totality of sample points, correspond
ing to the aggregate of all possible outcomes of the experiment, is called
the sample apace corresponding to the experiment in question. In
general, an event may correspond either to a single sample point or to a
Bet of sample points. For example, there are six possible outcomes of
the throw of a die: the showing of one, two, three, four, five, or six dots
on the upper face. To each of these we assign asample point; our sample
space then consists of six sample points. The basic event cca six shows"
corresponds to a single sample point, whereas the compound event U an
even number of dots shows" corresponds to the set of three sample points
representing the showing of two, four, and six dots respectively.
In many problems of interest it is possible to represent numerically
each possible outcome of an experiment. In some cases a single number
will suffice, as in the measurement of a noise voltage or the throw of a die.
In others a set of numbers may be required. For example, three num
bers are needed for the specification of the instantaneous position of a
moving gas particle (the three spatial coordinate values). When such
a numerical representation of outcomes is possible, the set of numbers
specifying a given outcome may be thought of as the coordinate values
of a vector which specifies the position of the corresponding sample point
in the sample space. Thus if K numbers are required to specify each
possible outcome, each sample point will have K coordinate values, and
the sample space will be a KdimenBional vector space. We will generally
confine ourselves to sample spaces of this. type,
The probability of a given event may now be thought of as 888igning
19
20 RANDOM SIGNALS AND NOISII
a weight or mass to the corresponding sample point (or set of sample
points). In our samplepoint and samplespace terminology, the proba
bility P(A) that the outcome of a given experiment will be the event (A)
may be expressed as the probability P(SA) that the sample point 8,
corresponding to the outcome of the experiment, falls in the subset of
sample points SA corresponding to the event (A):t
Random Variable. Arealvalued function %(8) defined on a sample space
of points 8 will be called a random v a r i a b l e ~ if for every real number a
the set of points 8 for which %(8) ~ a is one of the class of admissible sets
for which a probability is defined. This condition is called measurability
and is almost always satisfied in practice. A complexvalued function
Z(8) = %(8) +jY(8) defined on a sample space will be called a complex
randomvariable if X(8) and Y(B) are both measurable. Similarly, a func
tion which assigns a vector to each point of a sample space will be called
a vector random variable or a random vector.
It was pointed out above that a sample space representing the out
comes of the throw of a die is a set of six points which may be taken
to be the integers 1, .. ,6. If now we identify the point k with the
event that k dots show when the die is thrown, the function x(k) = k is a
random variable such that z(k) equals the number of dots which show
when the die is thrown. The functions g(k) == k
l
and h(k) == exp(k
2
)
are also random variables on this space.
Another example of a random variable is the realvalued function of a
real variable whose value represents a noise voltage as measured at a
given instant of time. Here one takes the real line as a sample space.
A related variable is that defined" to be unity when the noise voltage
being measured lies between V and V + AV volts and defined to be zero
otherwise. It should be realized that a function of a random variable
is a random variable.
32. Probability Distribution Functions
Consider now the real random variable X(8) such that the range of x
is the real line (i.e.,  00 ~ z S + 00). Consider a point X on the real
line. The function of X whose value is the probability Pi ~ X) that
the random variable x is less than or equal to X is called the probability
distribution junction of the random variable z. Since probabilities are
always bounded by zero and one, the extremal values of the probability
distribution function must also be zero and one:
t The notation, I SA means that the point' is an element of the point set SA.
t The use of the term "random variable" for a fUDction is dictated by tradition.
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 21
P(x S  co) = 0 and P { ~ S +00) == 1 (31)
It further follows that the probability that the random variable x falls in
the interval a < x ~ b is simply the difference between the values of the
probability distribution function obtained at the endpoints of the interval:
P(x 5 b)  P(x s a) = P(a < x 5 b) ~ 0 (32)
The righthand inequality follows from the nonnegativeness of proba
bilities. Equation (32) shows that P(x s b) ~ P(x ~ a) when b ~ a
and hence that the probability distribution function is a nondecreasing
function of X.
Paralleling the single variable case, we may now define a joint proba
bility distribution function by P(x ~ X,y ~ Y}, the probability that the
random variable x is less than or equal to a specified value X and that
the random variable y is less than or equal to a specified value Y. Such
a distribution function may be defined whether or not z and yare random
variables on the same sample space. Furthermore, whether x and yare
considered to be two separate onedimensional random variables or the
components of a twodimensional random variable is immaterial. In
either case, the joint sample space is a twodimensional sample space
(the xy plane), and the joint probability distribution function is the
probability that a result of an experiment will correspond to a sample
point falling in the quadrant ( 00 ~ x ~ X,  00 ~ y ~ Y) of that
sample space.
The extremal values of the joint probability distribution function are
obviously
and
P(x ~  oo,y ~ Y) = 0 = P(x S X,Y ~  oo}
P(x ~ +00 ,y ~ +00) = 1
(33a)
(33b)
as each of the first extremes represents an impossibility, whereas the
second represents a certainty.
The probability that the random variable x is less than or equal to a
specified value X whereas the random variable y assumes any possible
value whatever is simply the probability that x ~ X irrespective of the
value of y. The latter probability is, by definition, the value at X of the
probability distribution function of the random variable x. Thus
P(x s X,Y s +00) = P(x s X) (34a)
Geometrically, this is the probability that a sample point falls in the
halfplane ( 00 ~ x ~ X). Similarly,
P(x ~ +oo,y s Y) = P(y s Y) (34b)
Thus we see that, as with the joint probabilities, the joint probability
distribution function determines the probability distribution functions of
22 RANDOII SIGNALS AND NOISE
the component variables. The distribution functions of the component
variables are often designated as the marginal distribution function,.
These definitions and results may be extended in a more or less obvious
way to the case of Xdimensional random variables.
33. Discrete Random Variables
We will call the random variable x a discrete random variable if z can
take on only a finite number of values in any finite interval. Thus, for
0.2
The probability distribution
(a)
0.2
%1 %2 %3 %4 %5 X
The probability distiibutionfunction
(b)
FIG. 31. A discrete probability distribution and probability distribution function.
example, the random variable defined as the number of heads appearing
in N tosses of a coin is a discrete random variable. The complete set of
probabilities P(Xi) associated with the possible values x, of x is called the
probability distribution of the discrete random variable z, It follows from
the definition of the probability distribution function that in the discrete
case
and hence that
P(x s X) = LP(x,,)
ZiSX
P(x s + 00) = I P(x,,) = 1
all ,
(35)
(36)
The probability distribution and associated probability distribution fUDC
tion for one particular discrete random variable are shown in Fig. 31.
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 23
%2 %3
The joint probability distribution
(a)
p(%<X,y(Y)
%2 %3.
The Joint probability distribution function
(b)
Flo. 32. A twodimensional discrete probability distribution and corresponding prob
sbili ty distribution function.
If the probability distribution of a twodimensional discrete random
variable is given by the set of probabilities P(z",y".), the corresponding
joint probability distribution function is given by the expression
Therefore
P(x s X,y ~ Y) = l l P(x",y..)
"sx .",sy
(37)
P(x s +ClO ,y s +ClO) == l l P(x",y..) == 1 (38)
an i all til
An example of the joint probability distribution and the corresponding
joint probability distribution function of a twodimensional discrete ran
dom variable is shown in Fig. 32.
It follows from our definition of a sample point that distinct sample
points correspond to mutually exclusive events. The various results
24 RANDOM SIGNALS AND NOISE
obtained in Chap. 2 for such events may therefore be applied directly
to the study of discrete random variables. Thus Eq. (36) follows from
Eq. (25), and Eq. (38) follows from Eq. (28). The relations between
the joint probability distribution and the marginal distributions for dis
crete random variables follow directly from Eq. (29), and are
P(x,,) = I P(x",y..)
allm
and
P(y..) = I P(X"JY")
all'
(39)
Similarly we see from Eq. (212) that
P(X",Ym) = P(YmIX")P(Xi) = P(x"IY",)P(Ym)
and from Eq. (214) that
I P(x"IY..) = 1 = 1P(Y..lx,,)
all i aUtn
(310)
(311)
34:. Continuous Random Variables
Not all random variables are discrete. For example, the random
variable which represents the value of a thermal noise voltage at a
specified instant of time may assume any value between plus and minus
infinity. Nevertheless it can be shown that, since a probability dis
tribution function is a bounded, nondecreasing function, any probability
distribution function may be decomposed into two parts:t a staircase
function having jump discontinuities at those points X for which
P(z = X) > 0 (i.e., a discrete probability distribution function as shown
in Fig. 31), and a part which is everywhere continuous. A random
variable for which the probability distribution function is everywhere
continuous will be called a continuous random variable. Thus we see that
any random variable may be thought of as having 8. discrete part and a
continuous part.
Probability Density Functions. Any continuous distribution function
can be approximated as closely as we like by a nondecreasing staircase
function, which can then be regarded as the probability distribution func
tion of a discrete random variable. Thus a continuous random variable
can always be approximated by a discrete one. However, a more direct
method of analysis is made possible when the probability distribution
function is not only continuous but also differentiable with a continuous
derivative everywhere except possibly at a discrete set of points. In
such a case we define a probability density function p(x) as the derivative
of the probability distribution function
t See Cram6r (I, Arts. 6.2 and 6.6).
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 25
such that
(
X) = dP (x S X)
P dX
P(x ~ X) = J . ~ . p(x) dx
(312)
(313)
Although it is necessary to realize that there exist continuous random
variables which do not have a probability density function, t we may
generally safely ignore such pathological cases.
From the definition of the derivative as a limit, it follows that
(X)
 1 P(x ~ X)  P(z ~ X  4X)
P  im AX
AX....0 u
or, using Eq. (32),
p(X) = lim P(X  aX < x ~ X)
AX+O sx
We may therefore write in differential notation
p(X) dX = P(X  dX < x ~ X)
(314a)
(314b)
(315)
and interpret p(X) dX as the probability that the random variable x has
a value falling in the interval (X  dX < x S X). Since the proba
bility distribution function is a nondecreasing function, the probability
density function is always nonnegative:
p(x) ~ 0 (316)
An example of the probability distribution function and the correspond
ing probability density function of a continuous random variable is
shown in Fig. 33.
From Eq. (32) and Eq. (313) it follows that the probability that a
continuous random variable has a value falling in the interval (a < x ~ b)
is given by the integral of the probability density function over that
interval:
Pea < x s b) = L" p(x) dx (317)
In particular, when the interval in question is the infinite interval
( 00 ,+ 00), we obtain, using Eq. (31),
J+: p(x) dx = 1 (318)
This result is the continuousrandomvariable expression of the fact that
the probability of a certain event is unity. Since by definition the prob
ability distribution function of a continuous random variable has no
jumps, it is apparent that, for any Zo
t Bee. for example, Titcbmarsh (I, Art. 11.72).
26 RANDOM SIGNALS AND NOISE
P(x = x
o
) = 0
(319)
That is, the probability that a continuou8 random variable takes on any
specific value is zero. Obviously, however, this event is not impossible.
Impulse Functions. It would be convenient to use a single system of
notation to cover both the case of the discrete random variable and that
of the continuous random variable and thus to handle in a simple way
the situation in which we must deal with mixed random variables having
p(X)
x
Probability density function
(a)
x
Probability distribution function
(b)
FIG. 33. An example of the probability density function and probability distribution
function of a continuous random variable.
both discrete and continuous components. These aims may be accom
plished either by introducing a more general form of integration] or by
admitting the use of impulse functions and ignoring the cases in which
the continuous part of the distribution does not have a derivative. We
shall use the second of these methods.
Let us now apply the limiting expression for the probabilitydensity
function to a discrete random variable. Suppose, for convenience, that
t Specifically the Stieltjes integral. An averaging operation with respect to any
probability distribution can be written in terms of Stieltjes integrals even when the
continuous part of the probability distribution function does not possess a derivative.
See C r a m ~ r (I, Chap. 7).
if X = x",
otherwise
HANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 27
the discrete random variable in question has M possible values X,lI with
probabilities P(x".). Then
lim P(X  AX < x s X) = { OP(XM)
AX+O
Using this result in Eq. (314b) we see that
p(X) = {;
if X = x",
otherwise
Further, if E is an arbitrarily small positive number (and smaller than
the smallest distance between adjacent values of x
m
) , we also have the
result
X +e {P( )
f p(x) dx = 0 x..
XI
ifX=x.
otherwise
(320)
Appendix 1 contains an account of some of the properties of impulse and
step. functions. Reference to this material shows that the results pre
sented above are just the defining equations for impulse functions.
Thus the probability density function of a discrete random variable may
be thought of as consisting of an impulse of weight (i.e., area) P(x..) at
each of the possible values x. of that variable. We may therefore write
M
p{x) = l P(x..)8{x  x.)
.1
as the probability density function for a discrete random variable which
has M possible values x. with probabilities P(x
fft
) . The function cS(x 
x
m
) in Eq. (320) is the unit impulse function centered upon z = x.
The probability distribution function for a discrete random variable
may now be obtained by substituting Eq. (320) into Eq. (313). Inter
changing order of integration and finite summation, we find that
AI
P{::c s X) = l P{x..) f:. 8{x  x..) dx
mI
As shown in Appendix 1, the integral in this equation is the unit step
function U(X  xm ) defined by the relations
. { 1 when X ~ Xm
U(X  x",) = 0 otherwise
Thus we may write
M
P(x s X) = l P(x..) U(X  x..)
",1
(321)
28 RANDOM SIGNALS AND NOISE
as the probability distribution function of our discrete random variable.
The graph of such a probability distribution function has the form of a
staircase, as shown in Fig. 3lb.
We have shown above that we may extend the concept of the proba
bility density function to cover the case of the discrete random variable
by admitting the use of impulse functions. Henceforth we will use,
therefore, the probability density function, if convenient, whether we
are concerned with continuous, discrete, or mixed random variables.
Joint Probability Density Functions. In the case of a single random
variable, the probability density function was defined as the derivative
of the probability distribution function. Similarly, in the twodimen
sional case, if the joint probability distribution function is everywhere
continuous and possesses a continuous mixed second partial derivative
everywhere except possibly on a finite set of curves, we may define the
joint probability densityfunction as this second derivative
Then
a
2
p(X,Y) = ax aY P(x ~ X,y s Y)
P(x s X,y s Y) = J ~ . J ~ . p(x,y) dx dy
(322)
(323)
As before, we will ignore the pathological cases and restrict ourselves to
those cases in which the partial derivatives exist either in the usual Hense
or in the form of impulse functions (e.g., in the case of discrete random
variables) .
From the limiting definition of the partial derivative, we may write
1 [P{X ~ X,y s Y)  P(z s X  4X,y s V)]
p(X,Y) = lim dX 4Y P(x ~ X,y ~ Y  4Y)
:J::8 +P{x ~ X  4X,y ~ Y  aY)
and hence
(X Y)
= I P{X !J < z ~ X,Y l1Y < Y ~ Y)
p, J . ~ o 4X 4Y
AY'O
or, in differential notation,
p(X,Y) dX dY = P{X  sx < x ~ X,Y  dY < y ~ Y) (324)
Thus p(X,Y) dX dY may be interpreted as the probability that a sample
point falls in an incremental area dX dY about the point (X,Y) in our
sample space. It follows that the joint probability density function is
always nonnegative,
p(z,y) ~ 0
(325)
since the joint probability distribution function is a nondecreasing func
tion of its arguments.
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 29
From our definition of the joint probability density function, it follows
that the probability P(a R) that a sample point 8 falls in a region R
of the sample space is given by the integral of the joint probability
density function over that region:
P(8 e R) = JJp(x,?/) dx dy
R
(326)
In particular, when the region in question is the entire sample space (i.e.,
the entire xy plane), we obtain
t: t:
_. _. p(x,y) dx dy = 1 (327)
as we are concerned here with a certainty. On the other hand, if we
allow only one of the upper limits to recede to infinity, we obtain, on
application of Eqs. (34),
and
J
+. t:
_. _. p(x,y) dx dy = Pi ~ X)
f
y t:
_. __ p(x,y) dx dy = P(y ~ Y)
(328a)
(328b)
The derivatives of the righthand sides of these equations are simply the
probability density functions p(X) and p(Y) respectively. Therefore,
by differentiating both sides of Eq. (328), we obtain
and
t:
_ _ p(X,y) dy = p(X)
t:
__ p(x,Y) dx = p(Y)
(329a)
(329b)
since here the derivative of an integral with respect to the upper limit is
the integrand evaluated at that limit. Eqs. (329) are the continuous
randomvariable equivalents of the discreterandomvariable Eqs. (39).
As with probability distribution functions, the various definitions and
results above may be extended to the case of kdimensional random
variables. t
Conditional Probability Density Functions. Let us consider now the
probability that the random variable y is less than or equal to a par
ticular value Y, subject to the hypothesis that a second random variable
:z; has a value falling in the interval (X  I1X < x ~ X). It follows
from the definition of conditional probability, Eq. (211), that
P( < YIX  l1X < < X) == P(X  AX < z ~ X,y ~ Y)
y  %  P(X  AX < z ~ X)
t See for example Cram6r (I, Arta. 8.4 and 2 ~ . 1 ) .
30 RANDOM SIGNALS AND NOISE
The probabilities on the righthand side may be expressed in terms of
the probability densities defined above. Thus
z
f ~ . / p(:t:,y) dz dy
P(y ~ YIX  AX < ~ ~ X) == __z'""='=x
6
:
X
_
/ p(:t:) dz
%6%
Assuming that the probability densities are continuous functions of :a: in
the stated interval and that p(X) > 0, the integrals over x can be replaced
by the integrands evaluated at x = X and multiplied by aX as AX + O.
The incremental width I1X is a factor of both the numerator and the
denominator and therefore may be canceled. The limit of the lefthand
side as AX . 0 becomes the probability that y ~ Y subject to the
hypothesis that the random variable z assumes the value X. Thus we get
f ~ . p(X,y) dy
P(y s YIX) = p(X) (330)
The probability P(y :s; YIX) defined here will be called the conditional
probability distribution Junction of the random variable y BUbject to the
hypothesi8 that z == X.
If now the usual continuity requirements for the joint probability den
sity are satisfied, we may define a conditional probability density function
p(YIX) to be the derivative of the conditional probability distribution
function:
p(YIX) = dP (yd ~ YIX)
Then P(y s YIX) = f ~ . p(yIX) dy
Differentiating Eq. (3.30) with respect to Y, we get
p(YIX) = p(X,Y)
p(X)
or, in product form,
(331)
(332)
(333a)
p(X,Y) = p(YIX)p(X) (333b)
This result is the continuousrandomvariable version of the factoring of
a joint probability into the product. of a conditional probability and an
elementary probability.
As with all distribution functions, the conditional probability distribu
tion function P(1/ ~ YIX) is a nondecreasing function (of Y). Therefore
the conditional probability density function is nonnegative:
p(YIX) ~ 0 (334)
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 31
From Eq. (332) it follows that the probability that y has a value falling
in the interval (a < y ::; b), subject to the hypothesis that z == X, is
given by the integral of the conditional probability density function over
that interval:
P(a < y s blX) = L" p(yIX) dy
If now the limits recede to infinity, we obtain
f
+ CIO
_. p(ylX) dy = 1
(335)
(336)
8S here we have a certainty.
The conditional probability density function p(XIY) may, of course,
be defined in the same way as was p( YIX), and corresponding results
may be obtained for it.
36. Statistically Independent Random Variables
In Art. 25 we defined experiments to be statistically independent when
their joint probabilities were expressible as products of their respective
elementary probabilities. Thus if the experiments A and B have K and
M mutually exclusive outcomes Ai and B
m
respectively, we called them
statistically independent if the relations
P(Al:,B
m
) = P(Ai)P(B.)
were satisfied for all values of k and m. Suppose now that we charac
terize the outcomes of the experiments A and B by the discrete random
variables x and y so that x assumes the value x" whenever the event A,:
occurs and y assumes the value Ym whenever the event B occurs. Then
P(Xl:) = peA,,), P(Ym) = P(Bm) , and P(x':,Ym) = P(A",B
m)
We now say that x and y are statistically independent random variables,
if and only if the experiments they characterize are statistically inde
pendent. Thus, the discrete random variables x and y, having K values
XI: and M values Ym respectively, are said to be statistically independent
if and only if the relation
P(Xk,Ym) = P(X")P(Ym) (337)
is satisfied for all values of k and m.
From Eqs. (37) and (337) we find that
P(x s X,y :::; Y) = L LP(XIo)P(y",)
ZtS
X
II.SY
when x and yare statistically independent. Then, using Eq. (35), it
follows that the equation
P(x s X,y ~ J.
T
) = ]l(X ~ X)P(y s Y) (338)
32 RANDOM SIGNALS AND NOISE
is satisfied for all values of X and Y so long as z and yare statistlbally
independent random variables. Conversely, it is easy to show that if
Eq. (338) is satisfied for all X and Y, then Eq. (337) is satisfied for all
k and m. Thus the discrete random variables z and yare statistically
independent if and only if Eq. (338) is satisfied for all values of X and Y.
Although Eq. (337) applies only to discrete random variables, Eq.
(338) may be satisfied whether the random variables x and yare discrete,
continuous, or mixed. We will therefore base our general definition of
statistically independent random variables on Eq. (338). Thus:
DEFINITION. The random variables x, y, .. , and z are said to be sta
tistically independent random variables if and only if the equation
P(x s X,y ~ Y, ... ,z s Z)
= P(x ~ X)P(y ~ Y) P(z ~ Z) (339)
is satisfied for all values of X, Y, .. , and Z.
Suppose now that the random variables x and yare continuous random
variables and that the various probability densities exist. Eq. (338)
may then be written
J . ~ o o J:oo p(z,y) dx dy = J:oo p(z) dz J
Yoo
p(y) dy
If now we take the mixed second partial derivative of both sides (with
respect to X and Y), we get
p(X,Y) = p(X)p(Y)
Conversely, if we integrate this result, we can obtain Eq. (338). The
factoring of the probability densities is therefore a necessary and suf
ficient condition for the statistical independence of x and y. Similarly
it may be shown that the random variables x, y, ... , and z are sta
tistically independent if and only if the equation
p(x,y, . . . ,z) = p(x)p(y) p(z)
is satisfied for all values of z, y, . . . , and z,
(340)
36. Functions of Random Variables
One of the basic questions arising in applications of probability theory
is, U Given a random variable x and its probability functions, what are
the probability functions of a new random variable y defined as some
function of z, say y = g(x)?" This question arises, for example, in the
determination of the probability density function of the output of the
detector in a radio receiver. In this section we shall attempt to show
how such a question may be answered.
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 33
Single Variable Case. Suppose that x is a real random variable. We
can take it to be the coordinate variable of the sample space 8., which
consists of the real line with some assignment of probabilities. Let the
points in Sz be denoted by 8. Then x is the function x(s) = 8. Let y
be a singlevalued real function of a real variable, and consider the func
tion of 8 given by y(x(s)]. Since for each 8, Y[X(8)] is a real number,
y[x(s)] is (subject, of course, to the condition that it be measurable) a new
random variable defined on the sample space Ss. On the other hand,
y(x) could also be considered as providing a mapping between Sz and a
new sample space S", where SJI is the set of real numbers which consti
tutes the range of values taken on by y[x(s)] as 8 ranges over Sz. Then
if A is a set of points in Sz, there corresponds to A a set of points B in S"
given by the condition: 8
0
E A if and only if Ylx(so)] E B. Hence
P(s A) = P[x(s) A] = P{y[x(s)] eBJ
Since we usually suppress the sample space variable 8 in writing expres
sions with random variables, we may write Pi E A) = P{y(x) E B]. This
result establishes the fundamental relation between the probability func
tions of x and those of y(x). It is important to note that it doesn't
matter whether we regard y(x) as a random variable on Sz or y as the
coordinate random variable on 81/.
The probability distribution function of y may now be obtained by
choosing B to be the semiinfinite interval ( 00 < y ~ Y). If then
A (Y) is the set of points in Sz which corresponds to the set of points
(  ex> < Y :5 Y) in s; we obtain
P(y s Y) = P[x e A(Y)] (341)
as the relation between the probability distribution function of y and
a probability function of z. The probability density function of y
may then be obtained by differentiation:
(Y)
= dP [x A(Y)]
P dY
(342)
Suppose now that the probability density function of x, Pl(X), exists
and is continuous almost everywhere, that y is a differentiable monotonic
function of z, and that its derivative dy/ dx vanishes only at isolated
points. In this case we can obtain a direct relation between Pl(X) and
the probability density function of y, P2(Y), for it follows from our assump
tions that x is a singlevalued function of y. Thus to the point Y in 8"
there corresponds a unique point X in Sz, and the mapping is said to be
onetoone. Assuming, for the moment, that 'Y is a monotonically increas
ing function of X, it then follows that the interval ( 00 <z ~ X(Y) in
34 RANDOM SIGNALS AND NOISE
S. corresponds to the interval ( 00 < y ~ Y) in Sr. Using this fact in
Eq. (342) we get
d jX(Y) dX
P2(Y) = dY _. Pl(X) dx = Pl(X) dY
Since a similar result (except for 8 minus sign) is obtained when y is a
monotonically decreasing function of x, we can write
(343)
(344) y == Xl
Exa1npleS6.1. The technique of deter
mining the probability density function of
a function of a random variable can per
haps best be understood by studying a
specific example, Consider the case of
the squarelaw transformation
x
+VY VY
FlO. 34. The squarelaw transformation
function.
as the desired relation between the probability density functions of y
and s:
If the various conditions specified above are not satisfied, incorrect
results may well be obtained through the use of Eq. (343). For example,
suppose that y is constant over a certain interval in Sz (as would be the
case for a halfwave rectifier); then
x is not a singlevalued function of y
and Eq. (343) is meaningless as it
stands.
which is plotted in Fig. 34. This trans
formation function would be obtained, for
example, in the study of a fullwave squarelaw detector. An examination of Fig. 34
shows, first of all, that 11 can never be negative. Therefore
and hence
P(y ~ Y)  0
PI(}') .. 0
for Y < 0
for Y < 0 (345)
and hence
Further reference to Fig. 34 shows that A (Y) is the set of points ( vy ~ x ~
+ y'Y) when Y ~ o. Therefore
P(1I ~ Y) I:a P[ v'Y ~ x s v'YJ
.. P[x ~ + VYJ  P[x <  ,/}')
PC" S Y)  f+..~ Y PIC:/:) ch  t:..~ Y p,(x) dx
If now we take the derivative of both sides with respect to I", we obtain
(y) = p,(x  + v'Y> +Pl(X   v'Y>
'PI 2 v'Y
for Y ~ 0 (346)
RANDOM VARlABLJ8 AND PROBABILITY DISTRIBUTIONS 35
The coxnbinatioD of Eq. (345) and Eq. (346) then live! the required probability
density function for 11 in terms of that of z.
If next we specify the form of p(z), say &8 the gaussian probability density
()
exp( zl/2)
PI z =
we may evaluate Eqs. (345) and (346) and get an explicit result
{
ex
p
(1//2) for y 0
1',(1/)  (347)
for JI < 0
The probability density function given by Eq. (347) is called a cAi,quared density
function when written &8 a function of 11  Xl. The densities ,.(:e) and PI(Y) are
shown in Fig. 35.
p
1.0
0.9
0.8
x"
2 3 4 5
"
5 4 3 2 1
I
0.7 \
,
0.6 ,
,
0.5 \
0.4 \
y 1'2111 V2iY
FIG. 35. Gaussian and chisquared probability density functions.
Multiple Variable Case. The ideas involved in determining the proba
bility functions for functions of multiple random variables are the same
as those in the single variable case. Thus if ,XN) are a set of
real random variables whose joint probability functions are known, the
equations
y", = g",(xt,Xt, ... ,XN) m z:: 1, 2, ... ,M
define a mapping from the sample space of the random variables x" to
the sample space of the random variables y... The probability functions
of the random variables y.. may then be found by determining the point
sets in the sample space of the y. which are the mappings of given point
sets in the sample space of the x"' and equating probabilities. Note that
36 RANDOM SIGNALS AND NOISE
the number of random variables y. need not equal the number o random
variables X"e
Bzampl.88.I. Suppose that a random variable, is defined as the sum of the real
random variables x and 1/:
2 == g(x,y) == Z +11 (348)
and that the joint probability functions for z and 'Yare known. The problem is to
determine the probability density function 1'.(.) of the random variable e.
Since z and 11 are real random variables, their joint sample space may be taken as
the (x,y) plane consisting of the points ( 00 :S z S +00,  00 ~ y S +00). Also,.
since z and 'Yare real, 2 must be real, too, and its sample space may be taken as the real
line ( 00 :S z ~ too). As usual, we shall determine the probability density func
tion by taking the derivative of the probability distribution function. Hence we
y
FIG. 36.
need to know the set of points in the (z,1/) plane which corresponds to the set of points
(2 ~ Z) on the real z line. It follows from the definition of z that the points in ques
tion are 'those located in the halfplane (y S Z  z) which is shown in Fig. 36.
Thus
p(, S Z) == P(y S Z  x)
If the joint probability density function for x and 1/ exists and is continuous, we may
write
J
+00 JZ:C
P(z S Z) = _. _to p(x,y) dydx
and, on differentiating with respect to Z, we get
J
+OO
pa(Z) == _. p(z,'IJ == Z  z) dx
(349)
If now z and 11 are statistically independent random variables, their joint probability
density function is given by the product of their individual probability density func
tions. Hence in this case,
(350)
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 37
The probability density function of the sum of two statistically independent random
variables is therefore given by the convolution of their respective probability density
functions.
The Jacobian. As in the corresponding singlevariable case, a direct
relationship may be obtained between the probability density functions
of the old and new random variables when they are related by a one
toone mapping. For example, suppose that not only are the new
random variables y", defined as singlevalued continuous functions of
the old random variables x" (with continuous partial derivatives
everywhere) :
m = 1,2, ... ,N (351)
but also the old random variables may be expressed as singlevalued
continuous functions of the new:
n = 1,2, ... ,N (352)
(where the numbers of old and new variables are now equal). In this
case, to each point in the sample space of the x" there corresponds one
and only one point in the sample space of the y"" and the functional
relations between the old and new random variables provide a onetoone
mapping.
Further, suppose that A is an arbitrary closed region in the sample
space of the x" and that B is its mapping in the sample space of the y...
The probability that a sample point falls in A then equals the proba
bility that its mapping falls in B. Assuming that the joint probability
density function of the x" exists, we may write
J JPl(Xl, ,XN) d3:
1
d3:
N
== J JP2(YI, ,YN) dYl dYN
B
The problem before us now is simply that of evaluation of a multiple
integral by means of a change of variables. It therefore follows from
our assumed properties of the mapping that t
f JPI(Xl, ,XN) dXI rhN
A
== J JPI(XI =ii, ,XN = !N)IJI dYl dYN
B
t E.g., see Courant (I, Vol. 2, pp. 247254).
38 RANDOM SIGNALS AND NOISE
in which the f". are given by Eq. (352) and in which J is the Jacobian
of the transformation
aft alN
, .
'aYI
Yl
J = O(XI,
,XN) _
(353)
a(YI, ,YN) 
afl afN
,.
'aYN aYN
The old and new joint probability density functions are therefore related
by the equation
P2(YI, ,YN) = PI(XI = /1, ... ,XN = IN)IJI (354)
When N = 1, Eq. (354) reduces to the result previously derived in the
singlevariable case, Eq. (343).
37. Random Processes]
Up to this point we have not mentioned the possible time dependence
of any of our probability functions. This omission was deliberate, for in
many cases time enters the picture only implicitly, if at all, and has no
particular bearing on the problem at hand. However, there are many
cases of interest in which the time dependence of the probability func
tions is of importance, as, for example, in problems involving random
signals and noises. We will here attempt to show how our previous
notions of probability and random variable may be extended to cover
such situations.
One of the basic ideas involved in our concept of probability is the
determination of the relative frequency of occurrence of a given outcome
in a large number (+ 00) of repetitions of the basic experiment. The
idea of "repetition" of an experiment commonly implies performing the
experiment, repeating it at a later time, repeating it at a still later time,
etc. However, the factor of importance in a relativefrequency determi
nation is not performance of experiments in time succession but rather
performance of a large number of experiments. We could satisfy our
requirements by simultaneously performing a large number of identically
prepared experiments as well as by performing them in time sequence.
Let us now consider a sequence of N throws of a die. Suppose that
on the nth throw ken) dots show on the upper face of the die; k(n) can
assume any integer value between one and six. An appropriate sample
t A detailed discussion of the various possible definitions of a random process is
beyond the scope of this book. For a thorough treatment see Doob (II, Chap. I,
Arts. 5 and 6; Chap. II, Arts. 1 and 2). Sec also Grenander (I, Arts. 1.21.4).
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 39
space for the nth throw therefore consists of the six points 1, . , 6,
and an appropriate random variable is ken) itself. Similarly, an appro
priate joint sample space for a sequence of N throws is the Ndimensional
vector space in which the sample point coordinates are k(l), k(2), ... ,
k(N), and the appropriate random variable is the position vector of the
sample point. As the number of throws in a sequence increases, so does
the dimensionality of the joint sample space. The probability that k(l)
dots will appear on the first throw can be estirnated from the results of
simultaneously throwing a large number of dice by determining the corre
sponding relative frequency. Similarly, by repeating the process, we can
estimate the probability that k(n) dots will appear on the nth throw and
also the joint probability that k(l) dots will appear on the first throw,
that k(2) dots will appear on the second throw, ... , and that keN) dots
will appear on the Nth throw, The specification of such a set of experi
ments with the corresponding probability functions and random varia
bles (for all N) is said to define a random (stochastic) process and, in
particular, a discreteparameter random process. t
Consider next a particular sequence of throws of a die. The numbers
ken) obtained in this sequence of throws are called sample values and
may be thought of as defining a particular function of the index n. Such
a function, several examples of which are shown in Fig. 37a, is called a
sample function of the random process in question. The set of all possi
ble such sample functions, together with a probability law, is called the
ensemble of the sample functions. It will often be convenient to consider
a random process to be defined by its ensemble of sample functions as well
as by the random variables and probability functions discussed above.
The concepts introduced in our discussion of a discreteparameter ran
dom process may be extended to apply to a continuousparameter random
process in a more or less obvious manner. For example, consider the
thermalagitation noise voltage x(t) generated in a resistor. In particu
lar, consider the measurement x(t
l
) of this voltage at an instant of time t
l
This voltage could take on any value between plus and minus infinity.
The sample space corresponding to our measurement would then be the
real line ( 00 ~ x ~ +(0), and an appropriate random variable Xl
would be the measured voltage. If we had available a large number of
identical resistors, we could simultaneously measure a set of values x(t
1
)
and from these data determine the relative frequency of occurrence of
events of the type [X  aX < x(t
1
) ~ X], The relative frequencies so
obtained would provide measures of the probabilities P[X  ax <
Xl ~ Xl and hence of the probability density function P(Xl). Similarly,
we could make measurements at N instants of time, say t
1
through tN,
giving sample values of the random variables Xl through XN. and obtain
t The discrete parameter being the IC throw" index n.
40 RANDOM SIGNALS AND NOISE
1 2 3 4 5 6 1 8
i(nl
,.e,
,. , ,._.
,./, :"
/ , *'
2 e e e " .",*'
.*'
6
4
1 2 3 4 5. 6 1 8 1J,
i(nl
6
4
,., _.,
I \ .....  ", _,. .....
/, // .......
, \ /
2 " '/
'./
t
"
i
I
r'
I I
L...I
1 2 3 4 5 6 1 8
Typical sample functions of a discreteparameter random process
JsltI (a)
,..,
,.J i ! !
: I '
I I
I I
U
 .
Typical sample functions of a continuousparameter random process
(b)
FIG. 37.
a measure of the joint probability density function P(Zl, ,ZN). The
specification of all such sets of random variables and of their joint proba
bility distribution functions for all values of N defines a continuoU8
parametert random proce,s. The output voltage, as a function of time,
of a particular resistor would then be a sample function of our continuous
t "Continuous parameter" refers to the fact that the parameter' may assume any
real value and has nothing to do with the fact that the random variables in questIon
may be continuous random variables.
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 41
parameter random process. Typical examples of such sample functions
are shown in Fig. 37b. The ensemble of all possible such sample func
tions could, as in the discreteparameter case, be thought of as defining
the random process.
Probability Relations. Once we have defined a random process and its
various probability functions and random variables as outlined above,
we may apply directly all the probability relations we have previously
defined and derived. Thus, if Xl through XN are the random variables
pertaining to the random process in question, the probability density
functions P(Xl,X2, ... ,ZN) have the following properties. First of all,
(355)
(356)
for all values of N, since all probability densities must be nonnegative.
Further, we must have the relation
J
+IO J+IO
_ _ 10 P(Xl,X2, . ,XN) dXl tkN = 1
corresponding to the fact that the probability that any possible event
occurs is unity. Also, we must have the relation, following from Eq.
(329),
f
+ f +" P(Zl, ,Xi,Xk+l, ,XN)dXk+1 tkN
II II
= P(Xl, ,Xi) k < N (357)
The conditional probability density functions are given, as usual, by
(358)
and must satisfy the relation
J+: p(xNlxh ,XNl) dxN = 1
(359)
as again we are dealing with a certainty.
Suppose that Xl through XN are the random variables pertaining to
one random process and Y1 through YII are the random variables per
taining to another random process. The two processes are said to be
8tatistically independent random processes if
p{Xl, ,XN,Yl, ,UM) = p{xt, . ,XN)P(Y1, ,YII) (360)
for all sets of random variables x" and y", and for all values of Nand M.
Stationary Random Processes. Let us now reconsider the various out
puts of the large number of identically prepared thermalagitation noise
voltage sources discussed above. Just as we could previously estimate
the probability density function p(x,.) fr.om the data obtained at the time
RANDOM SIGNALS AND NOISE
instant t
1
, so now we can estimate the probability density function p(x'a+')
from data obtained at a different time instant t
1
+ t. Similarly, just as
we estimate the joint probability density function p(x'lJ . . . ,X'N) from
the data obtained at the N instants of time t
1
through tN, so may we now
estimate the joint probability density function p(x'a+" ,X'N+') from
the data obtained at the N instants of time (t
1
+ t) through (tN + t),
Obviously one of two situations exists: either these two sets of proba
bility density functions are identical or they are not. If the probability
functions corresponding to the time instants t" are identical to the proba
bility functions corresponding to the time instants (t" +t) for all values
of Nand t, then those probability functions are invariant under a shift
of the time origin. A random process characterized by such a set of
probability functions is said to be a stationary random process; others are
said to be nonstationary.
As an example of a stationary random process, consider the thermal
agitation noise voltage generated in a resistor of resistance R at a tem
perature T, which is connected in parallel with a condensor of capacitance
C. In our later studies, we shall see that the probability functions of
the noise voltage developed across the parallel RC combination are com
pletely determined by the values of R, T, and C. If the values of R, T,
and C are fixed for all time, the noise voltage corresponds to a stationary
.random process. On the other hand, if, say, T varies with time, then
the process is nonstationary.
38. Problems
1. The random variable x has an exponential probability den8ity function
p(x) = a exp( bfxl)
where a and b are constants.
a. Determine the required relation between the constants a and b.
b. Determine the probability distribution function of x, Pi X).
c. Plot p(x) and P(x JY) for the case a == 1.
2. 'I'he random variable y has the Cauchy probability den8itll function
(361)
where c and d are constants.
a. Determine the relation necessary between the constants c and d.
b. Determine the probability distribution function of v. P(y Y).
c. Plot P(lI) and P(y Y) for the case d == l.
3. Show that the joint probability density function of the random variables (Zh
,XN) may be expressed in terms of the conditional probability density functions
P(X"IXl, ,x"  1) by the relation
p(xJ, ,XN)  p(xNlxl, ... ,:eNI) (362)
,. Consider the discrete random variables x and 11, having K possible values z, and
RANDOM VARIABLES .AND PROBABILITY DISTRIBUTIONS 43
M possible values 11M respectively. Show that if the equation
P(z 5 X,y 5 Y) == P(z 5 X)P(y 5 Y)
is satisfied for all values of X and Y, then the equations
P(Xk,Ym) == P(Xk)P(Ym)
are satisfied for all values of k and m, and hence the random variables z and yare
statistically independent.
S. Let z and 11 be statistically independent discrete random variables; z has three
possible values (0,1,3) with the probabilities and 11 has the two possible
values (0,1) with the probabilities Determine, and plot, the joint probability
distribution function of the random variables x and y.
6. Let the random variables x and 11 be those defined in Probe 5 above. A new
random variable " is defined as the sum of x and 1/:
v==x+y
Determine the probability distribution function of ", P(v V), and plot.
'I. Let the probability density function of the random variable x be p(x). The
random variable y is defined in terms of x as
y==ax+b
where a and b are constants. Determine the relation between p(y) and p(x).
8. Let x and 1/ be statistically independent random variables. The new random
variables u and v are defined by the equations
u == ax +band " == cy +d
where a, b, c, and d are constants. Show that the random variables u and II are also
statistically independent random variables.
9. Let x be a gamBian random variable having the probability density function
()
exp( x
'
/ 2)
p x ==
The random variable 71 is defined as 11 .. x'. Determine and plot the probability
density function of the random variable 1/.
10. Let x be the gaussian random variable of Probe 9. The random variable 1/ is
defined by the equation
{
xl/" when x 0
11 == ( x) 1/" when x < 0
where n is a positive constant. Determine and plot the probability density function
of the random variable 11 for the cases n  1, 2, and infinity.
11. The random process x(t) is defined by
x(t)  sin (wI +B)
where tD is constant and where Bis a random variable having the probability density
function
p(I) = \2
0:
for 0S B S 2lr
otherwise
a. Determine the probability distribution function Ptz, S X).
b. Determine the probability density function p(x,).
c. Plot the results of a and b.
44 RANDOM SIGNALS AND NOISE
12. Let x and y he statistically independent random variables. A neW random
variable z is defined as the product of z and y:
Derive an expression for the probability density function pe,) of the new random
variable z in terms of the probahility density functions p(x) and p(y) of the original
random variables z and 7/.
13. Let x and y be statistically independent random variables with the probability
density functions
p(x) = __I _ ~
r VI  Zl
and p(1/) I:exp ( t)
for 11 ~ 0
for 11 < 0
Show that their product has a gaussian probability density function.
14. Let x and 11 be statistically independent random variables with the uniform
probability density functions
for 0 s z s A
otherwise
and
for 0 s v s B
otherwise
Determine and plot for A == 1 and B == 2 the probability density function of their
product.
16. Let x and y be statistically independent random variables with the probability
density functions
() exp( x
'/2)
p x == ( 2 r ) ~
and p(1/) I ~ exp (t)
for v ~ 0
for'Y < 0
Show that their product has an exponential probability density function (as in
Probe 1).
CHAPTER 4
AVERAGES
In the two preceding chapters we introduced the concept of proba
bility, defined various probability functions and random variables, and
studied some of the properties of those functions. We pointed out that
probability functions describe the longrun behavior of random variables
(i.e., behavior on the average). In fact it can be shown that the various
averages of random variables (e.g., the mean, the mean square, etc.)
can be determined through the use of the probability functions. It is
this aspect of probability theory which we shall investigate in this chapter.
In particular, we shall define the process of statistical averaging, study a
few averages of interest, and finally investigate the relation between time
and statistical averages.
41. Statistical Averages
Let us first consider a discrete random variable 'Y ::II: g(x), where z is a
discrete random variable taking on anyone of M possible values x"', and
g(x) is a singlevalued function of x. We want now to determine the
average of g(x). To this end, let us return for the moment to the relative
frequency interpretation of probability. Suppose that we repeat N times
a basic experiment which defines the random variable x and that the
event corresponding to x'" occurs n(x",) times in these N repetitions.
The arithmetic average of g(x) is, in this case,
If we believe that repetitions of this experiment will exhibit statistical
regularity, we would expect (as pointed out in Chap. 2) this average to
converge to a limit &8 N . co. Since the probability P(x.) represents
the limit of the relative frequency n(z",)IN, we are led to define the
45
RANDOM SIGNALS AND NOISB
statistical averaget (or expectation) E[g(x)] of the discrete random variable
g(x) by the equation
M
E{g(:t)] = l g(:tm)P(:t",)
",1
(41)
The statistical average then represents the limit of the arithmetic average.
If Xt., w , Xtl are all the mutually exclusive values of x for which
g(x) takes on the value 'Uk, then the probability of Ykis P(Xkl) + +
P(Xti)' and Eq. (41) may be written
K
E[g(:t)] = E(y) = l1l
k
P(Yk)
iI
(42)
Eqs. (41) and (42) give the same value for the statistical average of
g(x); conceptually they are different because Eq. (41) refers to the
sample space of x and Eq. (42) refers to the sample space of y. The
simplest application of Eq. (41) is that in which g(x) = z. E(x) is
usually called the mean value of x and denoted by m:/e.
Now suppose x is a continuous random variable with the probability
density Pl(X) and g(x) is a singlevalued function of x. Again we want
to determine the average of the random variable g(x). Let x be approxi
mated by a discrete random variable x' which takes on the values X
m
with probability Pl(Xm) ~ X m , where the M intervals dXm partition the
sample space of x. Then by Eq. (41)
M
E[g(:t')] = l g(:t..)Pl(:t..) ax..
",1
If we let all the Ax", + 0, thus forcing M ~ 00, the limiting value of this
sum is the integral given below in Eq. (43), at least for piecewise continu
ous g(x) and Pl(X). This procedure suggests that we define the statistical
average of the continuous random variable g(x) by the equation
E[g(x)] = J+.... g(:t)Pl(:t) d:t (43)
Equation (43) may be extended to apply to the case where x is a mixed
discrete and continuous random variable by allowingp.Is) to contain
impulse functions. The statistical average of y = g(x) can also be
expressed in terms of probabilities on the sample space of y; if the proba
bility density Pt(Y) is defined, then
E[g(:t)] = E(y) = J+: YP2(Y) dy (44)
t Alsoknown &8 the mean, mathematical ezpectGtitm, ,tocMstic average, and maemble
tJ,."tJgI. For convenience, we will sometimes denote the statistical average by a bar
over the quantity beinl averaged, thus: g(z).
AVERAGES 47
Multiple Random Variables. So far we have studied only statistical
averages of functions of a single random variable. Let us next consider
those cases where two or more random variables are involved. First
suppose that the random variables x and yare discrete random varia
bles having the M possible values x", and K possible values y"respectively.
The problem before us is to determine the average of a singlevalued
function, say g(x,y), of the two random variables x and y. Functions of
interest might be the Bum x +y, the product xy, and 80 forth.
Even though we are here talking about two random variables x and y,
the statistical average of g(x,y) is really already defined by Eq. (41).
Since x can take on M possible values and y can take on K, the pair
(x,Y) can take on MK possible values and the double indices (m,k) can
be replaced by a single index which runs from 1 to MK. Equation (41)
then applies. However, there is no point in actually changing the indices;
evidently
K M
E[g(x,y)] = L Lg(xm,y")P(x",,y,,)
kl ml
(45)
The corresponding expression when x and yare continuous (or mixed)
random variables is
f
+ f+1O
E[g(x,y)] = GO GO g(x,y)p(x,y) dx dy (46)
Equations (45) and (46) may be extended to more than two random
variables in a fairly obvious manner.
Example 4.1.1. Let U8 now apply the above expressions to several functions g(x,y)
of interest. First, let g(x,y) == ax + by where a and b are constants. Direct appli
cation of Eq. (46) gives the result that
f
+ GO t: f+1IO f+
E(ax + by) := a _ GO _ GO xp(x,y) dx dy + b _ GO _ GO YP(x,y) dx dy
The first term on the righthand side is a times the statistical average of the function
gl (x,1/) = x, and the second is b times the statistical average of the function g2(X,Y) == y.
Thus
E(ax + by) =z aE(x) +bE(y) (47)
That this result is consistent with our original definition, Eq. (43), follows from the
properties of the joint probability density function, specifically from application of
Eq. (329a). We may easily extend the result in Eq. (47) to the case of N random
variables z" to show that
(48)
where the 0" are constants. Thus, the mean of a weighted 8utn of random v a r i a b l ~ 3 i3
t.qual'" the weighted 3um of their 1neon3.
48 RANDOM SIGNALB AND NOI8le
Bzample 41.'. Next let us determine the statistical average of the product of.
function of the random variable z times a function of the random variable ,. From
Eq. (46),
t: t:
E[1&(:z:)/(tI) J.. _. __ 1&(:Z:)/(I/),(%,,,) th dJl
Generally, this is u far &8 one may proceed until the form of the joint probability
density is specified. However, a simplification does occur when :z: and II are statis
tically independent. In this case, the joint probability density fack)rs ink) the prod
uct of the probability densities of :z: and 11 alone, and we get
Therefore
t: t:
E[h(x)f(y)]  _. h(x)PI(x) tk _. f(J/)PI(f/) dW
E[h(x)/(1I)] .. E[A(%)]E[J(1I)] (49)
when x and 1/ are statistically independent random variables. This result may easily
be extended to the case of more than two random variables, hence 1M meG" 01 CI prod.
uct 0/ .ltJtiltically indepmdmt random l1ariablu ia equal to tM product 0/ 'M mea", oj
thoae random variables.
Random Processes. The preceding comments about statistical averages
of random variables apply equally well to statistical averages of random
processes. For example, let Xc be the random variable that refers to the
possible values which can be assumed at the time instant t by the sample
functions x(t) of a given random process. We then define the statistieal
average of a function of the given random process to be the statistical
average of that function of xc:
E[g(x,)] = J: f/(x,)p(x,) dx,
It should be noted that the statistical average of a random process
may well be a variable function of time, since, in general, the probability
density of x, need not be the same as that of %'+'T. For example, suppose
the sample functions x(t) of one random process are related to those Yet)
of another by the equation
x(t) = Yet) cos ,
Since cos t is simply a number for fixed t,
E(z,) = E(y,) cos,
Suppose now that the process with sample functions y(t) is stationary.
The probability density p(Yc) is then the same for all values of t, and
E(y,) has the same value mil for all t (i.e., the statistical average of a
stationary random process is a constant function of time). Then
E(x,) = mil cos t E ( x , ~ ) = mil cos (t + ".)
unless r is a multiple of 2r and E(x,) is not a constant function gf t
as it would be if the corresponding process were stationary.
A."&A.OBS 49
'I. Moments
One set of averages of functions of a random variable z which is (,f
particular interest is that consisting of the moments of z in which the
n , ~ h moment of the probability distribution of the random variable z is
defined to be the statistical average of the nth power of z:
(410)
(411)
and the nth moment of a random process is similarly defined. In a
stationary random process, the first moment m = E(x) (i.e., the mean)
gives the steady (or do) component; the second moment E(x
2
) (i.e., the
mean square) gives the intensity and provides a measure of the average
power carried by a sample function.
The nth central moment p" of the probability distribution of the ran
dom variable % is defined as
J
+
11." = E[(x  m)"l = _. (x  m)"p(x) dx
It then follows from the binomial expansion of (x  m)" that the central
moments and the moments are related by the equation
(412)
The second central moment which is often of particular interest is com
monly denoted by a special symbol 0'1:
(413)
(414)
and given a special name variance or dispersion. The variance of a sta
tionary random process is the intensity of the varying component (i.e.,
the ac component). The positive square root (1 of the variance is known
as the standard detJiation of the random variable x. The standard devi
ation of a random process is then the rootmeansquare (rms) value of
the ac component of that process.
Multiple Random Variables. The concept of moments of a probability
distribution may, of course, be extended to joint probability distribu
tions. For example, an (n + k)th order joint moment (or cr08B moment)
of the joint probability distribution of the random variables x and y is
J
+J+
E(X"yi) = _. _. x"y.p(x,y) dx dy
The corresponding joint central moments are
(415)
50 RANDOM SIGNALS AND NOISE
where m; is the mean of x and mil is the mean of y. The .joint moment
Pl1 == E[(x  ms)(Y  mv)]
is called the covariance of the random variables x and y. It is of par
ticular interest to us and will be discussed in Borne detail in Art. 44.
43. Characteristic Functionst
Another statistical average of considerable importance is the charac
teristic function M:r(jtJ) of the probability distribution of the real random
variable x, which is defined to be the statistical average of exp(jvx):
M,,(jIJ) = E[exp(jlJx)] = J: exp(jlJx)p(x) dx (416)
where v is real. Since p(x) is nonnegative and exp(jvx) has unit
magnitude,
IJ+: exp(jlJx)p(x) dx I~ t: p(x) dx = 1
Hence the characteristic function always exists, and
IMs(jv) I s Ms(O) = 1 (417)
It follows from the definition of the Fourier integral that the charac
teristic function of the probability distribution of a random variable J;
is the Fourier transform of the probability density function of that ran
dom variable. Therefore, under suitable conditions, t we may use the
inverse Fourier transformation
1 f+
p(x) = 211" __ Ms(jv) exp( jux) dv (418)
to obtain the probability density function of a random variable when
we know its characteristic function. It is often easier to determine the
characteristic function first and then transform to obtain the probability
density than it is to determine the probability density directly.
When x is a discrete random variable, Eq. (416) becomes
M,,(jv) = l P(Xm) exp(jvx
m
) (419)
m
Moment Generation. Suppose now that we take the derivative of the
characteristic function with respect to v,
dM (. ) J+oo
d
s
.l!!.._ = j x exp(jvx)p(x) dx
v  00
t For a more detailed discussion of the properties of the oharacteristio function, see
Cramer (I, Chap. 10) and Dooh (II, Chap. I, Art. 11).
: See, for example, Courant (I, Vol. II, Chap. 4, Appendix Art. 5) or Guillemin (II,
Ohap. VII, Artl. 19).
AVlCBAGES
51
If both sides of this equation are evaluated at v == 0, the integral becomes
the first moment of the random variable x. Hence, on solving for that
moment, we get
(420)
and we see that the first moment may be obtained by differentiation of
the characteristic function.
Similarly, by taking the nth derivative of the characteristic function
with respect to v, we obtain
dnM,.(jIJ) = j" f+" x" exp(jIJx)p(x) dx
dv
n
_ GO
Evaluating at tJ = 0, the integral becomes the nth moment of the random
variable x. Therefore
E(x") = (j)" d"M,.!jv)] (421)
dv v=O
The ditTerentiation required by Eq. (421) is sometimes easier to carry
out than the integration required by Eq. (410).
Suppose that the characteristic function has a Taylorseries expansion:
It then follows, using Eq. (421), that
GO
M,.(jv) = " E(x
n)
(jv)n
'' nl
nO
(422)
Therefore, if the characteristic function of a random variable has a
Taylorseries expansion valid in Borne interval about the origin, it is
uniquely determined in this interval by the moments of the random
variable. If the moments of all orders of the distribution of a random
variable do not exist but the moment of order n does exist, then all
moments of order less than n also exist and the characteristic function
can be expanded in a Taylor series with remainder of order n. t If the
moments do uniquely determine the characteristic function, they also
uniquely determine the probability distribution function. t A simple con
dition for this uniqueness is the following: if a probability distribution
has moments of all orders, and if the series in Eq. (422) converges
t See Courant (I, Vol. I, Chap. VI, Art. 2).
t See Cramer (I, p. 93), or Loeve (I, p. 186).
52 RANDOM SIGNALS AND NOISE
absolutely for some II > 0, then this is the only probability distribution
with these moments. t
Another average closely related to the characteristic function is often
discussed in statistics literature. t This function, called the momentgen
erating function Ms(u) is defined to be the statistical average of exp(ux):
Ms(u) = E[exp(ux)] (423)
where u is taken to be real. This function has essentially the same
momentgenerating properties as the characteristic function. In par
ticular, it is easy to show that
E(x") = d"Ms(U)] (424)
du" uo
An essential difference between the characteristic function and the
momentgenerating function is that the characteristic function always
exists, whereas the momentgenerating function exists only if all of the
moments exist.
Joint Characteristic Functions. The statistical average of exp(jvlx +
jvt.Y)
(425)
is the joint characteristic function of the joint probability distribution of
the random variables x and 'H. It therefore follows that the joint charac
teristic function is the twodimensional Fourier transform of the joint
probability density function of x and y, and we may use the inverse
Fourier transformation
1 f+ f+
p(x,y) = (2r)1 _. _. M(j
V
l ,jV2) exp( jVIX  JVIY) dVl dV2 (426)
to obtain the joint probability density function of a pair of random vari
ables when we know their joint characteristic function. Similarly, the
Nthorder joint characteristic function and the Nthorder joint proba
bility density function form an Ndimensional Fourier transform pair.
Having defined the joint characteristic function, let us determine some
of its properties. First we find that
M(OJO) = 1
It further follows from Eq. (425) that
1M(jVl,j VJ) I ~ M(O,O) = 1 (427)
t See C r a m ~ r (I, p. 176).
: See, for example, Mood (I, Art. 5.3l.
I See C r a m ~ r (I, Art. 10.6).
AVERAGES 53
Therefore, the joint characteristic function always exists, and it assumes
its greatest magnitude at the origin of the (Vl,1I2) plane.
Suppose now that we set only 112 equal to zero. We may then write
t: f+
M(3vl,O) = __ exp(JvIX) dx _. p(x,y) dy
From the properties of the joint probability density function, it follows
that the integral over y is simply the probability density function Pl(X).
Thus we obtain the relation
(428)
between the joint characteristic function for x and y and the character
istic function for x alone. Similarly, it can be shown that
(429)
(430)
Suppose next that we take the nth partial derivative of the joint charac
teristic function with respect to VI and the kth partial derivative with
respect to VI; thus
Setting both VI and v, equal to zero, we note that the double integral
becomes the joint moment E(x"yl:). Therefore
E(x"yt) = (_j)"+1< a..+I<M..(jV11VI)]
aVI aVt 0
Thus the various joint moments of the random variables x and y may be
obtained from their joint characteristic function through successive
differentiations.
A series expansion of the joint characteristic function in terms of the
various joint moments may be obtained through the use of powerseries
expansions of the exponentials in the integral expression for M(jVI,jV2).
Thus
or, on interchanging orders of integration and summation,
54 RANDOM SIGNALS AND NOISE
The double integral in this equation is simply the joint moment E(Z"Jr);
hence
M(jv.,jvl) = L.LE(x"Y") ( j v ~ ~ ~ { v l ) "
nO iO
(431)
Statistically Independent Random Variables. The Ndimensional joint
characteristic function for the N random variables X
n
is by definition
N 1{
M (jVl, ,jVN) = E [ exp (j Lv"x,,)] = E [n cxp(jv"x..)]
ft! ftI
If the N random variables x" are statistically independent, the average
of the product is equal to the product of the respective averages. Since
the statistical average of exp(jv,.x,,) is the characteristic function for X
ra
,
we obtain the result that
N
M(jvI, ,jVN) = nM..(jvn)
(432)
Thus the Ndimensional joint characteristic function for N independent
random variables is equal to the Nfold product of their individual charac
teristic functions. Conversely, it may easily be shown tha.t the N ran
dom variables x" are necessarily statistically independent if their joint
characteristic function is equal to the product of their respective charac
teristic functions.
Esample 48.1. Let us next consider the random variable y, defined to be the sum of
N statistically independent random variables %,,:
N
7J  L:1:"
,,1
The characteristic function for 1/ is then
N N
M.(M  E [ exp (i
v
L:1:,,)]  E [ nexp(jVZ,,)]
nl n)
Since the random variables s" are statistically independent,
N
Mv(j.,)  nM.,,(iv)
,,1
(433)
and we see thaT, the characteristic function for a Bum of statistically independent ran
dom variables is equal to the product of their respective characteristic functioDs. Do
not, however, confu. the results contained in Eq. (432) and Eq. (433lbMause of
their similarity of form. Equation (432) expresses the joint,. characteristic function
AVERAGES
of the N random variables ~ " and is a. function. of the N variables 11,.. On the other
hand, Eq. (433) expresses the characteristic function of a single random variable 1/
and is a function of the single variable v.
Example 43.1. Let a random variable z be defined as the sum of two statistically
independent random variables :I: and y: z := 2: + y. It follows from Eq. (433) that
M.(;V) = Ms(iv)M,(jv) (434)
The probability density function for z may now be obtained by taking the inverse
Fourier transform of both sides of Eq. (434). Thus
P3(Z) == 2
1
f+. M2;(jv)M,(jv) exp( jvz) dv
r .
On substituting for M.(iv) from Eq. (416) and rearranging the order of integration,
we get
Pa(') ... f+: Pl(Z) { ~ f+.: Afw<iv) exp[ jv(,  z) dV} th
Hence, using Eq. (418),
Pa(') = r: PI (z)Pt(V ... ,  :1:) th (435)
This result is precisely the same as that previously derived, Eq. (350). directly
from probability distribution functions.
44. Correlation
As we observed earlier in our discussion of conditional probabilities,
it is often desirable to know something about the dependence of one ran
dom variable upon another. One way of bringing to light a possible
,
.
... .:..:..: ...:...
.... .....: ....:::. :/: .
..::.:.:::}.;?.!:..: ..
Flo. 41. A scatter diagram.
dependence relationship between two real random variables x and 11 would
be to plot the outcomes of various performances of the basic experiment
defining these random variables as points in the (x,y) plane (i.e., plot
the various sample points measured) and study the resultant figure.
Such a plot, which might have the form shown in Fig. 41, is known as a
8catter diagram. If the two random variables x and 'Y were not dependent
56
RANDOM SIGNALS AND NOISE
upon each other, it might be expected that the sample points would be
more or less scattered throughout the plane. On the other hand, if the
variables were strongly dependent upon each other, we might then expect
the sample points to be clustered in the immediate vicinity of a curve
describing their functional dependence. The simplest form of depend
ence is linear dependence, which is of considerable practical importance.
In this case, we would expect the sample points to be concentrated along
some straight line, as shown in Fig. 41, for example.
Suppose that a scatter diagram indicates that the random variables
x and y have a strong linear dependence. It is then of interest to deter
mine which straight line
YP = a + b (436)
gives the best predicted value YP of the random variable y based upon a
sample value of the random variable z, In order to make such a determi
nation, we must specify just what we mean by "best." One convenient
criterion of goodness which is useful in many applications is the mean
square difference (error) Em, between the true sample value of the random
variable y and its predicted value:
Em, = E[(y  Yp)2] = E{[y  (a + bX)]2J (437)
Under this criterion of goodness, the best line of prediction is that line
which minimizes the meansquare error. Such a line is often known as
the linear meansquare reqreeeion line. t
Let us now determine those values of the intercept a and the slope b
which give the least meansquare error. Differentiating the expression
for the meansquare error with respect to a and b and equating the results
to zero gives
aE
m
,  2E(y) + 2a + 2bE(z) = 0
da 
a;b" = 2E(xy) + 2aE(x) + 2bE(x
2
) = 0
Solving for a and b, we obtain
b = E[(x  mz)(Y  mJl)] = ~ 1 1
us
2
us
2
E[(x  ms)(Y  m.,)] 111
a = m  m
s
= m   ms
., a
s
2
" (l
s
2
If next we substitute these values in Eq. (436) and verify the achieve
ment of a true minimum, we find
111 ( )
yp = mit + 2 X  ms
(fa
t See C r a m ~ r (I, Arts. 21.57).
(438)
AVERAGES 57
as the equation of the best prediction line. From this equation, it follows
that the best line of prediction passes through the point (mz,m.,).
It is convenient to introduce here the standardized variable corre
sponding to the random variable x and defined by the equation
Now
(439)
= 0 and (440)
i.e., the standardized variable has a zero mean and a unit standard devi
ation. In terms of the standardized variable on defining a standardized
prediction '1P = (yp  m.,)/u." the previously derived expression for the
best prediction line assumes the particularly simple form
"JI =
where p is the correlation coefficient defined by the equation
(441)
(442)
in which" is the standardized variable corresponding to y. The corre
lation coefficient is often known as the normalized covariance of the ran
dom variables x and y. Equation (441) shows that the correlation
coefficient is the slope of the line which gives the best predicted value
of " based upon a sample value of
Consider for the moment the mean square of (E 'I). Since and ."
are both real, the square (and hence the mean square) of this function
must be nonnegative. Hence
,,)2] = +E("s)
= 2(1 p) 0 (443)
The greatest possible positive and negative values of the correlation
coefficient, therefore, have unit magnitude:
1 s p s 1
Suppose that p = +1. It then follows that
J
+ J+  ,,)1] = __ __ ,,)2
p
( X,y) dx dy = 0
(444)
Since both factors of the integrand are nonnegative, this result can hold
if p(x,y) is continuous only if " = for all values of x and y for which
p(x,Y) > OJ the set of all pairs of values of and." for which p(z,y) = 0
occurs with probability zero. Similarly, if p = 1, then" =  for all
values of x and y for which p{x,y) > o. The extremal values of p = 1
68 RANDOM SIGNALS AND NOISE
therefore correspond to those cases in which" = ~ with probability one.
Linear Independence and Statistical Independence. If the correlation
coefficient of the real random variables x and 11 is zero,
p=o
then those random variables are said to he uncorrelated or linearly inde
pendent. From the defining equation of the correlation coefficient (442),
it follows that if the joint moment E(xy) of the random variables x and
y factors into the product of their respective means
E(xy) = E(x)E(y)
then the correlation coefficient is zero, and hence x and yare linearly
independent. Therefore, if two random variables are statistically inde
pendent they are also linearly independent.
On the other hand, however, two random variables that are linearly
independent mayor may not be statistically independent. This can be
shown as follows: It was pointed out in Art. 43 that a necessary and suf
ficient condition for the statistical independence of two random variables
is that their joint characteristic function factor into the product of their
respective individual characteristic functions:
Suppose now that the joint characteristic function has a Taylor series
valid in some region about the origin of the (Vl,fJ2) plane. Then M/&(jVl)
and MII (jUt) will also have Taylor's expansions, and, substituting these
expansions into the above equation, we get
Since these series must be equal term by term, it follows that, if a Taylor
series expansion exists for the joint characteristic function, then a neces
sary and sufficient condition for the statistical independence of two ran
dom variables x and y is that their joint moment factors
(445)
for all positive integral values of nand k. Since linear independence
guarantees only factoring of E(xy), it appears that linearly independent
random variables are not necessarily also statistically independent, as
AVERAGJ4.:S 69
shown by Probe 12 of this chapter. However, in the special case where
x and 'Yare jointly gaussian, linear independence does imply statistical
independence, as we shall see later.
46. Correlation Functions
Let Xl and X2 be the random variables that refer to. the possible values
which can be assumed at the time instants t
1
and t2, respectively, by the
sample functions x(t) of a given random process. Our remarks about
correlation in the preceding article apply just as well to these two ran
dom variables as to any other pair. However, since the joint probability
distribution of Xl and X2 may change as t
1
and t2 change, the average
E(XIX2) relating Xl and X2 may well be a function of both time instants.
In order to point up this time dependence, we call that average the auto
correlation Junction of the random process and denote it by the symbol
Therefore, if the random process in question is real, we have
= E(XIX2)
If the random process is complex, we define
R
z
(t
l
,t2) = E(X1X; )
(446a)
(446b)
where the asterisk denotes complex conjugate. Eq. (446b), of course,
reduces to Eq. (446a) for a real random process.
Just as E(XIX2) is a function of the time instants t
1
and tt, 80 is the
corresponding correlation coefficient. We shall call the correlation coef
ficient of Xl and XI the normalizedautocorrelation function of their random
process and denote it by the symbol Pz(t
1
,t
2
) . If the random process in
question is real,
(447)
(448)
and hence
(
, ') _ R.(tl,tt)  mimi
P. '1,"1 
a'1a'1
where ml = E(Zl), ml = E(Xt), 0'1 = a(xl), and 0'2 = U(X2).
If the given random process is stationary, the joint probability dis
tribution for Xl and X2 depends only on the time difference T = t
1
 t2
and not on the particular values of t
l
and t2. The autocorrelation func
tion is then a function only of the time difference 1', and we write
R.(t,t  T) = R.(T) = (449)
for any t. The corresponding normalized autocorrelation function is
( )
Rz(T)  m.
2
(4.50)
P. T' = tTs2
60
RANDOM SIGNALS AND NOISE
where m. == mt == ml and fl. == crt == (1t, since we are dealing with a
stationary random process.
Some random processes are not stationary in the strict sense (i.e., their
probability distributions are not invariant under shifts of the time origin)
hut nevertheless have autocorrelation functions which satisfy Eq. (449)
and means which are constant functions of time. Such random processes
are said to be stationary in the wide Bense. t Obviously, any random
process which is stationary in the strict sense is also stationary in the
wide sense.
Consider next the two random processes, not necessarily real, whose
sample functions are x(t) and Yet) respectively. For each process there
exists an autocorrelation function R.(tl,tt) and R
lI(t
1,tt)
respectively. In
addition, we may define two cr08scoTTelation function8
and
R.,,(tl,t
2
) == E(XIY;)
R".(tl,tt) == E(l/Ix;)
(451)
The correlations between the values of the sample functions %(t) and yet)
at two different instants of time are then specified by the correlation
R == [R.(tl,tt) R.,,(t1,tt)]
RI/.(t1,tt) RI/(tl,tl)
In general, an NbyN correlation matrix is required to specify the corre
lations for two instants of time among N random processes, or for N
instants of time between two random processes.
Some General Properties. Suppose that the random processes with the
sample functions x(l) and Yet) are individually and jointly stationary
<at least in the wide sense). Then, defining t' == t  1',
= E(x"+rY,') = E*(y,'z'+r)
(452)
(453)
and
Since both processes are stationary, the indicated averages are invariant
under a translation of the time origin. Therefore
R.
II
( .,.) == R:.( .,.)
R.(1') == R:( T)
Hence the autocorrelation function of a stationary real random process
is an even function of its argument. The crosscorrelation functions of
two such random processes mayor may not be.
Suppose, for the moment, that the real random processes with the
sample functions x(t) and Yet) are not necessarily stationary. Since %(t)
and yet) are real functions of time, and since the square of a real fune
t CI. Doob (II, Chap. II, Art. 8).
and
AVERAGES
tion is nonnegative, it follows that
E
On expanding, we can then obtain the result
IR
sv
(tt,t2)I s EU(Xtl)EK(Yt
l
)
Similarly,
IR
z
(t
1
,t2)I s
For stationary random processes these inequalities become
IRzlI(T) I s
IRz(T) I s Rz(O)
61
(454a)
(454b)
(4500)
(455b)
Example 46.1. To get. a better understanding of the autocorrelation function, let us
now consider a specific example: that of the "random telegraph" wave pictured in
Fig. 42. This wave may with equal probability assume at any instant of time either
FIG. 42. A random telegraph wave.
of the values zero or one, and it makes independent random traversals from one
value to the other. Therefore
m... 0 P(x  0) + 1 P(x .. 1)  Pte ::a 1) .. (456)
We shall assume that the probability that Ie traversals occur in a time interval of length
T is given by the Poisson distribution:t
(aT)'
P(k,T) .. kl exp( aT) (457)
where a is the average number of traversals per unit time.
Now x.  x, and x, = X'1' are both discrete random variables having the two pos
sible values zero and one. Therefore
R.(.,.)  (0 O)P(Zl == O,ZI .. 0) + (0 l)P(Zl  O,X2 = 1) + (1 O)P(XI .. 1,
x, .. 0) + (1 . l)P(XI == l,xl :Ill 1) .. P(XI .. 1,zl  1)
The probability that %1 ::II 1 and t,hat XI  1 is the same as the probability that Xl = 1
and that an even number of traversals occur between t and t  T. Hence
P(XI .. l,z2 == I) .. P(Xl  l,k even)  P(XI  l)P(k even)
t A derivation of the Poisson distribution is given in Chap. 7, Art. 72.
62 RANDOM SIGNALS .AND NOISE
since the probability that Ie assumes any particular value is independent of the value
of Xl. Since Xl assumes the values zero and one with equal probability,
R,,(1')  ~ J P ( k even)
Then, using the Poisson distribution for k,
10
l
iO
(k even)
We may evaluate the series as follows:
L (art ~ [L(arPl + l (:\T/)l]
iO ia iO
(k even)
:III ~ l [exp(alTI) +exp( aITDI
Using this result in the last expression for the autocorrelation function, we get
(458)
as an expression for the autocorrelation function of the random telegraph wave. A
plot is given in Fig. 43.
FlO. 43. Autocorrelation function of the random telegraph wave.
The limiting values of this autocorrelation function are
for T = 0, and
R.( 00)  ).(  m.
1
for T = ClO. The righthand equality in Eq. (460) follows from Eq. (456).
(459)
(460)
46. Convergence
Let us digress from our discussion of averages per se long enough to
derive a few results needed in the remainder of the book. In this article
we shall derive an important inequality and then consider the problem
of convergence of a sequence of random variables. In the following
article we shall discuss integrals of random processes. In Art. 48 we
shall return to the topic of averages and consider time averages and their
relation to statistical averages.
AVERAGES 63
The Chebyshev Inequality. Suppose that 11 is an arbitrary random
variable with a probability density function p(y) such that
E(lyll) = !+: Iyllp(y) dy < eo
Since both lyl2 and p(y) are nonnegative,
E(lyll) J Iyllp(y) dy
where E is an arbitrary positive number. Further, since Iyl' e
2
at every
point in the region of integration,
EOyl') e' J p(y) dy
The integral here is the probability that Iyl E. Therefore, on solving
for this probability, we obtain
Pllyl e) s E(I;12) (461)
E
In particular, if y is taken to be the difference between a random variable
x and its mean m% we get the Chebyshev inequality
a 2
Px  E) T (462)
E
Convergence in Mean and in Probability. On a number of subsequent
occasions it will be of importance to know whether or not a sequence of
random variables X
n
converges to a random variable x, and if so, in what
sense. We shall define here several kinds of convergence and show some
of their interrelations.
Suppose that E(lx
n
I
2
) < co for all n and that E(lxI
2
) < co. The
sequence of random variables X
n
is said to converge in the mean to the
random variable x if
lim E(lx"  x1
2
) = 0
ft+
in which case we write
l.i.m. x" = x
n......
where u l.i.m." denotes limit in the mean.
If the sequence of random variables x" is such that
lim P[lx"  xl E] = 0
,,t.
(4.63)
(464)
(465)
for arbitrary E > 0, that sequence of random variables x" is said to con
verge in probability to the random variable x, and we write
p lim %" == Z
,..10
(466)
64
RANDOM SIGNALS AND NOISE
It then follows from Eqs. (461), (463), and (465), on setting y == 2:  x"
in Eq. (461), that if a sequence of random variables x,. converges in the
mean to a random variable x, that sequence also converges in probability
to z.
Convergence in Distribution. It can next be shown that, if a sequence
of random variables x,. converges in probability to a random variable x,
then the probability distribution functions of the x; converge to the
probability distribution function of x at every point of continuity of the
latter. The random variables x" are then said to converge in distribution
to the random variable z.
Consider first the probability distribution function Ptx; ~ X). The
event (z, ~ X) may occur either when (x ~ X + E) or when (x > X + e).
Since the latter events are mutually exclusive,
n, s X) = rt, s X,X s X + E) +P(X,. s x > X + e)
Similarly,
P(x ~ X + I) = P(x ~ X + I,X" ~ X) + P(x ~ X + E,X" > X)
Subtraction then gives
n, ~ X)  P(z ~ X + E) = n. ~ s > X + E)
 P(X,. > x ~ X + e) s n. s X,X > X + e)
If (x" ~ X) and (x > X + E), then (Ix"  xl > E). This is one way in
which the event <Ix"  xl > E) can occur; hence
P(X,. ~ X,X > X + e) ~ P(lx"  xl > E)
Therefore, on defining 8,. = P(lx,.  z] > E),
P(X,. ~ X) ~ P(x ~ X + E) + ~ "
In a similar manner it can be shown that
P(x ~ X  e)  8" :::; P(X,. ~ X)
and hence that
P(x s X  E)  8" ~ P(x" s X) s P(x s X + E) + a"
Now 8" + 0 as n + co since x" converges in probability to s: Hence
P(x S X  E) s lim ri, s X) s P(x s X + e)
n..
for every e > O. It therefore follows that at every point of continuity
of P(z s X)
lim P(z" ~ . X) = Pi..~ X)
".....
which was to be shown.
(467)
.AVJ;;RAGES
65
'7. Integrals of Random Processes
We shall use integrals of random processes frequently. They arise
naturally in many places; for example, as we shall see in Chap. 9, if 8
system operates on an input function in a way which involves integration
and we want to consider what happens when the input is a noise wave,
we must deal with integrals of. random processes. It is clear what the
integration of a random process should mean in such an example; a par
ticular noise input is a sample function of its random process, so the inte
gration of the process should amount to ordinary integration of functions
when a particular sample function is considered. For each .sample func
tion of the random process in the integrand, the value of the integral is
a number; but this number is generally different for each sample function,
and over the ensemble of sample functions which make up the random
process, the integral takes on a whole set of values. The probability
that these values lie in a certain range is equal to the probability of
having a sample function in the integrand which would yield a value in
this range. Thus we can assign a probability law in a natural way to
the values of the integral; that is, the integral of the random process
may be considered to be a random variable. In symbols, if we tempo
rarily denote a random process by X(8,t), where 8 is the probability vari
able taking values in a sample space Sand t is time, we may write
yes) = l" x(s,t) dt (468)
For each 8 this may be regarded as an ordinary integral (of a sample
function). Since 8 ranges over S, Y(8) is a function on S, i.e., a random
variable.
It can, in fact, be shown that under reasonable conditions it is possible
to treat the integral of a random process as the ensemble of integrals of
its sample functions, 80 that it has the interpretation which we have just
discussed. In particular, t under an appropriate measurability condition,
if
/." E[lx(.,t)11 dt < 00
(469)
then all the sample functions except a set of probability zero are absolutely
integrable, and, in addition,
E [!." x(s,t) dt] = !." E[x(',t)) dt (470)
The limits a and b may be finite or infinite. The measurability condition
is one which we may fairly assume to be satisfied in practice. Thus we
are free to consider integrals of the sample functions of a random process
t Of. Doob (II, Theorem 2.7).
68 RANDOM SI.GNAaLB AND NOISE
whenever the mean value of the process is integrable. Further, WeC&D
actually calculate averages involving these integrals by use of Eq. (470).
It is customary to suppress the probability variable 8, so Eq. (468) i ~
written simply
(471)
and Eq. (470) is
(472)
Often we have to deal with the integral of a random process weighted
by some function. For example, suppose we have
y = L
6
h(t)x(t) dt (473)
(475)
(476)
where h(t) is a real or complexvalued function of t. Then, according to
the theorem referred to, this.integral exists with probability one if
L' E[lh(t)x(t)1] dt = L",h(t)IE[lx(t)1l dt < 00 (474)
If the weighting function is also a function of a parameter, say another
real variable T, then we have
y(T) = L
6
h(t,T)X(t) dt
which defines Y(T) as a random process.
'8. Time Averages
In Art. 41 we defined the statistical average of 8 random process with
sample functions xCt) to be the function of time E(x,). This is an average
II across the process"; for each t. it is the mean value of that random
variable which describes the possible values which can be assumed by
the sample functions at t,: = to. It seems natural to consider also averages
"along the process," that is, time averages of individual sample func
tiODS, and to inquire what relation these averages have to the statistical
average. Since in most cases of interest to us the sample functions extend
in time to infinity, we define the time average <z(t) > of a sample func
tion x(t) by
1 f+T
<x(t) > == lim 2T z(t) dt
2'..  T
it this limit exists. It should be noted that <z(t) > may not exist (i.e.,
the finite average may not converge) for all or even for any of the sample
functions, and that even if the time averages do exist they may be differ
ent for different sample functions. Because of these facts and because
AVERAGES 67
the statistical average of 8 nonatationary random process is not generally
a constant function of time, we cannot hope to Bay in general that U time
average equals statistical average." Yet in certain cases it seems as
though time averages of almost all the sample functions ought to exist
and be equal to a constant statistical average.
For example, suppose that the voltage of a noisy diode operating under
fixed conditions in time is observed for a long period of time T and that
its time average is approximated by measuring the voltage at times kT/ K,
k = 1, ... ,K, where K is very large, and by averaging over this set of
K values. Suppose also that the voltages of K other diodes, identical to
the first and operating simultaneously under identical conditions, are
measured at one instant and averaged. If in the first case T/K is suf
ficiently large that the noise voltages measured at times T/K apart have
very little interdependence, it appears there is no physical mechanism
causing the K measurements on the single diode to average either higher
or lower than the simultaneous measurements on the K diodes, and we
would expect all the time averages and statistical averages to be equal
(in the limit). Note that in this example there is statistical stationarity.
Satisfactory statements of the connection between time and statistical
averages have in fact been obtained for stationary random processes.
The most important result is the ergodic theoremt of Birkoff, which states,
primarily, that for a stationary random process, the limit <x(t) > exists
for every sample function except for a set of probability zero. Hence
we are on safe ground when we speak of the time averages of the sample
functions of a stationary process. The time average of a stationary
random process is a random variable, but the ergodic theorem states
in addition that under a certain condition called ergodicity it is equal
with probability one to the constant statistical average E(x,). A precise
description of the ergodicity condition is beyond the scope of this book;
however, its essence is that each sample function must eventually take
on nearly all the modes of behavior of each other sample function in
order for a random process to be ergodic. One simple condition which
implies ergodicity for the important class of gaussian random processes
with continuous autocorrelation functions ist
(477)
where R(".) is the autocorrelation function of the random process and the
t Bee Kbincbin (I, Chapa. 2 and 3), Doob (II, Chaps. 10 and 11, Arta. 1 and 2), or
LoINe (I, Chap. 9).
: Condition (477) may be inferred from Doob (II, Chap. 11, Arts. 1 and 8). See
also Grenander (I, Art. 5.10). (See Art. &4 of this book for the definition of a
gaussian random process.)
68 RANDOM SIGNALS AND NOISE
process is assumed to be stationary with zero mean. In general, how
ever, no condition on R(7) can guarantee ergodicity.
It should be noted that if a random process is ergodic, any function of
the process (satisfying certain measurability conditions) is also an ergodic
random process and therefore has equal time and statistical averages.
In particular, if the random process with sample functions x(t) is ergodic,
then
1 f+T
E(X,2) = lim 2T x
2
(t) dt
T....  1'
with probability one.
Convergence in Probability. Although we cannot prove the ergodic
theorem here, we can prove a lesser statement relating time and statisti
cal averages. Let x(t) be a sample function of a widesense stationary
random process with a finite mean m ~ and 'a finite variance (1s2. The
finiteinterval time average As(T) of a sample function is defined by the
equation
1 f+T
A ~ ( T ) = 2T 1' x(t) dt
Then if the limit exists,
lim As(T) = <z(t
1'.....
(478)
(479)
For fixed T, as x(t) is allowed to range over all possible sample functions,
Az(T) defines a random variable which we shall also denote by A.(T).
We shall show below that if the autocorrelation function of the random
process is absolutely integrable about the square of the mean of the
process, i.e., if
(480)
then the random variables As(T) converge in probability to m
s
as T + QO.
First, by takingthe statistical average of Eq. (478) and interchanging
the order of averaging and integration, we find that
E[A..(T)] = 2 ~ J_+r
r
E(x,) tit = mOl (481)
since E(x,) = m
z
As this result is independent of T, it follows that
E[ <x(t) >] = m
s
(482)
i.e., the statistical average of the time averages of the sample functions
of a widesense stationary random process is equal to the mean of that
process.
The variance of A.(T) may now be obtained as follows: From Eqs.
AVERAGES 69
(418) and (481), interchanging the order of averaging and integration,
we obtain
1 f+2' 1+7'
cr
2
[Az (T)] = 4T2 '1' dt' 7' E(Xt'x,) dt  ma
2
.: .:
= 4 ~ 1 2' dt' _ T [R..(t'  t)  m..
'
] dt
where Rz(t'  t) is the autocorrelation function of the random process
in question. Upon defining T = t'  t, this becomes
The integration is now over the area of the parallelogram in Fig.. 44.
Since the autocorrelation function is an even functon of T (and is not a
FIG. 44. Domain of integration.
function of t' here), the integral over the entire parallelogram is simply
twice the integral over that part of the parallelogram which lies to the
right of the t' axis. Thus
CJ'I[A..(T)] = 2 ~ ' ~ 2 2 ' [R..(T')  m..
'
] dT' J2' dt'
2'+1'
and hence CJ'2[A..(T)] = ~ ~ 2 2 ' (1  2T'T) [R..(T')  m..
2
]dT' (483)
Since
!h
2T
(1  2 ~ ) [R..(T')  m..
'
] dt I
S ~ 2 T ~ 1  2 ~ /IR.(T')  m.,11 dT' S h
2T
(R.(T')  m.
2
1d1
70 RANDOM SIGNALS AND NOISE
and
it therefore follows that if
J IRa(T)  ma
2
1dT < 00
then lim = 0
N+10
and hence, from Art. (46.), that
l.i.m, As{T) = m
s
T.....
p lim A.(T) = m.
1'....10
(484)
(485)
(486)
(487)
(488)
It should be noted that this absolute integrability condition is a sufficient
condition for the limit in Eq. (485), but it may not be a necessary one.
Time Correlation Functions. We shall define the time autocorrelation
Junction <Rs(T) of a sample function of a random process by
1 f+T
<Rs(T) = lim 2T x(t +T)X*(t) dt
'1'.... 10 T
If the random process is ergodic, the time autocorrelation function of a
sample function of a random process equals the statistical autocorrelation
function of that process
(489)
with probability one. Similarly, we shall define a time crosscorrelation
function <Rsv(T) of the sample functions x(t) and Yet) of two random proc
esses by
1 f+T
(R:q,(T) = lim 2T x(t + T)y*(t) dt
'1'..... '1'
If then the given processes are jointly ergodic, it follows that
<Rq(T) = Rq{T)
(490)
(491)
with probability one.
Example 48.1. The definition equation (488) may be applied to an arbitrary func
tion of time &s well as to a sample function of a random process so long as the indi
cated limit exists. For example, suppose that :e(t) is a complex periodic function of
time such that its Fourier series converges. We may then write
+
s(t)  LaU"",.) exp{iftw.t)
,,
(492)
where n is an integer, eat. is the fundamental angular frequency of x(t), and the coeffi
cients er(;,"".) are complex constanta given byt
(493)
t See for example, Guillemin (II, Chap. VII, Art. 7) or Churchill (I, Sec. 30).
AVBRAGES
It then follows from Eq. (488) that
'11
tn. 10
On interchanging orders of integration and summation and using the tact that
1 J+". . sin n .... m)c.7oT]
lim 2T exp( J(n  fn)",,,t] dl = lim ( ) T
,'GO " n  m We)
{
when m  n
when m n
we get
+..
LIaU.) II exp( +J.r)
"GO
(494)
When z(t) ii a real function of time, Eq. (494) becomes
+
CR.(,.)  QI(O) +2 LIaUfI<..) II cos (n",.,.)
,,I
(495)
Thus we see that the But<)correlation function of a periodic function of time is itself
periodic, and that the periodicities of the autocorrelation function are identical with
those of the original time function. Equation (495) further shows that all informa
tion as to the relative phasee of the various components of x(t) is missing (since only
the magnitudes of the coefficients appear). It therefore follows that all
periodic time fUDctions which have the same Fourier coeAicient magnitudes and
periodicities also have the same autocorrelation function even though their Fourier
coefficient phases (and hence their actual time structures) may be different. This
result points up the fact that the correspondence between time functions and auto
correlation functions is "manytoone" correspondence.
49. Problems
1. The random variable x has an exponential prohability density function
p(x) lD a exp(  2alxf)
a. Determine the mean and variance of x.
b. Determine the nth moment of
I. Prove that U2: is a bounded random variable, then the nth moment of % exista
for all n.
I. Let % be a random variable and c an arbitrary eonstant, Determine the value of
c which makes B(x  C)IJ a minimum.
72 RANDOM SIGNALS AND NOISE
' The random variable 11 is defined &8 the sum of N random variables :c.:
Derive an expression for the variance of y when
a. the random variables %" are uncorrelated, and
b. when they are correlated.
Express your results in terms of the variances ,1(%,,).
I. The random variable % may 888ume only nonnegative integral values, and the
probability that it assumes the specific value m is given by the Poisson probability
distribution
P(
.  exp( ).)
%  m  ml
(49.6)
a. Determine the mean and variance of z:
b. Determine the characteristic function of %.
8. The discrete random variables % and, each have a Poisson probability distribu
tion (as in Prob. 5). If % and, are statistically independent random variables, deter
mine the probability distribution of their sum J  Z + 1/. .
.,. Let % be a gaussian random variable with the probability density function
exp[  (z  a) 1/211
1
]
p(%)  y'2; b
(497)
This is also known as the normal probability denaity function.
a. Derive an expre88ion for the characteristic function of %.
b. Using 0, derive an expression for the mean and standard deviation of s, and
expreaa p(%) in terma of these quantities and %.
c. Using (I and b, derive an expression for the nth moment of %for the particular case
in which % haa a zero mean.
I. Let % and JIbe ltatistically independent laUllian random variables.
a. Derive an expression for the joint probability density function P(%,II).
b. Derive an expression for the joint characteristic function for % and 1/.
I. Let II b. a random variable having the probability density function
{
,,2
1/2 exp( 1//2)
1'(1/)  0 2'tr(n/2)
for II ~ 0
for 1/ < 0
(498)
where" is & coutant and r is the lamma function. Normally, this probability den
sity function is written u a function of 11  x and is called the cAittJUG,ed probability
density function of " d,,,., of freedom.
Determine the mean of fl.
b. Determine the characteristio fUllotion of fl.
AVERAGES 73
10. Let % and 7/ be statistically independent random variables, each having a chi
squared probability density function. Let the number of degrees of freedom be ft for
2: and mfor y. Derive an expression for the probability density function of their Bum
, .. z + y.
11. Let z be a gaussian random variable with zero mean and standard deviation of
unity. Consider the results of making N statistically independent measurements z"
of the random variable z. A new random variable 7/is then used to characterise this
combined experiment and is defined by the equation
Derive an expression for the probability density function of 1/.
11. Let J be a random variable such that
p(.)  { ~
for 0 S. S 2..
otherwise
and let the random variables z and 7/ be defined by the equations
z  sin z
and
7/  C08 ,
Show that the random variables z and 11 are linearly independent but not statis
tically independent.
11. Let the random variable 11 be defined as the sum of N statistically independent
random variables z,,:
where each of the random variables z" has two possible values: unity, with probability
Pi and sero, with probability q  1  p.
a. Determine the characteristic function of 2:
b. Determine the characteristic function of 1/.
c. Determine the probability distribution for 7/.
1'. Let z and 7/ be discrete random variables each having tbe two equally probable
values +1 and 1.
a. Show that their joint probabilities are symmetric; i.e., show that
and
P(z  1,11  1)  P(z  1"  1)
P(z .. 1,11  1)  P ( ~  1,y  1)
b. Derive expreaeioDa for the joint probabilities in terms of tbe correlation coefli
cient PlI, relating % and 1/.
11. Consider the three stationary random proCeIIeI havinl the Ample functioDS
z(o, f/(t), and '(0, respectively. Derive an expreeeion for the autocorrelation fune
tion of the ,um of these three proceeaea under the UIUDlptioDi
74 RANDOM SIGNALS AND NOISE
1
a. that the processes are correlated,
b. that the processes are all uncorrelated,
c. that the processes are uncorrelated and all have sero meana.
lG. Let x(t) be a periodic squarewave function of time asshown in Fig. 45. Derive
an expression for the autocorrelation function of x(t).
+1
~  l  ~ . . . J
FIG. 45. Periodic square wave.
17. Consider an ensemble of pulsed sample functions defined by the equation
z(t)  { ~
for n ~ t < n + 1
for n  1 S t < n
where n ma.y assume odd integral values only and where the coefficients (I" Are ata
tistically independent random variables each having the two equally likely values
zero and one. Determine and plot <R. (t,' +.,.) >.
18. Repeat Probe 17 for an ensemble in which the coefficients 0" are statistically
independent random variables each having the two equally probable values plus one
and minus one.
19. Let x(t) be a bounded periodic function of time with a fundamental period of '1',
seconds. Prove that
1 /+2' 1 tTl
lim 2T x(t) de  T x(t) dt
Tt. T 0
20. Consider the random process defined by the sample fUDctions
y(t)  (I C08 (t + .)
where (J and", are statistically independent random variables and where
for 0 S .. S 2r
otherwise
a. Derive an expression for the autocorrelation function of this process.
b. Show that E(1I') ell <J1(t) >
21. Consider the random process defined by the sample functions
%(t) 11:8 11
1
(1)
where the sample functions yet) are those defined in Probe 020.
AVERAGES 75
a. Show that if a is not constant with probability one,
E(z,) ~ <z(t) >
b. Show that the integrability condition stated by Eq. (480) is not satisfied by this
random process.
21. Consider the constant random process defined by the sample functions
z(t) = a
where a is a random variable having the two possible values +1and 1 with the prob
abilities p and q respectively.
a. Determine the autocorrelation function of this process.
b. Show by thedirect evaluation that
B(z,) <x(t) >
c. Show that the integrability condition stated by Eq. (480) is not satisfied by this
random process.
23. Consider the stationary (strictsense) random process whose autocorrelation
function is R ~ ( 1 ' ) . Assuming that the various following derivatives exist, show .that.
24. Let x(t) and Yet) be periodic functions of time with incommensurable funda
mental periods and zero average values. Show that their time crosscorrelation func
tion is zero.
II. Let xCl) and yet) be different periodic functions of time having the same funda
mental period. Show that their crosscorrelation function contains only those har..
monies which are present in both x(l) and Yet).
CHAPTER 5
SAMPLING
151. Introduction
So far we have assumed that the probability distributions of given
random variables, or random processes, were known a priori and that
the problem was to calculate various functions (e.g., moments, charac
teristic functions, and so forth) of those probability distributions. In
practical situations, however, we often face a quite different problem.
After making a set of measurements of a random variable, or random
process, having unknown probability distribution functions, we may ask
whether or not some of the statistical properties of that random variable
or process can be determined from the results of those measurements.
For example, we might wish to know if we can determine from measure
ments the value of the mean or variance.
Consider now the results of N performances of the basic experiment
which defines a given random variable; or, alternatively, consider the
results of measuring the value of a sample function of a given random
process at N different instants of time. The set of N sample values so
obtained specifies the result of such a combined experiment and hence
a sample point in the Ndimensional sample space which characterizes
the combined experiment. Since a given set of results determines only
a single point in the joint sample space, we should not expect that it
would completely determine the statistical properties of the random vari
able or random process in question. At best we can hope only to get
an estimate of those properties. A basic problem, then, is to find a func
tion, or 8tatistic, of the results of the measurements which will enable us
to obtain a good estimate of the particular statistical property in which
we are interested.
In this chapter we shall introduce some of the principles of the theory
of sampling by considering the problem of estimating the mean value
of a random variable or process and by studying further the relation
between relative frequency and probability. For a detailed discussion of
the fieldof sampling theory the reader is referred to statistical literature. t
t See for example Cram6r (I, Part III).
76
SAMPLING 17
&2. The Sample Mean
Since the problem of estimating the mean of a random variable is
essentially identical to that of estimating the mean of a widesense sta
tionary random process, we shall consider both problems at the same
time. Let x be a real random variable with the finite mean m
z
and the
finite variance (1.2, and let z(t) be the sample functions of a widesense
stationary real random process which has the same finite mean m. and
finite variance (1z2. Suppose nowthat we make N measurements of either
the random variable or a sample function of the random process. Let
x(n), where n = 1, ... , N, be the value of the nth sample of the ran
dom variable; let x(t,,) be the value of a sample function of the random
process at t = ttl; and let x" be the random variable which describes the
possible values that can be taken on by either x(n) or x(t,,). Note that
x" has the same statistical properties as x or as x,. In particular,
and (51)
(53)
for all n.
We pointed out in Art. 41 that the mean of a random variable was a
conceptual extension of the arithmetic average of a sample function.
With this fact in mind, let us eensider the sample mean mt, defined to be
the arithmetic average of the N random variables x"'
N
rot == ~ 2: x" (52)
,,1
as a statistic to be used in the estimation of the desired mean. The
statistical average of this sample mean is
N
E(rot) == ~ 2: E(x,,) == m"
nl
i.e., the statistical average of the sample mean is equal to the mean value
of the random variable (or process) under study. A statistic whose
expectation equals the quantity being estimated is said to be an unbia8ed
estimator; the sample mean is therefore an unbiased estimator for the
mean.
63. Convergence of the Sample Mean
The sample mean is not the only unbiased estimator of the desired
mean; the Binglesample random variable x" is also unbiased, as shown
by Eq. (51), and there are others as well. Whether or not any given
statistio is good depends mainly on how likely it is that its sample value
is close to the actual value of the statistical property in question. One
78 RANDOM SIGNALS AND NOISE
(54)
etatistie is better than another if its sample value is more likely to be
close to the desired value. Accordingly, we shall now study the con
vergence properties of the sample mean.
The variance of the sample mean is, from Eqs. (52) and (53),
N N
cr'(91l) = [;2 LLE(x.x.) ]  ma'
,,1 ",1
where we have interchanged the operations of statistical averaging and
summation. A somewhat more convenient form is
N N
cr
'(91l)
= ~ 2 LL[E(x"x...)  m..']
,,I tnI
In order to proceed further, we must have data as to the correlation
between the nth and mth samples, i.e., as to E(x"x",). For sampling of a
random process, this correlation is simply the autocorrelation function
of that process:
For sampling of a random variable, however, no statistioal property of
the given random variable alone gives us the required information; the
correlation must be obtained in one way or another from a knowledge
of the method of sampling.
To see the effect of the correlation between samples, suppose first that
the samples are 80 highly correlated that, approximately,
E(z"x".) = E(x")
(55)
for all values of nand m. In this case, Eq. (54) reduces approximatelyjo
N N
cr'(91l) = ;, LLa..' = cr..' (56)
n1.1
Therefore, when the samples are highly correlated, the variance of the
sample mean is approximately equal to the variance of the random vari
able (or process) under study, whatever the number of samples. In this
case then, a single measurement provides just as good (or bad) an esti
mate of the desired mean as does any other number of measurements.
Whether a good estimate is obtained or not here depends strictly on the
properties of the random variable (or process) under study.
Uncorrelated Samples. Next let us suppose that the various samples
are uncorrelated. In this case,
E(x,.:",) = E(xft)E(x",) = m.
J
n pi m (67)
SAMPLING 79
and all the terms of the double summation in Eq. (54) vanish except
those for which n == m. There are N of these terms, and for each the
bracketed expression is simply (f.2. Equation (54) therefore reduce. to
a 2
a
2
(mt) == N (58)
when the samples are uncorrelated. Such samples will be obtained,
in particular, when the successive basic experiments are statistically
independent.
If now we allow the number of measurements to increase without
limit, it follows from Eq. (58) that
lim (12(mt) == 0 (59)
N....
and hence from Art. 46 that the sample mean converges in the mean
and hence in probability to the desired mean
and
l.i.m. mt == ms
N....
p lim rot = m
s
N....
(510a)
(510b)
Such an estimator is said to be a consi8tent estimator.
According to Eq. (510), which is one form of the lawof large number"
the sample mean becomes a better and better estimator of the desired
mean as the number of measurements increases without limit. By this
we mean strictly what Eq. (510b) says: the probability that the sample
mean differs from the desired mean by more than a fixed quantity becomes
smaller and smaller as the number of measurements increases. However,
it is perfectly possible that the value of the sample mean as obtained from
a particular sequence of measurements continues to differ from the desired
mean by more than some fixed amount, as N . 00.
An estimate of the expected measurement error for finite values of N
may be obtained from the Chebyshev inequality, Eq. (462):'
P[lrot  rotI ~ e] s c r t ( ~ )
E
Using the above expressions for the statistical average and variance of
the sample mean, this inequality becomes
(511)
This result holds regardless of the particular probability distribution .of
the random variable or process being sampled and hence may well give
only an excessively high upper bound to the expected error.
80 RANDOM SIGNALS AND NOISE
(512)
Periodic Sampling. For a random process, the case of greatest practical
interest is probably that in which the sampling is done periodically in
time. Here the N measurements are spaced equally throughout a total
measurement interval of length T, with a sampling period to = T/ N.
If the measurement interval starts for convenience at t = 0, the nth
sampling instant occurs at t; = nt; Equation (54) then becomes
N N
er
2
(mt) = ~ 2 2: 2: IR,,[(m  n)to]  m,,21
nl ",1
Consider now the various terms of the double summation. It is perhaps
helpful to imagine an NbyN array of the elements (n,m) making up the
double sum. The following can then be observed: First, each of the
principal diagonal terms (n,n) is the same and is equal to u
s
2
Next,
each element (n,m) is equal to the element (m,n), since
Rz[(m  n)t.l = Rz[(n  m)tol
that is, every term above the principal diagonal has an equal counterpart
below that diagonal. Further, each element (n,m) is equal to an element
(n + j,m + j), as we are sampling a stationary random process, and hence
all the (N  k) elements in the kth diagonal above the principal diagonal
are equal. It therefore follows that the variance of the sample mean is
a 2. 2 N ~ l ( k )
er
2
(mt) = if + N '' 1  N [R,,(kto)  m,,2]
kl
when a widesense stationary random process is periodically sampled.
Let us next determine the effect of increasing the number of measure
ments without limit, at the same time keeping the total duration of the
measurement interval fixed. Here it is convenient to rewrite Eq. (512)
in the form
Nl
er
2
(mt) = U;; +i L(1  k;) [R,,(kto)  m,,2]to
kl
If we now hold T fixed and allow N + 00 (and hence to + 0) in such a
\vay that kt; = T, the summation becomes an integral and the term uz"/N
vanishes. Therefore
~ er
2
(mt) = ~ IT (1  ;)[R,,(r)  m,,2] dr (513)
and we see that the variance of the sample mean here will not in general
become vanishingly small as N . co unless T also increases without
limit.
SAMPLING 81
Since when N . ex> and T remains constant the sample mean becomes
a finite time average, i.e., since
1 h
T
lim mr = T x(t) dt
N+ao 0
it is not surprising that Eq. (513) gives the same result as that previ
ously obtained in the study of the statistical properties of time averages
(Eq. (483)).
54:. The Central Limit Theorem
For statistically independent samples, the probability distribution
oj the 8ample mean tends to become gau8sian as the number of stati8tically
independent samples is increased without limit, regardless of the proba
bility distribution of the random variable or process being sampled as
long as it has a finite mean and a finite variance. This result is known
as the equalcomponents case of the central limit theorem.
We shall now prove this result by making use of the fact that the
characteristic function of a sum of statistically independent random vari
ables is the product of their respective characteristic functions. Let y be
the normalized sample mean:
fit  m;
y::l: er(ffit) (514)
Since the samples are statistically independent, it follows from Eq. (58)
and the definition of the sample mean that
N
LX.)  m
a
N
Y
= "II\"(x"  m
z
)
= Lt 0'%
,,1
If then we define to be the normalized random variable (x"  mz)/u
Z J
we may write
(515)
The characteristic function of y is therefore
N
M.(jv) = E [ exp 2: 1
,,1
N
::I: E [ll ex
p
;:)]
,,1
82 RANDOM SIGNALS AND NOISE
Since the samples, and hence the normalized samples, are .statistically
independent random variables, all having the same probability distribu
tion (that of the random variable or process being sampled), it follows
from Eq. (433) that
Mw(jv) = EN [ex
p
= M EN (516)
where = (x  ms)/a.
We next wish to determine the behavior of the characteristic function
of the normalized sample mean as the number of samples increases with
out limit. To this end, let us expand the characteristic function of in a
Taylor series with a remainder. t Thus, since has a zero mean and 8
unit variance
where the remainder term A(I1/NJA) is given byt
A ==  M('(O)
(517)
(518)
in which ME' is the second derivative of the characteristic function of
and 0 8 1. The existence of the second moment of implies the
existence of a continuous second derivativej of M EJ hence
lim
(519)
in particular, as N + 00 for fixed v. Taking the logarithm of M"Civ)
and using Eq. (517), we then get
log Mw(jv) = N log [ 1  + A (;)]
The logarithm on the right side of this equation can be expanded by
using the expression
log (1 +e) = z +B(z)
in which the remainder term is
t: t
B(Il) =  Jo 1 +t dt
t Of. Courant (I, Chap. VI, Art. 2).
t ius; Art. 2, Sec. 3.
t Cf. Cram6r (I, Art. 10.1).
Cf. Courant (I, Chap. VI, Art. 1).
(520)
(521)
SAMPLING 83
(522)
(524)
Thus we obtain
1 M (.)   + NA (..!.) +NB [VI +
og II ,V  2 N 2N
It follows from Eq. (519) that the second term on the right side of Eq.
(522) vanishes as N 00. The third term also vanishes as N' 00
since, from Eq. (521),
I
B(z) I <! t: t dt = !. 0 as z 0
z  z)o 2
2
Therefore lim log MJI(jv) =  v
2
N+
and hence MI/(jv) = exp (  (523)
The right side of Eq. (523) is the characteristic function of a gaussian
random variable which has a zero mean and a variance of unity. Since
that limit function is continuous at v = 0, it therefore follows that]
. 1 (y2)
p(y) = (2r)} exp  2"
Thus the limiting form of the probability distribution of the sample
mean is gaussian when the individual samples are statistically inde
pendent and have a finite mean and a finite variance.
Comments. The tendency of the probability distribution of a sum of
random variables to become gaussian as the number of components
increases without limit can be shown to hold for much less restrictive
assumptions. t For example, if the components x" having means m" and
variances If,,2 are statistically independent and if
=0 (525)
where 6 > 0, then it can be shown that the normalized sum random
variable has a gaussian probability distribution in the limit as N 00.
This result is known as Liapounojf8 theorem. Even the requirement of
statistical independence of the components can be waived under certain
t See for example Cramm (I, Art. 10.4).
See for example Gnedenko and Kolmogorov (I, Chap. 4).
I See Uspensky (I, Chap.. 14, Bees. 2, 3, and 4) or Loeve (I, Ser.. 20).
84 RANDOM SIGNALS AND NOISE
conditions, and the limiting distribution of the sum will still be gaussian. t
It should be emphasized, however, that the limiting distribution of a
sum of random variables is not always gaussian and each individual case
should be investigated to see whether or not such a theorem applies.
The above statements refer to behavior in the limit as N + QO. It
should be realized that when N is finite, even though apparently large,
the gaussian distribution may well give a poorapproximation to the tails
of the probability distribution of the sum, even though the limiting form
of the sum distribution is in fact gaussian. t This would be true, for
example, for the case of a sum of statistically independent Poisson ran
dom variables, since the probability distribution of such a sum is also
Poisson for any N.
66. Relative Frequency
An important special case of sampling is that in which the random
variable or process being sampled can assume only the values unity and
zero with the probabilities p and (1  p) respectively. Consider, for
example, the determination of the relative frequency of occurrence of
some given event (A). Here we define a random variable z" to have
the value unity when the event (A) occurs in the nth sampling and zero
when it does not. It therefore follows from the definition of the sample
mean, Eq. (52), that
"" _ n(A)
un,  N
(526)
where n(A) is the random variable which specifies the number of times
the event (A) occurs in N samples. In this case, then the sample mean is
the random variable corresponding to the relative frequency of occurrence
of the event (A) in N samples.
Let p be the probability of occurrence of event (A), then
and hence, from Eq. (53),
E(x
n
) = P
E(mt) = P
(52i)
(528)
(529)
Thus the8tatistical average of the relative frequency of occurrence of an event
is equal to the probability of occurrence 0/ that event.
Uncorrelated Samples. Since
(12(X,,) = E(x,,")  E"(z,,)
= p _ p2
t See for example, Loeve (I, Sec. 28), Uspensky (I, Chap. 14, Sec. 8), or L6vy (I,
Chap. 8).
~ This point is discussed at length in Fry (I, Artl. 82 and 89).
SAMPLING 85
it follows from Eq. (58) that the variance of the relative frequency
becomes
u'(ffit) = p(l  p)
N
(530)
when the various samples are uncorrelated. An upper bound to this
variance may be found by taking the derivative with respect to p,
equating to zero, and solving for p. In this way the maximum value
of the variance is found to occur when p = 72 and is
(531)
(532)
If now we substitute these various results into the Chebyshev inequality,
Eq. (511), we get
p [I n(A) _ I> ] < p(1  p) < _I_
N P  E  NE!  4NE'J.
Since the variance of the relative frequency tends to zero as N . co,
it follows from Art. 46 that the relative frequency of occurrence of an event
converges in probability to the probability of occurrence of that event as the
numberof uncorrelated ,amples increase' without limit. This result, known
as the Bernoulli theorem, justifies in essence the relativefrequency
approach to probability theory introduced in Chap. 2.
Independent Samples. The determination of the probability distribu
tion of the relative frequency becomes simple when the various samples
are statistically independent. The situation then becomes the same as
that of the coin tossings discussed in Example 26.2, and the successive
samplings therefore form Bernoulli trials. By paralleling the develop
ment of Eq. (221), it can be shown that the probability distribution of
the relative frequency is the binomial distribution. Thus
p [n<;) = ;] = p"(l _ p)Nn
when the samples are statistically independent.
66. Problems
1. The discrete random variable % has the Poisson probability distribution
(533)
m) __)
IN ==  ml
m  0,1,2, ...
Let mt be the sample mean formed from N statistically independent samplings ot x.
a. Determine the mean and variance of the sample mean.
b. Determine the probability distribution of the sample mean.
c. Plot the results of b for N .. 3 and N 10.
86 RANDOM SIGNALS AND NOISE
I. The random variable x has a Cauchy probability density function
p(z) == ! (_1_)
11" 1 +Xl
Let 3I'lbe the sample mean formed from N statistically independent samplings of x.
a. Determine the mean and variance of the sample mean.
b. Determine the probability density function of the sample mean.
c. How does this case fit into the discussion of Art. 53 and 541
3. Consider the stationary random process defined by the ensemble of sample
functions x(t). Let the autocorrelation function of this process be
for 0 ~ ITI ~ T.
for To ~ ITI
Suppose that the sample mean of this process is obtained by integrating a sample
function over a measurement interval of length T ~ 1"0:
ItT
mt ==  x(t) dt
T 0
Determine the variance of this sample mean.
4. The probability PHthat a measured relative frequency n(A)/N differs by more
than E from its corresponding probability peA) is given by Eq. (532).
a. Determine, as a function of P(A), a bound on the required number of measure
ments N such that
b. Plot N of part a as a function of P(A), assuming a == 0.05 and PN == 0.1, for
values of P(A) in the interval (0.01,0.99).
&.t Show that the binomial distribution
Pen) = ( ~ ) p"(1  p)N....
tends to the gaussian distribution
1 [(n  'In,,)IJ
P(n) ::I _ In exp  2cT
'()
V 2...v(n) n
where m.,. a:::z E(n), when N + 00 and p remains constant.
8.t Show that the binomial distribution of Probe 5 tends to the Poisson distribution
m
Pen) = T exp( m,,)
n.
8S N + 00 and p + 0 such that Np remains constant.
7.t Show that the Poisson distribution of Probe 6 tends to the gaussian distribution
of Probe 5 as m.. + GO.
t See Feller (It Chapa. 6 sad 7) and Fry (I, Chap. 8).
CHAPTER 6
SPECTRAL ANALYSIS
81. Introduction
The resolution of a known electrical signal into components of differ
ent frequencies by means of a Fourier series or Fourier transform is such
a useful technique in circuit analysis that one naturally asks if a similar
procese can be applied to noise waves. To a large extent it can; that is,
noise waves are represented by random processes, and at least for sta
tionary (or widesense stationary) processes a quite satisfactory theory of
Fourier analysis has been developed. In this chapter we shall discuss
some parts of this theory.
Fourier Series. It is assumed that the reader has some knowledge of
Fourier series and integrals, t sowe shall state here only a few basic facts
as a reminder and for reference. If x(t) is a real or complexvalued
periodic function of a real variable t (e.g., time, if x(t) is to represent a
signal) and if x(t) is absolutely integrable over a period T, i.e., if
10'1' Ix(t) I dt < 00
then x(t) has associated with it a Fourier series
(61)
.
x(t) = 2: a"e
j
" ..,
n 10
(62)
(63)
where aft is called the nth Fourier coefficient and is given by
liT
an =  x(t)ein(llol dt
T 0
If any of various conditions are satisfied by x(t), then the sum on the
right side of Eq. (62) converges in some sense to x(t). In particular,
if x(t) is of bounded variation] in the interval 0 ~ t S T, then that sum
converges to z(t) everywhere that x(t) is continuous. Since for x(t) to
t For example, see Churchill (I, Chaps. III, IV, and V), Guillemin (II, Chap. VII), or
Titehmarsh (I, Chap. 13).
t. Titchmarsh (I, Art. 13.232).
87
88 RANDOM SIGNALS AND NOISE
be of bounded variation means roughly that the total rise plus the total
fall of the function x(t) is finite in the interval 0 ~ t ~ T, this condition
on x(t) is usually satisfied in practical problems.
An alternative condition on :e(t) is that it be of integrable square for
os t s T, i.e., that
(64)
The sum on the right side of Eq. (62) then converges to x(t) in the sense
that
(65)
(66)
This kind of convergence, which means that the meansquare error is
going to zero, is called convergenceinthemeant and is written
N
x(t) = I.Lm. \' a"e'"....
N... ''
nN
Since the condition of Eq. (64) is the requirement that the energy in
one period of x(t) is finite, we almost always are dealing with the case
where it is satisfied. In fact, we shall assume henceforth that every
function we introduce satisfies Eq. (64) for every finite number T, unless
we state otherwise, whether T is the period of a periodic function or not.
For convenience we shall write simply x(t) = i(t) rather than x(t) equals
a limitinthemean, as in Eq. (66).
When the condition of Eq. (64) is satisfied, the result is Parseval's
theorem.I which states that the average energy in the signal is equal to
the sum of the average energies in each frequency component; i.e.,
(67)
Fourier Transforms. The Fourier transform x(t) of a function X(f) is
defined by
x(t) = 1 X{f}e
ioJ
/ df
w = 2rf (68)
t We have already used in Art. 46 the term "convergenceinthemean" in con
nection with convergence of random variables considered as functions on a sampIe
space. I t will be observed that the definition here is the same as the previous defini
tion except that the averaging, or integration, is here with respect to the real variable
t, whereas there it was with respect to the samplespace variable, The use of the
term in both these ways is standard.
~ Titchmarsh (I, Art. 13.63),
SPECTRAL ANALYSIS
89
jf the integral exists in some sense. The inverse Fourier transform of
x(t) is defined by
1l:(f) = J :J:(t)et
I
dt
(69)
if it exists. One of the most basic results of Fouriertransform theory is
Plancherel's theorem, t which states that if X<f) is of integrablesquare
on the whole line,  00 < f < CIO; i.e., if
J IX(!)12 df < 00
(6.10)
(6IIb)
(6Ila)
then there is a function x(t) which is also of integrablesquare on the
whole line which is related to X(f) by the equations
x(t) = l.i.m. fA X (f)ei,., df
A.IO A
X(J) = l.i.m. fA :J:(t)e""t dt
A.IO A
and
Under the condition of Eq. (610), an analogue of Parseval's theorem
holds; i.e.,
J IX(!) I! df = J... 1:J:(t)/2 dt
(612)
We shall speak of x(t) and X(/) as a Fouriertransform. pair. If in addi
tion X(I) is absolutely integrable, x(t) is given by Eq. (68); and if x(t)
is absolutely integrable, X(n = X(f) is given by Eq. (69). As with
Fourier series, we shall commonly write Fourier transforms as in Eqs.
(68) and (69), even in some cases where the correct mathematical
terminology requires a limitinthemean.
62. The Spectral Density of a Periodic Function
Our chief concern in this chapter is with the function called the power
spectral density (or simply spectral density), which gives the distribution
in frequency of the power of a signal or a noise, and with its relation
to the autocorrelation function. The situation where one considers a
signal which is a known function of time is different from that where
one considers a signal or noise which is represented as a sample function
of a random process. Although there are strong analogies, the two must
be treated separately. We shall begin by discussing the very simplest
case, the power spectral density of a single periodic function.
Let x(t) be a periodic function of period T, which has finite "energy"
per period, i.e., which satisfies the condition of Eq. (64). Then by
Parsevnl's theorem the time average of its energy, i.e., its power, is equal
t Titchmarsh (II, Art. 3.1).
90 RANDOM SIGNALS AND NOISE
to a sum of terms, each term associated with one frequency in its Fourier
series expansion. Each term, in fact, can be interpreted 8S the time
average of the energy of one particular component frequency. Thus we
are led to define a function 9,(/), the power Bpectral demity, by
.
S(J) = Lla..I' 8U  n!.)
n
(613)
Hence S(I) consists of a series of impulses at the component frequencies
of x(t), each impulse having a strength equal to the power in that com
ponent frequency, and clearly is a measure of the distribution of the
power in z(t). The total power in x(t) is
<R(T) = l /a..I'e+
i
 ..
,, .
If we take the Fourier transform of CR(r), we get
J+: <R(T)r
i
'" d1' = Jl IG..12
6
+
i
...... r ~ dT
 .
(616)
To =  d
v,
(79)
The electron velocity and position can BOW be expressed in terms of this
transit time as
and
(710)
(711)
where Va == is the electron velocity at the anode.
The current pulse induced in the anode circuit of the diode by the
flight of an electron through the cathodeanode space can be obtained
by finding the charge induced on the anode by the electron in motion
and taking the time derivative of this induced charge. The charge q
induced on the anode may be found as follows: The energy U gained by
an electron moving through a potential difference V is
z
U== eV = eV
ca d
This energy is equal to the amount of work W that must be done to
induce the charge q on the anode when the anode potential is Va:
W= qV.
On equating these energies and solving for q, we get
The anode current pulse is therefore
e() dq etJ
t. t == dt = d
(712)
115
2e
T.
FlO. 71. An anode current pulse in
a temperaturelimited parallelplane
diode.
for 0 s t s TG
otherwise
(713)
8HOT NOiSE
during the flight of the electron and
sero before the electron is emitted
from the cathode and after its arrival
at the anode. Thus
This current pulse is shown in
Fig. 71.
'12. The Probability Distribution of Electronemission Times]
In order to study the statistical properties of shot noise in a thermioni.i
vacuum tube, we first need to determine the probability P(K,T) that
exactly K electrons are emitted from the tube's cathode during a time
interval of length T. It seems reasonable in the temperaturelimited case
to assume that the probability of emission of an electron during a given
interval is statistically independent of the number of electrons emitted
previously; that this probability varies as the length of the interval for
short intervals, i.e., as 0
(714)
where a is as yet an undetermined constant; and that the probability is
negligible that more than one electron is emitted during such a short
interval, l.e., approximately
P(O,4T) +P(1,4T) = 1 (715)
for small
The probability that no electrons are emitted during an interval of
length" may be determined as follows: Consider an interval of length
". + to be broken up into two subintervals, one of length l' and one of
length 4'1". Since the emission of an electron during J1T is independent
of the number of electrons emitted during T, it follows that
P(O,T +4T) = P(O,T)P(O,4'r)
If we substitute in this equation for from Eqs. (714) and (715)
we get for small 41' the result
P(O,T + 41")  P(O,1') = Q,P(O T)
'
t Cf. Feller (I, Art. 17.2) and Doob (II, Chap. VIII, Art. ").
116 RANDOM SIGNALS AND NOISE
As t1T t 0, this difference equation becomes the differential equation
dP(O,T) = aP(OT)
dT '
which has the solution
P(O,T) = exp( aT)
where the boundary condition
P(O,O) = lim = 1
4,.....0
(716)
(717)
(718)
follows from Eqs. (714) and (715). Thus we have obtained a proba
bility as the solution of a differential equation. This is an important
technique.
Consider next the probability that K electrons are emitted during an
interval of length T' + Again we may break up that interval into
two adjacent subintervals, one of length T and the other of length t1T.
If 11T' is small enough, there are only two possibilities during the sub
interval of length aT: either one electron is emitted during that interval,
or none are. Therefore, for small aT'
P(K,T + L\'T) = P(K  1,'T;1,L\'T) +P(K,T;O,I1'T)
Since again the emission of an electron during I1T is independent of the
number emitted during T, it follows that
P(K,T +aT) = P(K  1,'T)P(1,aT) + P(K,'T)P{O,a'T)
On substituting for P(I,a'T) and P{O,a'T), we find for small values of
that
P(K,T' + P(K,T') + aP(K,T') = aP(K  1,T')
1'herefore as !:J.T 0 we obtain the differential equation
dP(K,T') + aP(K T') = aP(K  1 T')
d'T ' ,
(719)
as a recursion equation relating P(K,T) to P(K  1,'1). Since P{K,O) = 0,
the solution of this firstorder linear differential equation ia]
P(K,T') = a exp( aT') 10" exp(at)P(K  l,t) dt
(720)
If now we take K = 1, we may use our previous result for P{O,'T) to
obtain P(I,T). This result can then be used in Eq. (720) to obtain
.:a E.I., see Courant (I, Chap. 6, Art. 3.1).
SHOT NOISE 117
P(2,.,.). Continuing this process of determining P(K,T) from P(K  1,"')
gives
P(K)
(ClT)J[ exp( a.,)
,1' == KI
(721)
for K == 0, 1, 2, ...
The probability that K electrons are emitted during a time interval of
length T is given by the Poisson probability distribution.
The average number of electrons emitted during an interval of length
'T is
E (K)
_ exp( aT) ==
T  '' KI aT
KO
(722)
since the possible number of electrons emitted during that interval ranges
from zero to infinity. If we define 7i = ET(K)/'T as the average number
of electrons emitted per second, t it follows that
a=11
and that P(K,T) may be expressed as
P(K,,,) = (71,,).1[ 71,,)
(723)
(724)
Since the exponential tends to unity 88 + 0, for K = 1 and small
ft4'T this equation reduces approximately to
= (725)
which checks with Eq. (714). The probability that a single electron is
emitted during a very short time interval is therefore approximately
equal to the product of the average number of electrons emitted per
second and the duration of that interval.
Independence of the Emission Times. Suppose that the interval
(t,t +T) is partitioned into M adjacent subintervals by the times t = t..,
where m = 1, . , M  1. Define to == t, tJl = t + T, and
1'", == t",  '1
Then
Consider now the probability P(K
1
,T l ; ;KJI,TNrK,T) that if K elec
t The ..umption that a and hence n are constant for all time amount. to the
assumption that the random proCell in qUelJtiOD is ltatioDar)',
118 RANDOM SIGNALS AND NOISE
trans are emitted during the total interval of duration T, then K. elee..
trons are emitted during the subinterval of duration T"" where
M
K= LK...
",1
From the definition of conditional probability
P(K K
I
K )
P(K,T;Kl,Tl; .. ; KJI,TJI)
1,Tl; . ; M,TM ,T = P(K,T)
Since the probability that K
M
electrons are emitted during the last sub
interval (that of length TN) is independent of the number of electrons
emitted previously, we may write
P(K,T;Kt ,T l ; ;KM,TM) = P(K,T;Kt,Tt; ... ;K
M

t
, TMl)P(KM,TM)
Continuing this process, it then follows that
M
P(K,T;Kt,
T
l ; . ;KM,TM) = n P(K""T",)
",1
If now we use this result and the fact that the number of electrODS
emitted during a given interval has a Poisson distribution, we get
P(Kt,Tt; ... ;KM,TMIK,T)
(726)
Suppose next that N S M of the K", are unity and that the rest are zero,
and let those subintervals for which K", = 1 be indexed by n, where
n = 1, ... ,N. Here K == N, and we desire the probability that if
N electrons are emitted during an interval of duration T, then one is
emitted in each of N nonoverlapping subintervals of duration T", such that
N
L'f" s 'f
111
In this case, from Eq. (726)
N
P(Kl,'fli ... iKM,'fMIN,T) = Nj n'f"
T
,,1
(727)
It can be shown] that the same result may be obtained if we assume that
t Prob, 2 of this chapter.
SHOT NOI81: 119
the varioue eleetronemission times are statistically independent random
variables each with the uniform probability density function
for t ~ t
ft
S , +.,.
otherwise
(728)
For this reason the Poisson process is sometimes called a H purely ran
dom" process.
'13. Average Current through a Temperaturelimited Diode
The total current flowing through a thermionic vacuum tube is the
resultant of the current pulses produced by the individual electrons which
pass through the tube. It was pointed out in Art. 71 that the space
charge effect of electrons in the cathodeanode space of a temperature
limited diode is negligible; hence there is effectively no interaction
between the various electrons passing through such a diode. The total
current, then, is simply the sum of the individual electronic current
pulses, all of which have the same shape and differ only by translations
in time due to the different emission times. If, therefore, K electrons
are emitted in a particular temperaturelimited diode during a time inter
val (  T, +T) which is much larger than the transit time To of an electron,
we may write
1C
l(t) = l i.(t  tt)
it
for T S t S T (729)
where i.(t) is the current pulse produced at the anode by an electron
emitted at t = 0 and where tic is the emission time of the kth electron
emitted during the interval. This expression is not valid for values of
t within 'To of the left end of the given interval, but the end effects are
usually negligible, since 2T >> T
Time Averaging. Let us now determine the time average <I(t) > of
the total current flowing through a diode. If K electrons are emitted
by the given diode during the interval (T,+T), it follows from Eq.
(729) and the definition of time average that
1 2:
K
f+T
<I(t) > = lim 2T i.(t  tic) dt
T+ T
kl
Since all the current pulses have the same shape and differ only by
translations in time,
f
+.1' i.(e  t.) dt = f +7' i.(t) dt == e
I' 'I'
120 RANDOM SIGNALS AND NOISE
where e is the charge of the electron. Each of the K terms of the above
summation, therefore, has the same value e, and we obtain the hardly
unexpected result that the time average of the total diode current is equal
to the charge of an electron times the time average of the number of
electrons passing through the diode per second; i.e.,
where
<I(t = e<n>
I
. K
<n> = rm 2T
T.......
(730)
(731)
K
= J+: J+: [L i.(t  t,) ]
11
p(t
l
, ,tkIK)p(K) dt 1 dtK dK
It was pointed out in Art. 72 that the emission times are uniformly dis
tributed independent random variables. Therefore, using Eq. (728):
f
+ [ ~ f + T dt f+T dt ]
E(I,) = __ p(K) dK ~ '1' 2; " '1' 2; i.(t  t,)
kl
K
= f: p(K) dK [2: 2 ~ f_+'1''l'i.(t  t,) dt,]
kI
Statistical Averaging. t It is of interest next to determine the statistical
average of the total diode current. If we consider the particular diode
studied above to be one member of an ensemble of temperaturelimited
diodes all having the same physical characteristics, its current waveform
may be considered to be one sample function of an ergodic ensemble of
diode current waveforms, as we have already assumed a statistically sta
tionary system. The result presented in Eq. (730) for the time average
of the diode current is, therefore, equal with probability one to the sta
tistical average of the total diode current. However, we are not so much
interested here in the result as in the method used to obtain that result.
Let us again consider the total diode current during the time interval
( T,+T). The number K of electrons emitted during this interval is a
random variable, as are the various emission times li. We may there
fore obtain the statistical average E(I,) of the total diode current by
averaging Eq. (729) with respect to the K + 1 random variables con
sisting of the K emission times t. and K itself:
K
E(l,) = J: J: [2 c  t,) ]
kl
p(t l , ,tK,K) dt
l
dtK dK (732)
t Cf. Rice (I, Arts. 1,2 and 1.3).
SHOT NOISE 121
As above, all the electronic current pulses have the same shape and inte
grate to 6. Therefore all the K terms in the bracketed sum have the
same value e/2T. Hence
E(I,) = 2 ~ f: Kp(K) dK
From Art. 72 it follows that K has a Poisson distribution with a sta
tistical average 112T where n is the statistical average of the number of
electrons emitted per second. Thus, on defining i = E(I,), we get
i = en (733)
which is the statistical equivalent of Eq. (730).
The process of statistical averaging with respect to the K + 1 random
variables which consist of the K emission times t" and K itself will be
used several times in the remainder of this chapter.
'14. Shotnoise Spectral Density for a Temperaturelimited Diode
The autocorrelation function of the total diode current may be found by
again considering the total current flowingduring the interval ( T, +T)
to be a sum of electronic current pulses and performing the operation of
statistical averaging as in Art. 73. Thus
K K
Rr(T) = J+: !: [2: i.(t  tte) ] [2: i.(t +T  tj ) ]
kl j1
p(t J, ,tK,K) dt
1
dtK dK (734)
As before, the (K + I)fold joint probability density may be decomposed
into product form because of the statistical independence of K and the
various t,,:
K K
R,(,) = f+ p(K) dK [\' \' f+T dt
1
. 4 4 T 2T
i1 jl
J_+T
T
~ ; i.(t  tte)i.(t + T  ~ ) ]
The K'J terms of the double summation in this equation may be formed
into two groups: the K terms for which k = j and the K2  K terms for
which k "j. When k = i, the Kfold integral over the various t" becomes
f
+2' dt f+2' dt
.! .! i.(t  ti)i,(t + l'  t
i
)
I' 2T '1' 2T
1 f+
:c 2T _. i.(t)i.(t + T) dt
122 RANDOM SIGNALS AND NOISE
for each value of k, since all the current pulses have the same shape
(differing only by translations in time) and are all zero outside the inter
val ( T, +T). When k ~ i, the integral over the ,. becomes
f
+Tdtl .. f+T dt
2' 2T 2' 2f i.(t  t,,)i.(t +T  ti)
1 f+T 1 f+2' 6
=2T _ 2' ~ . ( t  t,,) dt" 2T _T I.(t +.,.  lj) dtJ = (2T)1
Substitution of these results into the above equation for R1(r) gives
Since K is a discrete random variable having a Poisson probability dis
tribution E
2T(K)
= ii2T and
lID
EIT(KI  K) = L([(I  K)(
n2
i2t exp( n2T) = (n2T)1 (735)
KO
We therefore obtain the result that the autocorrelation function of the
total diode current is
(736)
If now we define i(t) to be the fluctuation of the total diode current
about its mean value, i.e., the diode shotnoise current,
i(t) = I(t)  1 (737)
we see that the autocorrelation function of the shotnoise current is
(738)
since 1 = en. We shall evaluate R
i
( .,.) later for the particular case of a
parallelplane diode.
The Spectral Density. The spectral density Si(f) of the diode shot
noise current may be obtained by taking the Fourier transform of Eq.
(738). Thus
SaU) = n f+.... exp( jWT) dT f+: i.(t)i.(t +T) dt
where (IJ :::: 2r/. Letting T == t'  t
Si(/) = n f+: i.(t) exp(jwt) dt 1+: i.(t') exp( jwe') de'
BROT NOISE 123
(739)
(740)
If now we define G(f) to be the Fourier transform of the electronic cur
rent pulse i.(t),
we obtain
G(f) == J ~ : i.(t) exp( jwt) dt
S,(f) = nIG(f)l'
The spectral density of the total diode current is therefore
8,(f) = nIG(f)12 + (en)' a(f)
611" 511" 3r 221'
.d
2
the impulse at zero frequency arising from the average component of the
total diode current.
In many applications, the frequency of operation is small compared to
the reciprocal of the transit time of the tube being used. In these cases,
the spectral density of the shotnoise current may be assumed to be
approximately equal to its value at zero frequency. t From the definition
of G(f), it follows that 0(0) == 6. From Eq. (739) it therefore follows
that a lowfrequency approximation for the spectral density of the shot
noise current generated by a temperaturelimited diode is
S,(!) = net = el (741)
This result is known as the Schottky formula.
S,
Flo. 72. Spectral density of shot noise in a temperaturelimited parallelplane diode.
(742)
when Til ~ ,. S 7'.
otherwise
The Parallelplane Diode. Let us now apply these results to the case
of a temperaturelimited parallelplane diode. In Art. 71 we found that
the electronic current pulse in such a tube has the form of a triangle as
given by Eq. (713). On substituting this into Eq. (738) we get
{
4el (1 3l!l + 111'1
1
)
Ri (1' ) = ~ 1 ' . .  2 1'. 2 1'.1
t See, for example, Fig. 71.
124 RANDOM SIGNALS AND NOISE
as an expression for the autocorrelation function of the shotnoise current
in such 8 tube. Taking the Fourier transform of Ri(T) gives]
Si(!) =el (c.:e)4 (""e)1 +2(1  COS""e  ""e sin ""e) (743)
where w == 2../, for the spectral density of the shotnoise current. This
result is plotted in Fig. 72.
'IG. Shotnoise Probability Density for a Temperaturelimited Diodet
Since the total current flowing through a temperaturelimited diode is
a sum of statistically 'independent electronic current pulses, it might be
expected from our previous discussion of the central limit theorem that
the probability distribution of the total current would tend to become
gaussian as the number of electrons flowing through the tube increased
without limit. In fact, since approximately 6.25 X 1011 electrons per
second flowthrough a tube per ampere of average current, we should be
able to assume the total diode current to be 8 gaussian random process.
However, rather than apply the central limit theorem directly, we shall
derive a general expression for the probability distribution function of the
total diode current and then study its limiting behavior as the average
number of electrons passing through the tube increases without limit.
The General Form. As before, it is convenient to represent the total
diode current flowing during the time interval (T,+T) as a function
of the K +1 random variables t ~ and K. The probability density func
tion p(I,) of the total diode current may be determined from the condi
tional probability density function p(I,IK) of the total current, subject
to the hypothesis that K electrons have passed through the tube during
the given interval, by use of the relation
p(l.) = Lp(I.IK)P(K)
KO
(744)
which follows from Eqs. (320), (329a), and (333b). The conditional
probability density p(I.IK) may most easily be obtained from its corre
sponding characteristic function MC1.llC)(jU), where
K
M(I.IIC)(jU)  E {exp [i
u
L'.(t  til ]}
.1
t Cf. Rack (I, Part III).
*Of. Rice (I, Artl. 1.4, 1.6, and 3.11).
SHOT NOISE 125
The emission times t.. are statistically independent randoil! variables,
hence
lC
M(I.IIC)(ju) == nE(exp[jui.(t  on
iI
Since the emission times are uniformly distributed,
All K terms of the product hence have the same form; we may therefore
write
{
I .: }K
MCl,llC)(jU) = 2T T exp[jui.(t  T)] dr
The corresponding conditional probability density is
1 f+ {1 f+T }K
p(I,IK) == 2... __ exp(jul,) 2T T exp[jui.(t  7')] d.,. du
Substituting this result into Eq. (744) and making use of the fact that
K has a Poisson probability distribution, ,ve get
 {_f+T .. }K
1 + n  'T) d'T]
pel,) == 2... J exp( jul,  n2T) 2: T K! du
'0
The summation over K in this expression is a powerseries expansion of
an exponential function, hence
1 f+
p(lc) = 2 exp( juIc)
r __
exp {n2T + n J_:
T
exp[jui.(t  .,.)] d.,.} du
If, for convenience, we now express 2T as the integral with respect to T
of unity over the interval ( T,+T), we obtain
1 f+
p(l.) == 2 exp( juI,)
r .
exp(n J+; {eXp[jUi.(t  .,.)]  I} d.,.)dU (745)
The electronic current pulse i.(t) is zero for values of t in excess of the
transit time To. Therefore
exp[jui.(t  1')  1 == 0 (746)
126 RANDOM SIGNALS AND NOISE
for I'  ".1 > T. and for all t not within ,.. of the end points of the given
interval. Both limits of the integral in Eq. (745) may hence be extended
to infinity, and'  T may be replaced by t', giving
1 f+
p(I,) = 2 exp(juI,)
T' .
exp (n f+: fexp[jui.{t')]  11 tit) du (747)
as a general expression for the probability density function of the total
current flowing through a temperaturelimited diode. Note that thi8
result is not a function of t, which confirms our previous assertion that
the shotnoise process under study is a statistically stationary random
process.
The Limiting Form. As it is not very evident from Eq. (747) just how
pel,) varies with I" let us now determine the limiting behavior of that
result as the average number of electrons passing through the diode per
second, n, increases without limit. Equation (733) shows that the aver
age of the total diode current increases without limit as n. 00, and it
follows from Eqs. (733) and (736) that the variance eT
1
of the total diode
current does 80 also, since
(748)
(749)
(751)
To avoid such difficulties, we shall study the limiting behavior of the
normalized random variable x, corresponding to the diode current, where
I,  1
x,=
a
rather than of 1, itself. The normalized diode current has a mean of zero
and an rms value of unity, whatever the value of n. The characteristic
function of the normalized diode current is
where Mr,(ju) is the characteristic function of the total diode current,
and is, from Eq. (747),
Mr,(ju) ... exp(n f: fexp[jui.(C')] 11 tit)
Therefore
M.,(jv) ... exp (  j:l +nf+: {exp (il1i:(t')]  1) tit') (752)
BHOT NOISlD 127
If now we expand in power series the argument of the integral in this
equation and integrate the result term by term, we obtain
n J+: {ex
p
(ivi:(t')]  I} dt'
=i:
n
J+: i.(t') dt'  ~ J+: i.'(t')dt'  ~ ~ J+: i.(t') dt'
+ =;:4 J+: i.
4
(t' ) dt' +
Substituting this result into Eq. (752) and using our previous result for
1and cr
l
in terms of n, it follows that
[
1 a f+
M..(jv) = exp  ~  ~ ! l T ~ _. i.(t') dt'
4 f+CIO
+ ~ ! : 4 _. i.
4
(t' ) dt' +
...]
(753)
and
Since tT varies as (ii)}i, the term of the bracketed sum which contains vA:
varies as (n)(k2)/2 when k ~ 3. All the bracketed terms except the first
therefore vanish as ii + 00; hence
! ~ ~ M..(jv) = exp (  i)
The characteristic function, and hence the probability density, of the
normalised diode current therefore becomes gaussian as the average num
ber of electrons passing through the diode per second increases without
limit.
Joint Probability Distribution. The various joint probability density
functions which specify the statistical properties of the current flowing
through a temperaturelimited diode may be obtained from an analysis
similar to that used above to determine p(lc). For example, in order to
determine the joint probability density function p(IhI2) of the random
variables 11 = l,+r and 1
2
= Ie, we may express the currents l(t) and
I(t + T) as sums of electronic current pulses. Thus, assuming that K
electrons are emitted during the interval (T,+T),we may write
K
I(t) == Li.(t  t,.)
kl
K
l(t +1') == Li.(t +l'  t.)
iI
where  T ~ t ~ +T as before. Using the probability distributions of
128 RANDOM SIGNALS AND NOISE
(755)
the emission times, we can determine first the conditional joint charac
teristic function
K K
M(l,,'.IIC)(jUJII) = E {ex
p
[i
u
l i.(t  tl:) +iv l i.(t +.,  tl:)]}
i1 i1
and then determine the corresponding conditional joint probability den
sity p(lt,1
2
IK ). The joint probability density p(I
1,12
) can be obtained
by averaging p(IhI21K) with respect to K. If we examine the limiting
behavior of the result as n+ 00, we find that p(I1,1
2
) tends to a two
dimensional gaussian probability density function. In a similar manner,
it can also be shown that the Nthorder joint probability density func
tion of the diode current tends to become gaussian as it + 00 and hence
that the diode current is a gaussian process.
78. Spacecharge Limiting of Diode Current
Although the temperaturelimited diode is of considerable theoretical
interest, the spacechargelimited diode is more common in practice. The
current flowing through the latter is limited by the action of the cloud of
electrons in the cathodeanode space rather than by a limited supply of
emitted electrons. In this section we shall derive the relation between
the current flowingthrough a spacechargelimited diode and the voltage
applied to it. In several of the following sections, we shall study the
effect of spacecharge limiting on shot noise. As before, we shall restrict
our discussion to conventional receiving tubes as introduced in Art. 71.
Let us, for convenience, again consider a parallelplane diode with
cathode plane at x = 0, anode plane at x = d, a cathode voltage of zero,
and an anode voltage V00 Since the charge density in the cathodeanode
space is not to be neglected here, the electric potential in that region
must satisfy Poisson's equation
d
2V
_ !. (754)
th
2
= Eo
The spacecharge density at a point is a function of the current density
flowing through the diode and the electron velocity at that point. The
velocity II of an electron at a given point in the cathodeanode space is
determined by its emission velocity II. and the potential at that point,
since the conservation of energy requires that
mv
S
_ mlJ.
s
+ V
22 e
Zero Emission Velocity.t Before discussing the case in which the
emission velocity of an electron may have any positive value, let us first
t cr. Harman (I, Art. 5.1).
BHOT NOISE 129
study that in which all electrons have a zero emission velocity. Although
the latter case is not very realistic physicallyJ its analysis contains many
of the steps of the more general case and yet is simple enough that the
ideas involved are not obscured by the details.
Since the emission velocity of each electron is zero, each electron has,
from Eq. (755), the velocity
vex') = [2e ]
at the plane z = z' in the cathodeanode space. All electrons at x' have
the same velocity; the charge density is, therefore, from Eq. (75),
_p = !. = J
fJ 2eV
and hence Poisson's equation may be written in the form
(756)
On multiplying both sides of Eq. (756) by 2dV/dx and integrating with
respect to x, we get
(
dV)2== 4J
dz Eo 2e
(757)
(758)
since V = 0 and dV/dx = 0 at x = o. The second condition results from
differentiating Eq. (755) to obtain
dfJ dV
mvdx=e
dX
and using the fact (from above) that fJ == 0 at x == O. Next, taking the
square root of both sides of Eq. (757), integrating with respect to x and
solving for the current density, gives
J == 4E
o
V.M
9 m d
2
as V == V. at % == d. This result is known as the LangmuirChild equa
tion and shows that the current density varies as the threehalves power
of the applied voltage. Because we are dealing with I), parallelplane
diode, the total current is simply the product of the current density and
the anode area.
Distributed Emission Velocities.t Although the zeroemissionvelocity
case is attractive in its simplicity, the situation in an actual thermionic
t Cf. Langmuir (I).
130
RANDOM SIGNALS AND NOISE
(759)
otherwise
for 0 S 11.
diode is far more complicated. In such a diode, the emission velocity
of an electron may take on any positive value and is in fact a random
variable having the Maxwell or Rayleigh probability density:
{
mo, ( mv
o2
)
p(",,) = ~ T c exp  2kT
c
where k is Boltzmann's constant (1.38 X 10
21
joules per degree Kelvin)
and To is the cathode temperature (in degrees Kelvin). t
Since the space charge in the cathodeanode region is due to the elec
trons in that region and since the charge of an electron is negative,
v
~ ... I..__#_
ex
region fJ region
d
FIG. 73. The potential distribution in a spacechargelimited parallelplane diode.
(760)
Poisson's equation (754) shows that the slope of the potential distribu
tion in the cathodeanode region must increase with distance from the
cathode. If the slope itself is positive, the corresponding electric field
will exert an accelerating force on an electron and thus increase its
velocity. If the slope is negative, an electron will experience a deceler
ating force and be slowed down. Therefore, if the space charge is to
limit the flow of electrons to the anode, there must be 8 potential mini
mum between the cathode and the anode, and the potential distribution
must have a form as shown in Fig. 73. Only those electrons having
emission velocities great enough to enable them to overcome the deceler
ating field between the cathode and the potential minimum will be able'
to reach the anode. Since, from Eq. (755), the velocity v", of an elec
tron at the potential minimum is
_ ( 2 +2eV",)H
fJ.  Vo m
where V", is the minimum potential (and is negative), the critical value
t Lindsay (1, Chap. V, Sec. 5), or van der Ziel (I. Art. 11.2),
BBOT NOISE 131
(762)
11. of emission velocity (i.e., the emission velocity of an electron which
just comes to rest at the potential minimum) is
V
e
= ( _ 2::WIY
i
(761)
If V
o
> V
e
, the electron will pass on to the anode; if v. < V
e
, the electron
will return to the cathode.
Since for any reasonable value of cathode current there will be a very
large number of electrons emitted per second, any measured velocity dis
tribution will probably differ but little from the Maxwell distribution. t
The emission current increment due to electrons with emission velocities
in the range (V
o,lI.
+ dll
o
) therefore is
dJ(vo) = J sp(vo) dvo ... J 8 (::) exp (  ; : ~ ) dv.
where J 8 is the saturation or total value of emission current density.
The current density J flowing past. the potential minimum is then
J = J s i: (:;:) exp (  ; ~ : ) dv. = J s exp ( ~ ~ ~ ) (763)
where the lower limit is 11
0
, since only those electrons with 1'0 > &fa con
tribute to the anode current.
In order to be able to solve Poisson's equation, (754), we need to
express the charge density in the plane at 2: in terms of the current den
sity there. The spacecharge cloud is composed of electrons having a
distribution of velocities. The contribution dp(v.) to the spacecharge
density at x by electrons emitted with velocities in the range (vo,v
o
+ dvo) is
dJ(V
o
)
dp(u.) =
v(v
o
)
(764)
where v(v
o
) is the velocity at z of an electron which is emitted with a
velocity v.. Substitution in this expression for dJ(v
o
) from Eq. (762)
gives
d ()
J sm,v. (mv
02
) d
 p V
o
= v(vo)kT. exp  2kT. V
o
which becomes
d ( )
 Jm [e(v  Vm)] [mv2(Vo)] tlo dvo
 p V
o
 leT. exp k/I', exp 2kT
c
v(v.)
on substituting from Eq. (7.55) for 11
0
in the exponential and for J s from
t Cf. Art. 55, in particular Eq. (532).
132 RANDOM SIGNALS AND NOISE
(765)
Eq. (763). Since V is constant at x, it follows from Eq. (755) that
11 dv == 11.du.
Hence,
dp == :;. exp [ e(Vk;eV.) ] exp ( ;;:::) dv
The calculations are simplified if we nowintroduce a normalized potential
energy
and a normalized velocity
e(VV",)
." == kT
c
(766)
(767)
(768)
11 == ( 2 ; ; ' ) ~ v
The equation above for the incremental charge density at x due to elec
trons having velocities in the range (v,u + dv) can then be rewritten as
 dp == J (2m)Ke.e
u l
du
r.
The total charge density at % can be obtained by integrating Eq. (768)
over the appropriate range of velocities.
Suppose that the plane z is located in the region between the potential
minimum and the anode; we shall denote this region as the ~ region.
The only electrons in the fJ region are those whose emission velocities
exceed the critical value, i.e., those for which U
o
satisfies the inequalities
I m I
mu
c
= eV < !!. < +ao
2 1ft  2
The correspondng inequalities for the electron velocity at the plane %
are, from Eq. (755),
mu
l
e(V  Vtn) S 2 < +~
Of, in terms of the normalized velocity at x,
e(V  Vm) <. < +
kT
c
= 11  u" ao
The total charge density at a plane in the ~ region is therefore
(
2)" 1+
p, 1:1 J ~ e' r'du
ia; ~
SHOT NOISE 133
(769)
The integral in this equation can be evaluated in terms of the erf function
2 l'
erf e ==  etl'du
V;o
where erf (+ 00) = 1. The equation for p, may be rewritten as
p, = J ( 2 ~ . ) } i e' [ ~ l ev1du  J; lv. ev1du]
hence p, =J (..!!!!..)}i e'[l  erf .y;j] (770)
2kT
c
Suppose now that the plane x is located in the region between the
cathode and the potential minimum; we shall denote this region as the
a region. There are now two classes of electrons passing the plane 2::
those passing z from the cathode going towards the potential minimum,
and those passing x in the opposite direction returning to the cathode.
The velocities at x of the electrons in the first class satisfy the inequalities
and hence
mv'
OST<+oo
oS u' < +00
The electrons in the second class have emission velocities great enough to
reach 2: but not great enough to pass the potential minimum. For these,
mv
l
oS 2 s e(V  V.)
Hence 0 S u
l
S "
Since the charge densities due to these two classes of electrons both have
the same sign, the total charge density at z is
P. = J(;;.)}i e' [h+ ev1du + LV; ev1dU]
Therefore, using Eq. (769),
p. = J ( 2 ~ . ) } i e'll +erf .y;j] (771)
Let us nowsubstitute the expressions for spacecharge density obtained
above into the Poisson equation. Thus
tllV J (11'111)"
 ==   ,'[1 erf .y;j]
dx
l
f 2kT
c
(772)
where the positive sign corresponds to values of z in the a region and the
negative lign corresponds to values of z in the {J region. On multiplying
134 RANDOM SIGNALS AND NOISE
both sides of Eq. (772) by 2dV/dx and integrating from x"" to x, we get
I dV)2 2J (rm
,
 =   e"[l erf y';j) dV
dx E 2kT
c
v",
since (dVjdx) = 0 at x = X",. From the definition of "1, it follows that
Hence
e
d'l =  dV
r,
(
d"1 )2 Je {"
dx = 7 }o e
1[1
erf y';j] d71
(773)
In terms of a new space variable
(2rm)J'
= 7 (kT.)" (x  x..)
Eq. (773) can be rewritten as
= L
1
e
1
[l erf y';j] d71
=e'  ] 10' e
1
erf V;; d7j
The second integral can be integrated by parts giving the result
(774)
where
==
= e'  1 (e
1
erf Y;; 
(775)
(776)
The problem has thus been reduced to one of integration as the solution
of Eq. (775) is the integral
(777)
Unfortunately, a closedform solution of Eq. (777) in terms of elementary
functions is not possible and numerical integration is required. The
results of such a numerical solution are tabulated in Langmuir (I).
An approximate solution of Eq. (7..77) can, however, be readily
obtained under the assumption that 'I is large compared with unity,
In this case one can show that Eq. (776) reduces approximately to
.... 2 (r;)Y,  1
SHOT NOISE i35
(778)
for values of % in the fJ region. Equation (777) then becomes approxi
mately
( )
w r" ( 1
= i [ 1J
J
( 1  2 d1J
A powerseries expansion of the bracketed term gives
(1  = 1+i +
On substituting this expansioninto our expression for and integrating,
we get
E_ = [1 + + ...J
The corresponding expression for current density in terms of anode poten
tial can be obtained by substituting in this result for E(x = d) from Eq.
(774) and for ,,(V = V.) from Eq. (766) and then solving for J. Thus
J == 4f
o
(2e)H (Vel  V.)'i {I + [ 1rkT
e
+ ..} (779)
9 fn (d  :t",)2 2 e(V
a
 V",)
which, under the assumption that" >> 1, reduces to
J == 4Eo (2e)Ji (Vo  V",)"
9 m (d  %.)2
If we compare this result with that obtained for the zeroemissionvelocity
ease, we see that the spacechargelimited thermionic diode with dis
tributed electronemission velocities acts (when its anode potential is
large compared with kTe/e) as though its cathode were located at the
potential minimum and each of its electrons had a zero emission velocity.
7..1. Shot Noise in a Spacechargelimited Diode
Fluctuations in emission of electrons from a heated cathode give rise
to fluctuations in the current ft.owing through a spacechargelimited
diode as well as in a temperaturelimited diode. However, even though
the electronemission times in a spacechargelimited diode are statisti
cally independent random variables, the arrival of an electron at the
anode is certainly dependent upon the previously emitted electrons.
For, 8B discussed in the previous section, whether or not a given elec...
tron reaches the anode depends upon whether or not its emission velocity
is sufficient to enable it to pass the potential minimum, and the depth of
the potential minimum depends upon the electrons previously emitted.
The effect of spacecharge limiting on the shotnoise spectral density
is not too difficult to determine qualitatively if we restrict our discussion
136 RANDOM SIGNALS AND NOISE
to frequencies which are small compared to the reciprocal of the average
transit time: Suppose that the rate of emission of electrons increases
momentarily. The added electrons will increase the space charge pres
ent, hence the depth of the potential minimum as well as the critical
value of the emission velocity will be increased. Thus, although more
electrons are emitted, the proportion of emitted electrons which reach
the anode is reduced. On the other hand, if there is a momentary
decrease in the rate of electron emission, the depth of the potential mini
mum will decrease and the proportion of emitted electrons reaching the
anode will be increased. The effect of spacecharge limiting is, therefore,
to smooth out the current fluctuations and hence to reduce the shot noise
to a value below that in a temperaturelimited diode with the same
average current. This fluctuationsmoothing action of the space charge
is essentially the same phenomenon as that which limits the diode current
to a value below saturation.
Quantitative determination of the effect of spacecharge limiting on the
lowfrequency spectral density of shot noise is rather complicated, and we
shall not attempt it here. However, the steps involved are as follows:t
Given that the electronemissionvelocity distribution is Maxwellian, the
analysis of the preceding section determines the anode current density.
Suppose now that there is an increase in emission of electrons with initial
velocities in the range (vo,vo+dv
o)
and hence an increase 8J
o(vo)
in
emission current density. The new value of anode current density may
be obtained, as in the preceding section, by determining the spacecharge
density present, substituting in Poisson's equation, and solving that
equation. If the increase in emission is small enough, the differential
equation relating the normalized potential energy 'I and the space vari
able ~ will differ from Eq. (778) only by the addition of a perturbation
term to 4>('1). However, a numerical solution is again required, though
an approximate solution can again be obtained which is valid for large 'I.
For small changes, the resulting increase in anode current density 11J will
be proportional to the increase in emission current density; i.e.,
8J = 'Y(v
o
)8J o(v
o
)
where ')'(11
0
) is called the linear reduction factor and is a function of tJoe
The actual distribution of electronemission velocities fluctuates about
the Maxwell distribution from instant to instant, giving rise to emission
current fluctuations. The basic assumptions in an analysis of shot noise
are that the emission current is temperaturelimited and that any fixed
division of the emission current is temperaturelimited. The spectral
density of the emissioncurrentdensity fluctuations caused by variations
t Cf. Rack (I) or Thompson, North, and Harris (I, pp. 75125, Part II, "Diodes and
Negativegrid Triodes").
SHOT NOISE
13'1
in electron emission in the fixed velocity range (vo,v. +cWo) is therefore
e dJo(fJ.). Since a fluctuation tJJo(v
o)
in emission current density pro
duces a fluctuation 'Y(vo)&Jo(fJ
o)
in anode current density, the spectral
density S.J(!) of the resultant anodecurrentdensity fluctuations is
S.J(f) = dJo(v
o
)
The spectral density Sj(f) of the total fluctuation of the anode current
density may be obtained by summing up this result over all velocity
groups; thus
Sj(f) = e fo 'Yt(fl
o)
dJo(flo) = e fo 'Yt(flo)JBP(flo) dflo
where p(v
o
). is the velocity distribution given by Eq. (759) and dJ(v
o
) is
given by Eq. (762). Using Eq. (763), this may be rewritten as
Sj(f) = eJ h 'Yt(flo) exp ( p(flo) dflo j
The spectral density of the anode current fluctuations is given by this
result multiplied by the anode area. Thus we see that the effect of space
enarge limiting on the lowfrequency spectral density of the anode current
fluctuations may be accounted for by multiplying the Schottky formula,
Eq. (741), by a spacechargesmoothing factor r
2
:
S,(!) = eir
2
(781)
where r
2
is given by the above integral and 0 r
t
S 1. A plot of r
t
vs. 'I is given in Fig. 74. This plot is based upon North's results, which
assume that the diode current is well below its saturation value. An
asymptotic expression for r
t
obtained by North is
r
t
= 9(1  1r/4) = 9(1  1r/4)kT
o
(782)
" e(V
a
 V
tn
)
This result, also plotted in Fig. 74, is approximately valid for values of
'I large compared with unity but small enough that the diode still oper
ates in the spacechargelimited region (and not in the temperature
limited region). The asymptotic form of I" may also be expressed as
r
2
= 3(1 _:!) 2kT..
c
Od
4 eI
where gd is the dynamic conductance of the diode;
et 3 ( I )
gtJ ==  =
ev, 2 V.  v,
following from Eq. (780).
(783)
(784)
138 BANDOM SIGNALS AND NOISE
0.1
001'
'I
50 o
100 ecY. _ r.)
, ~ T .
Flo. 74. Spacechargereduction factor and effective temperature versus normalized
potential energy lor a Ipacechargelimited diode. (Adopted from Fig. 5, Pari, I I J
of nom,.tm, NortA, orul Boma (1). B'l/permilrion 01 BCA.)
Substituting Eq. (783) into the modified Schottky formula, Eq. (781),
we find
where
Si(f) = 2kT./!9d,
T." := 3(1  i) T. == O.644T.
(785)
(786)
Equation (786) gives an asymptotic expression for T.
I I
; a plot of T.//ITt:
V8. " based on North's results is given in Fig. 74. In Art. 94 we shall
see that a conductance 9 operating at a temperature T generates a
gaussian thermal noise current which has a lowfrequency spectral density
8(f) = 2kTg
From the noise viewpoint, then, a spacechargelimited diode acts as
though it were a conductance gil operating at a temperature equal to
0.644 times the cathode temperature.
SHOT NOISE 139
(787)
(788)
78. Shot Noise in Spaceehargellmited Triodes and Pentodes
It is beyond our scope to give a comprehensive account of the noise
properties of multielectrode thermionic vacuum tubes, whether for shot
or other types of noise. The reader is' referred to other sourcesf for such
studies. We do, however, wish to derive some lowfrequency shotnoise
results for the triode and the pentode.
The Highmu Triode. Neglecting the noise induced in the grid circuit
by the random fluctuations of the electron current passing through the
grid plane, a reasonably valid lowfrequency analysis of the shot noise
generated in a negativegrid highmu triode may be made as follows:
If it is assumed that no electrons are intercepted by the grid of the triode,
and that all electrons are emitted from the cathode with zero initial
velocity, it may beshown that] the anode current density of a parallel
plane negativegrid triode can be expressed as
J = 4,. ( ~ ) ~ i [u(V, + Vp/It)M
9 m de,!
where V, is the cathodegrid potential difference of the triode, Vp is the
cathodeanode potential difference, p is the amplification factor, and (T is
a geometrical factor which has the form
1
t1 ==   ~ ~ ~
1 + ! ( d e p ) ~ i
It de,
where d
co
is the cathodegrid spacing of the triode, and d
ep
is the cathode
anode spacing. For common receiving tubes, 0.5 ~ u ~ 1.
A comparison of Eq. (787) with the equivalent expression for the
parallelplane diode, Eq. (758), shows that the behavior of the triode
can be expressed in terms of that of an equivalent diode which has a
cathodeanode spacing equal to the cathodegrid spacing of the triode
and a cathodeanode potential difference VII given by the equation
(789)
The results of the previous article can now be applied to the equivalent
diode in order to obtain the noise properties of the triode. For example,
the lowfrequency spectral density of the anodecurrent shot noise gener
ated in a triode can be expressed from Eq. (785) as
S'r(f) ::I 2k(O.644T.)g., (790)
t E.g., van der Ziel (I, Chaps. 5, 6, 14, and 15).
*See BpangenberK (I, Art. 8.5).
140 RANDOM SIGNALS AND NOISE
where g'l is the dynamic conductance of the equivalent diode. Since
g = 81 = 81 .(8V,) = g",!
av. av, BV. t1
where g", is the dynamic transconductance of the triode, we may rewrite
Eq. (790) in terms of the triode parameters as
(791)
(792)
The Pentode. t Let us now determine the spectral density of the shot
noise present in the anode current of a pentode. We shall assume
throughout that the voltages, with respect to the cathode, of all elec
trodes are maintained constant and that the control grid is biased nega
tively to the extent that it does not intercept any electrons. The region
between the cathode and the screen grid then acts as a spacecharge
limited triode, and our triode results above apply to the shot noise gener
ated in the cathode current stream of the pentode. However, an addi
tional effect takes place in a pentode as a result of the division of cathode
current between the screen grid and the anode. If the screen grid is fine
enough, whether or not a particular electron is intercepted by it is inde
pendent of what happens to other electrons; the random division of
cathode current between the screen grid and the anode generates an
additional noise called partition noise.
In order to study partition noise, it is most convenient to represent
the cathode current (before it reaches the screen grid) as a sum of elec
tronic current pulses in a manner similar to that used in Arts. 73 and 74.
Thus, assuming that K electronic current pulses occur during the time
interval ( T,+T), we write for the cathode current,
K
l c(t) = l i(t  tAo)
iI
where the II emission" times tl; correspond, say, to the times the electrons
pass the potential minimum. The til may be assumed to be uniformly
distributed; i.e.,
for T ~ t" ~ +T
otherwise
(793)
but are not independent random variables because of the action of the
potential minimum. Since the interception of an electron by the screen
t Of. Thompson, North, aDd Harris (I, Part III, "Multicollectors," pp. 126142).
SHOT NOISE 141
grid is independent from one electron to another, we can write for the
anode current,
K
I.(t) = I Y,i(t  t,)
'1
(794)
where the Y
i
are independent random variables, each of which may
assume either of the two possible values: unity, with probability p; and
zero, with probability (1  p).
Since the YA; and tic are independent random variables, the average
value of the anode current is
K
1. = E [I Y,i(t  t,) ]
kl
K
= pE [ I i(t  t,)] = pI.
kl
(795)
therefore,
and
(1 _ p) = !2
Ie
(796)
where 1
2
(t ) = I e(t)  I a(t) is the screengrid current.
The autocorrelation function of the anode current is
K K
R..(r) = E [L Y,i(t  t,) LYJi(t + T  lj) ]
iI j ... l
K K
= E [L I Y;Y;i(t  t,)i(t + T  t;) ]
ilil
(797)
where y;]'J is the average of YiYie Since the Ylc are independent ran
dom variables,
when k := j
when k j
(798)
'Therefore, using these results in Eq. (797), it follows that
K
R.(T) = pE [ ~ i(t  t,)i(t + T  t,) ]
Iet
K K
+ p2E [l I i(t  4)i(t + T  ~ ) ]
iI iI
(i"'i)
142 RANDOM SIGNALS AND NOISE
This becomes, on adding and subtracting p2 times the first term,
K
R..(T) == p(1  p)E [L i(t  t.)i(t +T  t,,) ]
'1
K K
+ pSE[L Li(t  t,,)i(t +T  lj)] (799)
iI jl
Since all current pulses have (approximately) the same shape,
K
E [l i(t  t,,)i(t + T  t,,) ]
iI
== J: p(K) dK t J_:7' i(t  t.)i(t +T  t.) dt"
iI
== nJ+: i(t)i(t + T} dt (7100)
where fi, = K/2T is the average number of electrons passing the potential
minimum per second. The autocorrelation function of the anode cur
rent is therefore
R..(T) == p(1  p)n J: i(t)i(t +T) tIt + PSR.(T) (7101)
since the second term in Eq. (799) is simply pI times the autocorrelation
function Rc<,r) of the cathode current.
The spectral density of the anode current can now be obtained by
taking the Fourier transform of Eq. (7101). From a development
paralleling that in the section on spectral density in Art. 74, it follows
that the lowfrequency approximation for the Fourier integral of the first
term in Eq. (7101) is pel  p)el
c
Since the cathode current stream is
spacecharge limited, its shotnoise spectral density has a lowfrequency
approximation el
c
r
l
Using Eqs. (796), we can then express the low
frequency spectral density of the anode current as
8..(/)  Its a. + [eI.r
s
+ I.s6(/)]
the impulse term being due to the average value of the cathode current.
The shotnoise portion Sa'(/) of the lowfrequency spectral density of the
anode current is therefore
S..,(/) = a. + r
s
] (7102)
which may be rewritten as
= a. [r
s
+ (I  r
S
) ] (7103)
SHOT NOISE 143
Since 0 r
t
S 1 and 1
2
+ 1.r! s t; it follows from Eqs. (7102) and
(7103) that
(7104)
i.e., a spacechargelimited pentode generates more anode current noise
than does a spacechargelimited diode but less than a temperature
limited diode when all have the same average value of anode current.
The results presented above are based upon the assumption that the
screen grid voltage was maintained constant with respect to the cathode.
If this condition is not satisfied, a correction term must be added to
account for a correlation between screen grid and anode noise currents.
79. Problems
1.t Suppose that the emission times t" are statistically independent random vari
ables, that the probability that one electron is emitted during the short time interval
b:r is (J fI.. ,. < <1, and that the probability that more than one electron is emitted during
that interval is zero. Derive Eq. (717) by breaking the interval (0,,.) up into M sub
intervals of length /J.,.  '1'1M, finding the probability that no electron is emitted in
anyone of the subintervals, and letting M . 00.
I. Assuming that the emission times e" are statistically independent random vari
ables, each with the probability den8ity
p(t.) 
for , :S e" :S , +.,
otherwise
(7106)
derive Eq. (727).
I. Suppose that the electronic current pulse in a temperaturelimited diode had the
shape
i.et) .. { ..
for 0 t
for t < 0
(7106)
What would be the autocorrelation function and spectral density of the resultant shot,
noise?
4. The random proeees x(t) is a linear superposition of identical pulses i(t  tIl),
where
t:
_ _ i (t) dt :Ill 0
The emission times '. are statistically independent and have a uniform probabilit)
density, 88 in Eq. (7105). The probability that K emission timesoccur in an interval
of length.,. is given by the Poisson probability distribution equation (724).
A new random process 1/(t) is defined by
1/(')  + I)
where ais fixed. Show that
Rller) == R.'(.,.) + R.'(a) + R.(J +,.)R.(I  t')
t cr. Rice (I, Sec. 1.1).
(7107)
(7.108)
144 RANDOM SIGNALS AND NOISE
I. Suppose that a parallelplane thermionic diode with a Maxwell diltnDutioD of
electronemission velocities is operated with its anode potential V. Iufficiently nega
tive with respect to the cathode that the minimum value of potential in the catbode
anode space occurs at the anode.
Show that the anode current I. is given by
I. "" I, exp ( ~ ~ : )
(7109)
where 1. is the saturation value of emission current, that the spacechargeemoothing
factor is unity, and hence that the lowfrequency spectral density of the anode current
fluctuations is given by
S,(j) "" el; "" 21: (;c) g4 (7110)
where gil == alafaYa is the diode conductance.
8. Suppose that an electron entering an electron multiplier produces" electroDl out
with a probability p" and that the arrival time of an input electron has a uniform
probability density, as in Eq. (7105). Show that
ttb.
I,
where 1. and 1, are the output and input currents, respectively, of the electron multi
plier. Show further that the lowfrequency spectral density of the outputcurrent
fluctuations is given by
B.(J) III: (J'1(n)el, + if,IB.en
where 8,(J) is the lowfrequency spectral density of the input current fluctuations.
CHAPTER 8
THE GAUSSIAN PROCESS
We saw in Chap. 5 that the probability distribution of a sum of inde
pendent random variables tends to become gaussian as the number of
random variables being summed increases without limit. In Chap. 7
we saw that the shot noise generated in a thermionic vacuum tube is a
gaussian process, and we shall observe in Chap. 9 that the voltage fluctu
ation produced by thermal agitation of electrons in a resistor is also a
gaussian process. The gaussian random variable and the gaussian ran
dom process are therefore of special importance and warrant special study.
We shall investigate some of their properties in this chapter.
81. The Gaussian Random Variable
A real random variable x having the probability density function
(81)
shown in Fig. 81 is a gaussian random variable and has the characteristic
function
The corresponding probability distribution functiont is
1 fX
Pi ~ X) =  e
z l
/
2
dx
~ .
(82)
(83)
An asymptotic expansion] for P(z S X) valid for large values of X is
e
Z I
/
I
[ 1 1 3 ]
P(x < X) = 1  1   + 
 V2;x X
2
X
(84)
t Extenaive tables of the gaussian p( ~ ) and 2P(E ~ ~ )  1 are given in National
Bureau of Standards (I).
*Of. Dwight (I, Eq. 580).
146
146 RANDOM SIGNALS AND NOISE
3
FIG. 81. Gaussian probability functions.
Since the gaussian probability density function defined above is an
even function of x, the nth moment
is zero for odd values of n:
E(x
ft
) = 0 (n odd) (85a)
When n ~ 2 is even, we obtain
E(x
ft
) = 1 3 5 (n  1)
(851J)
either by direct evaluation of the integral, t or by successive differenti
ations of the gaussian characteristic function. In particular,
E(x) = 0
and u
2
(x) = E(x
2
) = 1
Consider the new random variable
(86)
(88)
(87)
From Eqs. (86) and
y=ux+m
where ~ is a gaussian random variable as above.
(87), it follows that
E(y) = m
and (12(y) = er
2
t See for example C r a m ~ r (It Art. 10.5) or Dwight (I, Eq. (861.7).
THE GAUSSIAN PROCESS
'fhe probability density function of y is, using Eq. (343),
p(y) = _1_ exp l (y  2
m)2]
~ ( 1 20'.
The corresponding characteristic function is
[
V2(12]
M,,(jv) = exp jvm  2
147
(89)
(810)
The random variable y is a gaussian random variable with mean m and
variance (1'2.
The nth central moment of the random variable y is given by
Pn = E[(y  m)"l = (1'''E(x
n
)
and
It therefore follows, from Eqs. (85), that
/J" = 0 for n odd
IJ" = 1 3 5 (n  1)(1"
when n ~ 2 is even.
(811a)
(811b)
82. The Bivariate Distribution
Suppose that Xl and X2 are statistically independent gaussian random
variables with zero rneans and with variances O't
2
and (12
2,
respectively.
Their joint probability density function is, from Eqs. (340) and (89),
(812)
Let us now define two random variables yl and Y2 in terms of Xl and X2
by the rotational transformation:
Yl = Xl cos 8  X2 sin 8
y" =r Xl sin 6 + X2 cos 8
The means of Yl and Y2 are both zero, and the variances are
P,20 = E(Yl') = <Tl" cos! 8 +U,,2 sin" 8
Il02 = E{Y2
2
) = 0'1
2
sinl 8 +UI
2
C08
2
8
The covariance is
(813)
(814)
(815)
which in general is not zero.
The joint probability density function of 1/1 and YIcan now be obtained
148 RANDOM SIGNALS AND NOISE
from that of XI andzs. On solving Eq. (813) for Xl and lnterma of
Yl and yt, we get
Xl = Yl COS 8 + Yt sin 8
X2 = Yl sin 8 +Y" cos 8
Le., the inverse of a rotation is a rotation. The Jacobian of this trans
formation is
aXI aX2
cos 8  sin 8
IJI =
aYl
Yl
= = 1
aXt ax!
aY2
ay"
sin 6 cos 8
and it therefore follows, using Eqs. (354) and (812), that
( )
_ 1 [(Yl cos 8 +Y2 sin 8)2 (Ylsin 8 +Y2 cos 6)2]
P Yl,Y2  0:: exp  2 2  2 "
0"1 0'2
This result may be expressed in terms of the various secondorder
moments of Yl and Y2 &s
p(Yt Y2) = 1 exp [ + 2P.llYIY2  P."OY2
2
] (816)
, 2""("'20"'02  2("'20"'02  #0111") ..
The random variables Yl and YI are said to have a biuariate gaussian
probability density function. Hence two random variables u, and y"
which have a bivariate gaussian probability density function, as in Eq.
(816), can be transformed into two independent gaussian random vari
ables by a rotation of coordinates.
The joint characteristic function of Yl and Y2 can similarly be obtained
from that of Xl and XI, and is
and
M(j
V
l,iV,,) = exp[ 72(p2oVl" + 2P.llUIV" +"'OIU,,")]
General Formulas. The random variables
Yl == ..J!.!...
1l20}i
are standardized random variables, since
E(Yl) = 0 = E(Y
1
)
a
l
( Y
1
) = 1 = 0'2(YI)
(817)
(818)
(819)
Their joint probability density function is, from Eq. (818),
1 [Yt
2
 2pY
1
Y
2
+ Y2"]
p(Yt,Y
I
) = 211(1 _ plpt exp  2(1 _ pI) (820)
where p is the correlation noefficient of Y1 and YI. Equation (820) giYe,
THE GAUSSIAN PROCESS 149
the general form of the joint probability density function of two gaussian
standardized random variables. The corresponding characteristic func
tion is
(821)
which follows from Eq. (819).
The general gaussian joint probability density function of the two real
random variables Yl and YI with means ml and ml, variances 0"1 and 0"1,
and correlation coefficient p is
[
U"I(YI  ml)"  2Ul(TIP(YI  ml)(YI  ml)]
+(TII(Y2  ml)"
exp 
2UI"UI2(1  pI)
P(Yl,Yt) = 2"'0'10't(l _ pt)J.S (822)
'Their joint characteristic function is
M(jVI,jVI) = exp[j(mlvl + mit',,)  + 2tTIU2pVIV2 + 0'22V22)]
(823)
Dependence and Independence. Let YI and Y2 be standardized gauss
ian random variables. The conditional probability density function of
Y2 given Y
I
may be obtained by dividing Eq. (820) by p(Y
1
) and is
, 1 [(YI  PYI)"]
p(Y
t
Y
1
) = [2'1'(1 _ pt)]J.S exp  2(1 _ pI) (824)
The conditional probability density of Y'I. given Y1 is therefore gaussian
with mean pY
1
and variance (1  pI).
Suppose that Y
1
and Y
I
are uneorrelated, i.e., that p = O. It follows
from Eq. (820) that
p(Y1,Y
t)
= :... exp (  y1t t Ytt) = p(Y1)p(Y
t
) (825)
Thus we see that if two standardized gaussian random variables are
uncorrelated, they are also Btatiatically independent. It follows from this
that if any two gaussian random variables are uncorrelated, they are
independent.
Linear Transformations. t It often becomes convenient to use a matrix
notationj when dealing with multiple random variables. For example,
let y be the column matrix of the two random variables Yl and u:
(826)
t Of. (I, Arts. 22.6 and 24.4) and Courant and Hilbert (I, Chap. 1).
*See, for example, Hildebrand (I, Chap. I).
150
RANDOM SIGNALS AND MOIsm
and let v be the column matrix of 111 and "2:
v =
whose transpose is the row matrix v':
Then since
v'y = VIYl + V2Y2
we may express the characteristic function of Yl and YI as
= M(jv) = E[exp(jv'y)]
(827)
(828)
In particular, if Yl and Y2 are gaussian random variables, we can write,
from Eq. (823),
M(jv) = exp (jm/v  v)
where m is the column matrix of the means
m = [::]
and A is the covariance matrix:
(829)
(830)
(831)
Suppose now that the random variables YI and y" have zero means and
that they are transformed into the random variables Zl and z" by the
linear transformation
21 = allYl +a12Yt
22 = a2lYI + 022Y"
This transformation may be expressed in matrix form 88
z = Ay
where A is the transformation matrix
A = [::: :::]
and z is the column matrix of ZI and %2.
Since Yl and Y2 have zero means, so do %1 and 'I:
E(zl) = 0 = E(zl)
(832)
(833)
(834)
THE GAUSSIAN PROCESS
The variances and covariance of %1 and %2 are
161
= all
ter1!
+2alla12O"1D'2P +a12
2
D'2
2
a
2
(Z2) = a21
2C7'12
+ 2a 21a22O' 1C7' 2P +a22"0'2'1. (835)
E(Z1Z2) = al1a210'12 + (alta22 + a21a12)UI0'2P + a12a22D'2
2
Direct calculation then shows that the covariance matrix \' of '1 and %2
may be expressed in terms of A by
V = AAA' (836)
The characteristic function of Z1 and %2 may be written as
M,,(ju) = E[exp(ju'z)]
from Eq. (828). This becomes, from Eq. (833),
Mz(ju) = E[exp(ju'A y)]
= E[exp(jw'y)] = Af,,(jw')
where we have defined
w = A'u
(837)
(840)
If Yl and Y2 are gaussian random variables with zero means, their charac
teristic function is
M,,(jw) = exp(
from Eq. (829). It follows from the definition of w, and from Eq. (836),
that
w'Aw = u'AAA'u = u'\'u (838)
From this result, and from Eq. (837), we then see that the characteristic
function of ZI and %2 is
M.(ju) = exp( %u'" u) (839)
which is the characteristic function of a pair of gaussian random vari
ables with a covariance matrix 1,1. Thus we have shown that a linea?'
trans!o1mation oj a pair oj gau88ian random variables result in a pair of
gaussian random variables. The development leading to Eq. (816) is a
special case of this result.
83. The Multivariate Distributiont
The multivariate joint probability density function of N standardized
gaussian real random variables Ytt is defined by
N N
exp [ il IplnmYnYm]
(Y Y)
nlml
p 1,.., N =
t a. (I, Chap. 24).
152 RANDOM SIGNALS AND NOISE
where Iplntn is the cofactor of the element PAM in the determinant Ipl, and
where 'pl is the determinant of the correlation matrix
Pll PI2
PIN]
,=:
PII P22 PIN
PNI PNI PNN
in which
Pntn = E(YftY.) and
Pnn = 1
The corresponding gaussian joint characteristic function is
N N
My(jvl, ,jVN) = exp l l P.....v..v...)
nl m=1
which may be expressed in matrix form as
where v is the column matrix
V=
(841)
(842)
(843)
(844)
(845)
The joint probability density function of N gaussian real random vari
ables Yn with means m" and variances U,,2 can be obtained from Eq. (840)
and is
N N
exp [  2: 2: IAI.....(y..  m..)(y...  m... )]
)
,,1 ",1
P(Yl, ,YN =
(846)
where lAin", is the cofactor of the element in the determinant IAI of
the covariance matrix
in which
[
All
A = .A22
}..Ni }..N2
(847)
(848)
THE GAUSSIAN PROCESS 153
The corresponding multivariate gaussian characteristic function is
N N N
M,,(j
V
l , ,jVN) <= exp (1 l fJ"m"  l l ...) (849)
nl nlml
which may be expressed in matrix form as
M,,(jv) = exp(jm'v  ,72v'AV)
where m is the matrix of the means:
(850)
m= (851)
(852)
nm
It should be noted that Eq. (850) contains Eq. (829) as a special case.
Suppose now that the N standardized gaussian random variables Y"
are uncorrelated, i.e., that
P..III = {
The joint probability density function of the Y" then becomes, from
Eq. (840),
N
exp I y,,2)
p(Y
1
, , Y
N
) = (211");;;1
N
n (853)
(854)
Therefore if N gau8sian random variables are all uncorrelated, they are also
statistically independent.
Suppose next that N gaussian random variables 'Un, with zero means, are
transformed into the N random variables Zn by the linear transformation
Zl = allYl + a12Y2 + + alHYN
%2 = a21Yl + a22Y2 + + atHYN
ZN = aNIYl + aN2Y2 + + aNNYN
Let A be the matrix of this transformation
A= (855)
1M
We can then write
RANDOM 81GNALS AND NOISE
z == Ay {856)
where z is the column matrix of the z's and y is the column matrix of
tJae y's. It can then be shown by direct ealeulation that the covariance
matrix" of the a's is related to the covariance matrix A of the y's by
v = AAA' (857)
as in the bivariate case. The matrix argument used in the bivariate case
to show that tIle linearly transformed variables are gaussian applies with
out change to the multivariate case. Hence the Zft arc also gaussian
random variables. That is, a linear transformation of gau8sian 'random
lJariables yields gaussian random variables.
84:. The Gaussian Random Process
A random process is said to be a gaussian random process if, for every
finite set of time instants tn, the random variables X
n
= X, .. have a gaussian
joint probability density function. If the given process is real, the joint
probability density function of the N random variables X
ft
is, from Eq.
(846),
where m
ft
= E(x
ft
) = E(xt.. )
(858)
(859)
and where A is the covariance matrix of elements
A
ftm
= E(x
n
 m,,)(x.  m.)]
= Rz(tn,t
m
)  m,,1nm (860)
If the process in question is a widesense stationary random process, then
Rz(t",t",) == Rs(t"  t",) and mIl = mm = m. The covariances, and hence
the joint probability densities, then become functions of the time differ
ences (t
R
 t
m
) and not of t; and t
m
separately. It therefore follows that
a widesense stationary gaussian random process is also stationary in the
strict sense.
If the given process is a complex random process, then the N complex
random variables
ZR = Zt.. = Z" +jYR
and the real random variables %" and u; together have a 2Ndimensional
eaussian joint probability density function.
THE GAUSSIAN PROCESS 155
Linear TransformatioDs. Suppose that the random process having the
sample functions x(t) is a real gaussian random process and that the
integral
y = L' x(t)a(t) dt
(861)
exists in the sense of Art. 47, where aCt) is a continuous real function of t.
We will now show that the random variable y is a real gaussian random
variable.
Let the interval a < t ~ b be partitioned into N subintervals t
n

1
<
t ~ t", n = 1, ... , N, where to == a and tN = b, and consider the
approximating sum
N
1/N = l x(t,.)a(t,.) At,.
,,1
(862)
where 4t,. == '"  ',,1. The average value of the approximating sum is
N
E(1IN) = l E(x,.)a(t,,) At.
,,1
If the average of the random variable y exists, i.e., if
E(1I) = L' E(x,)a(t) dt < +ao (863)
and if the average of the random process E(x,) is a continuous function
of t, then]
lim E(YN) = E(y)
N+
(864)
where all the ~ t " go to zero as N + 00. The mean square of the approxi
mating sum is
N N
E(1IN') = l l E(x,.x,..)a(t,,)a(t...) At" At...
,,1_1
If the mean square value of y exists, i.e., if
E(1I') = L' L' E(x,x.)a(t)a(s) dt ds < +ee (865)
and if the correlation E(x,x,) is & continuous function of t and 8, then
lim E(YN
2
) = E(y!)
N+
It therefore follows from Eqs. (864) and (866) that
lim U
2
(YN) = u
2
(y)
N.... 
t Courant (I, Vol. I, pp. 131ft.).
(866)
(867)
156 RANDOM SIGNALS AND NOISE
We can now show that YN converges in the mean to 'Y. Consider
(868)
We may express E(YYN) in the form
N
E(YYN) = L[L
6
E(x,x,.)a(t) tit ] aCt,,) At.
nI
Since E(x,x,.) is continuous in t", 80 is the term in square brackets.
Hence, from Eq. (865),
lim E(YYN) = /.b a(8) /.b E(x,x.)a(t) dt d8 = E(y2)
N+ao G G
(869)
It then follows, on substituting from Eqs. (866) and (869) into Eq..
(868), that
lim E[(y  YN)2] = 0
N+ao
(870)
Thus the approximating sums converge in the mean, and hence in proba
bility, to the random variable y. It further follows, from the results of
Art. 46, that the probability distribution functions of the approximating
sums converge to the probability distribution function of 'Y. It now
remains to find the form of the probability distribution function of y.
Equation (862) defines'UN as a linear transformation of a set of gaussian
random variables. The approximating sum 'UN is therefore a gaussian
random variable and has the characteristic function, from Eq. (810),
M/lN(jV) = exp [iVE(YN) _ V I " I ~ l I N ) ]
It then follows from Eqs. (864) and (867) that the limiting form of this
characteristic function is alsogaussian:
lim M"N(jV) = exp [jVE(Y)  v
l
"2
S
(y)1 (871)
N+ ..I
Since we have already shown that the probability distribution functions
of the approximating sums converge to the probability distribution func
tion of y, and since the limiting form of the characteristic function of YN
is continuous in v about v = 0, it follows that:t
lim M"II(jv) = Mllejv)
N+
(872)
and hence, from Eq. (871), that 'U is a real gaussian random variable.
t Cramm (I .. Art. 10.4).
THE GAUSSIAN PROCESS 157
If the random process having the sample functions x(t) is a complex
gaussian random process, the above argument may be applied separately
to the real and imaginary parts of z(t) to show that the random variable
y defined by Eq. (861) is a complex gaussian random variable.
Under suitable integrability conditions, it can similarly beshown that
if the random process having the sample functions x(t) is a gaussian
random process, and if the integral
f
+OO
y(t) = _oo x(T)b(t,T) dT (873)
exists where b(t,T) is a continuous function of t and 1', then the random
process having the sample functions y(t) is also a gaussian random process.
Orthogonal Expansions. Let x(t) be a sample function from a station
ary gaussian random process. It then follows, from Art. 64 and the
preceding section, that we may represent that process over the interval
a < t ~ b by the Fourier series
+
x(t) = 2: x..e
i
.....
1
n
ao
where the coefficients
2.,..
wo=
ba
(874)
1 jb
X" =  x(t)e
,
,,,,edt
b  a G
(875)
are complex gaussian random variables. Since these coefficients become
uncorrelated in the limit as (b  a) + 00, they also become statistically
independent.
It also follows from the results of Art. 64 that the gaussian random
process having the sample function x(t) may be represented whether sta
tionary or not over the interval [a,b] by the general Fourier series
x(t) = Lu..x..q,..(t)
n
(876)
where the 10',,1
2
and the f/>,,(t) are the characteristic values and charac...
teristic functions, respectively, of the integral equation
L
b
R(t,s)q,(s) ds = Xq,(t)
and where the coefficients
1 jb
X" =  x(t)f/>:(t) dt
0'" G
(877)
(878)
are uncorrelated (and hence statistically independent) gaussian random
variables.
158 RANDOM SIGNALS AND NOISl)
85. The Narrowband Gaussian Random Process]
A random process is said to be a narrowband random proce,. if the
width III of the significant region of its spectral density is small com
pared to the center frequency Ie of that region. A typical narrowhand
spectral density is pictured in Fig. 82. If a sample function of such a
random process is examined with an oscilloscope, what appears to be
sz.Y]
4 , f
FIG. 82. A narrowband spectral density.
a sinusoidal wave with slowly varying envelope (amplitude) and phase
will be seen. That is, a sample function x(t) of a narrowband random
process appears to be expressible in the form
x(t) = V(t) cos [Wet + t1'(t)] (879)
where Ie = w
e
/ 21r is the mean frequency of the spectral band and where
the envelope Vet) and the phase cP(t) are slowly varying functions of time.
Formally, such a representation need not be restricted to narrowband
waves; the concept of an envelope has significance, however, only when
the variations of Vet) and q,(t) are slow compared to those of cos Cl)et.
Envelope and Phase Probability Distributions. Let us now determine
some of the statistical properties of the envelope and phase when the
narrowband random process in question is a stationary gaussian random
process. To this end, it is convenient to represent the given process over
the interval 0 < t ~ T by the Fourier series
GO
x(t) = l (z,.. cos nwot +x... sin nwot)
ftl
where 6'0 = 21r/T,
(880)
xc.. = ; ~ T x(t) cos nwet dt (881a)
and x... = ~ ~ T x(t) sin nwot dt (881b)
It then follows from the results of Art. 64 and the preceding article that.
these coefficients are gaussian random variables which become uncorre
lated as the duration of the expansion interval increases without limit.
The mean frequency of the narrow spectral band may be introduced by
t cr. Rice (I, Art. 3.7).
THE GAUSSIAN PROCESS 159
writing nw. in Eq. (880) as (nc.J
o
 :.Je) +We, where w. == 2rJe, and expand
ing the sine and cosine factors. In this way, we can obtain the expression
x(t) = xe(t) cos wet  x.(t) sin wet (882)
where we have defined
GO
xe(t) = L(xc.. cos (nwo  we)t +z,.. sin (nw.  w.)t]
nl
GO
x,(t) = l [z,.. sin (nw.  We)t  Xc.. cos (nw.  We)t]
,,1
It then follows from Eqs. (879) and (882) that
xe(t) = Vet) C08 .(t)
and x,(t) = V(t) sin f/>(t)
and hence that
(8830)
(883b)
(8840)
(884b)
and
(8850";
(885bj
where 0 Vet) and 0 ep(t) 2.,.... Since the only nonvanishing terms
in the sums in Eqs. (883) are those for which the values of nfo fall In
the given narrow spectral band, the sample functions xe(t) and x,(t) have
frequency components only in a band of width !:J.f centered on zero fre
quency. The frequency components of the envelope and phase are
therefore confined to a similar region about zero frequency.
The random variables Xci and X", which refer to the possible values of
xc(t) and x.(t), respectively, are defined as sums of gaussian random vari
ables, and hence are gaussian random variables. Their means are zerc
E(xet) = 0 = E(x,,) (886)
since the original process has a zero mean. The mean square of Xcf is,
from Eq. (883a),
E(xc"Xem) cos (nwo  wc)t cos (mwo  we)'I
E(x..
2
) = + E(xe"x....) (nw.  We)t sin (mw.  We)t
J
k
1
k 1 + E(x."x
c
".) sm (nw.  we)t COS (mw.  CA'e)t
n m
+ E(x.,.x.".) sin (nw
o
 wc)t sin (mw
o
 we)t
It then follows, on using the limiting properties of the coefficients, Eqs.
that as T + co,
.
E(Xd
2
) = lim E(Xe,,2)[COS! (flU.  we)t + sin! (ncaJ.  ",_)tl
T...... k
nl
"" 2 k S,,(!) df == E(x,J)
(887)
160 RANDOM SIGNALS AND NOISE
where S.(J) is the spectral density of the gaussian random process.
Similarly, it may be'shown that
E(x,,") = E(Z,2)
and hence, using Eq. (886), that
O'
2
(X
c
t) = (12(X,,) :;: (1.2
where a. = a(xt). The covariance of Xd and Xd is
 
E(xel
x,,)
= l l
nlml
E(xcnXcm) cos (nCl'o  wc)tsin (m,c.,o wc)t
 E(xcnx,m) cos (nCl'o  wc)t cos (m,c.,o  Cl'c)t
+ E(x,,,xcm) sin (1k&J
o
 Cl'c)t sin (mw
o
 wc)t
 E(x,,,x,m) sin (nCl'o  Cl'c)t cos (mCl'o  wc)t
which becomes, as T . 00,
(889)
E(x.,x.,) = lim E(X."I) [ (nCl'o  wc)t sin (nCI'o  wc)t]
T.....  1.,1  SIn (nCl'o  Cl'c)t cos  Wc)t
fa ..
Hence E(XdXae) = 0 (888)
The random variables Xcc and Xit are therefore independent gaussian ran
dom variables with zero means and variances 0'.2. Their joint proba
bility density then is, from Eq. (812),
( )
1 (Xd
2
+ x
ae2
)
p Xd,X" = exp  2 2
6J1rU.
The joint probability density function of the envelope and phase ran
dam variables V
e
and </>" respectively, may now be found from that of
Xd and X,t. From Eq. (884), the Jacobian of the transformation from
Xd and z, to V, and </>, is seen to be
IJI = V,
It therefore follows from Eqs. (354) and (889) that
(891)
for V, 0
otherwise
{
v. (Ve
2
) d
= 211"0'.2 exp  2u.2 for V, 0 an 0 211" (890)
o othervvise
The probability density function of V, can be obtained by integrating
this result over q" from 0 to 2... and is]
{
V, ( Ve
2
)
p(V
,)
= exp  20'.'
t Cf. Chap. 3, Prob. 13.
THE GAUSSIAN PROCESS 161
This is the Rayleigh probability density function and is shown in Fig. 83.
The probability density function of cPt can be obtained by integrating
Eq. (890) over V, and is]
if 0 ~ cPt ~ 211"
otherwise
(892)
The random phase angle is therefore uniformly distributed. It then
follows from Eqs. (890), (891), and (892) that
p(V"q,,) = p(V,)p(cPt)
and hence that V, and cPt are independent random variables. As we shall
show later, however, the envelope and phase random processes, which
0.6
0.4
0.2
2 3 4
FIG. 83. The Rayleigh probability density function.
have the sample functions V(t) and 4>(t), respectively, are not independent
random processes.
Joint Densities. Next let us determine the joint probability densities of
V1 = V, and V2 = V,_'I' and of ,pI = 4>, and 4>2 = 4>,'1' from the j oint prob
ability density of the random variables Xcl = Xc" X,I = X", X
c
2 = Xc(t'I'>,
and X,2 = X,(I_'I'). It follows from Eqs. (883) that Xci, X
c2,
X,I, and X,2 are
gaussian; it remains only to find their covariances in order to determine
their joint probability density function.
From Eq. (887), we have
0"2(X
01)
= 0"2(X.l) = 0'2(X
c2)
= 0'2(X.2) = 0'~ / ~
and from Eq. (888),
t Of. Chap. 3, Probe 13.
(893)
(894)
162
RANDOM SIGNALS AND NOISE
. .
Re(1') = II
nl tnI
The covariance of 2:
c
l and XcI is
E(xc"xc.".) cos (1kAJo  6)o)t cos (mc.J
o
 w
o
) ( '  T)
+ E(xc"x.",) cos (nwo  welt sin  wc)(t  T)
+E(x,ftxc.".) sin (nwo  "'c)l cos  wc)(t  T)
+ E(x.ftx.m) sin (nwo  wc)t sin (f1kaJ
o
 wc)(t  'T)
which becomes, using Eqs. (631), as T 00,
Hence
.
Re(1') = lim l E(xe"t) cos (1k4
o
 We)T
T.IO nl
Re(1') = 2 10 8.,(f) cos 2r(f  1.)1' dl (895)
In a similar manner it may be shown that
R.(1') = E(xl1
x
ot) = 2 10 8.,(f) cos 2r(f  1.)1' dl
= Rc(T)
and that
Rc.(T) = E(XCIX.2) = E(X.lXcl)
= 2 10 8.,(1) sin 2...(1  le)1' df
(896)
(897)
The covariance matrix of the random variables Xci, X,l, X
c
2, and X,2 then is
(898)
The determinant of this matrix is
(899)
The various cofactors are
All = A21 = A33 = A44 = uallAIH
Al2 = A21 = A
3
4 = A
43
= 0 (8100)
All = All = A24 = A4! =
At. = A
41
= A
21
= AI! =
It then follows from Eq. (846) that the joint probability density fune..
tion of Xci, X.I, Xc2, and X
,2
is
THE GAUSSIAN PROCESS 163
(8104)
The Jacobian of the transformation from Xci, XIII, Xc!, and X,I to V
1
,
V2, and 4>! is, from Eq. (884),
IJI = V1V2
It therefore follows from Eqs. (354) and (8101) that
{
[
O"s2(V 12 + V2
2)
'}
exp   2R.(T)VIVt C?S (tilt  till) J
p(VI,.t,V
I
,. 2) =  2Rc.(r)VIV2S1n (q,2  4>1)
for VI, VI 0 and 0 s 4>1, 4>! s 211"
o otherwise (8102)
The joint probability density function of Viand Vt can now be obtained
by integrating this result over 4>1 and 4>2. Since the exponential in Eq,
(8102) is periodic in 4>2, we get (where a = c/>2  c/>l  tan
1
[Rc.(r)/ Re(T))
1 (2r {2r
4r
t
Jo dtlll Jo dtll
t
exp { [R.(T) cos (tilt  till) + R(T) sin (tilt  till)]
1 r: d"'" 1 (211" { VI V2[Rc2(T) + } d
= 2,. Jo .,,1 2.. Jo exp  cos a a
_ 1 {VI V,[Re2(T) + Re.
2
(T)]li }
 0
where Io(z) is the zeroorder modified Bessel function of the first kind. t
The joint probability density function of VI and V
2
is therefore
1
VI V2 I {VIV2[Rc2(T) + [_ O"e
2
(V1
2
+VI
2
) ]
p(V1,V
t
) = 0 exp
where VI, V
2
0
o otherwise (8103)
where IAI is given by Eq. (899).
The joint probability density function of 4>1 and 4>2 may be gotten by
integrating p(V V over Viand V2. t On defining
(J Rc(T) ( ) + Rc,(T) ( )
fJ = 2 COS q,2  4>1 2 sin q,t  4>1
O"z
we obtain from Eq. (8102)
1 ( r
P(tIll,tIlt) = Jo dV
I
Jo dV
t
VIV
t
[
crz'(V12 + Vt 2  2VIV2P)]
exp  for 0 ::::, till, tilt 211'.
t See, for example, Magnus and Oberhettinger (I, Chap. III, Arts. 1 and 6) or
Watson (I, Arts. 3.7 and 3.71).
t cr. (I).
164 RANDOM SIGNALS AND NOISE
In order to facilitate evaluation of this double integral, let us introduce
two new variables y and z by the relations
VII = IAIM yell
u."
and V2
2
= 2ye
2a
(8105)
Us
The magnitude of the Jacobian of the transformation from Viand V,
to y and z is
Tne domain of y is (0,+ ClO) and that of 8 is 00 ,+00). After some
manipulation we can therefore obtain
where Ko(Y) is the zeroorder modified Hankel function. t It is known]
that the formula
(10 0081 a
10 eaIlKo(Y) dy = (1  al)M
is valid for a > 1. Hence
(00 1r  C08
1
fJ
10 dy = (1  fJI)M
is valid for {j < 1. Differentiation of this result with respect to fJ gives
( 10 611 _ 1 fJ(1t'  cos:" (j)
10 ye Ko(Y) dy  (1  fJI) + (1  fJl)9i
The joint probability density function of 4>1 and 4>2 is therefore
[(1  +  C08
1
tJ)] (8 106)
4Jr2(Ts
4
(1  tJ
2
)' i
P(4JI,I/>I) = where 0 5 1/>1, 1/>1 5 2r
o otherwise
where IAI is given by Eq. (899) and Pis given by Eq. (8104).
t See, for example, Magnus and Oberhettinger (It Chap. III, Arts. 1 and 5) or
Watson (I, Arts. 3.7 and 6.22).
t See, for example, Magnus and Oberhettinger (I, Chap. III, Art. 7), or Watson
(I, Art. 13.21>.
THE GAUSSIAN 165
Evaluation of Eqs. (8102), (8103), and (8106), at t/Jl = q,1, for example,
shows that
(8107)
'I'he envelope and phase random processes are therefore not statistically
independent.
86. Sine Wave Plus Narrowband Gaussian Random Process]
For our final results of this chapter we shall derive expressions for the
probability density functions of the envelope and phaseangle of the sum
of a sine wave and a narrowband gaussian random process.
Let x(t) be a sample function of a stationary narrowband gaussian
random process and let
y(t) = P cos (Wet +1/1) + x(t) (8108)
where P is a constant and the random variable 1/1 is uniformly distributed
over the interval (0,211") and independent of the gaussian random process.
Using Eq. (882), we can write
where
and
y(t) = Xe(t) cos ,CA'J  X,(t) sin wet
Xe(t) = P cos'" + xe(t)
X,(t) = P sin'" + x,(t)
(8109)
(8110)
If we express y(t) in terms of an envelope and phase,
it follows that
and
and hence that
y(t) = V(t) cos [Wet + q,(t)]
Xc(t) = V(t) cos q,(t)
X,(t) = V(t) sin q,(t)
V(t) = (X
e
2(t) +
(8111)
(8112)
(8113)
As in Art. 85, the random variables Xci and x" are independent gauss
ian random variables with zero means and variance The joint prob
ability density function of Xee, X,e, and til is therefore
for 0 1/1 211'.
t Of. Rice (I, Art. 3.10) and Middleton (II, Art. 5).
166 RANDOM SIGNALS AND .NOISE
Hence the joint probability density function of V" ." and is
l
V, [V,I + pI  2PV, cos (. 
422 exp  2cr 2
p(V"."t/t) = 'II' (1. where V, 0 U:d 0 S ." t/t S 2'11' (8114)
o otherwise
We can now determine the probability density function of V, by inte
grating this result over <1>& and';. Thus, for Va 0,
V (VI + PI) 1 1
2
. 1 (PV )
p(V,) = exp  '2 2 2 dt/l 2 exp cos 8 d8
0'% a. r 0 1r (1'z
where 8 =  t/I. Since the exponential integrand is periodic in 8, we
can integrate over the interval (0,211") in 8, and 80 get
p(V,) =  for V, 0 (8115)
otherwise
for the probability density function of the envelope of the sum of a sine
wave and a narrowband gaussian random process. This result reduces
to that given by Eq. (891) when P == o.
An asymptotic series expansion] of the modified Bessel function, valid
for large values of the argument, is
1
0
(x) = (
1
+ + 12:x' + ) (8116)
It therefore follows that when PV, > > we get approximately
p(V,) = ;.. exp [  (V,;.t>2] (8117)
for V, o.
Hence, when the magnitude P of the sine wave is large compared to fT.
and when V, is near P, the probability density function of the envelope
of the sum process is approximately gaussian.
The joint probability density function of the phase angles fj), and'" can
be gotten from that of V" f/>" and t/1 by integrating over V,. Thus, from
Eq. (8114),
p(."t/t) == V, exp [  V,I + pI  cos (.  tIt) ] dV,
This becomes, on completing the square,
1 (P"sin" ,) r [(V,  P cos 9)"]
p(q",l/I) = 4'11'2(1.1 exp  2cT..
I
Jo V, exp  2cT.
I
dV,
t Dwight (I, Eq. 814.1).
THE GAUSSIAN PROCESS
where' at +,  '/I. Hence, setting u = (V,  P cos 8)!ufII,
1 (P2 sin
l
8) f (U
2
)
== 41
2
exp  20'.2 u exp 2 du
p 008 BI
167
The first integral evaluates to exp(  pi cos! 8/20';&2). Since the integrand
of the second is even, we therefore get
(
A. exp ( P2/2u.
2
) + P cos IJ
P ""''1' == 4112
exp (  8) P j91. exp ( 2
U2
) du (8118)
10
where (J == .,  1ft and 0 q", 1/1 2r. The integral here is times
the probability distribution function of a gaussian random variable.
When the amplitude of the sine wave is zero, Eq. (8118) reduces to
when P = 0 (8119)
as it should.
An approximate expression for the joint probability density function
of the phase angles ., and 1/1 can be obtained by using Eq. (84) in Eq.
(8118). Thus we get, approximately,
(
A. = P cos (4),  1ft) [_ pi sin! (</>,  "')]
P ""'" (2r)'i
u
exp 2u:;2
when P cos (4)t  "') >> Us (8120)
where 0 q", '" 2r.
We have tried in this and the preceding article to present a few of the
significant statistical properties of a narrowband gaussian random proc
ess. More extensive results may be found in the technical literature,
particularly in the papers of Rice. t
87. Problems
1. Let z and y be independent gaussian random variables with means m% and m't
and variances fT.' and fT.', respectively. Let
z==x+y
a. Determine the characteristic function of e.
b. Determine the probability density function of z.
t E.g., Rice (I) and Rice (II),
168 RANDOM SIGNALS AND NOISE
I. Let :tl, ZI, :tl, and z. be real random variables with a gaussian joint probability
density function, and let their means all be zero. Show thatf
E(zlztZaX.)  E(XIX2)E(xaX.) + E(ZlZa)E(ZtZ.) +E(XIX.)E(xtZ.) (8121)
I. Let x(t) be a sample function of a stationary real gaussian random process with a
zero mean. Let a new random process be defined with the sample functions
1/(t) == xl(t)
Show that
(8122)
4. Let :I: be a gaussian random variable with mean zero and unit variance. Let a
new random variable 1/ be defined as follows: If e == %0, then
.. { xo with probability ~ ~
1/ X
o
with probability ~ ~
R. Determine the joint probability density function of x and y.
b. Determine the probability density function of 1/ alone.
Note that although x and 1/ are gaussian random variables, the joint probability
density function of x and 11 is not gaussian.
I. Derive Eq. (817).
8. Let the random variables x. and XI have a gaussian joint probability density
function. Show that if the random variables 1/1 and 111 are defined by a rotational
transformation of x. and Xt about the point [E(z.},E(ZI)], then 1/. and 1/t are inde
pendent gaussian random variables if the angle of rotation is chosen such that
(8123)
2r
"'0 == T
7. Let the autocorrelation function of the stationary gaussian random process with
sample functions :r;(t) be expanded over the interval [  f. +f] in a Fourier series
+
R(T) = L a,.Bi.....
r
Let the independent gaussian random variables :1:., (n   GO, ,1,0, +1.
. . . ,+ 00) have zero means and unit variances, and consider the random process
defined by the sample functions
+
1/(t)  L b,.:r;..Bi .....,l
,,_10
2r
Wo  T
where the b.. are complex constants.
Show f that jf the bare chosen so that
Ibnl' == an
then the random process defined by the sample functions yet) has the same multi..
t Of. Chap. 7, Probe 4.
t Of. Root and Pitcher (I, Theorem 2).
THE GAU88IAN PROCESS 169
variate probability density functions as does the process defined by the sample func
tions z (e) 80 long as 0 s t s T/2.
8. Let V
c
be the envelope of a stationary narrowband real gaussian random process.
Show that
and
E(V,)  (i)~ "a
"I(V,)  (2  i)"al
(8124)
(8125)
(8126)
where tT$1 is the variance of the gaussian random process.
t. Let :t(t) be a sample function of a stationary narrowband real gaussian random
process. Consider a new random process defined with the sample functions
1/(t) =x(t) cos wot
where f.  (110/2... is small compared to the center frequency Ie of the original process
but large compared to the spectral width of the original process. If we write
x(t) == Vet) cos [Wet + 4>(1)
then we may define
vet)
flL(t) = 2 cos [(We  Wo)t + f/>(t)]
to be the sample functions of the II lower sideband" of the new process, and
Vet)
flu(t) J:: 2 cos [(We +Wo)1 + .(t)
to be the sample functions of the II upper sideband" of the new process.
a. Show that the upper and lower sideband random processes are each stationary
random processes even though their sum is nonstationary.
b. Show that the upper and lower sideband random processes are not statistically
independent.
10. Let
x(t) = xc(l) cos wet  x.(t) sin (II,!,
ile a sample function of a stationary narrowband real gaussian random proeees, where
I, := w,/2r is the center frequency of the narrow spectral band.
a. Show that
R$("')  Rc( "' ) cos We'"  Ru ("') sin tlJc'"
where R,(.,.)  E[zc,%C('T)]
and Rc. ("' ) = E[xca.c,r
b. Show further that, on defining
and
and hence
where
we may write
R.(.,.)  RE("') cos [wc1' +8(.,.)]
c. Show that if Sz(f) is even about I, for ! ~ 0, then
Roa(.,.)  0, 1(1')  0, Rt(.,.)  R.(.,.)
R.(.,.)  R,(.,.) cos (II'"
(8127a)
(8127b)
(8128)
(8129)
170 RANDOII SIGNALS AND NOISE
11. Let the spectral density of the random process of Probe 10 be given by
ers
S
{ [ (I  fe)l] [(1+I.)I]}
S. (I) = 2(2rer
l
) ~ exp  2o'
t
. + exp  20'1
where er < <Ie.
a. Evaluate Re(T).
b. Evaluate Re.(,o).
12. Let x(t) be a sample function of a stationary narrowband real gaussian random
process. Define rr" and A(T) by
(8130)
where (1" is chosen to be real and nonnegative. Show that the autocorrelation func
tion of the envelope of the narrowband process is]
(8131)
where K and B are complete elliptic integrals of the first and second kind respectively
and where .F1 is a hypergeometric function. t
11. Let the sample functions of the random process of Probe 12 be expressed in the
form
x(l) == V(t)y(t)
where Vet) is the envelope and where
yet) = cos [Wet + t/I{t)
Show that the autocorrelation function of the phasemodulated carrier y(t) of the given
narrowband process is given by
(8132)
1'. Let V(t) and q,(t) be the envelope and phase, respectively, of a sample function
of a stationary narrowband real gaussian process. Show, by evaluating Eqs. (8102),
(8103), and (8106) at .1 = .1, that
P{Vt,f/>1,V2, cfJ2) "F p(V.,V1)P(f/>I,t/lt)
and hence that the pairs of random variables (VI, VI) and (cfJl,t!'2) are not statistically
independent.
t Cf. Price (II) or Middleton (II, Arts. 6 and 7).
t Of. Magnus and Oberhettinger (I, Chap. II, Art. 1).
I Cf. Price (II).
CHAPTER 9
LINEAR SYSTEMS
The concepts of random variable and random proees were developed in
the preceding chapters, and some of their statistical properties were dis
cussed. We shall use these ideas in the remaining chapters of this book
to determine the effects of passing random processes through various
kinds of systems. Thus, for example, we shall make such a study of
linear systems in this and the two following chapters; in particular, we
shall introduce linearsystem analysis in this chapter, investigate some
of the problems of noise in amplifiers in the next, and consider the opti
mization of linear systems in Chap. 11.
91. Elements of Linearsystem Analysis
It is assumed that the reader is generally familiar with the methods of
analysis of linear systems. t Nevertheless, we shall review here some of
the elements of that theory.
The System Function. Suppose that x(t) and y(t), as shown in Fig. 91,
are the input and output, respectively, of a fixedparameter linear sys
Fixedparameter
linearsystem
FlO. 91. The linear system.
tern. By fixed parameter we mean that if the input x(t) produces the
output y(t), then the input x(t +.,.) produces the output y(t +r), By
linear we mean that if the input Xi(t) produces the output Yi(t), then the
input
produces the output
y(t) = BIYl(t) + G2Y2(t)
An example of such a system would be one governed by a set of linear
differential equations with constant coefficients.
If the time function
z(t) == el"
t E.I., &8 presented in Bode (I), Guillemin (I), or James, Nichols, and PhUlipi (I).
171
172 RANDOM SIGNALS AND NOISE
where w is a real number, is applied for an infinitely long time to the
input of a fixedparameter linear system, and if an output exists after
this infinite time, it is of the same form; i.e.,
y(t) = Ae;'"
where A does not depend on t. For suppose that an input of the form
exp(jwt) which has been present since t =  00 produces a welldefined
output y(t), which is called the steadyBtate response. Then, since the
system is a fixedparameter system, the input
x(t + t
')
= eiw('+t') = tl
tI
" e;tI'
produces an output y(t + t').. However t' does not depend on t, and the
system is linear; therefore, the input exp(jwt
')
exp(jwt) gives the output
exp(jCA)t')y(t). Hence
y(t + t
')
= e
i
"'''y(t)
Upon setting t = 0, it follows that
y(t') = y(O)e
i
''''
is the system response to the input exp(jwt
'),
thus proving our assertion
with A = y(O). If now the input has a complex amplitude X (jw) ,
that is, if
the output is
which we write
x(t) = X(jw)e
i
'"
y(t) = AX(jtA1)e
i
""
(91)
y(t) = Y(jw)e
i llt
' (92)
The ratio A of complex amplitudes of y(t) and x(t) is a function of w,
and we denote it henceforth by H(jtA1); thus
H(
) = Y(jw) (93)
3
W
X (jw)
H(jw) is called the system function of the fixedparameter linear system.
It should be noted that a fixedparameter linear system may not have
a welldefined output if it has been excited by an input of the form
exp(jtA1t) for an infinitely long time, e.g., a Iossless LC circuit with reso
nant frequency. tA1. This fact is connected with the idea of stability,
which we shall discuss briefly a little later OD.
Suppose that the system input is a periodic time function whose
Fourier series converges. Then we can write
+
z(t) = l a(jru".) exp(jru""t) (94)
n
where
1 2r
To ::I  == 
!. CaJ.
(95)
LINEAR SYSTEMS 173
iii the fundamental period, and the complex coefficients a(jnw
o
) are
given by
1 /7'./
2
a(jn",o) = T z(t) exp( jn"'ot) dt
o 7'./2
It follows from Eq. (93) that
y.(t) = Y(jnw
o)
exp(jn6)
o
t) = a(jnwo)H(jnw
o
) exp(jnc"ot)
is the system output in response to the input component
(96)
Since the system is linear.ita total output in response to x(t) is the sum
of the component outputs y,,(t). Thus
+.
yet) = l a(j'TkA1o)H(jn
wo) exp(jnwot)
ft
(97)
is the steadystate output ofa fixedparameter linear system when its
input is a periodic function of time as given by Eq. (94).
Next, let us assume that the system input is a transient function of
time for which a Fourier transform exists. OUf series result above may
be extended heuristically to this case as follows: Suppose that we replace
the transient input x(t) by a periodic input
+
= l a(jnwo) exp(jnwotJ
,,.
where the coefficients are determined from :e(t) by Eq. (96).
The corresponding periodic output is, from Eq. (97),
+
yet) = l a(j'TkA1
o)H(j'TkA1o
) exp(j'TkA1.,l)
"'10
On multiplying and dividing both equations by To and realizing that
6)0 is the increment in 1kAJ
o
, we get
+
= ir LT"a(j'TkA1o) exp(j'TkA1ot)
ft .
+
and y(t) .... ir T"a(j'TkA1o)H(j'TkA1o) exp (jnw,t)
,,
174 RANDOM SIGNALS AND NOISE
If now we let T.t> 00 (and let n+ ao while CaJ.+0 80 that nw. == CIJ),
it follows from Eq. (96) that
lim Toa(jnCIJ
o)
= f + z(t)eiw'dt = X(jw) (98)
'1'"..... 
where X ( j ~ ) is the Fourier transform of the transient input, and hence
that
and
lim f(t) = !. f+ X(jw)e
iwl
tL.J = x(t)
T"..... 2... __
lim y(t) = Yet) = 2
1
f+ X(jw)H(jw)eiw
'
tL.J
'1'.+ r .
(99)
(910)
The system output is thus given by the Fourier transform of
Y(jw) = X ( j ~ ) H ( j 6 J ) (911)
i.e., the Fourier transform of the output of a fixedparameter linear sys
tem in response to a transient input is equal to the Fourier transform
of the input times the system function (when the integral in Eq. (910)
converges).
Unit Impulse Response. A transient input of special importance is the
unit, impulse
x(t) = a(t) (912)
As shown in Appendix 1, the Fourier transform of this input is equal to
unity for all w:
X(jw) = 1 (913)
The Fourier transform of the corresponding output is, from Eq. (911),
Y(jw) = 1 H(jw) = H(j6J)
and the output h(t) is, from Eq. (910),
1 f+
k(t) = 2 H(j6J)ei
fll
' dc.J
1r __
(914)
(915)
The unit impulse response k(t) of a fixedparameter linear systemis there
fore given by the Fourier transform of the system function H(j",) of that
system; conversely
(916)
If the unit impulse response is zero for negative values of " i.e., if
h(t) == 0 when t < 0
(917)
the corresponding linear system is said to be phyricaU1l realizable.
LINEAR SYSTEMS 175
The Convolution Integral. The response of a fixedparameter linear
system to a transient input can be expressed in terms of the system's
unit impulse response by substituting for the system function from Eq.
(916) into Eq. (910). Thus we get
f
+1O 1 f+
yet) =h(T) dr  X(iCIJ)ei
wCl

T
) tJ<"
_. 2r .
The integral over CJJ is the inverse Fourier transform of X(jCJJ) evaluated
at the time t  T (i.e., the input at t  T). Hence
J
+.
Yet) = _. h(T)x(t  1') dT (918)
Alternatively, by expressing X(jw) in Eq. (910) In terms of z(t), or by
simply changing variables in Eq. (918), we get
J
+
Yet) = 10 x(r)h(t  r) dT
(919)
'These integrals, known as convolution integrals, show that a fixedparam
eter linear system can be characterized by an integral operator. Since
the system output is thus expressed as a mean of the past values of the
input weighted by the unit impulse response of the system, the unit
impulse response is sometimes called the weighting function of the corre
sponding linear system.
As the unit impulse response of a physically realizable linear system
is zero for negative values of its argument, we can replace the infinite
lower limit of the integral in Eq. (918) by zero:
yet) = k" h(r)x(t  T) dT
Further, if we assume that the input was zero before t = a (as shown
in Fig. 92), the upper limit may be changed from +00 to t + a. Thus
we get
'+0
Yet) = f h(T)X(t  T) dT
o
(920)
for the output of a physically realizable linear system in response to an
input starting at t = a.
An intuitive feeling for the convolution integral may perhaps be
obtained as follows: Consider the output of a physically realizable linear
system at some specific instant of time '1 > 0. Let the system input
x(t) be approximated over the interval (a,tl) by a set of N nonover
lapping rectangular pulses of width /:.1' (as shown in Fig. 92), where
N ~ f ' == t 1 + fa
176 RANDOM: SIGNALS AND NOISE
Since the system is 'linear, the system output at t
l
is given by the sum of
the outputs produced by the N previous rectangular input pulses.
Consider now the output at t
1
produced by the single input pulse
occurring at the earlier time (t
l
 n where n N. As + 0, the
width of the input pulse approaches zero (and N 00) and the output
due to this pulse approaches that which would be obtained from an
impulse input applied at the same time and equal in area to that or the
%ttl
0
I
t
1
I
I
I
I
J
I
I
hIt)
I
I
h(n61')
1141' l'
FIG. 92. The convolution integral.
given rectangular input pulse, i.e., to x(t
l
 n Since the system
output at t
1
due to a unit impulse input at (t
l
 n is hen4T), the out
put at t
1
due to the given rectangular input pulse is approximately
hen  n The total output at t
1
is then approximately
N
y(tl) = Lh(n AT)x(tl  n AT) AT
ftl
If now we let 6,.,. 0 (and n 00 so that n = T), we obtain
y(t
1
) = t: h(T)X(t
l
 T) dT
which is Eq. (920) evaluated at t = tl.
A fixedparameter linear system is said to be stable if every bounded
input function produces a bounded output function. A stability require
ment on the unit impulse response can be obtained as follows: It follows
from Eq. (918) that
IY(') I = If+: h(T)X(t  1') dT Is f: Ih(T)llz(t  1')1 dr
LINEAR SYSTEMS 177
If the input is bounded, there exists some positive constant A such that
Ix(t)1 ~ A < +fX)
for all ,. Hence, for such an input,
for all t. Therefore, if the unit impulse response is absolutely integrable,
i.e., if
then the output is bounded and the system is stable. On the other hand,
it may be shown] that if h(t) is not integrable, then the system is unstable.
It is often useful to extend the system function as we have defined it
to a function of the complex variable p = a + jCJJ. Let H(p) be defined
by the complex Fourier transform
H(p) = J h(t)e<+f<oJ' dt (921)
in that region of the p plane where the integral converges. Equation
(921) reduces to our earlier expression for the system function, Eq.
(916), when a = O. In particular, if the filter is realizable and stable,
i.e., if h(t) = 0 for t < 0 and h(t) is integrable, then the integral in Eq.
(921) converges uniformly for all a 2: o. It follows that H(p) is
defined as a uniformly bounded function for all p with a ~ 0 (i.e.,
IH(p)1 ~ A = constant for all a ~ 0). Under the same conditions it
may be shown by standard methods] that H(p) is in fact an analytic
function for all p with a > o. Thus H(p) can have no poles on the
JCJJ axis or in the right half p plane. Conversely, it may be proved that
if H(p) is a bounded analytic function of p for a 2: 0 and if H(jw) satis....
fies an appropriate condition regulating its behavior for large CJJ, then the
Fourier transform of H(p) is a function of t which vanishes for t < 0;
i.e., its Fourier transform is a realizable impulse response function. In
fact, if H(jw) is of integrable square along the jw axis, then
H(P) = kh(t)eCa+i
w
). dt
(where the convergence of the integral is in the sense of convergence
inthemean) for some h(t). For a more detailed discussion of H(p) as
t James, Nichols, and Phillips (I, Sees. 2.8 and 5 . ~ ) .
*Titchmarsh (I, Chap. II).
I Paley and Wiener (I, Theorem VIII, p, 11).
178 RANDOM SIGNALS AND NOISE
a function of a complex variable, the reader is referred to other sources. t
Generally we shall consider only stable linear systems in the discussions
that follow.
Evaluation of Networksystem Functions. In some of our later dis
cussions, it will be necessary to consider multipleinput linear systems.
l ..et us therefore review some of the inputoutput relations for an Ninput
singleoutput stable linear system. Since the system is linear, we may
express its output y(t) as a sum of terms y,.(t):
N
Yet) = l y..(t)
ns=.}
(922)
where y,,(t) is defined to be that part of the output which is produced
by the llth input x,,(t); i.e., y(t) is equal to y.(t) when all the inputs are
zero except the nth. The problem now is to determine y,,(t) given xn(t).
Suppose that the nth input is a. sinusoidal function of time X"(j,,,)
exp(jwt) and that all other inputs are zero. We can then define a sys
tem function H,,(jw), relating the output to the nth input, by
(923)
(924)
where Y,,(jw) is the complex amplitude of the resulting response. We
can also define a unit impulse response function hn(t) as the Fourier
transform of H,,(j6):
1 f+
h,,(t) =  Hn(jCIJ)e
i
"" dw
2... 00
This impulse response is the system output in response to a unit impulse
at the nth input when all the other inputs are zero.
A particular case of interest is that in which the system under dis
cussion is an N + 1 terminalpair electrical network as shown in Fig. 93.
It is desirable to evaluate the system functions HII in terms of the more
familiar network impedances and admittances. Assuming sinusoidal
inputs, we may write
and
where generally V" and I" are complex functions of frequency. The
voltage and current relations at the N + 1 terminal pairs can then be
t See James, Nichols, and Phillips (I, Chap. 2) and Bode (I, Chap. VII). For
mathematical discussion of the complex Fourier transform see Paley and Wiener (I),
especially the Introduction and Sec. 8.
LIKBAB 8YSTEIrIS 179
.
I
I
Stable
.
linear
II
network
..

1
FIG. 93. The N + 1 terminalpair network.
expressed by the N + 1 simultaneous equations.]
10 = YooVo + YOIV1 + . + YONVN
II = YIOVO + y11V1 + + YINVN
(925)
where the coefficientsYn" are the socalled shortcircuit transferadmittances.
If we set all the input voltages to zero except V
k
(i.e., if we shortcircuit
all inputs except the kth), Eqs. (925) become
10 = YOkVk
II = YlkVk
IN = YN"VIc
Then, solving for the admittances Y"Ie, we obtain
1
ft
Y,,1c = Vic
(926)
Thus the admittances Yal: are given by the ratio of the complex amplitude
of the shortcircuit current flowing in the nth terminal pair to the com
plex amplitude of the voltage applied to the kth terminal pair (when all
other terminal pairs are shortcircuited) ; whence the name" shortcircuit
transfer admittance." If
(927)
for all values of nand k, the network is said to be bilateral.
It is convenient at this time to introduce the Thevenin equivalent
circuit of our network as seen from the output terminals. This equiva
lent network, shown in Fig. 94, consists of a generator whose voltage
equals the opencircuit output voltage of the original network, in series
with an impedance Zo, which is the impedance seen looking back into
t Cf. Guillemin (I, Vol. I, Chap. 4).
180 RANDOM SIGNALS AND NOISE
the original network when all of the inputs are shortcircuited. From
this definition of output impedance, and from the network equations
(925), it follows that
Therefore
1
0
= YooV
o
= V
o
Zo
1
Zo ==
yoo
(928)
i.e., the output impedance equals the reciprocal of the shortcircuit
drivingpoint admittance of the output terminal pair.
Let us now determine the opencircuit output which results from the
application of the input voltages VI, VI, , VN. On shortcircuiting
0.10
FIG. 94. The equivalent network.
the output terminals, we get from the Thevenin equivalent network the
relations
1
0(80)
Vo(OC) =  I
o(8C)Zo
=  
yoo
The shortcircuit output current 1
0(80)
flowing in response to the given
set of input voltages follows from the first of Eqs. (925):
N
l o(SC) = Ly""V.
31
Hence the opencircuit output voltage Vo(OC), can be expressed in terms
of the input voltages by the equation
N
Vo(OC) = L( V..
nl
From the definition of the system functions H,,(jCAJ) it then follows that
N
Vo(OC) = LH.(jw) V"
ftl
On comparing these last two equations, we see that the system functions
LINEAR SYSTEMS
181
can be evaluated in terms of the shortcircuit transfer admittances by
the relations
(929)
'Ve shall use these results later in our studies of thermal noise in linear
networks.
92. Random Inputs
The inputoutput relations discussed in the preceding section deter
mine the output of a fixedparameter linear system when the input is 8,
known function of time. Suppose now that the input is a sample func
tion of some random process. If we know what function of time the
given sample function is, we can apply directly the results of Art. 91.
More often, however, the only facts known about the input sample func
tion are the properties of its random process; e.g., it might be stated only
that the input is a sample function of a gaussian random process whose
mean and autocorrelation function are given. A question then arises as
to whether the results of the preceding article are applicable in such a case.
Suppose that the input to a stable fixedparameter linear system is a
sample function of a bounded random process. Since all sample func
tions of such a random process are bounded, it follows from the definition
of stability that the convolution integral
.:
y(t) = _aD h(r)x(t  r] dr (930)
converges for every sample function and hence gives the relation between
the input and output sample functions. More generally, it follows from
the facts stated in Art. 47 that the integral of Eq. (930) converges for
all sample functions except a set of probability zero if
(931)
Since the filter is assumed to be stable, this condition is satisfied if E(lxTD
is bounded for all r, In particular, if the input random process is sta
tionary or widesense stationary and has an absolute first moment, then
E(fxTD = constant and the condition above is satisfied. It may be noted
in passing that from the Schwarz inequality
E(lxl) s vE(lXfi) (932)
80 that if x(t) is a sample function from any stationary random process
"ith a finite second moment, the integral of Eq, (930) converges for all
182 HANDOM SIGNALS AND NOISE
sample functions except a set of probability zero. Furthermore. if the
input process is stationary,
(933)
93. Output Correlation Functions and Spectra
Suppose that the input to a fixedparameter stable linear system is a
sample function x(t) of a given random process and that y(t) is the corre
sponding sample function of the output random process. The output
autocorrelation function is, by definition,
R
1I
(tt,t2) =
Since the system is stable, y, may be expressed in terms of x, by the
convolution integral in Eq. (930). Hence we can write
which, on interchanging the order of averaging and integration, becomes
R.(t 1,t2) = J+ h(a) da J:" h(fJ) dfJ E(:e,,aX:'_6)
or R.(tJ,t2) = J+ h(a) da J+,.," h(fJ) dfJ R,,(t
1
 a,t2  (j) (934)
The autocorrelation function of the output of a stable linear system is
therefore given by the double convolution of the input autocorrelation
function with the unit impulse response of the system.
If the input random process is stationary in the wide sense, we have
Rz(t
1
 a,l!  (j) = + (j  a)
where l' == il  t2. In this case the output autocorrelation function
becomes
(935)
The times t
1
and t2 enter into Eq. (935) only through their difference T.
This result, in conjunction with Eq. (933), shows that if the random
process input to a stable linear system is stationary in the wide sense,
then so is the output random process.
The spectral density S1l(f) of the system output is given by,
S.(f) = J+.... R.(1')e
i
.... d1'
where CIJ. = 21(1. Hence, from Eq. (935),
S,,(f) = .."h(a) da h(fJ) dfJ /_+..R.,(1' + fJ  d,
LINEAR SY8T.EMS
On introducing a newvariable 'Y == T +(j  Q, we get
183
8,,(f) :: J+: h(a)e
ioo
" da J+: h({3)e+
jw6
d{3 J+." R:.('Y)ei..., d'Y
The integrals in this equation may be identified in terms of the system
function and the input spectral density. Thus
(936)
The spectral density of the output of a stable linear system, in response
to a widesense stationary input random process, is therefore equal to
the square of the magnitude of the system function times the spectral
density of the system input.
Multipleinput Systems. Let us .consider next a fixedparameter stable
linear system having N inputs x,,(t) and a single output yet). "'Pc showed
in Art. 91 that since the system is linear, the superposition principle
may be applied and the system output can be expressed as
N
y(t) = l y..(t)
n=l
where y,,(t) equals y(t) when all the inputs are zero except the nth. We
also showed there that we may define a system function H,,(jw) and an
impulse response h,,(t) relating y,,(t) and x,,(t). From the definition of
h,,(t), it follows that y,,(t) can be expressed in terms of x,,(t) by the convo
lution integral
The total output may therefore be expressed in terms of the various
inputs by the equation
N
y(t) = l J:. h..(.,.)x..(t  .,.) d.,.
1&1
When the various inputs are sample functions of random processes,
we can determine the output autocorrelation function by paralleling the
steps followed in the singleinput singleoutput case above. In this way
we obtain
N N
R,,(t1,t2) = l l !+: h..(a) da !+: hm({3) d{3 R"",(h  a,t2  (3)
nlml
(937)
where RAM is the crosscorrelation function of the nth and mth input
processes:
184 RANDOM SIGNALS AND NOISE
If the various inputs are mutually uncorrelated and all have zero means,
R....(t,t') = {: ..(t,t')
wlenn = m
when 11, m
(941)
where R. is the autocorrelation function of the nth input. The output
autocorrelation function then becomes
N
RI/(t"t,) = LJ+: h..(a) da f+: h..(fJ) dfJ R..(t 1  a,t,  fJ) (938)
,,1
If the various input random processes are all stationary in the wide
sense, then
R"",(t,t') = R"",(t  t')
On setting T = t
1
 t2, the output autocorrelation function therefore
becomes
N N
lll/("') = L Lt: h..(a) d J+.: h..(fJ) dfJ R....(.,. +fJ  01) (939)
nI ml
when the input processes are correlated, and becomes
N
RI/("') = I t: h..(a) dOl t: h,.(fJ) dfJ R..(.,. +fJ  01) (940)
,,I
when the input processes are uncorrelated and have zero means. The
corresponding output spectral densities may be obtained by taking the
Fourier transforms of these results. Thus we obtain, where 8
n
", is the
crossspectral density of the nth and mth inputs,
N N
81/(/) = LLH..(j2rf)H:.(j2rf)8....(f)
nl ml
when the input processes are correlated, and
N
81/(f) = LIH..(j2rf)1'8..(f)
,,1
(942)
when the input processes are uncorrelated and have zero means (where
S" is the spectral density of the nthinput process).
A particular case of interest is that in which the stable fixedparameter
linear system under study is an N + 1 terminalpair electrical netwerk
as sh.own in Fig. 92. We saw in Art. 91 that the system function of
LINEAR SYSTEMS 185
such a network could be expressed in terms of the various shortcircuit
transfer admittaneea by the relations
H
A(j2
...J) = _ YOn
Yoo
On substituting these relations into Eqs. (941) and (942), we get
N N
So(J), = L~ l i ' o ~ ~ S..m(f)
ftl ml
when the inputs are correlated, and
N
So(f)  LI::: rS..(J)
ftl
when the input processes are uncorrelated and have zero means.
(943)
(944)
94. Thermal Noiset
The randomness of the thermally excited motion of free electrons in a
resistor gives rise to a fluctuating voltage which appears across the termi
nals of the resistor. This fluctuation is known as thermal noise. Since
the total noise voltage is given by the sum of a very large number of
electronic voltage pulses, one might expect from the central limit theorem
that the total noise voltage would be a gaussian process. This can indeed
be shown to be true. It can also be shown that the individual voltage
pulses last for only very short intervals in time, and hence the noise
voltage spectral density can be assumed practically to be flat; i.e.,
approximately
(945)
as in the study of shot noise.
It is often convenient in analyses of thermal noise in electrical net
works to represent a noisy resistor either by a Thevenin equivalent cir
euit conaisting of a noise voltage generator in series with a noiseless
resistor or by an equivalent circuit consisting of a noise current gener
ator in parallel with a noiseless conductance. These equivalent circuits
are shown in Fig. 95.
The zerofrequency value of the noisevoltage spectral density may be
determined by studying the thermal equilibrium behavior of the system
composed of a noisy resistor connected in parallel with a lossless induc
tor. Such a system with its equivalent circuit is shown in Fig. 96.
t Cf.. Lawson and Uhlenbeck (I, Arts. 4.1 through 4.5).
186 RANDOM SIGNALS AND NOISE
Noiseless
Noisy
R
=
=
Noiseless
._J.
R
FIG. 95. A noisy resistor and its equivalent circuits.
Noiseless
Lossless
L
(946)
FIG. 96. The resistorinductor sY8tem.
From the equipartition theorem of statistical mechanics,t it follows that
the average free energy in the current flowingaround the loop must equal
kT/2, where k is Boltzmann's constant and T is the temperature of the
system in degrees Kelvin. Hence
E (Li
2
) ::0 !!E(i
2
) == kT
2 2 2
The meansquare value of the current can be evaluated from its spectral
density:
E(i
2
) = /: Si(!) dJ
Defining H(j27tJ) to be the system function relating loop current to noise
voltage, we have
H(j2.".j) = R +~ 2 " " J L
and hence, from Eqs. (936) and (945),
Si(/) = IH(j27l"/)12S(!) = R2 ~ ( ~ J L ) 2
The meansquare value of the current is, therefore,
E(
12)  S (0) 1+.0 dJ _ 8",,(0)
t  k v" 00 R2 + (21rjL)2  2RL
On substituting this result into Eq. (946) and solving for 8(0), we get
8
11
,, (0) == 2kTR
t See Valley and \\rallman (I, pp. 515519) or van der Ziel (I, Art. 11.2).
(947)
LINEAR SYSTEMS 187
(948) Si,.(f) = 2kTg
1'ne spectral density of the equivalent noise current generator, shown in
Fig. 95, is thent
The frequency above which it is unreasonable to assume a flat spectral
density for thermal noise is subject to some question.] However, it is
undoubtedly above the frequency range in which the concept of a physical
resistor as a lumpedconstant circuit element breaks down.
Thermal Noise in Linear Networks. Let us now determine the spectral
density of the noise voltage appearing across any two terminals of a
lossy linear electrical network. Suppose, for the moment, that the loss
I
~ ~
I
~ P : ! J
I
I
I
I
~
V N _ ~
I
I
Lossless
part of
network
FIG. 97. An Nresistor lossy network.
in the network under consideration is due entirely to the presence of
N resistors contained within the network. Let the resistance and tem
perature of the nth resistor be R; and Ttl respectively. We pointed out
above that a noisy resistor could be represented by a noiseless resistor in
series with a noisevoltage source. The network under study may be
eonsidered therefore to be an N + 1 terminalpair network having as
inputs the noisevoltage sources of the N resistors. Aside from the N
resistors, the network is Iossless; the system can be represented sche
matically as in Fig. 97.
The noise voltages generated by the different resistors are statistically
independent; hence, from Eq. (944), the spectral density of the output
noise voltage is
N
SoU) = ~ I::: rS,,(f)
nl
t These results were first derived by Nyquist. See Nyquist (I).
t Bee Lawson and Uhlenbeck (I, Art. 4.5).
188 RANDOM SIGNALS AND NOISB
where the shortcircuit transfer admittance 'Yo. relates the output termi
nals to the terminals of the nth noise voltage source, and where 8,,(/) is
the spectral density of the nth noise voltage source. Since, from Eq.
(947),
SA(/) = 2kT"R"
it follows that the spectral density of the output noise voltage of the
lossy network can be expressed as
N
So(f) = 2k 2: I rTfiR"
,,1
(949)
This result is sometimes known as Williams's theorem. t When all resis
tors in the network are at the same temperature T, Eq. (94) simplifies to
N
So(f) = 2kT 2: I rR"
n=1
(950)
H,. in addition, the requirement that the lossy network be bilateral is
added, at least in the sense that
'Yo" = 'U"o
for all n, the spectral density of the output noise voltage becomes
So(J) = 2kTRo(f) (951)
where R
o
(!) is the real part of the impedance seen looking back into the
output terminals of the lossy network. This result is commonly known
as the generalized Nyquist theorem; by suitably defining" resistance," it
can be extended to apply tolinear dissipative systems in general.j e.g.,
to Brownian motion, gaspressure fluctuations, etc.
In order to prove the generalized Nyquist theorem, let us suppose that
we can shortcircuit all the idealized noise voltage sources v" and apply a
sinusoidal voltage across the output terminals of the lossy network. The
complex amplitude I" of the current flowing in the resistor R" is related
to the complex amplitude V
o
of the applied voltage by
In = y"oVo
The average power 13fa dissipated in R" under these conditions is
PI' = =
t Williams (I).
t Callen and Welton (I).
LINEAR SYSTEMS 189
The total average power Po dissipated in the lossy network is the sum
of the average powers dissipated in the N resistors:
N
Po = l ~ o l 2 Lly..oI2R..
n1
This total average power can also be expressed in terms of the conditions
at the output terminals; thus
p ==! M[V 1*] = IVoit m[.!] = IV
o
l
2
R
o
o 2 U\ 0 0 2 zri 2 I
z
ol2
where 9l denotes real part and R
o
is the real part of the output imped
ance Zo of the lossy network. Comparing the last two results, we find
that
(952)
If Y"G = YOn for all n, then
(953)
If this result is substituted into Eq. (950), the generalized Nyquist
theorem, Eq. (951), is obtained.
96. Output Probability Distributions
Our concern in this chapter is to characterize the randomprocess out
put of a linear system which is excited by a randomprocess input. Up
to this point we have concerned ourselves chiefly with the problem of
finding the autocorrelation function of the output when the autocorre
lation function of the input and the impulse response of the system are
known; we have also considered the problem, which is equivalent for a
fixedparameter system excited by a stationary process, of finding the
spectrum of the output when the spectrum of the input and the system
function are known, We have seen that these problems can always be
solved, at least in principle, if the system can be characterized by an
integral operator as in Eq. (918). If the input random process is gauss
ian, the output random process is also gaussian, as discussed in Art, 84;
hence, from a knowledge of the mean and autocorrelation functions for
the output, any nvariate probability density function for the output
process can be written explicitly. t If the input random process is non
t Ct. Wang and Uhlenbeek (I, Arts. 10 and 11).
190 RANDOM SIGNALS AND NOISE
gaussian, the probability distributions characterizing the output process
are not, in general, determined by the autocorrelation and mean of the
output.
This being so, it is natural to ask what can be done toward obtaining
output probability distributions for a linear system driven by a non
gaussian random process. The answer is, usually very little. There
seems to be no general method for finding even a singlevariate proba
bility distribution of the output, beyond the bruteforce method of calcu
lating "all its moments. This is unsatisfactory, of course, first because
not all distributions are determined by their moments, and second because
it is usually a practical impossibility to get all the moments in an easily
calculable form. However, a few examples of this type of problem have
been worked out by one device Of another; in most of these examples
the input random process, although not gaussian, is some particular non
linear functional or class of functionals of a gaussian process,t such as
the envelope, the absolute value, or the square.
The Squarelaw Detector and Video Amplifier. An interesting and
important calculation is that for the probability distribution at time' of
the output of a system consisting of an IF filter, squarelaw detect Of, and
video filter, and for which the system input is either white gaussian noise
or signal plus white gaussian noise. This was first done by Kac and
Siegert,t and the results have been extended by others. We shall
followessentially Emerson's treatment.
First we note that this example is indeed one in which a linear system
is driven by a nongaussian random process; for although the input to the
IF filter is gaussian and hence the output of that filter is gaussian, the
video filter input is the output of the detector, which is the square of a
gaussian process. The essential difficulties of the problem arise because
of the inclusion of the video filter; the probability distribution of the
output of the squarelaw detector at time t can be calculated directly
(e.g., see Arts. 36 and 122). The IF filter is included in the problem
setup for two reasons: engineering verisimilitude and mathematical expe
diency. The problem can be treated by starting with a stationary
gaussian process into the squarelaw detector, but to give a rigorous
justification of some of the steps in the analysis (which we do not do
here) it is necessary to impose some conditions on the nature of the input
process; a sufficient condition is to suppose it has a spectrum which is
the square of the system function of a realizable stable filter (e.g., such a
spectrum as would be obtained by passingwhite gaussian noisethrough aD
IF filter).
t Of. Siegert (I, pp. 1225).
*Kac and Siegert (I).
See Emerson (I) and Meyer snd Middleton(I).
LlNBAR SYSTEMS 191
Let H
1
(jCIJ) and H,(j",) be the system functions of the IF and video
filters, respectively, and let hl(t) and hl(t) be their unit impulse responses
(Fig. 98). Let vo(t) be the IF filter input, VI(t) the IF filter output, and
Vt(t) the output from the video filter. Assuming both filters to be stable,
we can write
and
(954)
(955)
(956)
Hence
Vt(t) = i: t: f+: ht(t  r)h.(r  8)h.(r  U)Vo(8)Vo(U) d du dr
(957)
Upon letting iT = t  r, I' == t  'U, and IJ = t  8, Eq. (957) becomes
where
(958)
<959)
The central idea in the solution is to expand the input signal and noise
in a series of orthogonal functions chosen so that the output at any time
Video
filter
H2UW)
h2(t)
r.
I I
I IF I
I filter Squar& I
o IA law
~ ov' HIUW) VIIt) detector V1
2
{t ) r
~ laJ.!f1 I
I I
L ~ ~
FlO. 98. A linear system with a nongaussian input.
.,..
, can be written as a series of independent random variables. It turns
out, as we shall show below, that this can be done by choosing for the
orthogonal functions an orthonormal set of characteristic functions of the
integral equation
f+: A(p, ..)4l(") d.. = X4l(,,) (960)
From Eq. (959) we see that A(p,J') is a symmetric function of JJ and JI.
Also, if h
1
(t) is square integrable and hl(t) is absolutely integrable, it can
be shown that
f+: f+: AI(p, ..) d" d" < +co (961)
Hence Eq. (960) hu a discrete set of characteristic values and functions
192 RANDOM SIGNAlS AND NOISE
under these conditlons.] Furthermore, A(",v).is nonnegative definite if
in addition h
2
(t) ~ 0 for all ti the condition that A(p,,) be nonnegative
definite] is equivalent to the condition that the right side of Eq. (958)
be nonnegative, i.e., that V2(t) be nonnegative. But since t/1
2
(t) ~ 0,
it follows from Eq. (956) that tJ2(t) ~ 0 if h
2
(t) ~ 0.1 Hence it follows
from Mercer's theoremt that
A(p,II) = I "'"t/I,,(p)t/I,,(II)
k
(962)
where the ~ " and 4>" are the characteristic values and orthonormal charac
teristic functions of Eq. (960).
On using Eq. (962) in Eq. (958) we obtain
V2(t) = I "'" [!+.... v.(t  p)t/I,,(p) dp]"" (963)
k
Suppose now that the system input is composed of an input signal func
tion 8(t) and a sample function net) of a white gaussian random process:
If then we define
and
tJo(t) = B(t) + net) (964)
(965)
(966)
(967)
the output of the video filter can be expressed as
V2(t) = I ",,,[8,,(t) +n,,(t)]2
k
Let 112 be the random variable defined by t/2(t) at time t, nA: the random
variable defined by n1l:(t), and 8A: the value of 8,,(t). Each ftt is a gaussian
random variable, since it is the integral of a gaussian process. Hence,
each (8i +ni) is a gaussian random variable with mean value Ii Fur
thermore, the (Ii +ni) are mutually independent and have variance Btl,
since
E(nJn,,) = t: f+: E[n(t  u)n(t  V)]t/li(U)t/I,,(v) du dv
=f+: f+: 8..6(1  v)t/lj(u)t/I,,(v) du dv
{
8
ft
if k = j (968)
= 0 i f k ~ j
t See Appendix 2, Art. A22.
*See Appendix 2, Art. A21.
I Note that we have shown only that 1&1(0 ~ 0 is a sufficient condition for the Done
negativeness of. A(,I.,.,,) and not that it is a necessary condition.
LlNJDAB SYSTEMS 193
where 8,. is the spectral density of the input noise, and a(u  11) ill the
impulse function.
The key part of the problem of finding the singlevariate probability
distribution of the output is now done. Equation (967) gives VJ as a
sum of squares of independent gaussian random variables, so the determi
nation of the characteristic function of II, is straightforward. However,
after getting the characteristic function, there will still remain the diffi
culty of obtaining the probability density function of 1)2, i.e., of finding
the Fourier transform of the characteristic function.
The characteristic function of VI is
M..(jz} ... E {exp riz l + n.)2]}
,
= n +nk)2]}
,
Since the averaging is over the probability distribution of the gaussian
random variable ni,
The integrals may be evaluated by completing the square to give
(970)
E971)
If the input to the system is noise alone, 81c = 0 and Eq. (970) becomes
M(jz) ... I1 (1  ..
G  Pal
207
(108)
Consider the amplifier shown in Fig. 101. From this figure, it follows
that the incremental available powers of the source and the amplifier
output are
P
= (e,2)d/
a. 4R,
and
p = (e.
2
) . ,
GO 4R
o
(109)
respectively, The available power gain of the amplifier is therefore:
G= (R.) (e.
2
)dl = R,So(f)
CI n: (e,2)d/ RoS,(f)
where 8
0(f)
is the spectral density of eo and S,(f) is the spectral density
of e.. The output spectral density can be evaluated by applying the
techniques developed in Chap. 9; it is
So(f) = I ~ ~ ~ ~ ~ i rS,(f)
where H (j21('j) is the system function of the amplifier relating eo to eie
Hence
G
a
= R'jll(j21f
J)Z,
1
2
(1010)
R. z, +Z.
It should be noted that the available power gain of the amplifier depends
upon the relation between the amplifier input impedance and the source
output impedance.
102. Noise Figure
Weare now in a position to define the noise figure of an amplifier:
The operating noise figuret (or operating noise factor) F. of an amplifier
is the ratio of the incremental available noise power output N
ao
from the
amplifier to that part J.V
aiI
of the incremental available noise power out
put which is due solely to the noise generated by the source which drives
the amplifier:
r, = N_ (1011)
Nao
The noise figure of an amplifier is thus a measure of the noisiness of the
amplifier relative to the noisines oj the source.
An alternative definition of noise figure may be obtained &8 follows:
The amplifier available noise power output due solely to the source is
t Also called operating spot noise figure.
208 RANDOM SIGNALS AND NOISE
equal to the available noise power of the source N
G
times the available
power gain of the amplifier Ga. If we substitute this result into Eq.
(1011) and then multiply and divide by the available signal power from
the source SGI' we get
Now G.So.. is the available signal power output Soo of the amplifier.
Hence we can write
F  (SIN)
o  (SIN)oo
(1012)
The noise figure of an amplifier therefore equals the incremental avail
able signaltonoise power ratio of the source (S/N)o.. divided by the
incremental available signaltonoise power ratio of the amplifier output
(S/N)tJO. This relation is essentially that originally introduced by Friis]
as a definition of noise figure.
Let us now determine some of the properties of noise figure. First
of all, we note that the noise figure of an amplifier is a function of fre
quency, since the available powers used in its definition refer to incre
mental frequency bandwidths.
Next, we note that the amplifier output noise is due to two independent
causes: the noise generated by the source and the noise generated within
the amplifier itself. The incremental available noise power output is hence
the sum of the incremental available noise power output due to the source
N00. and the incremental available noise power output due to internal
noises N
G04
On using this fact in Eq. (1011), we get
s, = 1 +No.o
i
Noo
(1013)
Since both components of the output noise power must be nonnegative,
the noise figure of an amplifier must always be greater than or equal to
unity:
r, ~ 1 (1014)
The incremental available noise power output due to the source may
be expressed in terms of the available power gain of the amplifier and
the effective noise temperature of the source. Thus Eq. (1013) can be
rewritten as
F
1+
NCIOi (1015)
o = G
4kT
.. df 
The available noise power output due to internal noise sources is gener
ally independent of the effective noise temperature of the source. It
t Friis (I).
NOISE FIGURE 209
(1016)
therefore follows from Eq. (1015) that the operating noise figure of an
amplifier depends upon the effective noise temperature of the source; the
lower is T, the higher is F
o
1
ot.1  G kT d'i
GI,I ., 'J
where N
tJoaa
,. is the incremental available noise power output from the
cascade due to noise sources internal to both stages, Gal" is the available
power gain of the cascade, and T., is the effective noise temperature of
the source which drives the cascade. NaO'I,t equals the sum of the incre
mental available noise power output due to secondstage internal noises
NCIO,t and the incremental available noise power output from the first
stage due to its internal noise NGOil multiplied by the available power
gain of the second stage. Since
F 1
+
NGon
a = GGa
kT
.. df
and
where T.
I
is the effective noise temperature of the source and firststage
combination, it follows that we can express the twostage noise figure in
terms of the singlestage noise figures by the equation
(lO25a)
This becomes
(1025b)
when Tel = T. ,. In particular when both effective noise temperatures
are standard, we get
F
2
 1
r., = F1 + a
til
(1026)
This is the cascading relation for standard noise figures.
Consider now a cascade of N amplifier stages, as shown in Fig. 103.
Again we can split up the Nstage cascade into two parts. If we let
FOliN be the overall operating noise figure of the cascade, F0a,(NI) the
overall operating noise figure of the first N  1 stages of the cascade,
TeCN1) the effective noise temperature of the combination of the source
and the first N  1 stages, and F
ON
the noise figure of the last stage when
driven by the first N  1 stages, we obtain, from Eqs. (1024) and (1025),
NOISE FIGURE 213
Successive applications of this result enable us to express FOJ,N in terms
of the noise figures of the individual stages as
which becomes
N ,,1
F..." =F.. + ~ tr.:  1) / nGa.
2
",1
n
(10270)
(1027b)
when all T.
n
= T,.. It therefore follows that when all the effective noise
temperatures are standard, we obtain
N ,,I
F
I
N
= F1 + ~ (F"  1) / Il Ga.
n2 ",1
(1028)
This is the cascading relation for the standard noise figures. The oper
ating noise figure of the cascade may then be obtained either by sub
stitution of the individualstage operating noise figures into Eqs. (1027)
or by substitution of the individualstage standard noise figures into Eq.
(1028), substituting the result into Eq. (1017). The latter procedure
may well be more practical.
Equation (1028) shows that when the available power gains of all
stages after the kth are much larger than unity, the summation con
verges rapidly after the (n = k + l)th term. Only the noise figures of
the first k + 1 stages are then particularly significant. For example, if
every stage has a large available power gain, the noise figure of the cas
cade is essentially that of the first stage.
104:. Examplet
As an illustration of the application of the results above, let us determine the noise
figure of a singlestage triode amplifier such as that shown in Fig. 104. We shall
assume that any reactive coupling between grid and plate has been neutralized over
the amplifier pass band and that the frequency of operation is high enough that we
must consider the effect of both transittime loading of the grid circuit and induced
grid noiae.j Under these conditions the equivalent circuit of the system is that
shown in Fig. 105.
All the passive elements in Fig. 105 are assumed to be noiseless (except R,,). The
source noise current i. is assumed to result from the source conductance g.. with an
effective temperature T.. The amplifier input admittance is split into its real and
imaginary parts gi and Bi, respectively; the noise generated by the real part is repre
sented by it. The induced grid noise is represented by the noise current source if' and
t Cf. Valley and Wallman (I, Arts. 13.5 through 13.7).
t See, for example, van der Ziel (I, Art. 6.3) or Lawson and Uhlenbeck (I, Art. 4.10).
214
Source
I,
RANDOM SIGNALS AND NOISE
AftIplifler
FIG. 104. The triode amplifier.
Load
Flo. 105. Noiseequivalent circuit of the triode amplifier.
is assumed to result from the grid loading conductance gT with an effective temperature
T.,. The platecurrent noise of the triode is accounted for by the equivalent noise
resistor] R... For convenience, we have lumped the tube input suscepta.nce with B,
and the tube output reactance with Z.
The incremental available noise power from the source is
The incremental availablt' noise power output of the amplifier due to the input source
noise is
The available power gain of the amplifier is therefore
G ,,'g.
= 'I'[(g. +gi +gT)1 +Bil)
(1030)
Here the most convenient of our various expressions for noise figure is that given by
Eq. (1015). In order to apply that expression, we must determine the incremental
t It is often convenient to represent the platecurrent noise of a tube as being due
to a noisy resistor R.., with a standard effective temperature T., in the grid circuit of
the tube (as in Fig. 105). In this case the spectral density of the platecurrent noise
may be written 88
8,U)  2kT.R...'
It then follows from Eq. (791) that, for & triode,
e; _ (0.644) (Te) .!.
" T. ''''
(1029)
It should be emphasized that no current flow. through R'
f
at any t i ~ e ; only ita noise
voltage is significant.
NOISE FIGURE
available noise power output due to internal noises Nao,. Assuming that the internal
noises are uncorrelated, it follows that
and hence (assuming that the temperature of g, is T,) that
where Ii and t; are the relative noise temperatures of the grid input circuit and loading
conductances:
and
Upon substituting these results into Eq. (1015) we obtain
(1031)
(1032)
for the operating noise figure of the singlestage triode amplifier.
Optimization 0/F.. It is interesting to determine what can be done to minimize the
noise figure of the given amplifier. First, assuming fixed circuit parameters, we see
from Eq. (1032) that F. is a minimum when the frequency of operation is such that
(1033)
Although this condition is satisfied only at the resonant frequency of the input cir
cuit, it may be approximated over the bandwidth of the input signal when that band
width is much narrower than the bandwidth of the amplifier input circuit.
Of the various other amplifier parameters, certain ones (t
T
, OT' and R.
f
) are deter
mined by the type of tube used. The best one can do with these is to select a type for
which they are as small as possible. The minimum value of the inputcircuit con
ductance gi is determined by the maximum Q's which are physically obtainable for
the circuit elements and by the required bandwidth of the input circuit. The relative
noise temperature of the grid input circuit t, might well be reduced by cooling the input
circuit.
Since the effective noise temperature of the source is generally not under control,
the only remaining possibility is to adjust the source conductance g. so as to minimize
FD. 1.'hai there is an optimum value of g. follows from Eq. (1032); F. + co when
either g. + 0 or g. + 00. The optimum value can be found by equating to zero the
partia! derivative of F0 with respect to g.. In this way we obtain, assuming B, = 0,
1 [( . + )t + tag, +
g. Jopt =a g. R'f
Substituting this result into Eq. (1032) and setting B, = 0, we get
(1034)
216 RANDOM 8IGNALS AND NOISE
In certain practical amplifiers, the conditiont
(1()'36)
is satisfied. In these cases, the optimum value of the source conductance becomes,
approximately,
and the minimum value of the operating noise figure becomes
F..".  1 +2 ( ~ : ) (R(t,g, +Iygr)]H
(1037)
(1038)
Thus we see that matching the source impedance to the amplifier input impedance in
the usual sense (i.e., for maximum average power transfer) does not necessarily lead
to the lowest possible value of the amplifier noise figure.
The purpose of the above example is mainly to illustrate the application of our pre
vious noisefigure results. The conclusions reached are only as valid as the noise
equivalent circuit shown in Fig. 105.
10&. Problems
1. Consider the system shown in Fig. 101. Let the internal noise current i
A
be due
to a conductance gA at an effective temperature TA, and let Z." be the transfer imped
ance relating the amplifier Thevenin equivalent output voltage e. to the current in.
Derive an expression for the operating noise figure of the amplifier in terms of the
various system parameters.
I. Consider the system shown in Fig. 106. Suppose that a < < 1 and b > > 1
and that all resistors are At the standard temperature.
r,
I I
r,
I I
I I
Network A I Network B I
L ~ L ~
FIG. 106. An attenuator cascade.
l ~
a. Determine the available power gains of the networks A and B in terms of R, 4
and b.
b. Determine the standard noise figures of networks A and B.
8. Referring to Probe 2 above and Fig. 106,
a. Determine the available power gain of the cascade of networks A and B.
b. Determine the standard noise figure of the cascade.
c. Determine the relative noise temperature of the cascade.
t Of. V.ney and Wallman (I, Art. 13.6).
NOISE FIGURE 217
4. Suppose that the effective noise temperature T,l of the source driving a given
amplifier does not vary with frequency. Show that the average operating noise
figure F. of the amplifier is related to the average standard noise figure' of the ampli
fier by the equation
P. = 1 + pT. (F  1)
..
(1039)
I. Consider the system shown in Fig. 107. Suppose that R'
t
Rit and R are at the
standard temperature.
FIG. 107. An amplifier circuit.
a. Determine the available power gain of the amplifier.
b. Determine the standard noise figure of the amplifier.
c. Assuming R
i
> > R. and a high g". tube, obtain approximations to the results in
a and b.
8. Suppose the circuit elements in Fig. 107 have the values
R. == 600 ohms, R .. 0.5 megohm, and R
L
== 100 K ohms
and that the tube parameters are
g". == 1600 mhos, rp == 44
t
OOO ohms, and R'
I
== 1560 ohms
For these values
a. Evaluate the available power gain of the amplifier.
b. Evaluate the standard noise figure.
7. For the system of Probe 5,
a. Determine the optimum value of the source resistance R.
b. Using the values stated in Probe 6, evaluate a.
c. Using b, determine the available power gain and noise figure of the amplifier
when the source resistance has its optimum value.
8. Consider the system shown in Fig. 108. The power meter measures the power
output from the amplifier in a frequency band of width dl centered on J. The source
resistor is at the standard temperature.
218 RANDOM SIGNALS AND NOISE
Temperature
limited 1
diode
R.
Amplifier
Power
meter
FIG. 108. Noisefigure measuring system.
a. Show that the ratio of the powermeter reading Ptl with the diode connected
across R. to the reading Po when the diode is disconnected is
P
d
== 1 + eiRe
Po 2kTJi'
(1040)
where I is the average current flowing through the temperaturelimited diode,
b. Show that when PdiP0 == 2, the standard noise figure of the amplifier is given by
F a:: 201R. (1041)
9. t Consider the system shown in Fig. 109. The inputs of N identical highgain
amplifiers of the type shown in Figs. 104 and 105 are connected to the source through
a linear passive coupling network, and their outputs are added to drive a load.
Linear
passive
coupling
system
...
FIG. 109. An combination.
Show that the standard noise figure of the system is greater than, or equal to, the
optimum standard noise figure of one of the amplifiers.
10. It has been suggested] that the noise performance of an amplifier is better
characterized by the noise measure Itf of the amplifier, where
F 1
M ==
1  11G,.
(1042)
than by the noise figure alone.
Consider a cascade of two identical ampliflers, Show that the noise measure of the
cascade is equal to the noise measure of a single stage, whereas the noise figure of the
cascade exceeds that of a single stage.
t Cf. Bose and Pezaris (1).
Haus and Adler (I).
CHAPTER 11
OPTIMUM LINEAR SYSTEMS
111. Introduction
One application of the theory developed in Chaps. 6 and 9 is to the
design of linear systems to perform operations in an optimum way when
some or all of the inputs are sample functions from random processes.
Thus, for example, a socalled smoothing filter is designed to extract as
well as possible a wanted signal from a mixture of signal and noise; a
predicting filter is designed to yield a future value of a signal, where the
signal may again be mixed with noise. In this chapter we shall consider
how to find certain optimum linear systems. For convenience, we shall
suppose throughout that signal and noise terms are realvalued.
Before treating particular problems, let us consider briefly four con
ditions which largely determine any problem in system optimization.
These are the purpose of the system, the nature of the inputs, the cri
terion of goodness of performance to be used, and the freedom of choice
to be allowed in the design. Whenever these four conditions are speci
fied, some kind of optimization problem is defined, although the specifi
cations may, of course, be such that the problem has no solution at all,
or no best solution, or no unique best solution. In practical problems
another consideration is usually added, the cost of the system (perhaps
in some generalized sense of the word cost). For our present purpose,
however, we assume that cost is not to be a factor.
Rather than discuss these four conditions completely abstractly, let us
observe, as an illustration, how they apply in a particular situation, e.g.,
the design of an optimum smoothing filter. We suppose that there is
available a corrupted signal y(t) which is the sum of a wanted signal s(t)
and unwanted noise n(t),
y(t) = s(t) + n(t)
The first condition on the problem is the purpose of the system. Here
we are assuming that it is to recover the wanted signal B(t) from the
corrupted signal y(t).
Next we have to specify the nature of both inputs, B(t) and net). There
are various possibilities, including some uninteresting ones. For example,
219
220 RANDOM SIGNALS AND NOI8E
if B(t) is known exactly, there is no problem, at least in principle; if net) is
known exactly, it is trivial to obtain B(t) = Yet)  net), and there is again
no problem. At the other extreme, if there is no a priori information
at all about 8(t) and n(t), there is no hope for extracting 8(t) even approxi
mately. Obviously the interesting cases are those in which there is some
uncertainty about the inputs but not too much. It is often reasonable
to assume that the noise input n(t) is caused by some random physical
phenomenon whose statistics are known, as, for example, when net) is
thermal noise or shot noise. If this is true, we may suppose n(t) to be
a sample function of a random process. Depending on its origin, the
signal set) might be represented, to mention only some of the possibilities,
by a polynomial of degree m with unknown coefficients, by a finite trigo
nometric series with unknown coefficients, by a sample function of a ran
dom process, or by a combination of these. One common specification
of the nature of the signal and noise inputs is that both signal and noise
are sample functions of stationary random processes and that both the
autocorrelation functions and the crosscorrelation function are known.
These are the assumptions made by Wiener in his analysis of the linear
smoothing filter, which we shall discuss below.
There are as many possibilities as one desires for 8 criterion of good
ness of performance, that is, for a measure of how well the system per
forms its intended task. The desired output of the system is s(t); if the
actual output is z(t), then any functional of 8(t) and z(t) is some kind of
measure of how well the system operates. Usually, however, one thinks
of a measure of system performance as being some quantity which
depends on the error z(t)  s(t), which is a minimum (maximum) when
the error is zero and becomes larger (smaller) when the error is increased.
Since we are discussing systems which have random inputs and hence
random outputs, it is appropriate to use probabilities and statistical
averages. Some reasonable choices for a measure of how good the per
formance is, or how small the error is, are
1. p(Zt = 8,IY(T) , T ::; t)
2. P(lz,  8,1 > E)
3. E(lz,  8,1)
4. E(lz,  8,1
2
)
The reader can readily add to this list. If we use (1), we ask for that
system whose output has the largest conditional probability, using all
the past history of the signal, of being the right value. If the con
ditional probability densities are continuous and our only interest is that
the output have a small error as much of the time as possible, all errors
larger than a certain value being roughly equally bad, such a criterion is
attractive. It has the. disadvantage, however" that it requires a com
OPTIMUM LINEAR S'tSTEMS 221
plete statistical knowledge of the inputs, and this is often not available.
Choice (2) is a measure of performance for which all errors greater than
some threshold are considered exactly equally bad, while small errors are
tolerated. In this case, of course, one asks for a system which minimizes
the specified probability. In the criteria given by (3) and (4), errors are
weighted according to their magnitude; in (4), large errors are weighted
quite heavily. Choice (4) provides no better criterion for many appli
cations than various other choices, and often it gives a worse one. How
ever, it has the advantage that it leads generally to a workable analysis.
The calculation of E(lzc  8,P') can be made straightforwardly, given the
input correlation functions and the system function of the linear system.
This criterion, of least meansquare error, is used in the Wiener theory
of linear smoothing and in most of the extensions of that theory.
Finally, we must consider the freedom of choice to be allowed in the
design. We have assumed from the beginning of this chapter that the
systems were to be linear. This restriction is imposed not because linear
systems are necessarily best but simply because it is too hard mathe
matically in most instances to allow a wider class of possibilities. In
addition, we shall usually require that the systems be fixedparameter
linear systems and that they be realizable. The restriction to fixed
parameter systems is sometimes not necessary, but when the inputs are
stationary random processes, nothing is lost by making this restriction.
The restriction of realizability is introduced to guarantee that the results
have practical value rather than to simplify the analysis. In fact, the
subtleties of the Wiener theory of smoothing filters exist because of this
restriction. At times it is assumed that the system is to have been
excited over all past time; at other times it is assumed that it has been
excited only for a finite time. We shall discuss the smoothing filter
under both these conditions.
We have talked only about the smoothing filter, but it is clear that a
similar set of considerations applies to any problem in system optimi
zation. Some specification under each of the four conditions mentioned
must be made before a problem in the mathematical sense is even defined.
Three of these conditionspurpose, nature of the inputs, and freedom
of choice allowedare governed pretty clearly by what the system is to
be used for and where, and by theoretical and practical limitations on
design techniques. The other condition, criterion of goodness of per
formance, is also influenced by these things, but it is more intangible.
We shall suppose that intuition, backed by experience and the knowledge
mentioned, is sufficient to lead to a suitable criterion. If one looks more
closely into this matter, he is led to the subject of decision theory, t the
t See, for example, Middleton and Van Meter (I) which contains an extensive
bibliography.
222 RANDOM SIGNALS AND NOISE
basic problem of which is to establish rules for making a decision as to
when one thing is better than another. In this chapter we shall use
consistently a leastmeansquare error criterion, except in Art. 118,
where we shall use something closely related to it. In Chap. 14, how
ever, we shall again consider what may be regarded as system optimi
zation problems, and various performance criteria will be used.
In the remainder of this chapter we consider a more or less arbitrary
collection of problems concerned with optimum linear procedures for
smoothing, predicting, and maximizing signaltonoise ratio. The tech
niques used are representative of those for a wider class of problems
than we actually discuss.
112. Smoothing and Predicting of Stationary Inputs Using the Infinite
Past (Wiener Theory)
For this problem we assume an input signal y(t) which consists of a
wanted signal s(t) added to a noise n(t). Both 8{t) and n(t) are taken
to be sample functions from realvalued widesense stationary random
processes with a stationary crosscorrelation. We want to find a weight
ing function h(t) for a fixedparameter realizable linear filter which acts
on the entire past of yet) in such a way that the output of the filter at
time t is a best meansquare approximation to 8(t + ,,), " ~ o. This is
a combined smoothing and prediction problem, and problems of smooth
ing, and of predicting when there is no noise, are special cases. The
solution of this problem is due to Kolmogoroff and to Wiener, t and we
shall first treat it essentially as Wiener has. It makes no significant
difference in the mathematics (although it may in the conceptual foun
dations) whether 8(t) and n(t) are taken to be sample functions of random
processes and statistical averages are used (this is the procedure we will
follow here) or whether s(t) and n(t) are taken to be individual functions,
unknown but with known time correlations, and time averages are used.
The reader is referred particularly to a concise development by Levinsonj
of the theory using the latter assumption.
The input to the filter is
Yet) = s(t) + net) (111)
The output for any filter weighting function h(t) is
f h(t  T)Y(T) dT = f h(T)y(t  T) dT (112)
where h(t) = 0, t < O. The average squared error &is
8 = E {[s(t + 71)  f h(T)y(t  T) dTr}
t Wiener (III). See p. 59 for a. reference to KolmogorotJ.
t Levinson (I).
(113)
OPTIMUM LINEAR SYSTEMS 223
and it is desired to minimize this if possible by a suitable choice of h(t).
There may not actually exist an h(t) which gives a minimum from among
the class of all h(t) of realizable filters. '!'his point is discussed later.
Since 8(t) and n(t) have stationary autocorrelation functions and a sta
tionary crosscorrelation function, the meansquared error expression
given by Eq. (113) can be expanded to give
B = E (S2(t + ,,)]  2 f.... h(T)E{s(t + ,,)y(t  T)] dT
+ f.... f.... h(T)h(It)E{y(t  T)y(t  It)] d.,. dlt (114)
= R.(O)  2 !.... h(.,.)R,"(,,+ r) d.,. + f.... f.... h(.,.)h(It)R
II
( .,.  It) d.,. dlt
We now find a necessary condition which h(t) must satisfy to make &a
minimum. Suppose g(t) is a weighting function for any realizable filter
whatsoever. Then h(t) + Eg(t) is the weighting function of a realizable
filter, and if h(t) is to provide a minimum meansquared error, then the
expression on the right side of Eq. (114) must have at least as great a
value when k(t) is replaced by k(t) + Eg(t) as it does with h(t). This
must be true for any real E and any get) of the class considered. 'Vith
k(t) replaced by h(t) + Eg(t), the right side of Eq. (114) becomes
R,(O)  2 f h(T)R,i" + .,.) dT  2E f.... g(T)R'I/('" + .,.) dT
+ f.... f h(.,.)h(It)R
II
( .,.  It) d.,. dlt + 2E f.... t:.... h(.,.)g(It)R
II
( .,.  It)
d.,. dp + E
2
t:.... f.. g(.,.)g(It)RiT  It) d.,. dlt (115)
For h(t) to give a minimum, this expression minus the right side of Eq.
(114) must be greater than or equal to zero, i.e., .
2E {f.... !.... h(.,.)g(It)R
II
( .,.  It) dT dlt  f.... g(.,.)R.
II
( " + T) d.,.}
+ E
2
!.... f".  r) d.,. dlt 0 (116)
The last term on the left side of Eq. (116) is always nonnegative because
R
1I
(t) is nonnegative definite. t If the expression in braces is different
from zero, then a suitable choice of E, positive or negative, will make the
whole left side of Eq. (116) negative. Hence, a necessary condition
that the inequality be satisfied is that
f.... !.... h(.,.)g(It)RI/('"  p) d.,. dlt  f.... g(.,.)R",(" + .,.) dT = 0 (117)
Equation (117) can be written
fa" g(.,.) [fa" h(It)R
II
( .,.  p) dp  R'I/(" + r) ] dT = 0 (118)
t See Chap, 6, Art. 66.
224 RANDOM SIGNALS AND NOISE
(119) T ~ O
since get) and h(t) must vanish for negative values of their arguments.
But now Eq. (118) can hold for all geT) only if
R.,,(1' + '1) = fo" h(p.)R
II(1'
 p.) dp.
This integral equation must be satisfied by h(t) in order that the realiza
ble filter with weighting function h(t) give a minimum meansquared error
prediction for s(t + 11) amongst the class of all realizable fixedparameter
filters.
We have just shown that for h(t) to satisfy Eq. (119) is a necessary
condition in order that h(t) provide a minimum; it is also sufficient.
For if k(t) satisfies Eq. (119), Eq. (118) is satisfied for all get). Now
suppose f(t) is the weighting function of any other realizable filter; we
show that h(t) yields a smaller meansquare error than does f(t). Let
get) = J(t)  h(t). The inequality of Eq. (116) is satisfied because Eq.
(118) is. In particular, Eq. (116) is satisfied for this get) and for E = 1.
But the lefthand side of Eq. (116) with E = 1 is simply the difference
between the meansquare error using the filter with weighting function
get) + h(t) = J(t) and that using the filter with weighting function h(t).
Hence the error with h(t) is less than or equal to that with any other
admissable function.
The problem of finding the optimum smoothing and predicting filter
is now very neatly specified by the integral equation (119). In Art. 114
we shall solve this equation under the restriction that the crossspectral
density B'I/(n be a rational function. But first it seems worthwhile to
discuss Eq. (119) and its source in some detail, in particular to consider
the special cases of pure smoothing and pure prediction and the effect of
the realizability requirement.
Infinitelag Smoothing Filter. Suppose we had not required in the deri
vation of Eq. (119) that the optimum filter be realizable. Then we
should have been looking for a weighting function h(t) which would pro
vide a minimum meansquared error from among the class of all weighting
functions, with no requirement that these functions vanish for negative
values of their arguments. A little reflection will show that Eq. (117) is
still a necessary condition for a best h(t) under these circumstances, with
get) now any weighting function. Hence Eq. (118), modified so that the
lower limits of both integrals are  00, is still a necessary condition, and
Eq. (119) is replaced by the equation
 00 < T < 00 (1110)
which is easily seen to give a sufficient as well as necessary condition
that h(t) yield a minimum. Equation (1110) is readily solved by taking
OPTIMUAI LINEAR SYSTEMS 225
(1111) whence
Fourier transforms. Since the right side is a convolution, one has
ei"'S,,,(f) = t: R,,,(r +7/)r
i
..r dr
= f.... t: h(Il)R,,(r  Il)e
i
..
r
dll dr
= f:. h(ll)e
i
.." dll J:..
= H(j2rrf)8J/(f)
H(j2rf) = S,,,(f) ei2r/.
es
(1112)
If the signal and noise are uncorrelated and either the signal or the noise
has zero mean, Eq. (1111) becomes
H(
2 f)  8.(/) i2./"
J 11"  S,(f) + S,,(f) e
A filter specified by a system function as given by Eq. (1112) will not
be realizable, but it can be approximated more and more closely by real
izable filters as an increasing lag time for the output is allowed.t We
call the filter given by Eq. (1111) with." = 0 the optimum infinitelag
smoothing filter. The solution given by Eq. (1111) actually has some
practical value; it gives approximately the best filter when a long record
of input data (signalplusnoise waveforms) is available before processing
(filtering) of the data is begun. In such a situation t enters the problem
as a parameter of the recorded data which has no particular connection
with real time. In addition, the solution given by Eq. (1111) is of
interest because it has a connection, which will be pointed out later,
with the best realizable filter for smoothing and predicting. Equation
(1112) has an especially simple interpretation, according to which the
best U smoothing" results when the input signalplusnoise spectrum is
most heavily weighted at frequencies for which the ratio of signal spectral
density to noise spectral density is largest. The best the filter can do
to recover the signal from the noise is to treat favorably those frequency
bands which contain largely signal energy and unfavorably those fre
quency bands which contain largely noise energy.
113. Pure Prediction: Nondeterministic Processes]
We shall return in Art. 114 to the consideration of smoothing and
predicting filters, which must operate on input signals mixed with noise.
t See Wallman (I).
t The treatment of the pure prediction problem in this article is similar to that in
Bode and Shannon (I). It is essentially a heuristic analogue of the rigorous mathe
matical treatment which deals with linear prediction theory in terms of projections on
Hilbert spaces. See, e.g., Doob (II, Chap. XII) and the references there.
226 RANDOM SIGNALS AND NOISE
In this article, however, we shall consider only the special case in which
there is no input noise, 80 that Yet) = s(t). This is the pure prediction
problem. The optimum filter specification which we shall arrive at in
this article is contained in the results of Art. 114, which are got straight
forwardly from Eq. (119). Here, however, the pure prediction problem
is done in a different way, so as to throw a different light on what is
involved in prediction and so as to obtain certain ancillary results.
The only information we have been using about the input signal to
the predictor is, first, the fact that the signal is from a widesense sta
tionary random process, and second, the autocorrelation function of this
random process. The meansquare error depends on the signal input
g(t)
I
,I(t) h(t)
I
...
~
s(i) f %(t)
,...
%(t)
GUCAI)
OlUCAI) HUw) '('+'1
White
I
noise A B c
Predictor
HUw) G1Uw)H(jw)
FIG. 111. The optimum predicting filter,
only through this autocorrelation function (see Eq. (114) with R,,,(t) set
equal to R,,(t. Hence, if the inputsignal random process is replaced
by a different random process with the same autocorrelation function,
the meansquare error 8 for any particular filter is unchanged and the
optimum filter is unchanged. This fact provides the basis for the follow
ing analysis.
Let us outline briefly the ideas we shall use to find the optimum pre
dicting filter. First, it is imagined that 8(t) is replaced at the input to
the predicting filter by filtered white noise. That is, a hypothetical filter
A is postulated with a system function G(jw) chosen so that if A is driven
by white noise, its output has exactly the same autocorrelation function
(and hence the same spectral density) as has s(t). The output of A is
to be the input to the predictor. Then the predictor itself is imagined
to be a cascade of two filters, Band C. Filter B is to be the inverse to A
and hence is to have system function aI(jUJ) ; its purpose is to turn the
input signal back into white noise. Filter C is to make the optimum
prediction of 8(t +,,) by operating on the white noise; this operation
turns out to be quite simple. The system function H(jUJ) of the optimum
predictor is then the product of the system functions of the hypothetical
filters Band C. See Fig. 111.
We shall now treat the problem in more detail. It has to be supposed
first that the input signal belongs to that class of random processes which
have autocorrelation functions which can be realized with filtered white
OPTIMUM LINEAR SYSTEMS 227
noise. That is, R,(t) must be such that for some function g(l), z(t) as
defined by
z(t) = J ~ . g(t  r)x(r) dr (1113)
has autocorrelation function R.(t) = R,(t), when z(t) is white noise.
This means that the system function G(j21r!) , which is the Fourier trans
form of get), must s&tisfyt
IG(j2rf)12 = S,(/) (1114)
This restriction is significant, and we shall discuss it further below. For
now, it may simply be remarked that for a large class of widesense sta
tionary processes, the spectral density can be factored in terms of the
system function of a realizable filter, as in Eq. (1114). Since for the
purposes of the prediction theory there is no point in distinguishing
between s(t) and z(t), it will be assumed henceforth that
s(t) = J ~ . g(t  r)x(r) dr (1115)
where x(t) is white noise and the Fourier transform G(j21rf) of get) satis
fies Eq. (1114).
From Eq. (1115) we have, for the future value of set),
t+"
s(t + ,,) = J g(t + "  r)x(r) dr
to
(1116)
This is the quantity it is desired to predict. Since 8,+" is a random vari
able which is composed of a linear superposition of white noise, since at
time t part of that white noise is past history, and since the white noise
which is yet to occur after time t and before time t + fI has mean zero
and is uncorrelated with what has gone before, it is plausible to suppose
that the best linear prediction is got by taking that part of s,+" that is
composed of white noise which has occurred before time t. Let us now
show that this is in fact true. Equation (1116) can be rewritten
t+"
*+ ,,) = f. g(t + "  T)X(T) dr + Jg(t + "  T)X(T) dr
t
c+.
= '(t + ,,) + Jg(t +"  T)x(r) dT
t
(1117)
I(t + fI) is exactly that part of B(t + ,,) which is composed of white noise
which has occurred before time t; so (t + '1) is taken to be the prediction
t See Chap. 9, Art. 93.
228 RANDOM SIGNALS AND NOISE
of 8(t + 'I), which we want to prove is optimum. The meansquare error
of the prediction '(t + ,,) is
t+,
E{[8tH  StH]'} = E {[I get + 17  7')z(7') dr r} (1118)
,
In order to prove that this meansquared error is a minimum, we shall
calculate the meansquared error using any other linear predicting filter
which operates only on the past record of white noise, X(T), r t, and
show that it cannot be less than that given by Eq. (1118). Since any
linear operation on the past of 8(t) is necessarily also a linear operation
on X(T), T t, it will follow a fortiori that '(t + ,,) is as good as any linear
prediction which can be made using only the past of 8(t).
Let be another linear prediction, given by
8:+
1
= f(t  7')z(7') d1'
with Borne function J(t). Let h(t) = J(t)  g(t + ,,). Then
'+.
B/[81+1  = E {[ I get + 17  1')Z(1') d1'
.
 foo get + 17  1')Z(1') d1'  foo h(t  1')Z(1') d1' T}
t+.
=E HI get + 17  1')Z(1') d1'  foo h(t  1')Z(1') d1' T} (1119)
t
When this is expanded and averaged, the crossproduct term is zero, for,
since X(T), r t, is uncorrelated with x(r), ,. > " the first integral is
uncorrelated with the second. The first integral is just the error for the
prediction "+', so Eq. (1119) becomes
E{[8IH  = E{[81+1  81+.1
2
1+ E {[too h(t  1')Z(r) d1' ]'}
(1120)
Since the last term is nonnegative, Eq. (1120) shows that the prediction
error for 8;+, is gr.eater than for 8&+,. This demonstrates that 8,+, is
optimum, as was claimed.
Up to this point, the optimum prediction I(t + ,,) has been defined
only in terms of the hypothetical white noise x(t) by the equation
Set + 17) = get + "  1')Z(1') d1' (1121)
and not in terms of the input 8(t). Let us now obtain an expression for
the system function H(jw) of the optimum predicting filter, that is, for
OPTIMUM LINEAR SYSTEMS 229
(1122)
the filter that will yield I(t + 'II) as output when its input is B(t). In
Eq. (1121), if we set
~ ( u ) = 0 u < 0
= g(u +,,) u ~ 0
then Set +,,) = f. h(t  T)X(T) dT (1123)
which gives I(t + ,,) as filtered white noise. If now 8(t) is put into a
filter (filter B in Fig. 111) which is the inverse to one with impulse
response g(t) in order to recover the white noise x(t) at its output, and
then x(t) is put into a filter (filter C in Fig. 111) with impulse response
h(t), the overall action of the two filters is to yield the best prediction
S(t +,,) from the input 8(t). The predicting filter is therefore obtainable
as the cascade of a filter with system function Gl(jW) and a filter with
system function D.(jw), and the system function of the predicting filter is
H(jw) = Gl(jw)D(jw) (1124)
al(j",) is never a proper system function; since 8(t) is always smoothed
white noise, get) is the impulse response of a filter which performs some
integration, 80 the inverse filter must differentiate. This means that
Gl(jW) does not vanish at infinity. However, the product system func
tion H(jCJJ) mayor may not vanish at infinity. If it does not, then the
best linear prediction cannot be accomplished exactly by a filter, that is,
by a device characterized by an integral operator as in Eq. (1123).
Even when this happens, the optimum prediction can often be approxi
mated by approximating H(jCJJ) to arbitrarily high frequencies. These
remarks will be illustrated in the simple example below and by examples
in the next article. We could, from Eq. (1124), write an explicit formula
for H(jw) in terms of the spectral density of the signal input process; we
defer this to the next article, in which the general case of predicting and
filtering is considered.
Example 118.1. Suppose the signa18{t) is s.sample function from a random process
with spectral density
1
8,(f) .. 1 +r
Then if we take for G(j27rn
G(j'lrJ) == 1 ~ if
IGC,1'lrJ)/1 == 1 ~ if 1 ~ if == S.CJ)
80 the condition Eq. (1114) is satisfied. Furthermore
g(t)  .!.. t: G(jCIJ)e
i fIJI
dCIJ
2r .
 211"e
I
1I" t ~ 0
.. 0 t < 0
(1125)
(1126)
{I 127)
230 RANDOM SIGNALS AND NOISE
so g(t) is the impulse response of a realizable filter, as is required by Eq. (1115).
Now
and
Since
/i,(t) =w 0 t < 0
== 211"e
2
C
,...,,> t ~ 0
HUt.) ... J k(t)e
i
"" dt
... 211' fo etrU+_>e
j
"" dt
1
= e
I
"' , 1 +if
Gl(j",) == 1 +if
(1128)
we have for the system function of the predicting filter
(1129)
The best predicting filter in this case is simply an attenuator. This example is suffi
ciently simple that the result given by Eq. (1129) can be seen intuitively. The
spectral density given in Eq. (1125) can be realized by exciting an RC filter with
time constant 1/2r by white noise from a zero impedance source, as shown in Fig. 112.
Zero
:.Ih1p'ed.'nee
\tIhlte noise
source
Voltage  'ltl
RC ..L Predictor
2.
FIG. 112. Predictor example.
The voltage across the condenser at time t is 8,. The best prediction of this voltage
at time t + fI, since x(r) has zero mean and is unpredictable for r < t, is the voltage
which would be left after 8, has discharged through the resistance R for 'I seconds.
This is the value which the predicting filter, as specified by Eq. (1129), will yield.
Integral Equation for Prediction. We have found that '+II is the best
estimate that can be made for 8'+1J by a linear operation on the past of set).
If t+'f can be realized by a linear filter operating on set) with weighting
function h(t), then it follows, of course, from the minimization done
earlier in this section that h(t) must satisfy the integral equation (119).
It is interesting to observe that the necessity of Eq. (119) as a condition
on h(t) can be shown directly, however, using s(t +.,,). Suppose
(t + ,,) = fa h(T)S(t  T) dT (1130)
The error is
set +,,)  R(t +,,) = s(t +,,)  10 h(T)S(t  T) dT (1131)
OPTIMUM LINEAR SYSTEMS 231
but, by Eq. (1117), the error is a linear function of the white noise x(r)
only for values of T > t. Hence the error is uncorrelated with the past
of set), which is a linear function of X(T), T < t. Thus, using Eq. (1131)
and taking t = 0,
p.>O
(1133)
Or, expanding, we have the predicting version of Eq. (119),
R.{p. + ,,) = h" h{T)R.{p.  r) dT P. ~ 0 (1132)
It may be verified directly that formally the function h(t) defined by
h(t) = 1.... eilr/lH(j2rn df
when H(j2rrf) is given by Eq. (1124), satisfies Eq. (1132).
Gaussian Processes. It has been shown that 8'+1J is the best prediction
of 8t+" which can be got by a linear superposition of the values of X(T),
r ~ t, and hence of the past values of get). This of course leaves open
the question of whether there are nonlinear operations that can be per
formed on the past values of s(t) which will yield a better meansquare
approximation to 8
t
+". However, if the signal s(t) is a sample function
from a gaussian random process, one can show that the best linear mean
square approximation '+'I is as good as any in a meansquare sense. A
justification of this statement is similar to the argument made to show
8,+" is a best linear estimate; the difference lies in the fact that for gauss
ian random variables, uncorrelatedness implies statistical independence.
Suppose set) is gaussian, then x(t) is gaussian white noise. Let Y, be
any prediction for 8'+1J which depends only on S(T), T < t. Then Y, is
independent of X(T), T > t, and Z, = Y
t
 8t+1J is also independent of
X(T), T > t. The meansquare error of Y, is
E{(s'+1J  Y,l2J = E{(St+1J  8'+11  Z,]2}
= E{[st+7J  8t+IJ]2J + Ef[Zt]2}
which is greater than or equal to the meansquare error of 8,+".
Deterministic and Nondeterministic Processes. As we mentioned above
when we started to consider linear prediction theory using the device of
filtered white noise, we were allowing only spectral densities S,(!) which
could be factored, as in Eq. (1114), into a product of G(j2rrJ) and
G*(j2",.!), where G(j21rf) is the system function of a realizable filter.
This restriction is connected with the condition that the random process
be nondeterministic. A widesense stationary random process is said to
be determini8tic if a future value can be predicted exactly by a linear
operation on its past; otherwise, it is nondeterministic. An important
232 RANDOM SIGNALS AND NOISE
theorem in prediction theory states that a widesense random
process is nondeterministic if and only if
f
 !log S(J) / df
ao 1 +12 .
(1134)
converges, where S(f) is the spectral density of the process.t Now if
the process is regarded as generated by white noise passed through a
realizable filter,
S(f) = IH(j2rJ)12
but the condition that a gain function IH(j2rf) I be that of a realizable
tilter is that
f
Ilog df = ! flO jlOg S(f) J d!
ao 1 + J 2 10 1 +r
(1135)
converges. Thus the restriction we imposed was equivalent to restrict
ing the signals to be sample functions from nondeterministic random
processes.
11'.1. Let
,(I) = cos (wt + f/ W 11:1 2r/
where / and are independent random variables, where is uniformly distributed
over the interval 0 :S < 211", and where / has an even but otherwise arbitrary prob
ability denSity function p(J). Then ,(t) is a stationary random process, every sample
function of which is a pure sine wave. It is easily calculated that
R.(T) :II: 2E(cos CIn")
 2 t: cos (2r/T)p(f) d/
But also
R.C.,.)  f e/1tr'''S.(J) d/ 0= 1 COB (2r/..)8,(/) dj
Hence 8.(J)  2p(/). We can choose p(/) either to satisfy or not to satisfy the
criterion Eq. (1134), and thus the random process ,(t) can be made either nondeter
ministic or deterministic by suitable choice of p(/). In either esse, B(t +1') can be
predicted perfectly if its past is known, but not by a linear operation on its past if it
is nondeterministic.
114. Solution of the Equation for Predicting and Filtering
We want now to get the solution of the integral equation (119) for
predicting and filtering, using the infinite past. We shall do this only
for the case in whichthe spectral density 8,,(!) of the entire input y(t) is
a rational function. Equation (119) can be solved under more general
conditions, and the method of solution is the same. However, there is
t See Doob (II, p. 584) and Wiener (III, p. 74).
OPTIMUM LINEAR SYSTEMS 233
a difficulty in the factorization of the spectral density 8,,(f), which is
very simple if 8,,(1) is rational but calls for more sophisticated function
theory if 8,,(1) is not rational. t Practically, it is often sufficient to con
sider only the case of the rational spectral density because of the avail
ability of techniques for approximating arbitrary spectral densities by
rational functions. t
The difficulty in solving Eq. (119) arises entirely from the fact that
h(t) was required to vanish on half the real line. In fact, we have seen
in Art. 112 that with this condition waived (infinitelag filter) the inte
gral equation takes on a form such that it can be solved immediately bJ
taking Fourier transforms. The trick in solving Eq. (119) is to factor
the spectral density B,,(f) into two parts, one of which is the Fourier
transform of a function which vanishes for negative values of its argu
ment, the other of which is the Fourier transform of a function which
vanishes for positive values of its argument. We have already used this
trick in the preceding article where the solution to the special case for
which R.y(r) = RII(r) = R.(T) (pure prediction) was obtained in one form
in Eq. (1124). If the system functions Gl(jW) and fl(jw) are written
in terms of the factors of the spectral density, Eq. (1124) becomes exactly
the solution obtained belowfor the case R,,,(r) = R,,(T).
First, let us consider the factorization of 8,,(f). Since it is assumed
that 8,,(f) is rational, it may be written
(I  WI) .' (I  WN)
8,,(/) = a
2
~          :  ~  
(f  ZI) (!  ZM)
(1136)
Since 8,,(f) is a spectral density, it has particular properties which imply
certain restrictions on the number and location of its poles and zeros:
1. Su(!) is real for real I. Hence 8
11
(! ) = S:(f), which implies that
a
2
is real and all w's and 2'S with nonzero imaginary parts must occur in
conjugate pairs.
2. Sf/(!) ~ o. Hence any real root of the numerator must occur with
an even multiplicity. (Otherwise the numerator would change sign.)
3. 8,,(f) is integrable on the real line. Hence no root of the denomi
nator can be real, and the degree of the numerator must be less than
the degree of the denominator, N < M.
We can, therefore, split the right side of Eq. (1136) into two factors,
one of which contains all the poles and zeros with positive imaginary
parts and the other of which contains all the poles and zeros with nega
tive imaginary parts. Since any real root w" of the numerator occurs
an even number of times, half the factors (I  w,,) may be put with the
t See Wiener (III, Art. 1.7) Levinson (I, Art. 5), or Titchmarsh (II. Art. 11.17).
f See, f('! example, Wiener (III, Art. 1.03).
234 l!ANDoM SIGNALS AND NOISE
(1137)
term containing the poles and zeros with positive imaginary parts and
half with the other. Thus B
u
(/) may be written
(f  WI) (f  wp) (I  w*) (I  w*)
Su(/) = a a 1 p
(1  ZI) (I  zo) (f  zt) (1  zZ)
where P < Q, where the z", n = 1, ... , Q, have positive imaginary
parts, the Wk, k = 1, ... , P, have nonnegative imaginary parts,
2P = N, and 2Q = M. Let G(jw) be defined by
O(j",) == O(j2rf) = a (f  WI) (f  wp)
(f  Zl) (I  20)
Thus Su(/) = IG(j21f'f)11 where G(j2rf) = G(p) is a rational function of
p = j21r/ with all its poles and zeros in the left half p plane except possi
bly for zeros on the imaginary axis.
Define
and
g(t) = 1 e
i"''O(j2rf)
df
g'(t) = 1 ei"''O*(j2rf) df
Then, since G(j21f'J) as a function of J has all its poles in the upper half
f plane, g(t) = 0 for t < 0 and g'(t) = 0 for t > o. Since RII(t) is the
inverse Fourier transform of Su(f), Ru(t) is given by the convolution of
get) and g'(t),
Rv(t) = 1.... ei"''O(j",)O*(j",) df
=I ~ .. g(t  u)g'(u) du
(1138)
where the upper limit of the integral can be taken to be zero because
g'(t) vanishes for t > O. Intuitively, G(jw) is the system function of a
realizable linear system which would transform white noise into a ran
dom process with autocorrelation function RII(t).
Let us now introduce a function A (I) defined by
we have
S,u(f) = A (f)G*(jw)
[A(I) is G(jCJJ) for the case of pure prediction.] Then, if
a(t) = /:.. ei""A(f) df
R.
II
(T) = I ~ .. a(T  u)g'(u) du
(1139)
(1140)
(1141)
where the upper limit can again be taken to be zero as in Eq, (1138).
OPTIMUM LINEAR SYSTEMS 235
Substituting R,,(t) as given by Eq. (1138) and R'J/(t) as given by Eq.
(1141) into Eq. (119) yields
J ~ . a(.,. +"  u)g'(u) du = 10 h(P) dp. J ~ . g(.,. p.  u)g'(u) du .,. > 0
or
J ~ .. g'(u) {a(.,. +"  u)  1: h(p.)g(.,.  p.  u) dp.} du = 0
,,>0
(1142)
This equation is satisfied if the expression in parentheses vanishes for
all u < 0 and T > 0, that is, if the integral equation
a(.,. + ,,) = fo" h(P.)g(T  p.) dp. r>O (1143)
is satisfied. Equation (1143) can be solved directly by taking Fourier
transforms, whereas the original integral equation (119) could not. The
reason for the difference is that because get) vanishes for t < 0, the trans
form of the right side of Eq. (1143) factors into a product of transforms,
whereas the transform of the right side of Eq. (119) will not. We have
fa" ei"Ya(T + ,,) d.,. = t h(p.)ej<.>,. dp. fo" g(T  p.)ei..(Y,.) dT
= ( GO h(p)eiwl' dp, f GO g(v)e
iw
d"
10 p
= fa" h(p.)e
i""
dp. fo" g(v)e
i'"
d"
or H(jw) = _1._ r ei"Ya(T +,,) d.,. (1144)
G(Jw) Jo
Equation (1144) gives the solution H(jw) for the system function of the
optimum filter. Using Eqs. (1139) and (1140), we can rewrite Eq.
(1144) in terms of 8.,,(f) and the factors of S,,(f),
H(jw) = _1._ t:ei dT f" ei"'(Y+1) S'I/.(f') df' (1145)
G(Jw) J0  GO G*(J21rf')
Equation (1145) is a general formula for the system function of the
optimum linear Ieastmeansquareerror predicting and smoothing filter.
If 87/(f) and S'7/(f) are rational functions, the right side of Eq. (1145)
can always be evaluated straightforwardly, although the work may be
tedious. Just as in the case of pure prediction, H(jw) may not vanish as
w + 00. The remarks made about this possibility in the article on the
pureprediction problem apply here.
The natural special cases of the filtering and prediction problem are
predicting when there is no noise present, which we have already dis
cussed, and smoothing without predicting, which we DO\V discuss briefly.
236 RANDOM SIGNALS AND NOISE
(1149)
Smoothing. With." = 0, Eq. (1145) gives the solution for the system
function of the optimum realizable smoothing filter. It is possible to
rewrite the right side of Eq. (1145) 80 as to simplify somewhat the calcu
lations required in evaluating it in most instances and also so as to point
out the relation between it and the solution for the infinitelag smoothing
filter given by Eq. (1111).
First we point out that if is any rational function (of integrable
square) with all its poles in the top half f plane,
J:. df'
vanishes for negative values of or, and hence
(eo eifIJ1' dr feo eifIJ'1' q,(f') df' = q,(f) (1146)
)0 eo
Also, if /t(n is rational with all its poles in the bottom half plane,
(eo ei"'1'dr feo el""1'/I(/') df' = 0 (1147)
10 eo
Thus, given a rational function E(f) with no poles on the real axis, if
t(f) is decomposed into the Bum of two rational functions q,(f) + tf(f),
where ep(f) has all its poles in the top half Jplane and ,p(!) in the bottom,
(eo e
iWT
d.,. t: df' = (1148)
10 eo
This double integral operator, which appears in Eq. (1145), may be
thought of in the present context as a "realizable part of" operator.
We introduce the notation for this section, 4J(f) = [E(f)]+.
If signal and noise are uncorrelated, S.,,(f) = 8,(/). It may be shown
rather easily (see Probe 3) that 8.(f)/O*(j21rf) has no poles on the real
faxis when 8,,(1) = S,(n + S,,(f). Thus, using the special notation just
introduced, for the case of smoothing with uncorrelated signal and noise,
Eq. (1145) can be written
H(j21rf) ==
To find H(j2rJ) using Eq. (1149), a partialfraction expansiont of
S,(f)/G*(j21rf) is made, but no integrations need be performed.
Equation (1149) is in a form which permits an intuitive tiein with
the solution for the infinitelag smoothing filter. From Eq. (1111) we
have, if signal and noise are uncorrelated,
H (
2 1) S,(f) 1 S,(f)
00 J 11" = 8,,(f) = G(j21rf) G*(j2rJ)
t See Gardner and Barnes (I, p. 159B.).
(II50)
OPTIMUK LINEAR SYSTEMS 237
so
Then
where H
oo(j2
..nis the system function of the optimum infinitelag filter.
Equation (1150) describes a filter which may be thought of as two filters
in tandem. Equation (1149) describes a filter which differs from that
of Eq. (1150) only in that the realizable part of the second filter has
been taken.
Example 114.1. Suppose that signal and noise are independent and that
1
S .. 1 +/1
sv N
s; .. b
l
+I' 1:1I 1 + (I/b) I
It is desired to find the system function of the optimum linear smoothing filter. We
have
Nb
l
(! +/') + b
l
+r
8,,(/) = 8. + 8" :III (b +i/)Ch  i/)(1 +;/)(1  if)
AI +/'
 (Nb
l
+ 1) (b + jf)(h  i/)(1 +if)el  jJ)
where AI  (N + l)b
t]/(Nbl
+ 1).
8.,(/) may be factored into G(j2wn, O(j2... n where
. bVN+l A +il
G(j2rfl  A (h +i/)(1 +i/)
. b v'N + 1 A  if
G (J2rfl a A (b  i/)(1 _ j/)
8,(!).. A b  jf
G(j2rf) b VN + I (I +i/)(A  jf)
A {b+1 I bA 1 }
:... b yN + I A + 1 1 + jJ + A + 1 A  jf
[
B,(J) ] _ A b + 1 1
0(j2r/) + bv'N + 1 A + 11 +il
and, using Eq. (1150),
. 1 b+l b+ff
H(J2rfl  Nbl + 1 A + 1 A +if
(11..51)
If we take the limit in Eq. (II51) as b. co, we get the optimum system function for
smoothing the signal in white noise of spectral density N. This is
H(2 J) 1 1
J 1r  v'N+i + VN v'N + 1 +iVN/
(1152)
Smoothing and Prediction Error. The meansquare error of any smooth
ing and predicting filter with impulse response h(t) is given by Eq. (114).
If it is an optimum filter, then by virtue of Eq. (119), the meansquare
error is
(1153)
238 RANDOM SIGNALS AND NOISE
By Parseval's theorem, the double integral can be written in terms of
H(j",),
t:H(r) dT t: h(IJ)RII(1J  T) dlJ
= f:_ H(lhf)[H(j21rf)SII(f))* dj (1154)
= f__ IH(j21rf)12S
II(f)
dj
Then s = J S.(f) dj  J IH(j21l'f)IISII(f) dj (1155)
which is probably as convenient a form as any for calculation in the
general case. One can substitute for H(j2rf) from Eq. (1146) in Eq.
(1155) and, after some reduction (see Probe 6), get
s = R (0)  (I f er!..'(r+,) S.(I') dj' II dT (1156)
, J0  G*(j2rJ')
For the case of pure prediction, S'II(f) = S,,(f) = S,(f) and Eq. (1156)
becomes
&= R.(O)  101 t: e
i
"' (r+' )G(j 21l'j ' ) df rdT
= /__ G(j21l'f)G*(j21l'f) dj  10 Ig(T +,,)1
2
d.,
=1 Ig(T)"'1 dT  10 Ig(T +,,)lldT
where Parseval's theorem has been used. Since g(t) vanishes for t < 0,
this can be written
s = 10 Ig(T)I'dT  J: Ig(T)1
2dT
= 10" Ig(T)I'dT (1157)
This last result could have been obtained directly from Eq. (1117).
116. Other Filtering Problems Using Leastmeansquare Error Criterion
Phillips' Leastmeansquare Error Filter. Up to this point, we have
asked for the best fixedparameter linear filter to perform a smoothing or
predicting operation. A theoretically less interesting but practically
quite useful modification of this approach is to ask for the best filter
when the form of the system function of the filter is specified and only
a finite number of parameters are left free to be chosen. When the
Wiener method, which we have been discussing up to now, is used to
find an optimum filter, there may still remain a considerable difficulty,
after the system function H(jw) is found, in synthesizing a filter whose
system function approximates H(jw). This is particularly true if servo
mechanisms are involved, as for example in automatic tracking radar
systems. On the other hand, if the system function is fixed in form to
OPTIMUM LINEAR SYSTEMS 239
be that of a practically constructable filter (or servomechanism), then
the solution for the best values of the parameters will simply determine
the sizes and adjustments of the components of the filter. Of course,
by limiting the class of filters to be considered, one in general increases
the minimum error which can be achieved. Thus, in principle, quality
of performance is sacrificed. Actually, in practical situations the approxi
mations involved in synthesizing a filter to meet the specification of H(jC1J)
given by the Wiener theory may degrade the performance 80 that it is no
better than that of a filter designed by the method sketched in this section.
Phillips] has described in detail how to determine an optimum smooth
ing filter when the system function of the smoothing filter is prescribed
to be rational and of specified degree in the numerator and denominator.
In outline, his method is as follows. The input signal and noise are
presumed to be sample functions from stationary random processes.
The error, given by
8(t)  J.... h(t  'T)Y(T) dT
is then also a stationary random process and has a spectral density SIl(!).
The error spectral density can be determined straightforwardly from
Eq. (114) and is (see Probe 9)
S.(f) = [1  H(j2rJ)][l  H*(j2rf)]S,(f) + IH(j2rJ)12S,,(!)
[1  H*(j2rf)]H(j2rf)S,,,(f)  [1  H(j2rf)]H*(j27rj)S".(!) (1158)
The meansquare error is then
8 = J.... S.(f) df
(1159)
H(j21rf) is taken to be rational with the degrees of the numerator and
denominator fixed. Then if 8,(f), S,,(!), s.sr; 8".(!) are all rational,
S.(/) is rational. The integral of Eq. (1159) is evaluated by the method
of residues with the parameters of H(jC1J) left in literal form. Then these
parameters are evaluated 80 as to minimize 8. The reader is referred to
the Iiteraturef for a thorough discussion of the method and examples.
Extensions and Modifications of the Theory. The theory of best linear
predicting and smoothing which has been discussed in the foregoing sec
tions can be extended and modified in various ways. For example,
instead of asking for a smoothed or predicted value of set), one can ask
for a best predicted value of d8/dt or t s(t) dt or another linear functional
of the signal.j The techniques for solving these problems are similar to
t Phillips (I).
t Phillips (I) and Laning and Battin (I, Art. 5.5).
, 'Viener (III, Chap. V).
240 RANDOM: SIGNALS AND NOISE
t  T ~ l' ~ t (1161)
those for the basic smoothing and prediction problem. Smoothing and
predicting can be done when there are multiple signals and noises with
known autocorrelation and crosscorrelations. t The stationarity restric
tion on the input signal and noise processes may be removedjf in this
case an integral equation similar to Eq. (119) but with the correlations
functions of two variables, is obtained. A polynomial with unknown
coefficients may be added to the signal. The observation interval may
be taken to be finite instead of infinite j we shall examine this problem
in some detail in the next article. For an extensive discussion of the
design of linear systems, optimum according to a leastmeansquare error
criterion, with many examples, the reader is referred to Chaps. 5through 8
of Laning and Battin (I).
116. Smoothing and Predicting with a Finite Observation Time
Let us now consider the optimization problem in which a smoothing
and predicting filter is to act on a sample of signal and noise of only
finite duration. Here, instead of Eq. (112), we have for the output of
t.he filter
,
I(t + ,,) = f h(t  1')y(1') d1' = 102' h(u)y(t  u) du (1160)
'I'
and it is desired to find the h(t) which will minimize
E([8(t + ,,)  (t + ,,)]2J
We can now either go through a formal minimization procedure, as at
the beginning of Art. 112, or take a short cut and argue, as in Eq. (1124),
that in order for the estimate '(t + ,,) to be best in a leastmeansquare
sense, the error must be uncorrelated with y(r), t  T ~ 1" ~ t; for other
wise a further linear operation on Y(1") could reduce the error. Thus
E{[s(t + ,,)  8(t + '1)]y(,.)} = 0
Substituting from Eq. (1160), we get
loT h(u)E[y(t  u)y(1')] du  E[8(t + ,,)1/(1')] = 0 t  T ~ l' ~ t
or loT h(u)R.(v  u) du = R(fJ +,,) 0 ~ v ~ T (1162)
Thus h(t) must satisfy Eq. (1162) to be the impulse response of the
optimum filter, and Eq. (1162) can readily be shown to be sufficient
as well.
t Wiener (III, Chap. IV).
: Booton (I).
Zadeh and Ragazzini (I) and Davis (I).
OPTIMUM LINEAR SYSTEMS 241
Equation (1162) mayor may not have a solution. As in the dis
cussion of Eq. (119), we shall consider only the case for which the ran
dom process y(t) has rational spectral density. Then, if we let p = j2rJ,
we can write
N{p2)
B
II
(!) = D(p2)
(1163)
os v s T (1164)
o < v < T (1165)
where Nand D are polynomials of degree nand d, respectively. For
convenience, and because later we shall want to consider integral equa
tionslike Eq. (1162) where the righthand side is not a crosscorrelation
function, we shall let R(v + ,,) be written z(v). Then, adopting this
notation and using Eq. (1163), Eq. (1162) becomes
(T h(u) du f ePC" " ) N(p:) df = z(v)
Jo . D(p)
If we operate on both sides of this equation with the differential operator
D(d"/dv
2
) , a polynomial D(p2) is built up in the integrand of the left
hand side which cancels the D{pl) in the denominator and we get
iTh(u) du f e
pC
" >N (p2) df = D ( ~ : ) z(v)
The polynomial N(pt) can also be built up by differentiation of the
integral, so Eq. (1165) can be written
N (::2) {lr h(u) du f e
PC
" " ) d
f
} = D (::2) z(v) 0 < v < T
or N (::2) h(v) = D ( ~ 2 ) z(v) 0 < v < T (1166)
where use has been made of the relation
J ei
2
,,/ c....) df = 8(v  u)
Thus any solution of Eq. (1164) must satisfy the linear differential
equation (1166). A general solution of Eq. (1166) has 2n undeter
mined constants. To see if for any choice of these constants the solution
of Eq. (II66) is a solution of Eq. (1164) and to evaluate the constants,
the general solution of Eq. (1166) may be substituted in Eq. (1164).
It can be shownt that there is some choice of constants for which the
solution of Eq. (1166) for h(v) is a solution of Eq. (1164) iJand only if
the known Junction z(v) and ita derivatives satisfy certain homogeneoua
t Bee Appendix 2.
242 RANDOM SIGNALS AND NOISE
boundary conditions at II = 0 and II = T. By adding to the general
solution of Eq, (1166) the sum
2"2n2
L [ai&(I)(v) + bi&W(v  T)]
iO
(1167)
where the als and bls are undetermined and 8(i)(x) is the ith derivative
of the impulse function, a solution (in a purely formal sense) of Eq. (1164)
can always be obtained. t If derivatives of impulse functions are required
in the solution for the weighting function h(t), the significance is, of course,
that the optimum filter must perform some differentiation.
Example 116.1. Let us consider again Example 113, but with the stipulation that
the predicting filter is to act on the past of the input only as far back as t  T. It
turned out in Example 113.1 that although the predicting filter was to be allowed to
act on the infinite past, the best filter in fact used none of the past history of the signal
at all. Thus we should get the same optimum filter with the changed condition of this
example. We shall see that this is 80.
We have
1
8(/)  1 +1'
R,,(O  R,CI) == ...e
i I
1l'1' 1
1
D(pl) .. 1 +r  411'1 (411"1  pi) P  j27(1
Then, by Eqs. (1166) and (1167),
1 ( d
l
) I&(v) ::II  4...
1
  e
1
1I' ( + " ) + aa(u) + bl(u  T)
411" dt
l
 aa(u) + bac"  T)
The undetermined coefficients a and b are determined by substituting this expression
for h(v) back in the integral equation
This gives
or a  ~  I " " b  O. The weighting function of the optimum predicting filter is then
1&(1)  a(u)e
2
11'1J
which agrees with the result in Example 113.1.
Connection with Characteristicvalue Problems. The integral equation
(1162) is closely connected with the integral equation
(1168)
t See Appendix 2.
OPTIMUM LINEAR SYSTEM8 243
In fact, under certain conditions, t h(u) can be expressed as an infinite
series involving the characteristic functions of Eq. (1168). We now
show this formally. Again, let the function on the righthand side of
Eq. (1162) be denoted z(v). Then Eq. (1162) is
loT h(u)R,,(1J  u) du = Z(IJ) 0 ~ 11 s T (1169)
Let I~ , , ( v ) J be a set of orthonormal characteristic functions of Eq. (1168):,
and let z(v) be expanded in terms of the <t>,,'s,t
where
z(lJ) = Lz,,4>,,(v) 0 ~ v ~ T
kO
Z" = loT z(v)4>:(v)dv
(1170)
Then, since by Mercer's theorem
lID
R,,(v  u) = llT"I4>"(v)4>:(u)
A:O
the left side of Eq. (1169) is
. .
l A"lcfl,,(v) loT h(u)4>:'(u)du = llT"I4>"(v)h,,
iO A:O
where hie is the kth U Fourier coefficient" of h(u) and
.
h(u) = l h"cfl,,(u)
iO
(1171)
(1172)
Comparing the series from Eqs. (1170) and (1171), we see that the
Eq. (1169) is satisfied if
that is, if
(1173)
Applying this result to the filtering equation (1162), we see that the
solution can be written
(1174)
t Bee Appendix 2.
t This can alwaye be done for any .(v) of integrable square if R.,(t) is a correlation
fUllction of "filtered white noise," See Appendix 2.
244 RANDOM SIGNALS AND NOISE
Equation (1174) could have been got by attacking the optimization
problem differently. One can expand y(t) in its orthogonal series, Eq.
(631), in terms of the .I:(t) directly, and h(t) in the series given by Eq.
(1172), and calculate the meansquare error in the form of an infinite
series and then choose the coefficients hie so as to minimize the error.
When this approach is used, there is no point in requiring stationarity
of Yet), for the orthogonal expansion of Yet) is valid even when yet) is not
stationary. Davis'[ has carried through a solution of a generalized ver
sion of the problem by this method. In his treatment he has allowedthe
signal to include a polynomial of known degree with unknown coefficients.
117. Muimizing Signaltonoise Ratio: The Matched Filtert
In the preceding sections of this chapter we have been concerned with
linear filters intended to recover the form of a signal masked by noise.
In some circumstances, for example, in the simple detection of a radar
signal, the form of the time function which is the original signal is not
important; merely the presence or absence of a signal is. In such a situ
ation, the idea of designing a filter to have maximum output signalto
noise ratio has considerable appeal, even though the output signal may
be a distortion of the input signal. It should be remarked that there is
no ambiguity in the notions of output signal and noisefrom a linear filter.
If L denotes the linear operation performed by the filter and the input is
then the output is
y(t) = B(t) + n(t)
y.(t) = L(s + n](t)
= L[8](t) +L[n](t)
(1175)
(1176)
where it is perfectly reasonable to define L[8](t) to be the output signal
B.(t) and L[n](t) to be the output noise n.(t).
One case of particular interest, and the one we shall discuss here, is
that in which the signal 8(t) is a known function of time. It is desired
to specify a linear filter to act on the iaput
y(t) = 8(t) + n(t)
where net), the noise, is a sample function from a widesense stationary
random process, so that the output signaltonoise ratio
(
8) B.I(t)
N " = E[n.'(t)]
t Davia (I).
~ See Zadeh and Rasalzini (II).
(1177)
OPTIMUM L1NBAR SYSTEMS 245
(1178a)
(1178b)
(1179) and
is a maximum at some chosen time' ='1. We shall suppose that the
filter acts on the input for a finite length of time T. Then
(tl) == lof' h(1')'(h  1') d1'
n.(t
l)
== Iof' h(1')n(tl  T) dT
E[n.tOI)] == Iof' Iof' h(p)h(1')R..(p  1') dp d1'
Suppose that the maximum output signaltonoise ratio is 1/),. Then
for any linear filt'er we have
(1180)
where the equality sign holds only for the output of an optimum filter.
Since, if the impulse response h(t) of the filter is multiplied by a con
stant, the signaltonoise ratio is not changed, we may suppose the filter
gain to benormalized 80 that I.(tl) :::I 1. Let us now derive a condition
for the weighting function h(t) of an optimum filter. Let get) be any
real function with the property that
Iof' g(1').(t
l
 1')d1' == 0 (1181)
Then, for any number .,
Iof' [h(t) + l(I(t)].(t
l
 1') d1' == 1 (1182)
For convenience, let us introduce the notation .,I(h) for E[n.'(t
1
) ] when
the filter has weighting function h(t). Then, from Eq. (1175) and the
remark following, from the normalization
cr'(h)  x =s 0 (1183)
and from Eqs. (11SO) and (1182),
erl(h + eg)  ~ ~ '0 ~ 1 1  8 4 )
Subtracting Eq. (1183) from (1184) gives
C7
1
(h + .g)  (J'1(h) ~ 0 (1185)
for any. and any get) satisfying Eq. (1181). Expanding and cancelling
terms gives
.1 ('I' ('I' g(,,)g(T)R,.(T _ p) dp. d.,. +2. (7' ('I' h(P.)g(T)R,,(.,.  ,,) dp. dT > 0
Jo Jo Jo Jo 
This inequality is satisfied for all values of only if the second integral
vanishes:
(1186)
246 RANDOM SIGNALS AND NOISE
and Eq. (1186) is satisfied for all g(T) satisfying Eq. (1181) only if
loT h(p)Rn(T  p) dp = as(t
1
 T) 0 ~ T ~ T (1187)
where a is any constant. That this is so can be shown by supposing
where aCT) is not a multiple of 8(t
l
 T), and then letting
i
T a(tl  T)
geT) = a(T)  a(U)8(tl  11,) du 1'1'
o 8
t
(t.  11,) du
o
It is easily verified that geT) thus defined satisfies Eq. (1181) and that
Eq. (1186) cannot be satisfied.
Thus Eq. (1187) is a condition which must be satisfied in order that
h(t) bethe impulse response of a filter which maximizes the output signal
tonoise ratio. The value of a does not matter as far as the signaltonoise
ratio is concerned but affects only the normalization. By substituting
B(t
l
 t) from Eq. (1187) back into Eq. (1178a), one shows easily that
E[n
o
I(t
l)
1 (1188)
a = 8
o
(t
1
)
The reader may verify directly that Eq. (1187) is a sufficient as well as
necessary condition that h(t) be optimum. The verification is similar to
that at the beginning of Art. 112.
An interesting limiting case of Eq. (1187) occurs when the noise is
white noise, so that R,,(t) = NtJ(t) where N is the noise power. Then,
taking a = 1, Eq. (1187) becomes
N loT h(P)6(T  p) dp = 8(h  T) 0 ~ T ~ T
whence
Thus, in this special case, the optimum weighting function has the form
of the signal run backward starting from the fixed time t
l
A filter with
this characteristic is called a matched filter. t
EmmpZ.117.1. Let the signal be a series of rectangular pulses as shown in Fig. JI3.
Let the noise have spectral density
1
8.(/) =t at + (j27(/)1
1
Then R,,(t)  2a e'' CI > 0
t Of. Van Vleck and Middleton (I) and Zadeh and Ragazzini (II).
OPTIMUM LINEAR SYSTEMS 247
and the integral equation for the A(t) of the maximum signaltonoise ratio filter is,
for a .. 1,
('1' ,If,.I
10 A(,.) dp.  '('1 ,.) 0 T
This equation has the solution (without impulse functions at,. :a 0 and e c= T)
A(t)  (at  3(1\  t) (1189)
if the conditions
aB('I) +,'(t l) == 0
01('1  T)  ,'(tl  T) == 0
(1190)
are satisfied. With the time origin chosen as shown in Fig. 113 and '1  7', these
o,(t)
cl T
tO
Flo. 113. Rectangular pulse series.
conditions are satisfied. Then the optimum filter weighting function is
A(t) =al,(T  t)  &U)(t  b) + 1(1)(t  b  d)
 +1(I)(t  (Ie  I)To  b  d) (1191)
Ezampk 117.1. Let the nomebe the same as in the previous example and the signal
be
Let
,(t)  sin
l
wet
T _ 2.1\
Wo
os t s 7'
Then with"  T, the conditione required by Eq. (1190) are met, and the solution is
alaiD given by Eq. (1189), which becomes in this case
A(t)  (at  lint w.(T  t)
a
l
(at )
 2'  2 +2c.Jo
l
cos 2w.(f'  0
118. Problems
1. Let ,(t) be a signal which is a sample function from a stationary random process
with spectral density
1
8,(f) .. 1 +fC
Show that the leastmeansquare errorpredicting filter which operates on the infinite
put haa system function
H
, ,io f) [2"',,] { 2,." + . 2"71 /ii2 f 2"'71}
\,Jmf' c:: exp V2 cos V2 sin V2 +3 v sin V2
where" is the lead time.
248 RANDOM SIGNALS AND NOISE
I. t Suppose that a random voltage eel) is developed across a parallel RLC cit cuit
driven by a whitenoise current source i(t), as shown in Fig. 114.
a. Letting w.
t
:III IILCand Q  Rlw.L, find the spectral density S,(f) in terms of
CAl., L, and Q, assuming that 8.(/) .. 1.
b. Using the normalization a  iCAl/w. and supposing thecircuit sharply enough
tuned so that I/Qt is negligible with respect to one, show that approximately
G(jw)  G(aw.)  (a + 1/2Q  j ~ a +1/2Q +j)
c. With the same approximation &8 in b, show that the optimum predicting filter
for e(t) (using the infinite past) has system function
H(jw)  exp (  ~ Q ) sin W.'I [cot W.'I  2 ~  j;J
where" is the prediction lead time.
White' noise'
current
source
R
c Voltage e(l)
Q_..1L CAt02_..L
lAJoL' LC
FIG. 114. Parallel RLCcircuit.
s. Show for the case of uncorrelated signal and noise, where B,,(/) ::I B,(/) + S.(/j,
8.(n and 8.(/) any rational spectral densities, that 8.(f)/O*(j2.".j) can have no poles
on the real faxis.
4. Suppose that signal and noise are independent and that
1
8.(/) = 1 + (//(J)'
8.(n  N
(the noise is white noise). Find the optimum linear smoothing filter which operates
on the infinite past of the input. Compare the result with the limiting case of Exam
ple 114.1.
I. Show that for independent signal and noise,the system function of the optimum
linear smoothing filter (infinite past) may be written
H(j2 1)
1 [8.(/)]
11"  1  0(;211"/) G*{j2w/) +
6. Derive Eq. (1156) of the text from Eq. (1155).
'1. Using Eq. (1155), calculate the meansquare error of the output of the smoothing
filter of Probe 4 (the filter specified by Eq. (1152.
8. Calculate the meansquare error of the output of the predicting filter of Example
118.1.
a. By using Eq. (1155).
b. By using Eq. (1157).
t Lee and Stutt (1).
OPTIMUM LINEAR SYSTEMS 249
9. Derive the expression, Eq. (II58), for the spectral density of the mean
square error of the output of an arbitrary filter with signalplusnoise input.
10. Let a signal 8 (t) be from a stationary random process with correlation function
R,(l) :=It e
lel
+ eII,I. Find the impulse response of the linear predicting filter for
this signal, which is optimum among those which operate on the signal only for a
time T.
11. Derive Eq. (1169) of the text directly by expanding yet) and h(t) in the series
.
y(l) .. L
iO
.
h(l) .. L
iO
where the .,,(t) are given by Eq. (1168) and the hI; are unknown coefficients to be
determined. Find the meansquare error as an infinite series which involves the 11,,,,
and then determine the h" 80 88 to minimize it.
12. With signal and noise 88 given in Prob. 4, find the impulse response function of
the optimum linear filter which operates on the input only for a time T. Check the
limiting case as T + co with the result of Probe 4.
13. Find the impulse response of the linear filter which maximizes signaltonoise
ratio for the signal
,(t) =  + eI"'J + )i4[e
4
11'( ,  T ) + e
4
.. ']
when the spectral densityof the noise is
1
8,.(J)  (1 + fI)(4 +p>
CHAPTER 12
NONLINEAR DEVICES: THE DIRECf METHOD
The problem which will concern us in this chapter and the next is that
of determining the statistical properties of the output of a nonlinear
device,t e.g., a detector or a limiting amplifier. In these chapters, we
shall restrict our discussion to those nonlinear devices for which the out
put y(t) at a given instant of time can be expressed as a function of the
input x(t) at the same instant of time. That is, we shall assume that
we can write
y(t) == g[z(t)] (121)
where g(x) is a singlevalued function of x. We thus rule out of con
sideration those nonlinear devices which contain energystorage elements,
since such elements generally require that the present value of the out
put be a function not only of the present value of the input but also of
the past history of the input.
121. General Remarks
The problem at hand can be stated as follows: knowing the (single
valued) transfer characteristic y = g(x) of a nonlinear device and the sta
tistical properties of its input, what are the statistical properties of its
output? Basically, this is simply a problem of a transformation of vari
ables, the elements of which were discussed in earlier chapters. For
example, it follows from the results of Art. 36 that if A (Y) is the set of
points in the sample space of the input random variable z, which corre
sponds to the set of points ( 00 < y, ~ Y) in the sample space of the
output random variable y, then
P(y, s Y) = P(x, e A(Y)] (122)
and the probability density function of the output random variable is
given by
(Y)
== dP[x, s A(Y)]
P dY
(123)
wherever the derivative exists. It also followsfrom the results of Art. 36
t cr. Burgess (I), and n i l ~ C (I, Art.s. 4.1, 4.2, 4.5, and 4.7).
250
NONLINBAR DBVICBS: THE DIRECT METHOD 251
(124)
that if the probability density function 1'1(X,) of the input random variable
exists and is continuous almost everywhere, and if the transfer charac
teristic provides a onetoone mapping from z to'll, then the probability
density function PI(Y) of the output random variable is given by
P2(YI) = Pl(XI) I:I
Although this equation may not always apply (e.g., when the transfer
characteristic is constant over an interval in z as for a halfwave detector),
Eq. (123) always does.
Further, it follows from Art. 41 that averages with respect to the
output can always be obtained by averaging with respect to the input.
Thus
t:
El/(y,)] = E {J[g(x,)]J = _. ![g(x,)]p(x,) dx,
The nth moment of the output is therefore given by
f
+
E(y,ft) = _ _ g"(x,)p(x,) dx,
Similarly, the autocorrelation function of the output is
(125)
(126)
(127)
where Xl = X" and XI == X".
The problem of determining the statistical properties of the output of
a nonlinear device can hence be solved in principle by applying the various
results which we obtained earlier on the transformation of variables. We
shall apply those results in this chapter to a study of two nonlinear devices
of some practical importance:t the fullwave squarelaw detector and the
halfwave linear detector. However, when we try to use the method
adopted in this chapter to solve more complicated problems, considerable
analytical difficulties often arise. In these cases, a less direct method
involving the use of the Fourier transform of the transfer characteristio
may be used. The transform method of analysis will be discussed in
Chap. 13.
122. The Squarelaw Detector
By fullwave squarelaw detector we mean a fullwave squarelaw device
with the transfer characteristic
'Y = ax' (128)
t It should be noted that we have already studied the envelope detector, in essence,
in Arta. 85, 86, and 05.
252 RANDOM SIGNALS AND NOI8.
where (I is & scaling constant, followed by a lowpass or averaging filter.
Such a detector is shown schematically in Fig. 121; the fullwave square
law transfer characteristic is shown in Fig. 122. The squarelaw detec
tor is analytically the simplest of the various nonlinear devices we shall
aCt)
Squarelaw
y(t)
Lowpass
aCt)
...
device filter
Flo. 121. The squarelaw detector.
study. It is of interest to us, however, not only because of the simplicity
of its analysis but also because of its practical importance. The determi
nation of the statistical properties
of the detector output can be made
most easily by determining the sta
tistical properties first of the out
put of the squarelaw device and
then of the lowpassfilter output.
In this article, we shall consider first
a general input and then an input
with gaussian statistics; the case
.rr vr J: of a sine wave plus narrowband
 VA + G gaussian noise input will be studied
FIG. 122. The fuDwave squarelaw in the next article.
transfer characteristic.
The probability density function
of the output of a fullwave squarelaw device was derived in Art. 36
and is, from Eqs. (345) and (346),
{
Pt(x, = +vY,/a) +Pl(X, =  vY,/a)
P2(Y') = 2 yay;
o
for y, ~ 0 (129)
otherwise
When the input probability density function is an even function, this
becomes
for y, ~ 0
otherwise
(1210)
The determination of the probability density function of the output of
the lowpass filter was discussed in Art. 95. However, for an arbitrary
filter characteristic, the density function obtained in general form there
is 80 unwieldy as to make calculation of averages difficult. The calcu
lation of the autocorrelation function, which requires secondorder dis
tributions, is even more difficult. It is therefore useful to specialise to
the case of a narrowband gaussian noise input and an idealized type of
NONLINEAR DEVICES: THE DIREct METHOD 253
lowpass filter which simplifies the calculations enough to permit more
extensive results. This special case will be considered in the next section.
The nth moment of the output of the squarelaw device is
E(y,.,) = a" J+: X,'''Pl(X,) dXI = a"E(x,lII) (1211)
The autocorrelation function of the output of the squarelaw device is
(1212)
(1214)
(1215)
(1213)
otherwise
for y, 0
which becomes a function of t
1
 t2 when the input is stationary. It is
not feasible to go much beyond these simple results without specifying
the statistical properties of the detector input.
Gaussian Input. Suppose now that the detector input x(t) is a sample
function of a real gaussian random process with zero mean. In this case,
Pl(X,) = 1 exp ( __Xc_',)
V2ru2:. 2u2:.
It then follows from Eq. (1210) that the probability density function
of the output of the squarelaw device is
PI(YI) = tr"" exp ( 
which is a chisquared density function.
If the detector input is a narrowband gaussian random process, we
can write, from Eq. (873),
x(t) = V(t) cos [Wet + 4(t)]
where Ie = tlJ
e
/ 2r is the center frequency of the input spectral density,
V(t) 0 is the envelope of the input, and 0 4(t) 2r is the input
phase. The output of the squarelaw device then is
y(t) = aV'(t) + aV2(t) cos [2w.t +24(t)] (1216)
2 2
The first term of this expression has a spectral density centered on zero
frequency, whereas the second term has a spectral density centered on
2/
0
If the bandwidth of the input is narrow compared to its center fre
quency, these two spectral densities will not overlap. t On passing y(t)
through a lowpas8 zonal filter (i.e., a nonrealizable filter which will pass
without distortion the lowfrequency part of its input and filter out com
pletely the highfrequency part), we obtain for the filter output
z(t) = aV
2
(t ) (1217)
2
t 01. Fig. 124.
254 RANDOM SIGNALS AND NOISE
Since V(t) is the envelope of a narrowband gaussian random prOC888;
it has a Rayleigh probability density function:
{
V, ( V,, )
exp 
p(V,) == ~ ...I 211",1
for V, ~ 0
otherwise
(1218)
from Eq. (885). The probability density function of the filter output is
therefore
{
I ( Z') ex 
Pa(Z,) = ~ " , I P 04...
1
for z, ~ 0
otherwise
(1219)
which is an exponential density function. The probability density func
tion of the normalized input ~ = X/(f;r;J the normalized output of the
squarelaw device." = Y/041&2, and the normalized filter output r = z/aa.!
are shown in Fig. 123.
4 3 2 1 1 2 3 4
FIG. 123. Fullwave squarelaw detector probability density functions.
The nth moments of the output of the squarelaw device can be
obtained by substituting into Eq. (1211) the previously derived values
(Eq. (811)) for the even moments of a gaussian random variable. Thus
we obtain
In particular,
and hence
E(1I''') == aft 1 3 5 (2n  1)(1.,111
E(y,) = aal&.1
E(y,l) = 3a
t
cr;r;.4 = 3E2(y,)
(12(y:) = 2a
t
(f z.4 = 2E2(y,)
(1220)
(1221 )
NONLINEAR DEVICES: THE DIRECT METHOD 255
(1226a)
When the detector input is a narrowband gaussian random process and
the output filter is a lowpass zonal filter, the nth moment of the filter
output is, t using Eq. (1219),
E(Zi
ft
) = r Z,
ft
2 exp ( dZ
t
(1222)
Jo M,;, aUI:
= 1J,!a
ft
u
z
."ft
Hence E(z,) = afT,;." = E(Yt)
E(Zt") = 2a"0'I:, (1223)
and 0'2(Z,) = a"fT
z
, = E2(Z,)
Thus we see from Eqs. (1221) and (1223) that the means of the outputs
of the squarelaw device and the lowpass filter are both equal to a times
the variance of the input and that the variance of the output of the low
pass zonal filter is onehalf the variance of the output of the squarelaw
device.
The autocorrelation function of the output of the fullwave squarelaw
device in response to a gaussian input is, from Eqs. (1212) and (8121),
R
II
(t
1
,t,,) = a"E2(x,") + 2a"E2(xIX2) (1224a)
which becomes, on setting". = t
1
 t" and fTl: = C1I:U all t,
RJI("') = a"CTs + 2a
2
R,;I (".) (1224b)
when the input random process is stationary. The spectral density of
the output of the squarelaw device, given by the Fourier transform of
R
II
( ". ) , is
t:
8,,(f) == a
2
a,,48(f) + 2a
2
_ GO
since, from Eq. (AI16), the Fourier transform of a constant is an impulse
function. Now
Hence
8,,(f) = a
2
cr,.4TJ(f) + 2a
2
J+: 8.(1')8.(1  1') dl' (1225)
Thus the output spectral density is composed of two parts: an impulse
part
which corresponds to the output mean value, and a part
8,,(f)r = 2a
t
!+: 8.(/')8.(1  1') dl'
which corresponds to the random variations of the output.
t Cf. Dwight (I, Eq. 861.2).
(1226b)
256 RANDOM SIGNALS AND NOISE
A feeling for the above results can perhaps best be obtained by assum
ing some simple form for the input spectral density. Accordingly, let
the input spectral density have a constant value A over a narrow band
of width B centered on some frequency fe, where 0 < B < Ie:
B B
for Ie  2 < Iff < Ie +2
otherwise
(1227)
This spectral density is shown in Fig. 124a. In this case,
t:
= _. df = 2AB (1228)
The impulsive part of the spectral density of the output of the square
law device is therefore
and is shown in Fig. 124b. The mean and the variance of the output
of the squarelaw device are, from Eqs. (1221),
E(y,) = 2aAB and ul(y,) = Sa
2
A 2B2 (1230)
and the mean and variance of the output of the lowpass zonal filter are,
from Eqs. (1223),
E(z,) = 2aAB and gl(Z,) = 4a
'
A IB2 (1231)
The spectral density of the fluctuating part of the output of the square
law device is formed, as shown byEq, (1226b), by convolving the input
spectral density S:f(!) with itself. Here we have
{
4a'A'(B  liD for 0 < III s B
8,,(/), = 02a
2A
2(B  II/I  2/cl} for 2/c  B < III < 2/c + B
otherwise
(1232)
which is also plotted in Fig. 124b. The spectral density of the output
of the squarelaw device is thus nonzero only in the narrow bands centered
about zero frequency and twice the center frequency of the input spectral
density. The lowpass zonal filter passes that part about zero frequency
and stops the part about 21t. Hence
where
and
S,(!) = S;(f) + S,(f),
8i(f) = 4a
2
A'B
'
8(f)
S (f) = { 4a
2
A2(B  liD for 0 < IfI < B
, r. 0 otherwise
(1233)
(1234a)
(1234b)
These spectral densities are shown in Fig. 124c. From the above results,
we see that the bandwidths of the outputs of the squarelaw device and
NONLINEAR DEVICES: THE DIRECT METHOD 257
f
f
B
A
(a) Input
2' B
(b) Squarelaw device output
B +8 f
(c) lowpass filter output
FIG. 124. Spectral densities for a fullwave squarelaw detector in response to a
narrowband gaussian input.
the lowpass' filter are twice the bandwidth of the detector input. A
moment's reflection should convince one that these results are consistent
with those obtained in the elementary case of sinewave inputs.
123. The Squarelaw Detector: SignalplusNoise Input
Let us now consider the case in which the input to the fullwave
squarelaw detector is the sum of asignal 8(t) and a noise n(t):
x(t) = s(t) + n(t) (1235)
where B(t) and net) are sample functions from statistically independent
random processes with zero means.
Since the output of the squarelaw device is
y(t) = a[8(t) + .n(t)]! == a[s2(t) +2s(t)n(t) +nl(t)] (1236)
and the input processes are independent, the mean of the output is
E(y,) = a[E(s,!) +E(n,I) (1237a)
258
RANDOM SIGNALS AND NOISB
which.. when the input processes are stationary, becomes
(1237b)
where (1, = er(8,) and er" == a(n,) for all t. The meansquare value of the
squarelaw device output is, in general,
(1238a)
which becomes
(1238b)
when the inputs are stationary.
The autocorrelation function of the output of the squarelaw device is
RII(t1,tt) = E(YIYI) = a
lE(s1
+ nl)I(81 +nl)l)
When the input random processes are stationary, we get, on setting
T = tl  t2,
where R,(r) and R,,(T) are the signal and noise autocorrelation functions,
respectively, and where
and (1240)
The autocorrelation function of the output of the squarelaw device
therefore contains three types of terms:
(1241)
in which the term
R,:I:.(r) = a'R(r) (12420)
is due to the interaction of the signal with itself, the term
R":I:,,(T) = aIR".(,.) (1242b)
is due to the interaction of the noise with itself, and the term
R,z,,(r) = 4a
I
R.(T)R,,(T) +2a
2
CT, IQ',,1 (1242c)
is due to the interaction of the signal with the noise. Of these terms
we may say that only the 8X8 term (which would be present if there were
no noise) relates to the desired signal output; the 8m and nm terms
relate to the output noise.
NONLINEAR DEVICES: THE DIRECT METHOD 259
The spectral density of the output of the squarelaw device may be
obtained by taking the Fourier transform of RJI{T). Thus we get
81/(/) = 81:1:,(/) + "';u:n(f) + 8,u:n(f) (1243)
where 8.".(f) = at J+.: R(T) exp( J'l1f'IT) dT (1244a)
8"...(f) = at !: Rn'(T) exp( J21f'IT) dT (1244b)
and
8.",,(f) = 4a
t
J+: R.(T)Rn(T) exp( J21f'IT) dT + 2a
t
lT.
t
lT,,26(f) (1244c)
=4a' J+: 8..(1')8.(1  I') df' + 2a
t
lT.
t
lTn' 6(f)
in which 8.(n and 8,,(f) are the spectral densities of the input signal and
noise, respectively. The occurrence of the 8xn term shows that the out
put noise increases in the presence of an input signal. Although this
result has been obtained here for a fullwave squarelaw detector, we shall
see later that a similar result holds for any nonlinear device.
Sine Wave Plus Gaussian Noise Input. In the preceding section we
obtained various statistical properties of the output of a fullwave square
law detector in response to a signalplusnoise input of a general nature.
Suppose now that the input noise n(t) is a sample function of a stationary
real gaussian random process with zero mean and that the input signal
is a sine wave of the form
s(t) = P cos (Wet + 8) {I 245)
where P is a constant and (J is a random variable uniformly distributed
over the interval 0 ~ 8 ~ 2. and independent of the input noise process.
It follows from Eq. (l224b), since the input noise is gaussian, that the
nm part of the autocorrelation function of the output of the fullwave
squarelaw device is
R".,,(,.) == 2a
I
R"I(T) + a
l
Q',, 4 (1246)
The corresponding spectral density then is, from Eq. (1225),
8""..(f) = 'la' l: 8,,(f')8,,(1  I') df' + a'IT,,48(f) (1247)
We now require certain properties of the input signal in order to deter..
mine the remaining parts of the autocorrelation function of the output
of the squarelaw device. First, the autocorrelation function of the input
signal is
R.(el,'t) == plErcos (CaJJI + B) cos (c.JJI + 8)]
pI pI 1 r:
 2" cos [w.(t1  t,) + "2 2r 10 COB [w.(t1 + tt) + 28) d8
pi
== 2 cos [CaJ.(t1  tl))
260 RANDOM SIGNALS AND NOISE
We can therefore write
(1248)
where r = t
1
 t2. The spectral density of the input signal then is,
using Eq. (AIIS),
S,(f) =  je) + +Ie)] (1249)
(1250)
where Ie = we/21r. From Eqs. (1242c) and (1248), we have
R.z,,(r) = 2a'p2R,,(T) cos WeT + a1p1a,,"
The corresponding spectral density is
8,%,,(/) = a
2
p 2[S,,(f  Je) + 8,,(/ +Ie)] + a
2
p 2a,,28(f) (1251)
which follows from Eqs. (1244c) and (1249).
Next, the autocorrelation function of the square of the input signal is,
E(812822) = PE[cos
l
(wet
l
+ 8) cos' (Wets + 8)}
p. p.
= 4 +8 cos 2Ca'e(t1  t2)
The 8X8 part of the autocorrelation function of the output of the square
law device is, therefore,
a
2
p . alp.
R.z.{r) = 4 + 8 cos 2WeT (1252)
where If = tl  t2. The corresponding spectral density is
alp. alp.
= 4 8(/) +16[8(/  2/e) + 8(/ + 2/c) ] (1253)
To summarize, when the input to a fullwave squarelaw device con
sists of a sine wave plus a gaussian noise with zero mean, the autocorre
lation function of the device output is, from Eqs. (1246), (1250), and
(1252),
RI/(T) = a
2
(;1+u..
1
)' + +2a
1p2R.(T)
COS"'oT + cos 2"'e'T
(1254)
and the output spectral density is
Sy(f) = a
2
(;2 +UftlY + 2a
l
f+: S.(j')S.(j  jf) dj"
a
l
p 4
+Q2P2[S.(1  Ie) +8,,(/ +Ie)] +16[cJ(J  2Je) + 8(/ +2/0) ] (1255)
which follows from Eqs. (1247), (1251); and (1253).
NONLINEAR DEVICES: THE DIRECT METHOD 261
The first term in Eq. (1254) is simply the mean of the output of the
squarelaw device,
(1256)
(1257)
The mean square of the output, obtained by evaluating the autocorre
lation function at T = 0, is
E(y') = 3a
l
( ~ 4 + plu,,1 +U,,4)
The variance of the output of the squarelaw device therefore is
U,," = 2a
l
(i; + P'u,,' +u,,4) (1258)
Again we shall try to obtain a feeling for the analytic results by assum
ing a simple form for the input noise spectral 'density. As in Art. 122,
let the input noise spectral density have a constant value A over a
narrow band of width B centered on a frequency fe, where 0 < B < fee
The total input spectral density then is
es = ~ I [cl{f  10) +cl{f +10)]
+{:
for /0  : < 1/1 </0 +~ (1259)
otherwise
for 0 < 1/1 < B
for 2/.  B < III < 2/0 +B (1260)
otherwise
and is shown in Fig. 12..50.
Next, let us consider the various terms which make up the output
spectral density. We observed above that the noise terms at the square
lawdevice output resulting from the interaction of the input noise with
itself (the nxn terms) are the same as the totaloutputnoise terms arising
in the case of noise alone as an input. Thus, from Eqs. (1229) and
(1232),
Snan(f) = 4a
2
A2B
2
8(f)
{
4a2A2(B  IfD
+ ~ a l A2(B  IIII  2/.\)
This spectral density is shown in Fig. 124b.
Equation (1253) shows that the outputspectraldensity terms result
ing from the interaction of the input signal with itself (the 8Z8 terms)
consist of three impulses, one located at zero frequency and a pair located
at 2/0 These terms are shown in Fig, 125b,
262 RANDOM SIGNALS AND NOISE
f
a
2p4
Area
4

tfp4
Area16
f
(
B
Area ~ p 4
16
(b) Sau
24 to B +B +{, +2{c f
(cl)"" s;
FIG. 125. Spectral densities for a fullwave squarelaw device in response to & slne
wave plus gaussian input.
The outputspectraldensity terms resulting from the interaction of the
input signal with the input noise (the 8xn terms) are shown by Eq. (1251)
to consist of an impulse at zero frequency plus a pair of terms resulting
from the shifting of the inputnoise spectral density by !ce Hence
S,s,,(f) = 2a
2
P2AB8(f)
2a'P'A
+
o
B
for 0 < IfI s 2
B B
for 2ft:  2 < IfI < 2/0 +2"
otherwise
(1261)
NONLINEAR DEVICES: THE DIRECT 263
(1262)
for 0 < Jfl B
otherwise
B
for 0 < IfI 2
otherwise
This spectral density is shown in Fig. 125c. The totaloutput spectral
density is then given by the sum of the 8X8, 8xn, and nxn terms and is
shown in Fig. 125d.
As in Art. 122, we can follow a fullwave squarelaw device by a low
pass zonal filter to obtain a fullwave squarelaw detector. The output
spectral density of the squarelaw detector is given by those terms in
Eqs. (1253), (1260), and (1261) which are in the vicinity of zero fre
quency. Thus
S,(f) = a
2
+ 2ABr8(f)
+ {:a
2P
2
A
{
4a2AI(B  IfD
+ 0
The second term in this equation is the result of the interaction between
the input signal and noise; the third is the result. of the interaction of
the input noise with itself. The relative importance of these two terms
can be determined by comparing their total areas, a
2
(z.Zft ) and
respectively. From Eq. (1262),
(1263)
Now the input signaltonoise power ratio (i.e., the ratio of the input
signal and noise variances) is
(1265)
(1264)
Hence
(
8) _a.
2
_ P2/2
N ,  tT,,2  2AB
a
2
(z...) = 2 (S)
al(z"Sft) N i
Thus, as the input signaltonoise power ratio increases, the output noise
becomes more and more due to the interaction of the input signal and
noise and less and less to the interaction of the input noise with itself.
Modulated Sine Wave Plus Gaussian Noise Input. In the preceding
section, we assumed that the detector input signal was a pure sine wave.
Let us now consider the input signal to be a randomly amplitudemodu
lated sine wave:
8(t) = pet) cos (Wet + 8) (1266)
where 8 is uniformly distributed over the interval 0 fJ 21r and where
P(t) is a sample function of a stationary real random process which is
statistically independent of B and of the input noise to the detector.
264 RANDOM SIGNALS AND NOISE
(The analysis which follows is also valid when pet) is periodic but con
tains no frequencies commensurable with fe = 6J./2r).
We shall assume, as before, that the input noise is a sample function
of a stationary real gaussian process with zero mean. The nm terms of
the squarelaw device output autocorrelation function and spectral den
sity are therefore given again by Eqs. (1246) and (1247), respectively.
The autocorrelation function of the input signal is
R,(T) = E(P,P, +r)Efcos (wet + 8) cos [wc(t + or) + 8]} (1267)
=MRp(T) cos WeT
where Rp(T) is the autocorrelation function of the input signalmodu
lating process. The spectral density of the input signal then is
S.(J) = ~ J+: Sp(f')[6(f  t.  f') + 6(f +Ie  f')] df' (1268)
= J4[Sp(f  Ie) + Sp(l +I.,)]
where Sp(f) is the spectral density of the input signalmodulating process.
It therefore follows, from Eqs. (1242c) and (1267), that the 8zn portion
of the autocorrelation function of the squarelaw device output is
R.s,,(T) = 2a
2R
p
(T)R,,(or ) cos "'eT + a
2
R
p
(O)u"t (1269)
The corresponding spectral density is
S.s,,(f) = 2a
l
J: Rp('T)R,,(r) cos 2r/
e
'T exp( j2rfe'T) d'T + a
I
R
p
(O)u,,16(f)
The integral in this equation can be evaluated as onehalf the sum of the
Fourier transform of. the product RP(T)Rn(T) evaluated at I  Ie and at
J + fe. Since the Fourier transform of Rp(or)R,,('T) is the convolution of
the corresponding spectral densities, we get
Sos,,(f) = a
l
J+: S,,(f')[Sp(f  I,  I') + Sp(f +fe  f')] df'
+a
2
Rp (O)u,,28(f) (1270)
The 8X8 part of the autocorrelation function of the output of the square
law device is
R,z.(T) = aIE(p,JP'+rI)E{cos
2
(Wet + 8) cost [w.(t + or) + 8]}
a
2
a
2
(1271)
= 4 Rp.(T) + 8 Rp.(T) cos 2WcT
where RPI(T) is the autocorrelation function of the square of the input
signalmodulation process. The 8X8 part of the spectral density of the
output of the squarelaw device is therefore
a
l
a
2
811:. (1) == 4" BPI ( /) + 16 [SPI(I  2fe) + SPI(I + 2/.,)] (1272)
where SPI(I) is the spectral density of the square of the input signal..
modulation process.
DE\TICm: TItE D1RgCT METHOD 265
Let us now compare these results with those obtained when the input
sine wave is unmodulated. First of all, the nxn portions of the output
spectral densities are the same In both cases. Next, a comparison of
Eqs. (1251) and (1270) shows that the nonimpulsive part of the ez
portion of the unmodulated output spectral density is convolved with
the spectral density of P(t) in the case of modulation. Finally, a com
parison of Eqs. (1253) and (1272) shows that the impulses in the case
of no modulation are replaced by terms containing the spectral density
of P2(t) when there is modulation. The overall effect of the modulation
is therefore to cause a spreading in frequency of the 8xn and 8X8 parts
of the output spectral density.
Signaltonoise Ratios. As the final topic in our study of the fullwave
squarelaw detector, let us consider the relation between the output and
input signaltonoise power ratios.
From our expression for the 8X8 part of the autocorrelation function
of the output of the squarelaw device, Eq. (1271), we see that the signal
power at the detector output is
as
where
We can express Soin terms of the input signal power
S, = R.(O) = =
So = a
2
k
p
Sl '
E(P4)
k
p
= E'(P')
(1273)
(1274)
(1275)
(1276)
is a function only of the form of the probability distribution of the
modulating signal. Since kp remains constant when S, is varied, e.g.,
when P(t) changes to aP(t) , Eq. (1275) shows that the output sig
nal power varies as the square of the input signal powerhardly an
unexpected result.
The noise power at the detector output is, from Eqs. (1246) and
(1269),
No = +
= a
2
[u,,4 +0',,2E(P2)]
(1277)
where the factors of % arise because half the noise power at the output
of the squarelaw device is centered about zero frequency and half is
centered about twice the carrier frequency. t This result can be expressed
in terms of the input signal power S, and the input noise power N, = 0',,2 as
N.. == a'N,' (1 +2;.) (1278)
t Of. Figs. 124 and 125.
266 RANDOM SIGNALS AND NOISE
0.001 0.01 0.1
1
1000 k;
100
10
0.01
0.001
100 1000 Si
N;
(1280)
FIG. 126. Output signaltonoise power ratio versus input signal..tonoise power ratio
for a fullwave squarelaw detector.
This result corresponds to that obtained previously in the case of an
unmodulated sine wave, Eq. (1265).
The ratio of the output signal and noise powers is therefore
So = k
p
(Si/N
i)
2
(1279)
No 1 + 2S./N.
which is plotted in Fig, 126. When the input signaltonoise power
ratio is very large we get, approximately,
So kp s.
No = 2 N
i
When the input signaltonoise power ratio is very small, we get,
approximately,
So = k
p
( 8,)2 (1281)
No N,
Thus the output signaltonoise power ratio varies directly ~ s the input
signaltonoise power ratio for large values of the latter and as the
NONLINEAR DEVICES: THB DIRECT METHOD 267
sll) ..
Halfwave
y(t)
Lowpass
.cit)
linear
..
filter
device
Flo. 127. The halfwave linear detector.
square of the input signaltonoise power ratio for small values of the
latter. This result shows the 8mallBignalsuppression effect of a detector.
Although we obtained it here for the fullwave squarelaw detector, we
shall show in Chap. 13 that it is 8 property of detectors in general.
s
11 %
T
FIG. 128. The halfwave linear transfer
characteristic.
when x ~ 0 (1282)
when x < 0
where b is a scaling constant, fol
lowed by a lowpass, or averaging,
filter. Such a detector is shown
schematically in Fig. 127; the half
wave linear transfer characteristic
is shown in Fig. 128.
The probability distribution func
tion of the output of the halfwave
linear device is
124. The Halfwave Linear Detector
For our second application of the direct method of analysis of non
linear devices, let us study the halfwave linear detector. t This detector
consists of a halfwave linear device
with the transfer characteristic
{
0 for Yl < 0
P(y, s Yl) = p(x, s ~ I ) for YI ~ 0
as may be seen from Fig. 128. This equation can be expressed in terms
of the input probability density function as
I
0 for u. < 0
P(y, ~ Yl) = Ptx, < 0) + 10"
1
/
6
Pl(X,) dx, for YI ~ 0 (1283)
By differentiating Eq. (1283) with respect to Yl, we get
pz(y,) = P(x, < 0)8(y,) +~ PI (x, = ~ ) U(y,) (12.s4)
where U(y,) is the unit step function defined by Eq. (Al6).
f cr. Rice (I, Art.s. 4.2 and 4.7) And Burgess (I).
268 kANDOM SIGMALS AND NOISE
The various moments of the output can be obtained by substituting
Eq. (1282) into Eq. giving
(1285)
In order to proceed further, we generally need to specify the particular
form of the input probability density function. However, when the input
probability density function is an even function, we can express the even
order moments of the device output directly in terms of the evenorder
moments of the detector input. For, in this case,
Hence
where PI(X,) == PI( x,).
The autocorrelation function of the output of the halfwave linear
device is, from Eqs. (127) and (1282),
(1287)
where Xl == X" and XI == X...
Gaussian Input. Suppose now that the detector input x(t) is a sample
function of a real gaussian random process with zero mean. In this case,
the probability density function of the detector input is given by Eq.
(1213) and the probability density function of the output of the half
wave linear device is, from Eq. (1284),
(1288)
The probability density function of " == y,/btT(z,) , along with that of
= x,/(1(x,) is plotted in Fig. 129.
Since Pl(X,) is an even function here, the evenorder moments of the
output can be obtained most directly from Eq. (1286). Thus, on using
Eq. (811b),
E(y,'JA) = 3 5 (2n  1)]
(1289)
where n = 1, 2, . . .. The oddorder moments of the device output
can be determined by substituting Eq. (1213) into Eq. (1285}, giving
NONLINEAR DEVICES: THE DIRECT KETROD
3 2 1 2 3 f.".
Flo. 129. Halfwave lineardevice probability density functions.
where m = 0, 1, 2, . . .. On defining z = x
2
, this becomes
269
Hencet (1290)
In particular, the mean value of the output of the halfwave linear device
is
and the variance is
E( )
_ bC1(X,)
y, 
y'2;
(1291)
ttl(y,) == E(y,l)  EI(y,)
== ~ (1  ~ ) b
2112(X,)
== O.3408b2112(X,) ~ 1 2  9 2 )
The autocorrelation' function of the output of the halfwave linear
device is found by substituting for P(Xl,XI) in Eq. (1287) the joint prob
ability density function of two gaussian random variables with zero
means. When the input process is stationary, we get, using Eq. (822),
b
l
t t:
RI/(T) == 211"11.
2
[1 _ p.2(T)]K Jo Jo XIX2
{
X12 +%.1  2
pa
(T)XIXI} d d
exp  2(1zl[1 _ pz2(r)] Xl XI
t See Dwight (I, Eq. 861.2).
270 RANDOM SIGNALS AND N01SB
where a. == a(x,) for all t and p.(r) == E(xtxl)/a.". In order to facilitate
evaluation of the double integral in this equation, we define
We may then write
Now it has been shown that]
10" dx 10" dy xy exp( x
2
 y2  2xy cos ~ ) = Mcsc
2
~ ( 1  ~ cot ~ )
(1293)
when 0 ~ 4J S 2... Thereforej on defining p.(r) = c084J (since lp.(r)( ~
1), we obtain
(1294)
8S the autocorrelation function of the output of a halfwave linear device
in response to a stationary real gaussian random process input.
In view of the practical difficulties involved in taking the Fourier
transform of Eq. (1294) to get the spectral density of the output of the
halfwave linear device, it is convenient to express the output autocorre
lation function in series form before taking its transform. The power
series expansion of the arc cosine is]
1r p3 3p6 3 5p7
cos
1
(p) = 2 + p + 2 3 + 2 4 5 + 2 4 6 7
+ . . .,  1 S p ~ 1 (1295)
The powerseries expansion of [1  p2]H is
p2 p4 3p'
[1  p2]}2 = 1  "2  2. 4  2.4.6
. . . , 1 5: p 5: 1 (1296)
On substituting these expansions in Eq. (1294) and collecting terms,
we find that the first few terms of a series expansion of RII(T) are
...J
(1297)
t Rice (I, Eq. 3.54).
t Dwight (It Eq. 502).
Dwight (I, Eq. 5.3).
NONLINEAR DEVICES: THE DIRECT METHOD 271
(1298)
(1299)
Since s 1, the terms in P.(") above the second power are relatively
unimportant. To a reasonable approximation, then,
b
2
q 2 b
l
b2
RI/(f') = 2: +"4 + aka.
2
R.2(f')
We know from Eq. (1291) that the first term in the series expansion of
RI/(T) is the square of the mean of y,. Therefore the rest of the series
expansion evaluated at T = 0 must give the variance. Using Eq. (1298)
we then have, approximately, for the variance of y,
(1,,2 = (1 + L) = O.3295b
2(1s'
A comparison of this result with the exact value given in Eq. (1292)
shows that the approximate form of the output autocorrelation function,
Eq. (1298), gives a value for the output variance which is in error by
only about 3 per cent. Since the approximation afforded by Eq. (1298)
becomes better as R.(.,) decreases, [R,(r)  9,.2] as calculated from Eq.
(1298) is never in error by more than about 3 per cent.
An approximate expression for the spectral density of the output of
the halfwave linear device can now be obtained by taking the Fourier
transform of Eq. (1298). Thus
b
2
2 b
2
b
2
f+
81/(/) = 2(1:/: + 4 8,;(f) + :4_ 2 8.(/')8
z
(1  I') dj'
'If' 6f:If(J' :J: __
(12100)
where is the spectral density of the input process. As in the study
of the squarelaw detector, let us assume that the input process has the
narrowband rectangular spectrum
Sif) = l:
B B
for It:  2 < IfI < Ie + 2"
otherwise
In this case, a.
2
= 2AB and. from Eq. (12100),
8,,(f) = b'AB au) + {b:A
1r 0
b: (1 _
+ b
l
A (] .!. I IfI  2
f
..l)
8r, B
o
B B
for Ie  "2 < Ifl <J" + '2
otherwise
for 0 < IfI < B (12102)
for 2ft:  B < IfI < 2Jc + B
otherwise
272 RANDOM SIGNALS AND NOISE
Suppose now that the hallwave linear device is followed by a lowpass
zonal filter. The spectral density of the filter output will be nonzero
only in the region about zero frequency. Hence
2 {b
2A
( Iii)
S.(f) = b AB fI(f) + ""4;; 1  B
.. 0
for 0 < IfI < B (12103)
otherwise
The various spectra for this case are shown in Fig. 1210.
It is interesting to compare the above results for the halfwave linear
detector with those obtained previously for the fullwave squarelaw
A
Ie
Ie I
(a) Input
6
2A
.
B
f
Area "zAB
..
1
2' fc13
(b) Halfwave lineardevice output
B f
(c) Low..pass filter output
FIG. 1210. Spectral densities for a halfwave linear detector in response to a narrow
band gaussian input.
detector. From a comparison of Figs. 124 and 1210, it may be seen
that the spectra for both detectors have the same shape in the band
centered on zero frequency and on twice the center frequency of the
input spectrum. The area of each band, however, is proportional to the
square of the input variance in the case of the squarelaw detector and to
the variance itself for the linear detector. A further difference is that
the output of the halfwave linear device has an additional noise band
NONLINEA.R DEVICES: THE DIRECT METHOD 273
coincident in frequency with the input band. These results are based,
of course, on the validity of the approximate relation, Eq. (1298), for the
autocorrelation function of the output of the halfwave linear device.
We have gone here about as far with the analysis of the halfwave
linear detector as it is convenient to go with the direct method of analysis.
A study of the behavior of this detector in response to an input consisting
of a signal plus a noise will be deferred to the next chapter.
12&. Problems
1. Let the input to a fullwave squarelaw device whose transfer characteristic is
11 = ax be a sample function x(t) of a stationary real gaussian random process with
mean m and variance crt.
a. Determine the probability density function of y.
b. Determine the mean and variance of 7/.
c. Plot the result of a for the case in which m  Ser.
eo(t)
c
Fullwave
squarelaw
device
(e2 oeI
2)
lowpass
filter
FIG. 1211. A squarelaw detector.
I. Consider the system shown schematically in Fig. 1211. Let the input eo(l) be a
sample function of a stationary real gaussian random process with zero mean and
spectral density
8o(J ) .. A >> 2kTR
a. Determine the autocorrelation function of 61_
b. Determine the spectral density of tJl.
c. Sketch the autocorrelation functions and spectral densities of eo, tJl, and'i.
I. t Consider the system shown schematically in Fig, 1211 and let the input be as
in Prob. 2. Let the unit impulse response of the lowpass filter be
1&(t T) .. { atJ
a
' for 0 ~ t s T (12104).
, 0 otherwise
where a is a constant.
a. Determine the mean m. of ea.
b. Determine the variance ". of 6,.
c. Determine the ratio (tI,1M,) and its limiting value 88 T . 00.
,. Consider the system shown schematically in Fig. 1211 aad let the input be 88 in
Probe 2. Let the lowpass filter be an integrating filter with the unit impulse response
h(t,T)  { ~
for 0 S t S T
otherwise
(12105)
a. Determine mi.
b. Determine tI,.
c. Determine (Vilma) and its limiting value as T . CIO.
t Of. Probs. 9 and 10 of Chap. 9.
274 RANDOM SIGNALS AND NOISE
5. Let the input to a fullwave squarelaw device be of the form
:r;<t)  P(l + m C08 w.') COB wet +n(t) (12106)
where P and m are constantA, where 0 < m < 1 and w.. < < We, and where n(t) is u
sample function of a stationary real guasslan random process with zero mean. The
spectral density of the noise has a constant value No for We  2,.,,,, ~ Iwl ~ c.Jc +2",,,,
and is zeroelsewhere.
a. Determine the output spectral density and sketch.
The desired output signal is the sinusoidal wave of angular frequency c.I",.
b. Determine the power ratio of the output lowfrequency distortion to the desired
output.
c. Determine the power ratio of the desired output signal to the entire lowfre
quency noise.
.sit)  .(t) +n(t)
Multiplier
LeI)
Local
oscillator
Lowpass
filter
t}
FlO. 1212. A synchronous detector.
8. C'onsider the synchronous detector shown schematically in Fig. 1212. Let the
input signal be a randomly modulated sine wave of the form
,et)  pet) cos wet (12107)
where PCt) is a sample function of a stationary" real random process with the spectral
density, where I. < <I.,
s.en  (:.
for 0 :S 1/1 :S ~
otherwise
(12108)
and which is statistically independent of the input noise process. Let the local
oscillator output be
L(t)  A cos wJ, (12109)
where A is a conatant. Further, let the output filter be a lowpass zonal filter which
passel all frequencies 0 S / S!. unaltered and which attenuates completely all other
frequencies.
a. Suppose that the input noise net) is a sample function of a stationary rea] random
process with zero mean and spectral density
for f.  ~ s III s f. +~
otherwise
(12110)
Determine the lipaltonoiae power ratio at the output of the lowpua filter in
terms of t.b4input aipaltonoi8e power ratio.
NONLINEAR DEVICES: "THE DIRECT METHOD 275
b. Let the input noise be of the form
nO)  .,(0 cos wet (12111)
(12112)
Cor 0 s IfI s
otherwise
where .(t) is a sample function of a stationary real random process with zero
mean and spectral density:
8..<n _ {:o
Determine the signaltonoise power ratio at the output of the lowpass filter in
terms of the input signaltonoise power ratio.
T. Consider the synchronous detector shown schematically in Fig. 1212. Let the
input noisen(1)be a sample function of a stationary real gaussian random process with
zero mean and spectral density:
8..(f)  {ex
p
[  (I + exp [  (f (12113)
where fT < < fe. Further, let the input signal be of the form
,(t)  P C08 (wet +8) (12114)
where P is a constant and 9 is a uniformly distributed (over 0 S , < 2...) random vari
able which is independent of the input noise process, and let the local oscillator output
be of the form
L(t)  A cos w,J,
where A is a constant.
a. Determine the spectral density of the multiplier output.
b. Sketch the spectral densities of the input signal plus noise, and the multiplier
output.
8. Let the Iowpasa output filter in Probe 7 be & simple RC network with a half
power bandwidth < < fT. Derive an expression for the signaltonoise power ratio
at the filter output in terms of the filter halfpower bandwidth and for the input signal
tonoise power ratio.
%It)
Mixer
Mtt Band.pass y(t)
filter
FUllwave 1>(t) lowpass
squarelaw
device filter
Day 2
zIt)
local
oscillator
FlO. 1213. A receiver.
I. Consider the receiver shown schematically in Fig. 1213. Let the input be of the
form
z(t)  pet) cos w.t (12115)
where P(t) is a sample function of a stationary real random process with
zero mean and spectral density
for 0 S 1/1 s 0.05/.
otherwise
(12116)
276 RANDOM SIGNALS AND NOISE
The output of the mixer is
M(t)  [1 + z(I)]g(') (12117)
where get) is the periodic function of time shown in Fig. 1214 and where "'0 in that
figure is " of AJ.. The bandpass filter has the system function
for 0.1/. s If) s 0.4/.
otherwise
(12118)
and the lowpass filter has the system function
for 0 s 1/1 S 0.15/.
otherwise
(12119)
a. Determine the spectral density of the mixer output in terms of the input spectral
density and sketch.
b. Determine the spectral density of the lowpass filter output and sketch.
c. Determine the mean and variance of the 10wp888 filter output.
g

..
2
3r 2r r 11" 2.
wo'
FIG. 1214. g(t)
1
FlO. 1215. The limiter transfer characteristic.
10. The transfer characteristic of a given symmetrical limiter is shown in Fig.
1215.
a. Derive an expression for the probability density function of the limiter output in
terms of the probability density function of the limiter input.
b. Evaluate a when the input random process is a stationary real gaU88ian random
process with a sero mean.
c. Determine the mean and variance of the limiter output for part b.
d. If the input probability density function is even about a mean of .ero but other
wise arbitrary, determine the probability density function of the limiter output
for the case of the ideal limiter (i.e., U. > 0 but Zo  0).
e. Evaluate the mean and variance for the limiter output in p&ft, ~
CHAPTER 13
NONLINEAR DEVICES: THE TRANSFORM METHOD
When the direct method of analysis of .nonlinear devices is applied to
problems other than those discussed in the preceding chapter, aualytie
difficulties often arise in the evaluation of the various integrals involved.
In many interesting cases, such difficulties can be avoided through the
use of the transform method of analysis, which we shall discuss in this
chapter. t We shall first introduce the Fourier (or Laplace) transform
of the transfer characteristic of a nonlinear device; then we shall apply it
to the determination of the autocorrelation function of the device output
and, finally, study the specific case of a .,thlaw detector. As in Chap. 12,
we shall restrict our attention to those nonlinear devices for which the
device output at time t can be expressed as some singlevalued function
of the device input at time t.
131. The Transfer Function
Let y = g(x) be the transfer characteristic of the nonlinear device in
question. If (I(x) and its derivative are sectionally continuous and if
g(z) is absolutely integrable, i.e., if
t:
_. Ig(z)1 dz < +00 (131)
then the Fourier transformJ(jv) of the transfer characteristic exists, where
(132)
and the output of the nonlinear device may be expressed in terms of its
mput by the inverse Fourier transformationt
1 f+
Y = g(x) ==  J(jv)ei:ndv 2... __
(133)
We shall call J(jv) the transfer Junction of the nonlinear device. The
representation of a nonlinear device by its transfer function was intro
t Cf. Middleton (II) and Rice (I, Part 4).
SSee, for example, ChurchUl (I, p. 891.).
277
278 RANDOM SIGNALS AND NOISE
duced by Bennett and Ricet and was apparently first applied to the
study of noise in nonlinear devices, more or less simultaneously, by
Bennett, tRice, and Middleton.1f
In many interesting cases (e.g., the halfwave linear device), the trans
fer characteristic is not absolutely integrable, its Fourier transform does
not exist, and Eq. (133) cannot be used. However, the definition of
the transfer function may often be extended to cover such cases, and
results similar to Eq. (133) can be obtained. For example, tt suppose
that g(x) is zero for x < 0, that it and its derivative are sectionally con
tinuous, and that g(x) is of exponential order 88 x + 00; i.e.,
for z 0 (134)
where M 1 and ttl are constants. The function
rex) =
where u' > Ut, then, is absolutely integrable, since
and its Fourier transform does exist. We can therefore write, from
Eq. (133),
and hence
g(x) = .!.. f+GI e
Cu
' +i
f1
) S dv [ t:
2r GO Jo
If now we introduce the complex variable w = u +jv, the integral with
respect to the real variable tJ may be replaced by a contour integral along
the line w = u' + jv in the w plane, and we can write
where 11,' > u and
u'+i
g(x) = 2:j f f(w)e dw
",'iflO
(135)
(136) f(w) = g(x)e
z
dx
The transfer function of the nonlinear device in this case may therefore
t Bennett and Rice (I).
t Bennett (I).
I Rice (I, Art. 4.8).
,. Middleton (II) and (III).
tt Cf. Section 57 of Churchill (II).
NONLINEAR DEVICES: THE TRANSFORM METHOD 279
be .defined as the unilateral Laplace transform, t Eq. (136), of the transfer
characteristic, and the device .output can be obtained from the inverse
Laplace transformation, Eq. (135).
In many problems, the transfer characteristic does not vanish over a
semiinfinite interval, and the above transform relations cannot be used.
It often happens, however, that the transfer characteristic does satisfy
the usual continuity conditions and is of exponential order for both
positive and negative values of x; i.e.,
and
fu(x)r M,e"IZ
lu(x)1 Maeu.:r:
for x > 0
for z < 0
where M2, Ut, M3, and Us are constants. In such 8 case, let us define
the halfwave transfer charscteristics]
and
Then
u+(x) = { go(x) when x > 0
when x 0
{
o when x 0
g_(x) = g(x) when z < 0
g(x) == U+(x) 1r O(x)
(137b)
(137c)
Now g+(x) and g_(x) do satisfy the conditions of the preceding paragraph
and hence do have unilateral Laplace transforms, say f+(w) and f(w),
respectively, where
which converges for 'U > UI, and
f(w) = g_(x)e'" dx
(138a)
(138b)
which converges for u < Ua. The transfer function of the given device
may then be thought of as the pair J+(w) and J(w). Since g+(z) can be
obtained from J+(w) by a contour integral of the type in Eq. (135) and
g_(x) can similarly be obtained from J(w), it follows from Eq. (137c)
that the complete transfer characteristic is given by the equation
(139)
where the contour C+ may be taken to be the line w = tt' +jv, u' > 1t2!
and C_ to be the line w = u" + jv, u" < Ua.
If in the above U2 < Ua, f+(w) and f(w) have overlapping regions of
t Churchill (II, Chap. VI).
t cr. Titchmarsh (II, Art. 1.3).
280 RANDOM SIGNALS AND NOISE
convergence in the tD plane and we can accordingly define the tranafer
function of the device to be the bilateral Laplace transform,t
f
+
/(tD) == J+(tD) + == . _. g(%)e dz (1310)
which converges for 'UI < U < 'UI. In this case, the inversion contours
may both be taken to lie in the overlap region, and hence can be the same.
132. .thLaw Devices
An important class of nonlinear devices is that based on the halfwave
J'thlaw transfer characteristic
{
= 0
when % > 0
when % s 0
(1311)
(1312)
where a is a scaling constant and" is some nonnegative real number.
Particular examples are the halfwave 11thlaw device with the transfer
characteristic = 'Y(x), the fullwave (even) 11thlaw device with the
transfer characteristic
,,(%) = + 'Y( x)
{
ax " when x > 0
= 0 when z = 0
a( x), when x < 0
and the fullwave (odd) .,thIaw device with the transfer characteristic
9.(%) = 'Y(z)
 0
when x > 0
when x = 0
when x < 0
(1313)
Plots of several such transfer characteristics are shown in Fig. 131. The
halfwave and fullwave (even) 11thlaw devices are often used in detec
tors, and the fullwave (odd) .,thIaw device occurs in some nonlinear
amplifiers.
The Laplace transform of the halfwave 11thlaw transfer characteristic
18
f/>(tD) = a 10 z"C"* th
The integral on the right converges only when 9l(w) > o. The eorre
t Cf. van der Pol and Bremmer (I, Art. 11.6).
NONLINEAB DEVICES: THE TRANSFORM METHOD 281
211
,2/
I
I
I
"
, .>
3 % 2
24
2
... ..... CI
3.
3 2 1 2 3 s
(a) Halfwave
6
24
,."2'
I
,
I
I
, .. "'"
a
,hi
/; I
. ,
3 2 1 2 3 %
(b) Fullwave (even) 6
2a
.2/
I
/ . .,!
1 ../
4
.0
(c) (odd)
Flo. 131. J'tbLaw transfer characteristics.
spending inverse transformation contour C+ must therefore lie to the
right of the 10 == if) axis. Letting t = wx,
Hence (1314)
RANDOM: SIGNALS AND NOISE
where r(z) is the gamma functiont defined by the integral
r(z) = t: e1t.
1
tit (1315)
where m(z) > O. It then follows that the transfer function of the half
wave vthIaw device is
Since
f(w) = q,(w) (1316)
ED TZ+/0 plane
jv
a fO (x)"e
tDS
dx = a t : t'e(W)' dt = ~ (  w )
eo Jo
the transfer function pair of the fullwave (even) vthIawdevice is
!tJ+(w) = ~ ( w ) and 1._{1O) = .(w) (1317)
and that of the fullwave (odd) vthIaw device is
!o:(w) = q,(w) and lo_(w) ==  q,( w) (1318)
The integral for q,( w) converges only when mew) < 0; the correspond
ing inverse transformation contour
C_ must therefore lie to the left of
the 10 ;::= jv axis. We shall subse
quently choose C+ to be the 'line
to = E + jf) and C_ to be the line
10 = E +jf), where E > 0 and
 co < u < +co, as shown in Fig.
132.
Narrowband Inputs. Suppose
that the input to a halfwave vth
U law device is a narrowband wave
of the form
c. x{t) = Vet) cos 8(t)
= Vet) cos [Wet + q,(t)] (1319)
where Vet) ~ o. The output of
FIG. 132. .,thLaw inversion contours. that device can then be obtained
by substituting in Eq. (135) for
x(t) from Eq, (1319) and for few) from Eq. (1316). Thus
.+j
1 1 ar(v + 1) f etDvoo,
y = . J(w)e
wv oo
. ' dw = . dw
2rJ c, 2rJ w+
1
.j.
The exponential can be expanded using the JacobiAnger formula]
(
z)'" ~ (z/2)2,.
1..(z) = 2 !..I nlr(m + n + 1)
".0
t Ct. Watson (I, Art. 13.24).
t Watson (I, Art. 3.7) or Magnus and Oberhettinlet (I, Chap. 3, .Art. 1).
(1327)
NONLINEAR DEVICES: THE TRANSFORM: METHOD 285
lf+j" p'ane
Q4
Flo. 133. Contour of integration.
the Bessel function 1.(z) varies as z,. for small values of z, If, then, we
assume for the moment that m > , + 1, the singularity of 1.(r)/rJ'+1 at
the origin vanishes, and that function becomes analytic inside of, and on,
the contour of Fig. 133. It therefore follows from the Cauchy theoremt
that
(1328)
Consider now the integrals 1
2
and 1
4
An asymptotic expansion of
I ",{z) for large z ist
.
1 e (I)"r(m +n
.. (e) = (211'z)" Lt nlr(m  n +
_0
For large values of Izl, then,
e
1..(z) =
approximately. Hence, for large values of
e
i
 /.0 e
t
dE
1
2
= y'2; (E +j(:l)"Hi
(1329)
(1330)
and, when " +%> 0,
11
2
1 8e'. _ 0
V2r (jJ'+Ji
as fJ + co
t Churchill (II, Sec. 48) or Titchmanh (I, Art. 2.33).
t Watson (I, Sec. 7.23).
286 RANDOM SIGNALS AND NOISE
It therefore follows that I J + 0, and similarly that 1
4
. 0, as fJ + GO t
and hence that
(1331)
On the imaginary axis, r = j" and dr = j d". Hence
I = j(') f+ J",(.,,) d."
_ _ '1,+1
since
(1332)
Now, setting" = t,
f
o J .(,,) d" == _ (1)(.') ( J.(t) dt
_ _ .".+1 Jo t+
1
Therefore
I == jC.')[l _ (1)("'0)] ( J.(t) dI.
Jo '..... 1
W' 1 J (t) dt
= 2isin (m  J')   " '   ~ 
2 0 t+
1
The integral in this equation is Weber's infinite integral t and is
(1333)
(1334)
r J.(t) dt r (T)
Jo to+l == 2
0
+lr (1 _m: ,,)
under the conditions that II + 72 > 0 and m > " (which are met by our
assumptions 80 far). Since]
r(z)r(l  z) = ~
SID rz
(1335)
it follows from Eqs. (1322), (1333), (1334), and (1335) that for m even,
C(",m) == ,.ar(" + 1) (1336)
2
0
+Ir(1 _m; ")r(1 +m: ,,)
when m > v + 1 and p + ~ > o. Since, for fixed P, C(v,m)/e.. is a
singlevalued analytic function of m for all m, the theory of analytic
continuation permits us to remove the restriction m > J1 + 1. The
t Watson (I, ~ t . 13.24).
t Magnus and Oberhettinger (It Chap. 1, Art. 1).
I Churchill (II, See. 50) or Titchmarah (I, Art.,. 4.1 through 4.4).
DEVICES: THE TRANSFORM METHOD 287
remaining condition JJ + > 0 is automatically satisfied by our original
assumption that" is nonnegative.
Since Ir(n)1 == 00, n == 0, 1, 2 ... , it follows from Eq. (1336)
that the coefficients C(",m) are zero whenever (m  ,,)/2 is a positive
integer. This occurs, for example, when m is even and greater than v
if " is an even integer, and when m is odd and greater than" if I' is an
odd integer. It therefore follows from Eq. (1324) that if II is an even
integer, the output harmonics of a fullwave (even) JlthIawdevice vanish
whenever m > ,,; similarly it follows from Eq. (1325) that if " is an odd
integer, the output harmonics of a fullwave (odd) zthlaw device vanish
whenever m > P.
vthLaw Detectors and Nonlinear Amplifiers. If a halfwave or full
wave (even) 11thlaw device is followed by a lowpass zonal filter, a vthla\v
detector is formed. The output of the halfwave vthIaw detector is,
from Eq. (1322),
z(t) = C(",O) V(t) (1:l37a)
and the output of "a fullwave zthlew detector is, from Eq. (1324),
where, from Eq. (1336),
C(" 0) = ar(" + 1)
, 2+
1
r' (1 +
In particular, the output of a halfwave linear detector is
a
Zt(t) = 0(1,0) V(t) =  V(t)
1r
(1337b)
(1:338)
and the output of a fullwave squarelaw detector is]
%2(t) = 2C(2,0) V
2(t)
= V2(t)
The halfwave linear detector is thus an envelope detector.
If a fullwave (odd) "thIaw device is followed by a bandpass zonal
filter centered on Ie == "'c/2r (i.e., a filter whose system function is unity
over the frequency interval about t. and zero elsewhere) a vthIaw non
linear amplifier is formed. The output of such an amplifier is, from
Eq. (1325),
Zo(t) = 20(1',1) V(t) cos [Wet + 4>(t)]
where, from Eq. (1336),
ar(v + 1)
C(",1) = 2r[(" + 1)/2]r[(1' +3)/2]
t cr. Art. 122, Eq. (1217).
(1339a)
(I339b)
288 RANDOM 8IGNALS AND NOISE
When" == 0, the output of the fullwave (odd) I'thIaw device caD
assume only the values a, as shown by Fig. 13le. The device is then
called an ideal limiter, and the cascade of such a device with a bandpass
zonal filter, centered on the input carrier frequency, is called an ideal
bandpa88 limiter. When the input to an ideal bandpass limiter is a
narrowband wave, as in Eq. (1319), the output is, from Eq. (1339),
zo(t) = 2C(O,1) cos [Wet + f/>(t)]
4a
=  cos [Wet + ~ ( t ) ]
11"'
Hence, when the input to an ideal bandpass limiter is a narrowband
wave, the output is a purely phasemodulated wave; the phase modu
lation of the output is identical to that of the input.
133. The Output Autocorrelation Function and Spectral Density
The autocorrelation function of the output of a nonlinear device can
be stated as follows in terms of the transfer function of that device,
from Eq. (135), t
R,,(tl,t2) = f+: f: g(XI)g(X2)P(XI,X2) thl th2
1 1 1 t: t:
== (2 )2 J(Wl) dWl !(W2) dW2 P(ZI,Xt)
11"') C C . .
exp(W1Xl + 101%1) dZ
l
dZ
I
The double integral with respect to Xl and x, is, from Eq. (425), the
joint characteristic function of Xl and x, expressed &8 a function of the
complex variables WI and 'lD1. Hence
R,,(tl,ta) = ( 2 ~ ) a hl(Wl) dWl LI("'2) dW2 M.(W.,W2) (1340)
Equation (1340) is the fundamental equation of the transform method
of analysis of nonlinear devices in response to random inputs. The
remainder of this chapter is concerned with the evaluation of this expres
sion for various specific devices and inputs.
In many problems, the device input is the sum of a signal and a noise:
x(t) = 8(t) +net)
(1341)
t We B88ume here for compactness that the transfer function may in fact be ex
pressed either 88 a unilateral Laplace transform of the transfer characteristic, 88. in
Eq. (136), or as a bilateral Laplace transform &8 in Eq. (137). For those cases in
which the transfer function must be expressed as a trauform pair, u in Eq. (138),
each inversion contour in this equation must be replaced by a pair of contoun, 88 in
Eq. (139).
NONLINEAR DEVICES: THE TRANSFORM METHOD 289
where s(t) and n(t) are sample functions of statistically independent ran
dom processes. In these cases the input characteristic function factors,
and Eq. (1340) becomes
Rw(t1,tt) = dWl h!(Wt) dWt M.(Wl,W2)M..(Wl,Wt) (1342)
where M.(Wl,Wt) is the joint characteristic function of 81 and 8t, and
M,,(Wl,tDt) is the joint characteristic function of nl and nt.
Gaussian Input Noise. When the input noise is a sample function of a
real gaussian random process with zero mean, we have, from Eq. (823),
M..(Wl,Wt) = exp U[CT1tw1
t
+2R..(tl,tt)W1Wt +CTttwtt]} (1343)
where Ul = u(nl), Us = u(n2), and R,,(tl,tt) = E(nln2). The output auto
correlation function then becomes
1 r (0'1
2WI
2)
( (u2lwtt)
Rw(t1,tt) = (211'))t JC!(Wl) exp 2 dWl JC!(W2) exp 2 dW2
exp[R,,(t1,t2)WltD2]M.(WI,W2)
If now exp[R,,(t1,tt)WIWS] and M.(Wl,tD2) could each be factored as a
product of a function of WI times a function of W2, or as a sum of such
products, then the double integral in the above equation could beevalu
ated as a product of integrals. That the exponential term can be so
factored may be seen by expanding it in a power series:
(1344)
The autocorrelation function of the output of the nonlinear device can
therefore be written as
_ R,/'(lt,t2) ( (0'12tDlt)
RW(t1,t.)  Lt kl(211'j)2 l!(Wl)Wl exp 2 dWl
ia
h !(Wt)Wt" exp dWt M.(wl,Wt) (1345)
when the input noise is gaussian. In order to proceed further, the
characteristic function of the input signal must be specified.
Sinewave Signals. Suppose next that the input signal is an amplitude
modulated sine wave, i.e., that
s(e) == P(t) cos 6(e) = P(e) cos [Ca)ct +.1 (1346)
where p(e) is a sample function of a lowfrequency random process (i.e.,
290 RANDOM SIGNALS AND NOISE
one whose spectral density is nonsero only in a region about zero fre
quency which is narrow compared to Ie) and where the random variable
<p is uniformly distributed over the interval 0 ,; < 2r and independent.
of both the input modulation and noise. The characteristic function of
such a signal is
}ltf,(Wt,W2) = E[exp(wIPl cos 8
1
+WtP2cos 82)]
The exponential can he expanded using the JacobiAnger formula, Eq.
(1320), to give
GO 10
M.(WI,W2) = I I m"E[I...(w1PI)I..(w
2
P2)E(cos m9
1
cos n(2)
maO n=O
Since
E(cos m81 cos n82) = E[cos m(wJl + t/J) cos n(wetl + 4]
{
0 when n m
= 1
 cos mWeT when n = m
em
where T == t
1
 tt, it follows that
.
M.(Wl,W2) = I e...E[I...(w
I
P
I
)I ...(WJ>2)] cos mw
c
1'
tnO
(1347)
(1350)
when the input signal is an amplitudemodulated sine wave.
The autocorrelation function of the output of a nonlinear device in
response to a sinewave signal and a gaussian noise may now be obtained
by substituting Eq. (1347) into Eq. (1345). Thus, if we define the
function
hu(t.) = ;'j ...[wP'] exp dw (1348)
where P, = P(t.), CT, = and the autocorrelation function
R
mi
(t
1
,t2) = E[h"'1:(t
1)hm1:(tt)] (1349)
where the averaging is with respect to the input signal modulation, it
follows that the output autocorrelation function can be expressed as
. .
RI/(ll,l2) = rr Rmlc(ll,l2)R..lc(ll,l2) cos 1nWc1'
".0 kO
When both the input signal modulation and noise are stationary, Eq.
(1350) becomes
. .
R
II(1')
= 2: Ru(r)R..lc(1') cosmw.1' (1351)
".0 iO
NONLINEAR DEVICES: THE TRANSFORM METHOD 291
If. in addition! the input signal is an unmodulated sine wave
P cos (wet + t/
we get
. .
R,,(r} = 2: 2: e"':;U1 R,,"(r} cos me"cT
.0 ~  o
(1352)
since in this case the coefficients h.,(t
1
) and h..,(tt) are constant and equal.
Output Signal and Noise Terms. For the moment, consider the case in
which the input noise is an unmodulated sine wave. The output auto
correlation function is then given by Eq. (1352). Let us now expand
that result and examine the various terms:
+2 2: 2: ~ I R,,"(T} COBme".T (1353)
_lil
The first term in this equation corresponds to the constant part of the
device output. The set of terms (m ~ l,k == 0) corresponds to the
periodic part of the output and is primarily due to the interaction of the
input signal with itself. The remaining terms correspond to the random
variations of the output, i.e., the output noise. Of these remaining terms,
those in the set (m = O,k ~ 1) are due mainly to the interaction of the
input noise with itself, and those in the set (m ~ l,k ~ 1) to the inter
action of the input signal with the input noise.
If we express the output of the nonlinear device in terms of its mean,
its periodic components, and its random part as
R,,(T} = mv
l
+H LA",I COB me"c
T
+ R,(T}
_I
(1354)
(1355)
where R,(T) == Ee'll'l'). Acomparison of Eqs. (1353) and (1355) shows
that the output mean value and the amplitudes of the output periodic
components can be expressed 'directly in terms of the coefficients h..".
Thus
and
m. c: h
oo
.AM' ea 2h..o m ~ 1
(1356)
(1357)
292
RANDOM SIGNALS AND NOISE
In addition, the autocorrelation function of the random part of the out
put may be expressed as
where
(1358)
(1359)
represents that part of the output noise due mainly to the interaction of
the input noise with itself, and where
(1360)
represents that part of the output noise that is due to the interaction of
the input signal with the input noise. t The expansion of the output
autocorrelation function leading to Eq. (1352) has thus enabled us to
isolate the output mean, the output periodic components, and the (NXN)
and (SXN) parts of the output noise.
These results were obtained under the assumption of a stationary input
noise and an unmodulated sine wave as an input signal. However, a
similar splitting up of the output autocorrelation function is possible in
the general case, and we can write
RI/(tl,t,,) == R(B%8)(t l ,t,,) + R
CNXN)(t1
,t,,) + R(B%N){t
1
,t l ) (1361)
where we have defined, from Eq. (1350),
R(8ZB)(t
1,tJ)
= LE",R.o(t1,tJ) cos f1UIJ.r
_0
R(NXN)(tl,tJ) = 2: :, R",,(t1,tJ)R."(t1,tJ)
iI
(1362)
(1363)
and
. .
R(UN)(t.,tJ) = 2 L2: R..,,(tl,tJ)R.."(t1,tt) cos f1UIJ.r (1364)
_lil
It should be noted that, strictly speaking, all the above terms are func
tions o( the input signalmodulation process.
Just which of the various (8XS) terms in Eq. (1362) represents, in
fact, the output signal depends entirely upon the use of the nonlinear
device. For example, if the device is used as a detectorJ the desired out
t Note that R(BZN)("')  0 when P(t)  O. since 1.(0)  O. m ~ 1. in Eq. (1348).
NONLINEAR DIWICES: THB TRANSFORM METHOD 293
put. signal is centered on zero frequency. In this case, the signal part of
the output autocorrelation function would be
Rs
o
(tl,tl) .. ROO(tl,tl)
On the other hand, when the device is a nonlinear amplifier,
R
s
(ll,lt) = 2R
I O
(t
1
,tt) cos WeT
(1365)
(1366)
since then the desired output signal is centered on the input carrier fre
quency, i.e., on fe.
134. The Output Spectral Density
The spectral density of the output of a nonlinear device may be
obtained, as usual, by taking the Fourier transform of the output auto
correlation function. Consider first the case in which the input to the
nonlinear device is an unmodulated sine wave plus a stationary gaussian
noise. In this case, the output autocorrelation function is given by Eq.
(1353). Hence
8
11
(J) = hoo
I
8(J) + Lh..o
l
[8(/ +mi.) +8(/  mI.)] + Lh ~ 1 ,,8..(1)
.1 kl
+ rr~ I [,,8..(1 +mI.) +,,8..(1  mI.)] (1367)
.lil
where we have defined "S.(f) to be the Fourier transform of R"i(T):
"S..(J) ... 1: R.."(r) exp( j2...1.,.) d.,. (1368)
The first term in Eq. (1367) is an impulse located at zero frequency
corresponding to the mean value of the output. The set of impulses
located at mi. correspond to the periodic components of the output,
and the remaining terms are due to the output noise. As before, the out
put noise terms may be separated into two parts: the part representing
the interaction of the input noise with itself,
(1369)
and the part representing the interaction of the input signal and noise,
294 RANDOM SIGNALS AND NOISE
These two equations are, of course, simply the Fourier transforms of the
corresponding autocorrelation functions as given by Eqs. (1359) and
(1360), respectively.
Let us now determine the relation between ,8,,(/) and 8,,(/). When
k = 1, it follows from Eq. (1368) that
(1371)
(1372)
Further, when k ~ 2, R,,'(T) can be factored to give
f
+ OO
"S,,(!) = _. R"kl(T)R,,(T) exp( j2.../.,) d"
Hence kSn(!) may be expressed as a convolution of k
1
S,.(f) with 8,.(J):
,,8..(f) = t: 1118..(/')8..(1  I') dJ'
By repeated application of this recursion formula, we get
,,8..(f) = f+: t: 8.(/"1)8.(1t1  ItI)
8,,(!  !1) d!il dJl (1373)
Thus the spectral density ,,8,,(/) may be expressed as the (k  1) fold
convolution of the input noise spectral density with itself.
Consider next the case in which the input signal is an amplitude
modulated sine wave and in. which the input signal modulation and input
noise are stationary. The output autocorrelation function in this case,
obtained by setting T = t
1
 t2 in Eq. (1361), is
R,,(T) = RCSXs)(T) + RCNXN)(r) + R
C8 X N
)(T) (1374)
where, from Eqs. (1362), (1363), and (1364),
and
.
R(sxs)("') = LmR...o(1' ) cos 1n<Al..r
".0
= Lir ROI:(t1,t,)Rnl:(r)
i2
(k even)
which becomes, on expanding R,/'(T) from Eq. (1387),
Since both Ro,,("') and R.i(T) correspond to lowfrequency fluctuations,
the terms in the first series correspond to spectral terms concentrated
about zero frequency, whereas the terms in the double series correspond
to spectral terms centered on 2/
c
S (k  2r)!c kjcJ k 2. Only the
first set contributes to the detector output z(t); hence
(1388)
is the (NXN) portion of the detector output autocorrelation function.
The (8XN) portion of the detector output autocorrelation function
may be obtained similarly. Examination of Figs. 135 and 136 shows
that the noise bands in 1c8.(f) are spaced by twice the carrier frequency
and that these bands extend only over the frequency interval from  k/
c
to +kfc. It follows that. the noise bands in.S,,(! + m!c)and ,,8,,(/  mfc)
300 MNDOM SIGNALS AND NOl8B
are also spaced by twice Ie and that these spectral densities have bands
about zero frequency only when (m +k) is even and m ~ k. The only
(SXN) terms of the autocorrelation function of the device output which
can contribute to the detector output are therefore given by (from Eq.
(1364) ~
k
IfcsxNI(tl,t2) = 2 2: L:, R.I:(t1,t2)R.I:(r) C08 mtAJcT
iI ",1
(m+' even).
The (SXN) terms of the autocorrelation function of the detector output
may now be determined by substituting in this equation for R,/'(r) from
Eq. (1387) and picking out the lowfrequency terms. In this way we get
The sum of Eqs. (1388) and (1389) gives the autocorrelation function
of the detector output noise. If the detector lowpass output filter is not
a zonal filter, the methods of Chap. 9 may be used, in conjunction with
Eqs. (1388) and (1389), to find the actual autocorrelation function of
the detector output noise.
NonUnearAmpWlera. As a second example, let us consider a nonlinear
bandpass amplifier. The amplifier nonlinearity may be undesired and
due to practical inability to obtain a sufficiently large dynamic range,
or it may actually be desired as in a "logarithmic" radar receiver or an
FM limiter. In either case, it is important to determine the effect of
the nonlinearity. As in the preceding sections of this article, we shall
assume that the system input is the sum of an amplitudemodulated sine
wave and a stationary gaussian noise whose spectral density is concen
trated in the vicinity of the carrier frequency of the input signal.
Since the amplifier is nonlinear, it may generate signal and noise com
ponents in the vicinity of zero frequency, the input carrier frequency i;
and all harmonics of fe. Since the system is supposed to be an U ampli
fier," however, we are interested only in those components in the vicinity
of fe itself. For convenience, let us represent the nonlinear amplifier by
a nonlinear device in cascade with & bandpass zonal filter; the system
function of the filter is assumed to be unity over the frequency interval
centered about 10 and zero elsewhere. Other filter characteristics can be
accounted for by applying the methods of Chap. 9.
The techniques of the preceding several sections may now be used to
sort out those terms in the autocorrelation function of the output of tile
NONLINEAR DEVICES: THIS TRANSFORM KETROD 301
nonlinear device which contribute to the amplifier output. In this way
it can be shown that
(1390)
is the (NXN) part of the autocorrelation function of the amplifier out
put and that
i+l
(Ie + l)R...(tl,t l)R"I: (T)
R.CUNl(tl,t,) = '' '' (k + m + 1) (Ie  m + 1)
11.1 2 f 2 I
(_+i odd)
COS [Col
c
,. + 8(,.)] (1391)
is the (SXN) portion of that autocorrelation function.
138thLaw Detectorst
The results of the several preceding articles are general in that the
transfer characteristic of the nonlinear device in question has been left
unspecified. Let us now consider a specificexample, the ,thlaw detector.
We shall assume that the detector consists either of a halfwave vthlaw
nonlinear device or of a fullwave (even) J'thlaw device, followed by a
lowpass zonal filter. The transfer characteristics and transfer functions
of the J'thlaw devices were discussed in Art. 132.
Let the input to a vthlaw detector consist of the sum of an amplitude
modulated sine wave and a gaussian noise. If the nonlinear device is a
halfwave I'thIaw device, the autocorrelation function of its output is
given by Eq. (1350) where the coefficient h",Ic(t.) in R
mlc
(tl,t
2
) is
hmJ;(lt) = ar(;lI'] 1) Ie. w".lI..(wP.) exp dw (l392)
which follows from Eq. (1348) on substituting for few) from Eq. (1316).
If, on the other hand, the nonlinear device is a fullwave (even) vthlaw
device, it follows from Eqs. (1317) and (1318) that
{
0 for m + k odd
hmJ:(ti). = 2h..(4) for m + k even (1393)
where h..(t.) is given by Eq. (1392). When the input noise is narrow
band, it follows from the preceding article that the only coefficients h",,,(ta)
which contribute to the detector output are those for which m + k is even.
1Of. Rice (I, See. 4.10) and Middleton (II, Bee. 4).
302 RANDOM SIGNALS AND NOISE
Wk,l!m(WP,) exp(ai
2w 2
/
2)
around the rectangular contour
shown in Fig. 137. Let II be the
integral along the line w = e + jv
from v = {3 to v = +(j; let I'l be
the integral along the line
w=u+i{3
u
wu+jv plaIJ8
The only difference between the autocorrelation function of the output
of a halfwave vthIaw detector and that of a fullwave J'thIawdetector
is therefore a factor of 2
2
= 4. t
Evaluation of the Let us now evaluate the coefficients
as given by Eq. (1392) for the vthIaw devices. The method to be used
is essentially the same as that used
in Art. 132 to evaluate the coeffi
cients C(v,m).
Consider the integral of
j(j
C
4
FIG. 137. Contour of integration.
from' U = E to u = 0; let 1
3
be the
integral along the line w = jf) from
" = (j to f) = and let I. be
the integral along w = u  j/j from
u = 0 to U = f. Since, from Eq.
(1327), the Bessel function I ",,(z)
varies as zm for small z, and since the exponential varies as ZO for small z,
there is no singularity of the integrand at the origin if we assume for the
moment that m + k  vI > o. The integrand then becomes ana
lytic inside, and on, the contour of Fig. 137, and it follows from the
Cauchy theorem that
(1394)
Consider the integrals I" and 1
4
It follows, using Eq. (1330), that
1
2
is given approximately, for large values of by
I = ar(v + 1) 1exp[uP, +(fi"/2(u
2
 exp[j({3Pi + d
2 t (u + jl3),+,i1c u
Hence
I
I I < ar(J' + l)e ( p. + (1",.2) exp( Ui
2
(j2/ 2) 0 as + 00
2  exp f, 2
for any m, k, JI since exp( ui
2
{32/ 2) vanishes faster than l/p+Ji
1l
can
t Of. Eq. (1337) in Art. 132
t cr Watson (I, Art. 13.3), or Middleton (II, Appendix III).
NONLINEAR DEVICES: THE TRANSFORM METHOD 303
increase as ~ + 00. It therefore follows that 1
2
. 0, and similarly
[4 ~ 0, as P+ 00. Hence, using Eq. (1394),
ar(v + 1) f+i. _ (u,2
W
' )
h...Ic(t.) = 21rj _j.. wi' oII...(wP.) exp 2 dw
sin (k + m  ,,) ~ t: ( 'v')
= ar(" + 1) 1r 10 v"lJ...(vPi ) exp  Ui
2
dv
when m + k  vI > o. The integral in this equation, Hankel's
exponential integral, t is (when m + k  v > 0)
~ .. v".IJ...(vP.) exp (  U , ~ 2 ) dv =
m
(Pi
2
/ 20i
2)2r[(k
+ m  v)/2] (k + m  , . Pi!) (1395)
2m1(u,2/2)(Ic.1/2 I
FI
2 ' m + 1,  2u.
2
where IF
1
(a;c;z) is the confluent hypergeometric function defined by the
series]
GO
\' (a),z' a z a(a + 1) Z2
IF
l
(a;c;z) = ~ (C)r
T1
= 1 +en+ c(c + 1) 21 + .. (1396)
,0
It therefore Iollows, using Eq. (1335), that
'"
ar(v + 1) (Plt/20,2)2
h,Il1c(t
i
) = 2m!r[1  (,"I, + k  v)/2](Ui'/2)(k')/'
IF. (m +;  ";m +1;  :::.:) (1397)
when m + k  vI> o. However, for fixed m and k, the function
hmk(ti)/r(V + 1) as defined by Eq. (1392) can be shown to be a single
valued analytic function of II for all II in any bounded region. Since the
right side of Eq. (1397) divided by r(v + 1) is also a singlevalued ana
lytic function of II for all JI in any bounded region, the theory of analytic
continuation permits us to extend hmk(li) as given by Eq. (1397) to all
values of II and so remove the restriction m + k  'V  1 > o.
It should be noted that, since Ir( n)1 = 00 for n = 0, 1, 2, ... ,
the coefficient hmk(ti) vanishes whenever m + k  JI is an even integer
greater than zero.
t Watson (I, Art. 13.3) or Magnus and Oberhettinger (I, Chap. 3, Art. 7).
t See Whittaker and Watson (I, Chap. XVI), or Magnus and Oberhettinger (I,
Chap. VI, Art. I). Convenient tables and plots for the confluent hypergeometric
function are given by Middleton and Johnson (I). Some plots, using the symbol
M(a,'Y,:Z:), are also given in Jahnke and Emde (I, Chap. X).
304 RANDOM SIGNAls AND NOISE
Output Signal and Noise. We shall assume in the remainder of this
article that the input to the 11thlaw detector is the sum of an amplitude
modulated sinewave signal and a stationary narrowband gaussian noise
whose spectral density is concentrated about the carrier frequency of the
signal.
If we further assume, for the moment, that the input sine wave is
unmodulated, the mean value m
o
of the detector output is, from Eq.
(1356),
mo =
where, from Eq. (1397) for the halfwave sth law detector,
ar(., + 1)0''' (., P/J)
hoo(t.) == r(v/2 + 1)2C, +2)/2
1F 1
 2;1; 202
(1398)
(1399)
If we now vary slowly the amplitude of the input sine wave, the output
mean value will also vary slowly. We shall define the output signal to be
the variation of the output mean with respect to its zero signal valuer] i.e.,
8o(ti) =  [hoo(t
,)
)PiO
which becomes, for the halfwave .,thIaw detector,
(13100)
(13101)
The output signal power is defined to be the mean square of the output
signal:
So(t.) = E(  [hOO(ti)]P,o) 2) (13102)
which becomes a constant when the input amplitude modulation is sta
tionary. The output signal power from a halfwave vthlaw detector
then is
S.(t.) == t;];:2E {[ IF
I
(   1T} (l3103)
The (NXN) portion of the detector output autocorrelation function is,
from Eq. (1388),
.
ROCNXN)(tl ,tS) == 2: (k )2 E[hOI:(t 1)h
OA
(ts)]R.I:(7')
.2 I 21:
(i even) 2
where, from Eq. (1397) for the halfwave vthlaw detector,
h ('.) = arC. + l)l
Fl[(k
 ,,)/2;1; Pi
l
/
2u2
]
eM: "i 2r[1  (k  v)/2](0'1/2) (1;.)/1
(13104)
(13105)
t JU8t how output ,ignal is to be defined depends strongly on the particuiar applica
tJou. Tllift definition is useful for the fonowing analyaes.
(13108)
NONLINEAR DEVICES: THE TRANSFORM METHOD 305
The corresponding noise power output is, on Bettini tt lila tl in Eq.
'13104),
tr
t(
+ I) t. E(.Flt[(k  JI)/2;1; PN20'IJI
N.CNZN)(t.) = a ;0+1 0' k (k/21)tr
t[1
 (k  ..)/2]
i2
even)
(13106)
since R.(O) = R,,(O) == cr
l
The terms k > II in Eq. (13104) and (13106)
vanish when, is an even integer because of the factor I'[l  (k  11)/2].
The (BXN) part of the detector output autocorrelation function is,
from Eq. (1389),
k
R (t t)
= \' \' E[h"'k(tt)h"'k(t
2)]R.J:(T)
(13107)
o(8XN) 1, 2 k k (k + m) ,(k  m) ,iI
iI'"1 2 2 .2
(",+k even)
where h.J:(t,) is given by Eq. (1397) for the halfwave vthlaw detector.
The corresponding noise power output is
alr
S("
+ 1)t1
1
1'
No(8ZN)(t,) = 2.+
1
1 E IFlt (m +:  JI;m +
(
ila1 m. 2 2 2
(_+' even)
The terms m + k > "in Eqs. (13107) and (13108) vanish when I' is an
even integer because of the factor r[1  (m + k  1')/2].
The total noise power output from the detector is given by the sum of
Eqs. (13106) and (13108). It should be noted from these equations
that the total detector output noise power is independent of the shape
of the input noise spectrum. t This fact is, of course, a result of the
assumptions of a narrowband input noise and a zonal output filter.
Small Input Signaltonoise Ratios. These expressions for output signal
and noise powers are rather complicated functions of the input signal
and noise powers. Considerably simpler results may be obtained for
either very small or very large values of the input signaltonoise power
ratio
(13109)
When the input signaltonoise. power ratio is very small, it is con
t MidcUetoD. (II, Sec. 3).
306 RANDOM SIGNALS AND NOISE
venient to .expand the confluent hypergeometric function in series form
by Eq. (1396). In this way we get, from Eq. (13103),
a1r
l("
+ E(P,.) [8' ]2
8.(li) == 2>+'r'(,,/2) E'(P,') N (li)r (13110)
The output signal power of a vthlaw detector hence varies as the square
of the input signaltonoise power ratio for small values of the latter.
The (NXN) part of the output noise power becomes, from Eq. (13106),
(I'f
l
( ., + 1)0'1. 1
N.(NZN)(li) = 2.+2 (1e)1 ( Ie _') (13111)
i2  I r
l
1  
2 2
which is independent of the input signaltonoise power ratio. The
(SXN) part of the output noise power becomes, on using Eq. (1396)
in Eq. (13108),
E(p2"') [S ]'"
. _ a'r'(" + 1)0'20 EM(P,') N (t,)r
No(UN>(t.)  2>+1 (k + m) (k  m) ,iI
il",1 2 ! 2 .2
(m+i even)
For small values of the input signaltonoise ratio, the dominant terms in
this equation are those for which m = 1. Hence, approximately,
a'r
2
( " + 1)0'2. [8 ] 1
No(UNI(li) == 2>+1 (k + 1) (k..,.. 1) _
iI ' I  12
11
1
(A; odd) 2 2
(13112)
A comparison of Eqs. (13111) and (13112) shows that the output noise
is primarily of the (NXN) type when the input signaltonoise ratio is
very small.
On combining the above expressions, the output signaltonoise power
ratio is found to be
8
when N (tc), << 1 (13113)
(13114)
is a function only of ., and the probability distribution of the input
amplitude modulation. Since the same ratio is obtained for the full
NONLINEAR DEVICES: THE TRANSFORM METHOD 307
wave .,thlaw detector, the output signaltonoise power ratio for a 11thlaw
detector is proportional to the square of the input signaltonoise power
ratio for small values of the latter and all values of v. This is the U small
signal" suppression effect.
Large Input SignaltoNoise Ratios. When the input signaltonoise
power ratio is very large, it is most convenient to expand the confluent
hypergeometric function in the asymptotic aeriest
00
F (a.c.z) = r(c) (a)r(a  c + l)r
1 1 " I'{c _ a)zG'' rtz
r
rO
= r(c) [1 + a(a  c + 1)
I'(e  a)zG Z
+ a(a + l)(a  c + l)(a  c + 2) + ...] (13115)
2z
2
On substituting Eq. (13115) into Eq. (13103), we get
a
2
f 2(v + 1)(12" E(P
i
2,, ) [S JP
8
o
(t;) = Jl4r4(JI/2)2r2 E'(P
i
2) N(t;)1 (13116)
The (NXN) part of the output noise power becomes, approximately,
on substituting Eq. (13115) into Eq. (13106) and picking out the
dominant term,
a
2
r
2
(J' + 1)(12. [S ]..2
No(NXNl(t;) = r4(JI/2)2O+2 E(2) (P
i2)
N (t;)1 (13117)
and the (SXN) part of the output noise power becomes, approximately,
on substituting Eq. (13115) into Eq. (13108) and keeping the dominant
term,
a
2
r ' (v + 1)(12. E[P,2(,,l)] [8 ]pl
No(BXNl(t;) = Jltr4(JI/2)2.t Elrl)(P.2) N (t;)r (13118)
A comparison of Eqs. (13117) and (13118) shows that the output noise
is primarily of the (8XN) type when the input signaltonoise ratio is
very large.
On combining the above expressions, the output signaltonoise power
ratio is found to be
(t;). = K[JI,p(P.)] (t.)1]
where the constant
S
when N (",), > > 1 (13119)
2
K[JI,p(P.)] = Jl2 E(Pit)E(P.trt)
, Magnu8 and Oberhettinger (It Chap VI, Art. 1).
(13120)
308 RANDOM SIGNALS AND NOISE
is a function only of " and the probability distribution of the input
amplitude modulation. Thus the output signaltonoise power ratio for
a 11thlaw detector is direcUy proportional to the input signaltonoise
power ratio for large values of the latter and all values of 11. Hence all
11thlaw detectors behave in essentially the same way as the fullwave
squarelaw detector in so far as ratios of signaltonoise power ratio are
concerned.
137. Problems
1. Show by direct evaluation of the contour integral t that
I+i
l J[arC., +1)] d _ {az'
27rj W,+"l e 10 0
Ii
where > 0, a is real, and v is real and nonnegative.
t. Let the input to a nonlinear device be
x(l) lI:I ,(t) +net)
when % > 0
when % s 0
(13..121)
(13122)
where ,(t) and net) are sample functions of independent real gaussian processes with
zero means and variances erl(,,) and ,,1(n,) , respectively. Show that the autocorrela
tion function of the output of the nonlinear device can be expressed as
 
RII(lI,'I)  l l .. (ta)1R.i('I,tl)R...(t1,tl)
ia .0
where the coefficients ht.(li) are
1 ( (".tw
l
)
hlJ",(lt) = 2rj lc I(w)wi+ exp T dw
(13123)
(13124)
where few) is the transfer function of the nonlinear device and cr,' .. "I(a.) +crt(n.).
3. Let the nonlinear device of Probe 2 be an ideallimiter with the transfer character
istic
g(x) =
Show that the coefficients in Probe 2 become
for z > 0
for z  0
for z < 0
(13125)
4. Let the input to a nonlinear device be
for k +m odd
for k + m even
(13126)
%(t)  cos (pt +II) + A C08 (qt +til)
t Of. Whittaker .nd Watson (I, Art. 12.22),
(13127)
NONLINEAR DEVICES: THE TRANSFORM METHOD 309
where Iq  pi << p, where , and are independent random variables each uni
formly distributed over the interval (0, 211"), and where A is a constant. Show that
the autocorrelation function of the output of the nonlinear device can be expressed 88
. .
R.(T)'" L La..aJ,t,u cos mpr cos kfJT
",0 iO
where !'" and ea: are Neumann numbers and the coefficient h
m
ll is given by
h"'l &'# 2 ~ Ie f(w)I",(w)Il(wA) dw
(13128)
(13129)
whcref(w) is the transfer function of the nonlinear device and where 1.(,) is a modified
Bessel function of the first kind.
I. Let the nonlinear device of Probe 4 be an ideal limiter with the transfer charac
teristic given by Eq. (13125). Show that the coefficient h
m
1c in Probe 4 becomes
"'+:1 ar (m ~ k) IF! (m ~ k,m ; kim + 1; ll)
( 1)
rA"'r (1 + k~ m)m!
for m +k odd and A > 1
",+kl
(_1)2
m+kl
(_1)2
o
a
1r e~ m) r (1 + k~ m) r (1 + m; k)
for m + k odd and AI
aAlr ( ~ ) tF! ( ~ ' Y j k + I;AI)
r (I + m ; k)kl
for m +k odd and A < 1
for m + k even
(13130)
where J\(a,b;c;,) is a hypergeometric function. t
8. Let the nonlinear device of Probs. 4 and 5 be followed by a bandpass zonal filter,
centered about p/21r, to form an ideal bandpass limiter, Show that t.he filter output
z(t) has the a.utocorrelation function
2a' [ 1 1 ]
R.(T) = 7 A' cos pT +4 cos qT +At cos (2q  p)T
when A > > 1, and
2a'
R.(T) = ;t [4 cos PT+AI C08qT +At cos (2p  q)T)
(13131)
(13132)
when A < < 1.
T. Derive Eq. (1390) and (1391).
t Of. Magnus and Oberhettinger (I, Chap. II) or Whittaker and Wateon (I, Chap.
XIV).
310
RANDOM SIGNALS AND NOISE
8. Let the input to a halfwave linear detector be an unmodulated sine wave plue a
stationary narrowband real gaussian noise. Show that the mean value of the detec
tor output is, approximately.
{
acT"
m. _ V2;
aP
.".
when ~ . << IT.'
pi
when 2 > > tTfll
(13133)
where a is the detector scaling constant, P is the sinewave amplitude, and (Tnt is the
noise variance.
9. For the detector, and input, of Probe 8, show that the detector noise output power
can, to a reasonable approximation, be expressed as t
(13134)
10. Let the input to a fullwave (odd) t 11thlaw device be the sum of an amplitude
modulated sine wave and a stationary narrowband real gaussian noise. Show that
the autocorrelation function of the device output is
GO ..
R.(t1.t,) == L L4:i E[h"'k(tl)h..k(t t)R.,l(T) cos mW.T
",0 ia
(m+i odd)
(13135)
where ss, is the Neumann factor, R,,(T) is the autocorrelation function of the input
noise, Ie .. (,)c/2... is the carrier frequency of the signal, and the coefficients h"'k(l.) are
given by Eq. (1392) and hence by Eq. (1397).
11. Let the fullwave (odd) .,thIaw device of Prob. 10 be followed by a zonal band
pass filter, centered about fe, to form a .,thIaw nonlinear amplifier. Show that
~ (Ie).  C'(r) [ ~ (t.),]
where the constant C'(.,) is
when"~ ("h << 1
(13136)
1
e'(.,) II: 1 (13137)
riC t P)[ .fl r
t
(I  k ; r)e ~ I),e ; 1)1]
(k odd)
and hence that the output signaltonoise power ratio for a J'thIaw nonlinear amplifier
is directly proportional to the input signaltonoise power ratio for sniiLll values of the
latter and all values of v,
11. For the .,thlaw nonlinear amplifier of Probe 11, show that
8 [ ..s ]
N (tt).  K'{."p(P,)] IV (ti),
f Of. lUee (I, Art. 4.10).
t Of. An. 132.
I Of. Al't. l36.
S
when N (li), > > 1
(13138)
NONLINEAR DEVICES: THE TRANSFORM METHOD
where the constant K'[",p(P,)] is
, 2 E(P,")
K (",p(PdJ .. 1 + ,,1 E(Pil)E(P/IPI)
311
(13139)
and hence tbat the output power ratio for a vthlaw nonlinear amplifier
is directly proportional to tbe input signaltonoise power ratio for largevalues of the
latter and all values of ".
11. The vthlaw nonlinear amplifier of Probs. 11 and 12 becomes an ideal bandpass
limiter when " .. o. Let the input to an ideal bandpass limiter be the sum of an
unmodulated sine wave and a stationary narrowband real gaussian noise. Show
thatt
and that
when << 1
when .?> 1
(13140)
(13141)
where (8IN). and (8IN), are, respectively, the output and input signaltonoise
power ratios.
1'. Show that a cascade of N ideal bandpass limiters is equivalent to a single ideal
bandpass limiter.
t Ct. Davenport (I).
CHAPTER 14
STATISTICAL DETECTION OF SIGNALS
In radio communications and in radar, a signal meant to carry intelli
gence to a user is always partly masked and distorted in transmission
before it is made available at the receiving end. Some of this distortion
is due to natural causes which cannot be removed. For example, thermal
noise cannot be avoided in any receiver; seaclutter return cannot be
avoided for a radar looking at objects on the ocean; distortion of signal
waveforms by multipath propagation cannot be avoided under certain
conditions in longdistance radio communication. Thus, even after care
ful engineering of all parts of a communications or radar system to mini
mize the disturbing influences, there remains some signal distortion, which
can deprive the user of at least part of the intelligence carried by the
signal.
The question naturally arises, then, as to how a received signal can be
processed to recover as much information from it as possible. If a signal
is perturbed in an unknown way by an agent which behaves with some
statistical regularity, as those mentioned above do, it is appropriate to
apply a statistical analysis to the problem of how the user should process
the received signal. This chapter contains a preliminary treatment of
the subject of statistical analysis of received signals. Certain statistical
procedures are introduced and then applied to a few typical examples
from radio and radar. No attempt is made to discuss the practical engi
neering aspects of these radio and radar problems, and no attempt is
made to catalogue those problems for which a satisfactory solution exists.
There is a connection between optimizing a receiver and optimizing the
form of the signals to be used, but we shall not discuss this second prob
lem at all. There is also a strong tiein between the optimization pro
cedures discussed here and those discussed in Chap. 11. We shall need
certain concepts and results from two parts of the theory of statistical
inference: the testing of hypotheses and the estimation of parameters.
These are discussed, necessarily very briefly, in Arts. 142, 143, and 144.
161. Application of Statistical Notions to Radio and Radar
Two examples will illustrate the applicability of statistics to radio
and radarreceiver design. First, suppose an FSK (frequencyshift key...
312
STATISTICAL DETECTION OF SIGNALS 313
Ing) radio teletype operates at a frequency between 5 and 15 megacycles
over a long distance, say, two to four thousand miles. The basic tele
type alphabet is a twosymbol alphabet, with symbols called mark (M)
and space (8). Each letter of the English alphabet is coded &8 a sequence
of five marks and spaces, e.g., A == MMSSS. In teletype operation,
each mark or space has a fixed time duration of T seconds. The FSK
teletype is a eystem in which a mark is transmitted as a carrier at a given
frequency fo for a duration of T seconds, and a space is transmitted as a
carrier at a slightly different frequency /1 for a duration of T seconds.
Thus, a letter of the English alphabet is broadcast as a signal8(t) given by
s(t) = A cos [w(t)t + ",(t)]
where
wet) == 2",/<)
q,(t) = q,(Ic)
i 0
for (k  I)T < t ~ kT, k = 1, 2, 3, 4, 5
for (k  l)T < t ~ kT
(141)
and where fell) is either /0 or fl, depending on the message, and .Ct) is a
fixed phase shift "0 if J(J;) = jo and .1 if j(lt) = fl.
At the frequencies and ranges indicated, there will be times when there
are multiple paths for the radiated waves to take between transmitter
and receiver. The different paths will introduce different delays and
different attenuations, and these delays and attenuations will change
with time. At the receiver there will be added, necessarily, antenna noise
and thermal and shot noise. Thus, if a transmitted signal is given by
s(t), if we assume that the path delays do not change appreciably over
the duration of a message, and if we neglect other disturbing influences,
such as frequency dependence of the path parameters and the small
doppler shifts of frequency, the received signal is given by a function
y(t) = al (t)8(t  'Tl) + a2{t)8(t  'Tt) + + aN(t)8(t  "N) + n(t)
(142)
where n(t) is stationary gaussian noise. The path delays e., . . . t "11
and the path attenuations 41(t), . , (IN(t) are not known exactly;
there mayor may not be some information about them. Even the num
ber of paths N may not be known.
What the receiver must do is decide for each time interval (k  1)
T < t ~ kT whether the transmitter was sending a mark or a space. i.e.,
whether during that time interval the transmitter frequency was /0 or fl.
This is a statistical decision problem; one of two alternative hypotheses
must be true: that the frequency is 10 or that it is!I. Adecision between
these alternatives must be based on imperfect observations, with only
statistical information available about n(t) and perhaps not even that
about the 4i(t) and the ",.
Even without analyzing the problem in detail, one can make the follow
314
RANDOM 8IONALS AND NOISE
ing observations. If the largest difference between the delays 'r, is small
compared to T and the ai(t) do not change much in time T, then except
for a little interference from the previous symbol at the beginning of each
new symbol, the effect of the various delays is just to add sine waves
of the same frequency but different phases. Since the receiver needs
only to measure frequency, the addition of the different sine waves does
not distort the signal form in a significant way and the additive noise is
essentially the only source of error. However, the different sine waves
may combine for certain values of the ai(t) in such a way as to cancel;
consequently there is very little signal energy at the receiver and the
noise n(t) completely masks the signal. In this situation, optimizing the
receiver can accomplish little; if there is no signal present at the input
to the receiver, no amount of sophistication of design in the receiver can
recover it. On the other hand, if T is decreased (resulting in an increase
IAI/I cas We t ~ __I_
FIG. 141. Transmitted signal from a pulsed radar.
in the rate at which messages can be sent) to a point where the differ
ences between the 1'. are comparable with or even greater than T, then in
addition to the possible cancellation effect, the signal is smeared out over
an observation interval. There will then usually be present at the
receiver components of both mark and space frequencies. In this situ
ation, if some information is available about the ai(t) and the n, a receiver
which is optimum according to appropriate statistical criteria may work
a good deal better on the average than a receiver which is not optimum.
Although not much can be done at a single receiver about fading of
signal level due to cancellation in multipath propagation, this effect can
be countered to some extent. One thing that can.be done is to operate
on two different frequency channels simultaneously, i.e., to use socalled
frequency diversity. Then a combination of path lengths which causes
cancellation at one wave length probably will not at the other. Thus
there will be two receivers providing information from which to make a
decision. This leads to a statistical problem of howto weight the infor
mation from each receiver in arriving at a decision.
As a second example, let us consider a narrowbeam scanning pulsed
radar used to locate ships at sea. The carrier frequency is, say, several
thousand megacycles, and the pulse length is a few microseconds. If
A(t) is the modulationa periodic sequence of pulses, as shown in Fig.
141the signal emitted from the transmitter can be represented by
s(t) z: A (t) cos wet
STATISTICAL DETEcrlON OF SIGNALs 315
The signal radiated in a fixed direction increases to a maximum as the
scanning antenna sweeps by that direction and then falls off essentially
to zero. This imposes a second and relatively slowly varying modu
lation on the signal. Letting G(t) be the antenna (voltage) gain in a
fixed direction as the antenna rotates, the signal radiated in one direc
tion is given by
G(t)A(t) cos wet
If the radiated energy is reflected from a fixed object at a distance r from
the transmitter and received on the same antenna, the echo signal received
at the radar is given by
~ a (t 0G(t)G (t  2:) A (t  ~ ) cos w. (t  ~ )
where c is velocity of propagation and a(t) is proportional to the ratio of
incident signal field strength to the field strength of the signal reflected
back to the radar at time t by the object. Now if the object is moving
in a radial direction with constant velocity v, there is a doppler shift of
the frequency spectrum of the signal, which, since the modulation is
narrow band, can be accounted for approximately by replacing We by
w. (1  :V)' Then, adding the noise net) at the receiver, the total
received signal is approximately
Yet) = ~ a (t  ;) G(t)G(t  1')A(t  1') cos [(we + Wtl)t  1'W.] + net)
(143)
where
2r
".=
C
2v
CA)tI =  c CA'c
If there are other objects or sea clutter illuminated at the same time,
then during the time interval that the return from the first object as
given by Eq. (143) is not zero, there will be other received signals of
the same form as the first term in Eq. (143) added in. Sea clutter will
cause the superposition of a great many such terms, each of relatively
small amplitude.
The usual function of the radar is to detect the presence of an object
and determine its position in range and azimuth. The data available
at the receiver are signals which have been corrupted by random noise
and perhaps by seaclutter return, which is also random. Thus a decision
has to be made from statistical data; and if it is decided an object is
present, the parameter's range and azimuth have to be estimated from
316 RANDOM SIGNALS AND NOISE
'0
'0
(eI)
rei
'0 #0.
Transmitted silnlls
(a) 'I ..
(61
Space of possible received
signals from 'oarea ~
Space of possible received
signals from 'Iarea ~
FIG. 142. Decision situatioDs at the
receiver.
statistical data. The problem is how to use any information avail
able about the statistical nature of the noise and sea clutter, together
with the knowledge of signal structure, to make the detection and loca
tion of the object as sure and precise as possible.
Of course, many different situations are encountered in radio and radar
reception with respect both to the nature of the signal and to random
disturbances of the signal. Common
to all, however, is the presence of addi
tive gaussian noise caused by thermal
effects. Since often the additive
gaussian noise is the only random
disturbance of any importance, con
siderable emphasis has been placed
on signal detection and extraction
problems in which the received sig
nal is the same as the transmitted
signal (or a multiple of it) plus
gaussian noise. Generally speaking,
any random signal perturbations
other than additive gaussian noise
complicate the receiver problem
considerably.
However, it is possible to havea
signal present at the receiver which
has been II hashed up," by multipath
propagation, for example, but which
is strong enough completely to over
ride the thermal noise. In such a
case the reception problem mayor
may not be really a statistical prob
lem. For example, in FSK radio
teletype communication, if we idealize the situation by supposing the
additive gaussian noise to be zero, it may be that the mode of propagation
is such that a received signal, although distorted, could have come from
only one of the two possible transmitted signals. Then there is no ambi
guity at the receiver in deciding whether a mark or space was sent.
Symbolically, for the simple case of pointtopoint communication using
a twoletter alphabet, the possibilities for ambiguity at the receiver are
as shown in Fig. 142. The diagram of Fig. 142a is meant to indicate
that the transmitted signal 80 is transformed into one received signal Yo,
and 81 into 1/1. Thus, although yo and Yl differ from 80 and 81, they are
fixed and distinct; consequently, if sufficient knowledge about the trans
mission mediumthe channelis available at the receiver, there need
never be any ambiguity in reception. In general, if there is a fixed 1: 1
STATISTICAL DETECTION OF SIGNALS 317
transformation carrying transmitted signals into received signals,.. the user
can tell exactly what was transmitted. In Fig. 142b, both possible trans
mitted signals are perturbed randomly so that there is a whole set of
possible received signals that might have come from 80 and a whole set
that might have come from 81t but these sets have no elements in common.
Again, a sure decision can bemade at the receiver as to which signal was
sent. The statistical problem of deciding between 80 and 81 on the basis
of .received data is said to be singular in these cases where a correct
decision can always be made. In Fig. 142c there is a set of possible
received signals which could arise either from '0 or 81, and in Fig. 142d
any received signal could have been caused by either 80 or 81. Always,
in this last case, and sometimes in the previous one, & nontrivial decision
has to be made, after the received signal is observed, as to whether 80 or
81 was sent.
142. Testing of Statistical Hypothesest
This article and the two that follow contain some basic statistical
principles and procedures presented with no more than passing reference
to communications or radar. These principles and procedures cannot be
presented in detail here, of couree.] Let us start by discussing the prob
lem of testing between two alternate statistical hypotheses, the simplest
case of the theory of testing of hypotheses.
The situation is this: An event is observed which could have been pro
duced by either of two possible mutually exclusive causes, or which is a
concomitant of one of two possible mutually exclusive states of nature.
The hypothesis that a particular one of the two possible causes was
responsible for the event is designated by H
o
, and the hypothesis that
the other possible cause was responsible, by HI. It is desired to estab
lish a reasonable rule for deciding between H0 and Hh that is, for deciding
which possible cause of the observed event is, in some sense, most likely.
The decision rule is to be laid down before the event is observed. Some
knowledge about the causes and their connections with the events that
can occur is obviously necessary before a useful decision rule can be for
mulated. In statistical decision theory this knowledge is taken to be
in the form of given probability distributions.
t The statistical treatment of signal detection and extraction problema can be uni..
tied and generalized by using the statisticaldecision theory associated with the name
of Walde Of. Middleton and Van Meter (I).
~ The reader is referred to Mood (I) for general statistical background. The
chapters on hypothesis testing and estimation are particularly pertinent.
I A comprehensive treatment of the subject of hypothesis testing on a more ad
vanced level than in Mood is given in Lehman (I). There is a large body of published
material on this subject; some of the early papers of Neyman and Pearson, who are
responsible for much of the development of thesubject, may be particularly interesting,
e.I., Neyman and Pearson (I) and (II).
318 BANDOM SIGNALS AND NOISE
(145)
(146)
(144a)
(144b)
To be more precise, we postulate a space Y of all possible values of
some observed variable (or variables) which depends on two mutually
exclusive causes Co and C
I
, one of which must bein effect. We assume
that if Co is in effect, there is a known probability law Po induced on Y
which determines the probability that the observed variable takes on
any specified set of values. Similarly, we assume that if CI is in effect,
there is a different known probability law PI induced on Y. We shall
suppose that these probabilities have probability density functions Po(y)
and PI(Y), respectively, for points Y in Y. In addition we sometime
assume that there are known probabilities 11"0 and 11"1, where To + rl = 1,
for the existence of the causes Co and C
1
The problem is to establish
a decision rule for choosing H0 or HI. This amounts exactly to partition
ing the space Y into two parts, 80 that for any observation y we can say
that we believe either Co or C
1
to be the true cause.
Bayes' SolutionA Posteriori Probability. Consider the problem for
mulated above with known probabilities 11"0. and rl for Co and C
I
These
probabilities are usually called a priori probabilities. If M is a set of
values of y, by Bayes' rule the conditional probabilities of Co and C
1
,
given M, are
P(C 1M) = P(MICo)P(C
o)
o P(M)
P(C 1M) = P(MIC
1)P(C1
)
I P(Jil)
Letting M be the set of values y ~ " < y +dy and using the notation
introduced above, the ratio of these conditional probabilities is
P(Coly). _ Po(Y)'Jro
P(C1Iy)  Pl(y)'Jrl
The probabilities P(Coly) and P(C1Iy) are sometimes called a posteriori
probabilities of Co and C
1
An eminently reasonable rule for making a
decision in many examples is: once an observation y has been made,
choose hypothesis H
o
if the conditional probability P(Coly) exceeds
P(C1Iy); choose HI if P(C
1
Iy) exceeds P(CoIY)j and choose either one,
say H 1, if they are equal. That is,
accept hypothesis H
o
if poy > 11"1
PI Y ro
accept hypothesis HI if poy) ~ 11"1, or if Po(y) = Pl(y) = 0
PI Y 11"0
If there is no difference in importance between the two possible kinds of
mistakes, i.e., choosing H 0 when H 1 is in fact true, and choosing HI when
H0 is in fact true, then by adopting this decision rule an observer will
guarantee himself a minimum penalty due to incorrect inferences over a
long sequence of repeated independent observations.
STATISTICAL DETECTION OF SIGNALS 319
(148)
(149)
If the two kinds of error are not of equal importance, then it seems
reasonable to prejudice the test one way or the other by multiplying the
threshold value "'1/"'0 by some constant different from one. Exactly how
to do this shows up more clearly when we look at the same problem from
a different point of view in the following paragraph.
After a decision has been made, one of four possible situations obtains:
(1) H
o
may be true, and the decision is that H is true; (2) H true,
decision is HI; (3) HI true, decision is Hv; (4) HI true, decision is HI.
Each one of these situations represents an addition of true or false infor
mation to the observer's state of knowledge. Let us suppose that we
can assign a numerical gain or loss to each of these four possibilities.
We may as well assign a 1088 to each with the understanding that a nega
tive 1088 is a gain. Thus we assign 1088 values, which are numbers desig
nated by [.100, L
01
, L
10
, L
11
to each of the four situations listed above, in
the order listed. The expected loss to the observer is the average value
of the four 1088 values, each weighted according to the probability of its
occurrence. That is,
L = E[loss] = LLj"P(H
j
is true and H" is chosen) (147)
jO, 1
iO,l
The probabilities involved depend on the decision rule used. Thus, if we
denote by Yo that part of Y such that if the observation 'V belongs to Yo,
we choose H0; and if correspondingly we denote by YI the remainder of
Y, where we choose HI, Eq. (147) becomes
L = E[loss] = l Li"rjPj(Y.)
jO, 1
iO,l
where PJ(Y
t
) is the probability that y falls in Y
rc
if H, is actually true.
It is desired to choose Yo (and hence Y1 = Y  yo) 80 as to minimize
the loss. In order to do this we impose one highly reasonable constraint
on the 10s8 values Lj lc; we assume that whichever hypothesis is actually
true, a false decision carries a greater 1088 than a true decision. Thus
L
01
> u;
Llo > L11
Now, using the facts that
Pj(Y.) == r PJ(1I) d1l
[r,
and PJ(Y1) = 1  PJ(Y.)
we have, from Eq. (148),
L ~ LOITo + L
11
1('1
+ JYe [ro(I.101  Loo)po(y) +1I'1(L10  L
U)P1(y)]
dy (1410)
320 RANDOM SIGNALS AND NOISE
The first two terms in Eq. (1410) are constants and hence not affected
by the choice of Yo, 80 all that has to be done to minimize L is to choose
Y
u
so as to minimize the integral in Eq. (1410). This is easy, because
since the first term in the integrandis everywhere less than or equal to
zero and the second term is everywhere greater than or equal to zero,
it is sufficient to choose for Yothat set of points y where the first term in
the integrand is greater in absolute value than the second. That is,
Yo is the set of points y where
Po{y) > 1r'1{L
10
 L
ll
) (1411)
Pl(y) 1f'o(L
OI
 L
oo)
Y
1
is all the rest of Y, including those points 11 for which Po(y) = PI(Y) = O.
This criterion for a decision reduces to that of the inequality (146) if
L10  L
11
== L01  Loo But if L
1
0  L
11
== L
01
 L
oo
the minimization
which led to the inequality (1411) determines Yo so as to minimize the
overall probability of error. Thus, minimizing the probability of error
gives a decision rule which is identical to the following rule: Choose that
hypothesis H or HI with the greater a posteriori probability.
The solution to the problem of testing between two hypotheses given
by the inequality (1411), which we shall call the Bayes' solution, seems
to be satisfactory when it can' be applied. However, there are often
practical difficulties in assigning loss values, and there are both semantic
and practical difficulties in assigning a priori probabilities. Since only
the differencesL
10
 L
11
and L
01
 L
oo
enter into the Bayes' solution, one
can without sacrificing any freedom of choice always set L
11
= L
oo
= o.
Then it is necessary to give values to L
10
and L
01
which reflect the rela
tive loss to the observer of the two kinds of error. Obviously, this can
be hard to do in some circumstances, because all the implications of the
two kinds of wrong decision may not be known. For example, if the
problem is to test whether a radar echo indicates the presence of an
object at sea or not, it is sometimes a much more serious error to decide
there is no object present when one actually is than to decide an object
is present when there is none. But rarely can one say with assurance
that the first kind of error is a thousand times more serious than the second.
The meaning of a priori probabilities is a delicate question. When one
assigns a probability "'0, as we have used the term probability, to a
hypothesis Hs, he is taking H
o
to be an "event," i.e., a subset of a sample
space. Whether or not H0 turns out to be true should then depend on
the outcome of a random experiment. But actually, the truth or falsity
of H0 is often determined in a nonrandom way, perhaps by known natural
laws, or by the will of an individual; and the only reason there is a sta
tistical test at all is that the observer does not know the true state of
STATISTICAL l>ETECTION OF SIGNALS 321
things. In such a circumstance, no random experiment is performed
which has the hypothesis H0 as a possible outcome. When one assigns
a priori probabilities, he is introducing an apparent randomness to
account for his ignorance. Thus, in the example of reception of a radio.
teletype message, whether the symbol mark or space is actually trans
mitted is the result of a deliberate rational choice made by the sender.
One could perhaps justify attaching a priori probabilities to the symbols
in this case on the ground that the fact that a particular message is sent
at a particular time is the result of what may be regarded as a random
set of circumstances.
This brief remark indicates that the notion of an a priori probability
may not always fit with the usual formulation of probability as a weight
ing of subsets of a sample space. On the other hand, it can be main
tained that subjective probabilities, which represent degrees of belief,
however arrived at, can legitimately be used for a priori probabilities.
But even if one concedes that it is meaningful to postulate a priori
probabilities for hypotheses, he will most likely agree that in some situ
ations it is awkward to give actual values for 1"0 and 1'"1. In the radio
teletype example mentioned above, there is no practical difficulty; if H0
is the hypothesis that the transmitted signal is a space, then it is natural
to take for ro the relative frequency with which spaces are known to
occur in coded English text. But in a radar detection problem, if H0 is
the hypothesis that a ship lies, say, in a onemilesquare area located at
the maximum range of the radar and there is no recent auxiliary infor
mation about ship movements, then a believable value for ro is obviously
hard to determine.
To summarize, the Bayes' test, as given by the inequality (1411),
guarantees minimum expected loss to the observer. There is often diffi
culty in applying it, first because of difficulty in assigning losses, second
because of difficulty in assigning a priori probabilities. If the losses for
the two kinds of errors are set equal, the test reduces to the simpler
Bayes' test of (146), which minimizes the total probability of error, or
maximizes the a posteriori probability of the possible hypotheses. Let
us emphasize this last point. A reasonable mode of behavior for the
observer is to minimize his expected loss, i.e., to use the test specified
by the inequality (1411) if he has sufficient information to do so. If
he feels he cannot justify assigning particular loss values, then it is
reasonable (and common practice) for him to minimize probability of
error, i.e., to use the test specified by the inequality (146). But this
means his decision rule is exactly the same as if he had set the loss values
equal in the first place, even though he set up his test without con
sidering lOBS.
322 RANDOM SIGNALS AND NOISE
143. Likelihood Tests
Simple Hypotheses. If we are not willing to introduce a priori prob
abilities into a hypothesistesting problem, we cannot calculate overall
expected 1088 or probability of error. The decision rule must then be
based on some different idea. A more or less obvious basis for a decision
rule is given by what we call the likelihood principle, by which we infer
that the cause (or state of nature) exists which would more probably
yield the observed value. That is. if po(Y) > PI(Y), one chooses H
o
; if
PI(Y) ;::: po(Y), one chooses HI (where again the decision for the case
Pl(Y) = Po(y) is arbitrary). This can be restated as follows: take the set
Yo to be that set of values y for which Po(y) ~ 0 and
(1412)
Here a test is given which is not founded on considerations involving
a priori probability or loss and yet which gives the same decision rule
as the Bayes' test with "'1(L
I O
 L
I l
) = 1I"o(L
OI
 Loo).
Every decision criterion discussed to this point has involved the ratio
f)O(Y)/Pl(Y). This function of y, which we shall denote by ley), is called
the likelihood ratio. Since PO(Y)/Pl(Y) is not defined when both Po(y) == 0
and PI(Y) == 0, a special definition for l(y) is required for these points;
we take ley) == 0 when both Po(y) and Pl(Y) vanish. The likelihood ratio
has a central role in statistical hypothesis testing; in fact, it will transpire
that all the hypothesis tests we consider are likelihood ratio tests, some
times with a more general form of the likelihood ratio. Since ley) is a
function defined on the sample space Y (the space of observations), it is
a random variable. It has two different probability distributions. If
H is true, there is a probability law effective on Y given by Po(y), so the
probability distribution of l is
Po(l < a)  J 1'0(1/) dy
PtCw) <_
I'a(w)
(1413)
(1414)
If HI is true, the 'probability distribution of I is
P1(1 < a) = J 1'1(1/) dy
,.(w) <..
pa(w)
Before going further, it is convenient to introduce here a few conven
tional statistical terms that apply to hypothesis testing. 'I'hus far we
have considered only the testing of one hypothesis against one other
hypothesis. These individual hypotheses are described as simple in order
to distinguish this situation from one we shall discuss later, where socalled
STATISTICAL DETECTION 01' SIGNALS 323
composite hypotheses, whichare groups of simple hypotheses, are con
sidered. As has been made clear by now, any decision rule for testing
between two simple hypotheses is equivalent to partitioning the space of
observations Y into two parts Yo and Y1. If an observation y occurs
which belongs to Yo, the rule is to choose H
o;
if 11 belongs to Y
1
, the rule
is to choose HI. A rule is determined when either of the sets Yo or Y1
is specified, because the other set is the remaining part of Y. It is com
mon practice to discuss hypothesis tests in terms of one hypothesis Ho.
Thus, in terms of H0, Yo is calledthe
acceptance region and Y1 is called
the rejection region, or more com
monly, the critical region. If the
observation y falls in Y1, so that we
reject H0, when in fact H0 is true,
wesay we have made an error of the
first kind, if 1/ falls in Yo when H1
is true, we say we have made an
error of the second kind. The prob Ysample sDlce
ability of rejecting H0 when it is FIG. 143. Test between two hypotheses.
true, P
O(Y1
) , is called the level or
the size of the test. The probability of rejectingH0 when it is false,
Pl(Yl), is called the power of the test. These definitions are repeated in
Fig. 143.
As mentioned above, when there are no a priori probabilities given,
one cannot determine expected 1088 or overall probability of error and
hence one cannot establish a test to minimize either of these quantities.
The likelihood principle was pointed out as being one criterion on which
a test can then be based. Another criterion, which applies in different
circumstances but is perhaps more convincing, is to keep the probability
of error of the first kind less than or equal to a prescribed value and
minimize the probability of error of the second kind. A famous theorem
of Neyman and Pearsont states that & likelihood ratio test will satisfy
this criterion. Precisely, if " is a real nonnegative number, the critical
region Y
1
( ,,) which consists of all y for which Pl(Y)/PO(Y) ~ " gives a test
of H0 against HI of maximum power from among all tests of level ~
PolY1(")].
The proof of the theorem is as follows. Let a  PO(Y
1
) a: the level
of the test. Let T1be any region in Y for which PO(T
1
) S a. We shall
show that the power of the test which has critical region Y1 is greater
than or equal to the power of the test which has critical region TI. Let
U be the subset of Y which contains those points y in both T
1
and Y
1
(U may be empty), and let the notation Yl  U denote the set of points
t See Lehman (I, Art. 2.1) or Neyman and Pe&r80D (I, Art. III).
324 RANDOM SIGNALS AND NOISE
y in Y
I
which does not belong to U. First, for every point y in Y
I
, and
thus in Y
1
 V, Pl(Y) ~ 'lPo(y); hence
Pl(Yl  u) = / Pl(Y) dy ~ 1J / Po(y) dy = 1JPO(YI  U) (1415)
y.u y.u
Therefore
Pt(Y
I
) = P
1(Yt
 U) + Pl(U) ~ '1P
O
( Y
1
 U) + PI(U) (1416)
But
"PO(YI  U) + PI(U) = "PO(Y
1
)  1/P
o(U)
+ PI(U) (1417)
and since, by hypothesis
P
o
{T
1
) S a = PO(Y
1
)
it follows that
"PO(Y1)  "Po(U) + P1(U) ~ "PO(T1)  "Po(U) + PI(U) (1418)
Now
"Po(Tt )  'Po{U) + P1(U) = "Po(Tl  U) + Pl(U) (1419)
Since points in T
1
 U do not belong to Y
1
, Pl(Y) ~ 'Ipo(y) for all y in
T
1
 U. Hence
(1420)
(1421)
Finally,
PI(T
t
 U) + PI(U) = PI(TI)
The relations (1416) through (1421) combine to give
}l1(]'1) ~ P l ( ~ l )
which is what we wished to show.
One observes that as 1/ increases the size of the set Y1 decreases and
the level of the test, a == P o(Yt), decreases (or, strictly speaking, does not
increase). Other than this one cannot say anything in general about the
functional dependence of ex on 'I. In particular, a need not take on all
values between zero and one; that is, a as a function of ", may have
jump discontinuities. It may also be observed that the set Y1 may be
taken to be those values of y for which PI(Y)/PO(Y) > ", (instead of ~ )
and the statement and proof of the theorem remain unchanged, The
difference is that the level a corresponding to " may be different. If
Y
1
is chosen to be the set of y's for which P1(Y)/PO(Y) ~ "" then Yo is
the set where PI(Y)/PO(Y) < 'I, which is equivalent to PO(Y)/PI(Y) > 1/.".
By the preceding remark, this fact implies that Yo is a critical region of
maximum power for a test of HI against He at level PI(Y
o
) .
Ezample143.1. Let the apace of observations Ybe the real numbers. ~ t H0 be the
hypothesis that the observed values are distributed according to a gaussian distribu
tion with mean zero and variance one. Let HI bethe hypothesis that the observed
STATISTICAL DETECTION 01' SIGNALS 325
y
0.4
\1
leveloftestarea ~ ,1011+t
Power oftestarea ~
Flo. 144. Test for the mean of a gaussian distribution.
and
values are distributed according to a gaussian distribution with mean one and varianoe
one. (See Fig. 144.) We test H. against HI. Then
1 II'
P.(,I)  V2r exp 2
1'1(11)  _1_ exp [ _ (1/  I)']
y'2; 2
1'1(,1) _ exp [! (211  1)]
'0(11) 2
Since the logarithm is realvalued strictly increasing function wh"n its argument is
positive, we can replace the condition
1'1(1/) _ exp (II _!) ~ "
1'0(11) 2
by the equivalent condition
log 7'1(1/)  1/  ! > log"  'I'
1'0(,1) 2 
If ,,'  1/2, Y1 is then the set of all positive real numbers and Y. is the set of all
negative real numbers and zero. The level of the test is
CI  Po(Y.)  ~
The power of the test is
1 ( [ (V  1)1] 1 J (1/
1
)
Pl(Yl)  V21r Jo exp  2 dll  V21r 1 exp  2 dy
 0.841
In general, Yl is the let of numbers fI ~ ,,' + 1/2; the level is
and the power is
326 RANDOM SIGNALS AND NOISE
Ezampk 148.1. Let Y be the set of all real numbers from zero to three. Let po(r)
and 1'l(Y) be given by
Then
Po(y)  ~
0
1'1(')  )i
0
'1(1/) _ 0
po(U)
 1
 +ao
for 0 ~ tI S 2
otherwise
lor 1 S 11 S 3
otherwise
for 0 ~ 11 < 1
for 1 ~ 11 S 2
for 2 < '1/ S 3
If " > 1, Y1 is the set of numbers 2 < 'II S 3. The level of the test is zero and the
power is ~ ~ . If" S 1, Y1 is the set of numbers 1 S 11 S 3; the level is ~ ~ and the
power is 1. This is a very simple example of what is called a lingular hypothesistest
ing problem, where there is some subset of Y which has nonzero probability under
one hypothesis and zero probability under the other.
Example 149.3. The problem is to test between two Nvariate gau88iandistributions
with the same covariance matrices but different means. This is a test between two
simple hypotheses. The space of observations Y is Ndimensional vector space; i.e.,
each observation 'U is a vector 11  (YI, ,'UN). We suppose the coordinates are
chosen so that 'Yh , 'UN are uncorrelated random variables. The hypotheses are
N
Ho.Po(fll YN) _ 1 exp [ _ ! '\' {Yre  Gre)l] (1422a)
, , (211")
N1
'cr1 (TN 2 ~ (TicI
iI
N
H
P (y 'UN) = 1 exp [ _ ! ~ (fli  bi>l] (1422b)
I I 1,, (2...)Nll
v l
(TN 2 ~ vrel
~  l
thus a a: (ai, . . ,aN) is the mean of po and b  (bl ,
The likelihood ratio is .
,bN) is the mean of'l.
A likelihoodratio test is determined by choosing Y. to contain those fI  (111, ,fiN)
for which
We may as well consider the logarithm of 1("1, .. ,fiN), and it is convenient to do 80.
Then the test becomes
,tiN) >
10
1 "  'I'
This gives
'1424)
STATISTICAL DETECTION OF SIGNALS 327
as the inequality determining YIt Geometrically this inequality meaDSthat Y. is the
region on one side (if c is positive, it is the far side looking from the origin) of a hyper
plane in Ndimensional space. The hyperplane lies perpendicular to vector from
the origin with components
[
al  ba at  bl aN  bN]
~ , ~ , , tiNt
and is at a distance from the origin of
c
Since the y" are independent gaussian random variables,
(1425)
is also a gaussian random variable. If H0 is true,
(14264)
(1426b)
If H1 is true,
(1427a)
(1427b)
The error probabilities then are found by aubetituting the appropriate parameters in
a gaussian distribution. For example,
1 Ie {I (f  EO(f)JI}
level == PO(Yl ) .. _ / exp  2 I{I:) dE
v 2r tro{f)  10 tlo ~
where Eo(E) and cro2{f) are given by Eqs, (1426a and b).
(1428)
Composite Alternatives. Thus far we have considered hypothesis tests
in which each of the two hypotheses relates to a single possible cause of
the observations. That is, each hypothesis is simple. We now want to
consider hypothesis tests in which one or both of the hypotheses relate
to a whole set of possible causes, that is, in which one or both hypotheses
are composite. Let us denote by n a set of labels or indices of all the
possible simple hypotheses, 80 that.& simple hypothesis is represented by
328
RANDOM SIGNALS AND NOISE
H. where ell is an element of o. Let 0
0
be a subaet of 0 and 0
1
be the
set of all points of n not contained in 0
0
, 0
1
== n  0
0
Corresponding
to each CA), there is a probability law defined on Y with a density p,.,(y).
The composite hflPOthuiB H0 is that the probability density actually
governing the observations 11 is a p.,(y) with", an element of 0
0
; the
composite hypothesi. HI is that a density P..,(,,) is in effect with CA)' an
element of 0
1
This formulation includes the case of testing between
simple hypotheses; for if 0
0
and 0
1
each contain only one element, the
hypotheses are simple. If 0
0
contains only one element, but 0
1
contains
many, then the problem is to test a simple hypothesis H0 against a
composite alternative, a fairly commonly occurring situation in statistics.
In the problems in which we are interested, composite hypotheses
usually arise because of the existence of unknown parameters which
affect the observations. For example, in the radar problem alluded to
in Art. 141, H
o
might be taken to be the hypothesis that there is no
echo returned from objects in a certain area. Then HI is the hypothesis
that there is a returned echo. But because of the unknown size of the
reflecting object and the variability of propagation, the amplitude of the
returned signal is unknown. Since any observation made on the returned
signal is influenced not only by the presence or absence of an object but
also by the strength of the return if there is an object, H1 is a composite
hypothesis. The set n is composed of & set 00 containing only one
point ''0, which gives the simple hypothesis H0, and a set 0
1
containing
an infinity of points (corresponding to the infinity of possible amplitudes),
which gives the hypothesis HI.
If a probability law is given for 0, that is, if a set of a priori prob
abilities is given for the possible causes, then a minimum expected 10s8
or a minimum probability of error test (a Bayes' solution) can be obtained
in essentially the same way as in the case of simple hypotheses. In fact,
if 1r(w) is the a priori probability density on 0, 1r
O
P
O
(Yi ) is replaced in
Eq. (148) by
J.
P,., (Yi)1r(CA) dt"
o.
and 1t'
I
P
1
( Y
Al
) is replaced by
J.
P,.(Yk)r(,,) dc..J
01
k = 0, 1
Ie == 0, 1
The region Yo is then the set of points y where
/0, P..(y) (w) dw > LID  Lit
fOl P..(y) (w) dw L
DI
 L
oo
which is analogous to the condition (1411).
(1429)
STATISTICAL DETZcrlON OJ' SIGNALS 329
When no a priori probability distribution for n is given and one or
both of the hypotheses are composite, it is often hard to find a satis
factory decision rule. We shall discuss briefly two possible approaches
to the problem of testing hypotheses when there are no a priori prob
abilities and the hypotheses are composite: a direct application of the
maximumlikelihood principle, and maximization of the power of a test
when the level is held fixed. These two ideas were used in the preceding
section in the discussion of simple alternatives.
The maximumlikelihood principle, which is closely related to maxi
mumlikelihood estimation, is that the observer should choose that CAl
from n which renders the observation 'Y most likely; that is, given an
observation y, he should choose C&J so as to maximize Pw(y). Then if CAl
belongs to 0
0
, the observer decides on Hv; if c.J belongs to 0
1
, he decides
on H1. This criterion can be stated in terms of a likelihoodratio test.
We define the likelihood ratio for the general case to be
max
l( )  "'I
'V 
WI
(1430)
where CAJo ranges over the set 0
0
and 611 ranges over the set 0
1
t A
generalized likelihoodratio tat is a test in which Yo is the set of points y
for which '
l(y) > 11
where" is some predetermined nonnegative number.
The likelihood principle can be untrustworthy in certain applications.
ObviouslyJ if the observer uses a likelihoodratio test when there is actu
ally a probability distribution on 0 of which he is unaware, he may get
a test which differs radically from the test for minimum probability of
error.
In considering tests of maximum power at a given level, let us, for the
sake of simplicity, consider only the special case that the null hypothesis
H0 is simple and the alternative HI is composite. Then for each 61 in 0
1
one could test H0 at level a against the simple hypothesis that corre
t The usual definition for the likelihood ratio is
max 1'",,(1/)
lUi)  m"'.:x P..,(t/)
where faJ ranges over all of o. Ii is more convenient in this chapter to use the defini
tion of Eq. (1430), and the two are essentially equivalent.
330 RANDOM 8IGNAlS AND NOISE
sponds to CAl. A maximum power test to do this would have a critical
region Y.. consisting of the set of X'S for which
p.,(x) >
Po(x)  "..
o
1 
"'0
FlO. 145. Powerfunction curves.
for some number 'I.,. If the critical regions Y.. are identical for all C&J
in Ot, then the test with critical region Y
I
= Y.., CAl in 0
1
, is said to be a
uniformly most powerful test of H0 against HI at level CI. When this
fortuitous situation occurs, one is in just as good a position to test H0
against H1 as if HI were simple. The ambiguity introduced by the fact
that HI is composite turns out in this case to be of no importance. In
some useful examples, as we shall see later, there exists a uniformly most.
powerful test. When there does not, there is no simple criterion as to
what is the best test of level a. The situation is clarified perhaps by
reference to the power function. The power function of a test with
critical region Y1 is the probability of the critical region Y1 as a function
of CAl. Suppose that n is an interval on the real line so that every ele
ment CAl is a real number, say, between a and b. Then typical power
function curves are as shown in Fig.
145, in which Wo is the point in n
corresponding to H0, so that the
value of the power function at Wo is
the level of the test. Each curve in
Fig. 145 is the powerfunction
graph of a different test; all have
level CI. If there is a test with
power function lying above all the
rest which have the same value at
CAlo, like curve (a) in Fig. 145, it is
a uniformly most powerful test at its level. Usually every power function
will for some values of WI lie under the power function of another test,
like curves (b) and (c) in Fig. 145.
When there is no uniformly most powerful test, some further criterion
must be used to find a "best" test. There are various possibilities: the
cl888 of tests considered may be made smaller by considering only those
tests which have some particular desired property. There may then be
a uniformly most powerful test with respect to this smaller class. One
such class is that of unoiaeed tests, whose power functions have a mini
mum at CAlo; curves (a) and (b) in Fig. 145 are power functions of unbiased
tests, but curve (c) is not. Another possibility is to judge the goodness
of tests at the same level according to some overall property of the
power function. t
t See Lehman (I, Chap. 6).
STATISTICAL DETECTION OJ' SIGNALS 331
BmmpZ. 148.4. Let us modify Example 143.3 by setting (Ii  0, k  1, ,N,
and replacing b. by flb. where fj is an unknown real positive number. H. is then the
simple hypothesis
(1431(1)
N
__exp [ _ ! \' 11,,1]
2'' tllc
l
kI
and HI is the composite hypothesis
fJ>O
(14316)
We now show for this example that a likelihood ratio test of H0 against HI is a uni
formly most powerful test at its level. First, we observe that with nonsingular gaus
sian probability densities, as in this example, the level a, for any fixed fl, is a continuous
as well as monotonic function of " which runs from zero to one. Hence at least one
"  ,,(fJ) can be found to yield any specified level a. From Example 143.3, for a
fixed value of fJ we have that the critical region for a most powerful test of H
against H, at level a consists of those y  (tI" ,UN) for which
N
y,Jjb" c(lJ)
L 'it
iI
N
or y,,6. > c(fJ) == k(fJ) (1432)
alIt  fJ
A:l
Now if for two different values of fJ, fJl and fJt, k(fJI) is strictly greater than k(fJ,), then
it follows from Eq. (1432) that contains at least all the points tI .. (til, ,tiN)
contained in But
P'(Y6.) :a: a = PO(Ylla)
Hence P.(Y  Y .. o. But then, by our convention regarding seta of lI'Swhere
,.(tI)  0 in the definition of the critical region for a maximum power test,  Y,.
Hence all the Y are equal, and we can take Y1 == That is, the maximumpower
test at level a of H0 against each of the simple hypotheses comprising HI is the same as
each other one; 80 this test is a uniformly most powerful test of H0 against HJ.
Example 143.6. We now further modify the preceding examples to get an example
in which both hypotheses are composite. Let be an unknown real number. It is
desired to test between the hypotheses
N
[
I \' (YI:  /lalc)l]
tiN exp  2 "I
iI
N
H
. ( 1 [ 1 l (11"  IJbl)l]
1 PI Y., ,YN,fj) ,. (2 )Nlt exp  2 I
r CTI tiN cr.
iI
We shall determine the ratio, as defined in Eq. (1430),
max 1'.(lIb
1(1/) 
max 1'1(tI., ,1IN;fJ)
. fJ
(14334)
(14336)
332 RANDOM SIQNALS AND NOISE
The maximum of PO(t/l, , 7 1 N ; ~ ) occurs for the same value of fJ &8 the maximum
of log PO(Ylt    ,t/Ni/J). Since 101 1'.(11"  ,t/Ni/J) is a seconddegree poiynomial
in fJ, we can find its stationary point by setting its derivative equal to zero. Thus
N
(
1 ~ (JI.  fJa,,)1
log po ?lit ,t/Ni/J)  CODst.  2 ~ fT,,1
.1
and
(1434)
(l435)
From the form of log po it is obvious that this value of fJ in fact gives a maximum for
log po and hence for po The value of fJ to maximize pl(t/l, ... ,YNifJ) is similarly
N
~ 1/,))"
~ tI,,1
fJ _ " ~ 1
~ bil
k lIi
l
"1
Substituting these values of fJ in the expressions for po and PI, respectively, and per
forming a slight reduction yields
Let
''i'hen
(1436)
(1437)
(1438)
and Yo is the region where this expression is greater than log tr.
STATISTICAL DETBCTION 01' SIGNALS 333
14&. Statistical Estimation
Statistical estimation can be described briefly as follows: An event is
observed which could have been produced by anyone of a family of
mutually exclusive possible causes (or is related to anyone of a family of
possible states of nature). Each of these possible causes is indexed by a
parameter or by a set of parameters. It is desired to infer from the
observation which cause could most reasonably be supposed to exist, that
is, which parameter value or sets of values obtained. Thus, as for the
problem of testing. statistical hypotheses, we postulate a space Y of all
possible values of some observed variable (or variables) and a set of
known probabilities P(II on Y where CaJ belongs to a set o. But now 0 is
partitioned into a collection of mutually exclusive subsets O. where a is
a number or a vector labeling the subset. The sets 0
4
may consist of
only single elements w from o. An estimator of a is a function 4(y)
defined on Ywhich is intended, of course, to yield a reasonable guess
of the true value a. Thus an estimator is a random variable on the
space of observations.
This abstract formulation is perhaps made clearer in terms of a common
example. Suppose it is known that some quantity y is determined by a
gaussian distribution of unknown mean m and variance cr
l
and an esti
mate of m is wanted. A set of N independent measurements of y is
made. We designate these 111, YI, , YN. Then the space of obser
vations Y is Ndimensional vector space, and a point y in Y has coordinates
(Yh ,YN). The set nis the upper half of the plane, with the abscissa
of a point w representing a mean m and the ordinate (which is positive)
representing a variance (12. The parameter a to be estimated is m, and
the sets O. are lines drawn vertically up from the horizontal axis. An
example of an estimator for m is the sample meant
Yl +'02 + +YN
1fI.== N
which is a function of the point Y in Y. Under the same conditions,
it might be desired to estimate m and D'I. Then Y and n are the same
as before, but now a is a vector with coordinates (m,D") and Do is a single
point in n.
According to the definition of an estimator, any function of y is one,
whether it gives a good, fair, or ridiculous guess of the true value of the
parameter. There are several criteria for judging the reasonableness and
quality of estimators. First, if the distributions on Yare completely
determined by a so that anyone of the family of probability densities
t See Art. 52.
884 RANDOM SIGNALS AND NOl8E
on Y can be written as P.(y). we sayan estimator 4 of 0 is uabiesed]
if it gives the correct value on the average; i.e., <1 i ~ unbiased if for all a
E.(4) = a (1439)
where E,,(4) is the expectation of 4 when a is true. This condition can
also be written
Id(y)p..(y) dy = a
If the probability distributions on Yare not completely determined by a
single parameter, a condition corresponding to Eq. (1439) may still be
written, but it represents a more severe restriction, Thus. for example,
if the family of distributions are determined by two parameters a and b
we may say that an estimator 4 of a is unbiased if for every a
for all b (1440)
The property of being unbiased does not constitute a sufficient guarantee
of the goodness of an estimator. An estimator 4(y) can be unbiased and
yet in repeated identical Independent trials yield values which fluctuate
wildly around the true value. This situation reflects the fact that for
some set of densities P.(y), a(y) considered as a random variable has a
distribution which does not concentrate the probability weighting closely
around the mean. One measure of the fluctuations of 4(y) about its
mean is given by its variance
UG1(ti) = E
a
{ [1  E
a
(4)]I J
If 4(y) is an unbiased estimator, this reduces to
uG
2
(4) = Eal(d  all]
(1441)
(1442)
A condition of optimality of an estimator which we shall use is that it
be an estimator of minimum varlance] from some class of unbiased
estimators.
As with hypothesis testing, we are often interested in how the statisti
cal inference can be improved as the number of independent observations
becomes large. Let a sequence 4N(Y) of estimators be given. where dN(Y)
is an estimator on the joint sample space of N independent observations.
If this sequence has the property that for any true value of the parame
ter a, 4N(Y) converges in probability to a, the sequence of estimators is
said to be consi8tent. 1f
t See Cramm (I, Art. 32.3).
f Bee C r a m ~ r (I, Art. 32.3).
I See Art. 46.
, See Art. 53:
STATISTICAL DETECTION OF SIGNALS 335
Maximumlikelihood Estimation. The likelihood principle mentioned
in a preceding section applies also to estimation. The intuitive idea is
that if anyone of a family of probability densities PG(Y) on Y might be
operative when a particular observation y is made, it is reasonable to
suppose the true value of a is one which establishes a probability dis
tribution on Y so as to make y most likely, i.e., that the parameter a is
such as to maximize PG(Y), which as a function of a and y is called the.
likelihood function and sometimes written Since the logarithm
is a real increasing function for positive arguments, we may consider
log instead of "(y;a) when searching for values of a which yield a
maximum. It turns out usually to be convenient to do this. If log ;a)
has a continuous derivative with respect to a, then we may often expect
to find the value or values of a which maximize the likelihood among the
solutions of the equation
alog "(y;a) = 0
da
(1443)
This equation is called the likelihood equation. A solution for a is a
function a(y), which is an estimator of a. Solutions can occur of the
form a == constant;t these are not reasonable estimators, since they do
not depend on y, and we drop them from consideration. The remain
ing solutions of Eq. (1443) are called maximumlikelihood estimator.
Maximumlikelihood estimators are known to have properties which help
justify their use in some circumstances beyond the rough intuitivejustifi
cation given above. Under certain conditions, maximumlikelihood esti
mators have minimum variance in a wide class of unbiased estimators. t
Under more general conditions, maximumlikelihood estimators are con
sistent and have asymptotically minimum variance.
If there are n different parameters ai, . . . , a" to be estimated, then
the maximumlikelihood method consists of solving the n equations
a
a log A(y;al,. ,a,,) ;::: 0
ai
i = 1, ... J n (1444)
for at, . . . , aft. If an a priori probability distribution is known for
the parameter values, with density function 1I"(a), then there is a Bayes'
estimate for a obtained by maximizing r(a)po(Y) for each y. This is
closely analogous to the situation in hypothesis testing with given a priori
probabilities.
t For example, if A(y;a)  (11 ; al)I, a 55 0 is a solution of thelikeli
hood equation.
Cram6r (I, Art. 33.2).
I Cram6r (I, Art. 33.3).
336 RANDOM SIGNALS AND NOISE
EmmpZ, 144.1. Let Y be ndimensional vector apace. There is liven a 18' 01 pua
sian Nvariate probability density functions,
and it is desired to estimate fJ. A maximumlikelihood estimator haa already been
obtained for this problem in Example 143.5. By Eq. (14.34),
N
LflAG,,/.,,1
S(y)  '":"::'N
I
  
Ltll,1/.,,1
iI
This estimator is unbiased, for
The variance of ~ ( 1 1 ) is, since the ". are independent,
(1446)
(1447)
(1448)
which does not depend on fJ. The variance of the estimator does Dot depend OQ IJ
because the parameter (J influences only the mean of the distribution on Y.
The estimator 8(,) ill linear, i.e., if JI  (,Ilt ,VN) and I  ('h ,IN) are
two observations and c. and c. are any constants,
We can now show that ~ has minimum variance in the cl_ 01 all uDbiaeed, liDear
estimators of fJ. Any linear estimator !J'(1I) can be written in the form
(1449)
with some let of numbers d". In order for fJ' to be unbiaeed
N N
E_[tJ'(fI)  l dtoE_(lI,,>  fJ l ~ fJ
'J iI
STATISTICAL DETECTION OF SIGNALS
hence the following relation must be satisfied by the d.,
The variance of /J'(y) is
N
G','(P'(y  l d.....'
iI
Now, from Eq. (1450) and the Schwartz inequality,
Hence,
337
(1"50)
(1451)
Observations and Statistics. Thus far, in the discussions of both the
hypothesistesting and estimation problems, the space of observations Y,
or the sample space, has been assumed given. In many classical
statistical testing procedures, some number N of prescribed measure
ments or observations are made, and these provide the statistical data.
These N measurements then provide a set of N real numbers 1/1, ,'UN
which can be thought of as the coordinates of a point y = (1/1, ,'UN)
in an Ndimensional vector space Y, the sample space. Once the method
of making the measurements is decided, Y is fixed. Y contains the raw
data of the test from which a statistical inference is to be drawn. In
some types of problem, howeverparticularly in the applications we
consider herethere is too much raw data to be handled directly, and
the analysis really proceeds in two stages. First, the original data are
treated so as to yield a smaller set of data, and then statistical procedures
are applied to these reduced data. In this situation the sample space Y
is the space of possible values of the reduced data.
For example, suppose a pulsed radar is used to determine whether
there is an object in a certain direction at a distance of between 20 and
21 miles. Ideally the data available at the radar receiver from which a
decision can be made consist of a sequence of sections, one for each
pulse, of a continuous record of voltage against time of duration 2/186,000
seconds (the time for a radio wave to travel two miles). A continuous
338 RANDOM SIGNALS AND NOISE
record of this sort involves an uncountable infinity of values. Various
things may be done to reduce the quantity of data. One is to sample
each returned pulse once; i.e., the voltage amplitude received is measured
once during each time interval corresponding to the range intervalfrom20
to 21 miles. Suppose the radar is trained in the required direction long
enough for K pulses to be returned. Then a sample point has K coordi
nates and Y is Kdimensional space.
The reduction of data from the original space of observations to the
sample space can be regarded as a mapping or transformation. If the
original space of observations is denoted by M, with points m, then for
each observation m a point y in the sample space is determined. Thus
a mapping y(m) is defined from M to Y; such 8 mapping is called a
statistic. In general, any mapping from a space of observations to a
sample space or from one sample space to another is called a statistic.
Thus, in the situation described in the preceding paragraph, the mapping
which carries a received waveform into a point in Kdimensional space is
a statistic. If, further, an average is taken of the coordinates Yl, . ,
Yx of the point in Y, it constitutes a mapping of Y into a new one
dimensional sample space and is another statistic, the sample mean.
It is clear that the choice of a statistic is part of the overall statistical
problem and that it must be made with some care. Usually, when the
original data are reduced, Borne information pertinent to the decision to
be made is lost, but not always. Thus, for example, it can be shown t
that, if a set of numbers is known to be distributed according to a gaussian
probability law with unknown mean and variance, and if n independent
samples are taken, the sample mean and the sample variance contain
just as much information about the distribution as do the n sample values.
Here the statistic with the two coordinates, sample mean and sample vari
ance, maps an ndimensional sample space onto a twodimensional one.
Grenander has introduced the term "observable coordinates" for the
initial statistics used in making statistical inferences on random processes.
This term seems apt for the situations we are concerned with, and we
shall use it in the articles that follow.
14:6. Communication with Fixed Signals in Gaussian Noise:
The first statistical signalreception problem we shall consider is the
following: One of two known possible signals Bo(t) and Bl(t) is transmitted
for a fixed interval of time 0 ~ t < T. The transmitted signal is cor
rupted by the addition of stationary gaussian noise with a known auto
t C r a m ~ r (I, p, 494. Example 1). The sample mean and variance are "sufficient
estimators," which implies the statement above.
t The material of this article is an immediate application of parts of a theory
developed in Grenander (I, particularly Arts. 3 and 4).
STATISTICAL DETECTION OF SIGNALS 339
correlation function, so the received signal Yet) is given by the equation
Yet) = Si(t) +net) o~ t s T, i = 0, 1 (1452)
where net) is the noise. The decision to be made at the receiver is
whether 8o(t) or 81(t) was actually transmitted. If 80(t) and 81(t) are
sine waves of different frequencies, then this problem becomes an idealized
persymbol analysis of the FSK radio teletype discussed in Art. 141.
The 8i(t) may be quite arbitrary, however, so the application is more
general. Application may also be made to radar, as we shall see in
Art. 147. The chief idealization of the problem represented by Eq.
(1452) with respect to almost any practical application is that no param
eters have been introduced to account for ambiguous amplitude and
phase of the signal 8i(t). Modifications in which unknown signal ampli
tude or phase is introduced will be discussed later.
This is a problem in the testing of statistical hypotheses. The hypothe
sis H0 is that 8o(l) was actually sent; the hypothesis HI is that 81 (t) was
actually sent. The observation is a realvalued function y(t) on the
fixed interval 0 ~ t < T; the observation space is the set of all such
functions. We are going to derive a likelihoodratio test to choose
between the two hypotheses. The significance of the likelihoodratio
test and the establishing of a threshold depend on what particular appli
cation one has in mind. For example, in radioteletype transmission,
it is usually reasonable to assign a priori probabilities 11"0 c:: ~ ~ , 11"1 = ~
and equal losses to each kind of error. Then the threshold for the
likelihoodratio test is one.
The first step is to choose a statistic, or set of observable coordinates,
and thereby specify a sample space on which to calculate the likelihood
ratio. Following Grenanderf we take as observable coordinates a set of
weighted averages of yet) as follows: From Art. 64, we know that the
noise net) can be written
n(t) = l Z14>1(1) os t s T (1453)
t
where
%1 = !o7' n(t)4>:(t) dt (1454)
E(zlc) == 0
E(z/cz:> = 0 if k ~ m
= trk
l
if k = m
and where the q,1i(t) are a set of orthonormal functions satisfying
for R,,(8  t)4>1(t) dt = 11',,14>,,(8) 0 :5 8 :5 T (1455)
t Grenander (I), Art. 3.
340 RANDOM SIGNALS AND NOISE
Since R.(t) is real, the t/I.(t) may be taken to be real, t and we shall assume
that this is done. We take for observable coordinates
k = 1,2, . (1456)
If we define ak and hie, k == 1, 2, .. , by
at = foT 8 o ( t ) ~ t ( t ) dt
b
t
= foT 8 1 ( t ) ~ t ( t ) dt
then from Eqs. (1456), (1457), and (1454) we have
(1457)
'U1r, = air, + Zle
== b. +ZIr,
if i == 0
ifi==l
(1458)
From Eq. (1454) it follows, according to a principle presented in Art.
84, that any finite collection of z.'s has a joint gaussian distribution.
Each ZIr, has mean zero, and any two different z.'s are uncorrelated; hence
the ;Z.'8 are mutually independent (gaussian) random variables. The
a.'s and b.'s are numbers, not random variables. Hence, if i = 0, the
'YIJ'S are mutually independent gaussian random variables with mean
value a. and variance (1.1. Similarly, if 1, = 1, the y,,'sare mutually inde
pendent gaussian random variables with mean value b
t
and variance (Tic
2
The reason for choosing the observable coordinates in this way should
now be clear. The coordinates YIJ are mutually independent, and hence
it is straightforward to write joint probability densities for (Yl, . ,YN),
where N is arbitrary. An approximate likelihood ratio can be written
that involves only 'Vt, , 1/N, and then a limit taken as N... ex>. The
important point is that this choice of observable. coordinates leads to
independent random variables. This decomposition in terms of the
orthogonal expansion of n(t) is the infinitedimensional analogue of the
diagonalization of the covariance matrix of a finite set of random variables.
The orthonormal functions f/>.(t) mayor may not form a complete] set
if no restrictions are put on R,,(t) except that it be a correlation function.
t Since R.(O is real, if .i(t) is a characteristic function, tRlc/i(t)] and 3[4J.(l)] are
characteristic functions. It can be shown that if (O, . . . , 4Jx(t) is a set of linearly
independent complexvalued functions, some set of K of their real and imaginary parts
is linearly independent. Hence, given a set of K linearly independent complex
valued characteristic functions for a characteristic value A, a set of K linearly inde
pendent reoltIGlued characteristic funotions for ~ can be found, and from this a set of
K realvalued orthonormal characteristic functiona,
*See Appendix 2, Art. A21.
STATISTICAL DETECTION 01' SIGNALS
341
If the functions .1I(t) are not complete, then there are functions l/I(t)
which are orthogonal to all the to, i.e., with the property that
1of' ",(t)4>.(t) dt = 0 (1459)
for all k. If in addition some function ",(t) which satisfies Eq. (1459)
for all Ie also satisfies
loT ",(t)80(t) dt ~ loT ",(t)81(t) dt (1460)
then a correct decision may be made with probability one. This can be
shown as follows: let '/I(t) be a function which satisfies Eqs. (1459) and
(1460) for all k; then
Iof'",(t)y(t) dt = fof' ",(t)80(t) dt + Iof' ",(t)n(t) dt if i =0
=loT ",(t)81(t) dt +101' ",(t)n(t) dt if i = 1
But the second integral in each of the expressions on the right is equal to
zero because, using the representation of Eq. (1453), since ';(t) is orthogo
nal to all the .Ie(t), it is orthogonal to n(t). Hence, if 8o(t) was sent
Iof' ",(t)y(t) dt = foI' ",(t)80(t) dt = Co
whereas if 81(t) was sent,
loT ",(t)y(t) dt = 101' ",(t)81(t) dt = CI
and Co and Cl are different. The situation described in this paragraph is
called the eztreme singular case by Grenander, It does not occur in con
ventional problems, because, if net) is filtered white noise, the q,1c(t) are a
complete set of orthonormal functions. t Intuitively, this extreme singu
lar case is that in which the noise can be completely U tuned out" (see
Probe 145 for a simple example).
We now turn to the usual situation in which there is no function 1/I(t)
orthogonal to all the .,(t) and satisfying the condition (1460). This
includes the case that the .Ie(t) are a complete set. Since (Yl, ,YN)
is a set of mutually independent gaussian random variables with vari
ances (tTII, ,oN
I
) and with mean values (ai, ... ,aN) under hypothe
sis H0 and (bt, . . . IbN) under hypothesis HI, the natural logarithm 0:
the likelihood ratio for the first N observable coordinates is, from Eq
(1425),
342 RANDOM SIONALS AND NOISE
A likelihoodratio test based on just these N observable coordinates is
then, from Eq. (1424), to choose Hif
(1462)
and to choose HI otherwise. Before discussing the limiting behavior as
N ~ 00, it is convenient to put the likelihood ratio in a different form.
Let
(1463)
Then
(1464)
We note also, for future use, that by Eq. (1463),
N
faf' R..(u  t)!N(t) dt = L(a.  b.).t(u)
iI
(1465)
It can be shown that log IN(Yl, ,'VN) converges as N . 00 t (if either
hypothesis is true). Thus the limiting form of the likelihood ratio test is
given by
logley) ==N
li
!!!. faf' !N(t) [y(t)  80(t). ~ 8
1
(t)] dt > log" (1466)
If
(1467)
t Grenander (I, Art. 4).
STATISTICAL DE'rECTION OF SIGNALS 343
it can be shown by a relatively straightforward calculation using the
Chebyshev inequality that
log IN(Yl, .. ,'UN) ~ +00 in probability if H
o
is true
and log IN(Yl, ... ,YN) ~  00 in probability if HI is true
This is also a singular case in which perfect detection is possible in the
limit. If the infinite series of Eq. (1467) converges to a finite limit (the
only other alternative, since the series has positive terms) the limiting
likelihood ratio is not singular, and we call this the regular case. In some
types of systems, natural constraints on signals and noise guarantee that
the series of Eq. (1467) converges. For example, if the noise is con
sidered to be introduced as white noise at the input to the receiverI then
the singular case cannot occur. A proof of this is outlin.ed in Probe 146.
Let us now discuss further the regular case.
Fixed Signal in Gaussian NoiseRegular Case. The form of the
likelihoodratio test given by Eq. (1466) is inconvenient in that the likeli
hood ratio is obtained as a limit and the test function fN(t) is defined
by a series which becomes an infinite series as N ~ ex). Formally one
can see that if there is no trouble connected with passing to the limit
and the limit of !N(t) is denoted by J(t) , then Eqs. (1464) and (1465) give
log l(y) = iTf(t) [y(t)  8o(t) ; 81(t) ] dt
where f(t) is the solution of the integral equation
(1468)
00 GO
ITR..(u  t)J(t) dt = 2: a"t/>,,(u)  2: b"t/>,,(u)
kl iI
=80(U)  81(U) 0 ~ U ~ T ( 1 4  G 9 ~
It can be shown] rigorously that if fN(t) converges in mean square to a
function f(t) of integrable square, then the logarithm of the likelihood
ratio is given by Eq. (1468) andf(t) satisfies the integral equation (1469).
Conversely, if a function J(t) of integrable square is a solution of Eq.
(1469), then it can be used in Eq. (1468) to give the likelihood ratio.
If the characteristic functions of Rn(t) are not a complete set, J(t) will
not be unique.
We shall assume for the rest of this section that the likelihood ratio is
specified by Eqs. (1468) and (1469). First, let us point out the very
close connection between this likelihoodratio test and the maximum
t Grenander (I, Art. 4.6).
344 RANDOM SIGNALS AND NOISE
signaltonoise ratio filter discussed in Art. 117. Let us suppose for
convenience that 81(t) == o. Define the function get) by
geT  t)  f(t)
or
Then Eq. (1469) becomes
so(u) == 101' R..(u  t)g(T  t) dt
== loT R..(u +IJ  T}g(IJ) drJ
so(T  o== loT R..(IJ  t)g(IJ) drJ (1470)
and the test inequality becomes
loT geT  t)y(t) dt > H loT geT  t)so(t) tIt +log" (1471)
From Eq. (1470) it follows that get) is the weighting function of a filter
which will maximize signaltonoise ratio at time T when the input signal
is 80(t), 0 S t ~ T, and the noise has autocorrelation R,,(t). Hence the
test given by Eq. (1471) can be interpreted as putting the received signal
Yet) into this maximum signaltonoise ratio filter and comparing the out
put with what is obtained when Bo(t) is put into the same filter.
The probabilities of the two kinds of error can be calculated straight
forwardly because log l(y) is a gaussian random variable under either
hypothesis. If 81(t) was actually sent, using Eq. (1468),
mil = E[log ley)] == ~ loTf(t)[81(t)  80(t)] dt (1472)
a'(l) = a'[log ley)] = E [loT loTf(t)n(t)f(u)n(u) dl du1
== loT loT R..(t  u)f(t)f(u) dt du (1473)
From Eq. (1469) it follows that m'l can be written in the form
(1475)
or
m" =  ~ loT loT R..(t  u)f(t)f(u) dt du (1474)
Then the probability that the received signal is identified as Bo(t) when it
is actually 81(t) is
_1 1 exp(!u')dU
y'2; 2
10'./' >+H.')
STATISTICAL DETECTION OF SIGNALS 345
The probability that the received signal is identified as Bl(t) when it is
actually 8o(t) is
101 ,,/.(1)  }{.(l)
_1 f ex
p
(  !U')dU
~ 2
GO
(1476)
If the noise is white noise so that the correlation function is an impulse
function RA(t) = N6(t), then Eq. (1469) reduces to
Nf(u) = Bo(U)  Bl(U)
where N is the noise power. Substituting this in Eq. (1468) yields
log ley) = ~ ~ T 8o(t)y(t) tit _ ~ ~ T 81(t)y(t) dt
1 ('1' 1 (T
 2N 10 80
2(t)
dt + 2N 10 B.
2
(t) dt (l477)
If now the two signals 8o(t) and 81(t) have equal energy, the last two
terms in Eq. (1477) cancel, and if '1 = 1, the likelihood test becomes
simply: choose H0 if
loT 80(t)y(t) dt > loT 81(t)y(t) dt (1478)
and HI otherwise. A detector which instruments the criterion of (1478)
is called a correlation detector. The correlation function cannot be strictly
an impulse function, of course, but it can be an approximation to one
as far as the integral equation (1469) is concerned if the spectral density
of the noise is fairly flat over a frequency range considerably wider than
the frequency band of the signals. This approximation is still reasonable
inthe usual case of narrowband signals and white narrowband noise;
the requirements are that the center frequency Cl'o be very much greater
than the noise bandwidth and the noise bandwidth be greater than the
signal bandwidth.
146. Signals with Unknown Parameters In Gaussian Noise
A more realistic model of a received radio signal than that given by
Eq. (1452) can be had by introducing parameters to account for unknown
amplitude, phase, additional terms due to multipath transmission, etc.
The simplest model of a signal in gaussian noise which includes such a
parameter is
y(t) = f3Si(t) +n(t) 0 ~ t ~ T, i = 0, 1 <1479)
Let us discuss detection of a signal of the form given by Eq.(1479)
under two different statistical assumptions: that (j is a random variable
with known probability distribution, and that Ii is completely unknown.
346 RANDOM SIGNALS AND NOl8E
Using the notations of the preceding article, we have, instead of Eq.
(1458), the equations
y" = + Z"
= + Zk
if i = 0, k = 1, 2, ...
if i = 1
(1480)
(1484)
Suppose now that fJ is a random variable independent of the noise net)
and with a probability density function p(fJ). Then on the sample space
of the first N of the Yk'S there are the probability densities
f
ClO 1
PO(Yh ,YN) = _ .. (211")N/2
U1
UN
N
exp [ ! \' (Yt  (1481)
2 '' (Tic"
iI
f
lO 1
Pl(Yh ,YN) = _ .. (211")N/2
U1
UN
N
exp [ ! \' (Yk  fJ
b
k)2] d(3 (1482)
2 '' Uk!
iI
if 8o(t) was actually sent or 81(t) was actually sent, respectively. The
likelihood ratio IN(Yl, . ,YN) is then
N
J:.. exp [ L(Y/r. 
IN(Yl, ,YN) = i;l (1483)
J.... exp [  L(Y/r.  p({j) d{j
IeI
The limit of IN as N + CX) is the likelihood ratio on the sample space of
all the Yk.
If there is actually very little known about (3, it may be reasonable to
assign it a uniform distribution between two limits PI and P2. Then
1
p(j) = {j fJ PI :s; fJl
2  1
= 0 otherwise
After canceling common factors, the likelihood ratio is
N N
J:'" exp ({j La"y/r./u/r.2  La/r.2/
cr/r.2)
l ( )
ka:l kl
N Yl, ,YN = N N
J:'" exp ({j Lblr.y.JU/r.2  Lb,,2/
U/r.2)
iI iI
8TATISTICAIJ J.>ETECTION OF SIGNALS
This cannot be expressed in a verysimple form. If we let
N N
Yo =
~ a"y" A' = 2: a,,'
(1,,1 (1,,2
iI kI
N N
Yt =
2: b"y"
B2 = 2: b,,2
tr,," tr,,"
tl iI
then
where F is the cumulative normal distribution
1 IX (X
2
)
F(X) = y'2; _00 exp  2" dx
347
(1485)
It is not difficult in principle to introduce parameters with known prob
ability densities, but as indicated even by this relatively simple example,
the calculations become very unwieldy.
If the amplitude parameter (j is unknown and there is no a priori infor
mation about it, then deciding at the receiver whether 8o(t) or 31(t) was
sent is a problem of testing between two composite hypotheses. The
generalized likelihood ratio on the space of the first N observable coordi
nates has already been calculated in Example 143.5 and is given by
Eq. (1438). The limit of this as N ~ 00 is the likelihood ratio on the
space of all the y".
This likelihood test can also be interpreted in terms of filters which
maximize eignaltonoise ratio. Using the notations eN and d as in
Eq. (1437), we shall suppose that
c' = lim eN"
N.....
(1486)
are finite and that the integral equations
loTfo(t)R..(u  0 de = 80(U)
loT!r(t)R..(u  t) dt = 81(U)
(1487)
(1488)
348 RANDOM: SIGNALS AND NOISE
have solutions /o(e) and /l(e) which are of integrable square. Then, by
development like that which produced Eq. (1469), we have
(1489a)
(1489b)
From Eq. (1438) this gives, for the logarithm of the likelihood ratio
on the space of all the Yt,
log ley) =: ~ I [iTfo(t)y(t) der~ I [lTft(t)y(t) deT
If now the functions go(t) and gl(t) are defined by
go(T  t) = Jo(t)
gl(T  t) =: !l(t)
(1490a)
(1490b)
we have, as in Eq. (1471),
log ley) =: ~ I [iTgo(T  t)y(t) der~ I [i2' gl(T  t)y(t) der
(1491)
where Uo(t) and gl(t) satisfy
L2' R.(v  t)go(t) de =: Bo(T  v)
If' R.(v  t)gl(t) de =: BI(T  v)
o~ t s T (1492a)
oS t S T (1492b)
Thus the terms in parentheses in Eq. (1491) can be regarded as outputs
of filters which maximize the signaItonoise ratio for signals 80(e) and
'I(t) respectively. If the signals are normalized 80 that c
l
== d
2
and the
likelihood threshold." == 1, the likelihood test may be implemented by
putting the received signal through each of these two U matched filters"
and choosing Bo(t) or 81(t) according to which of the filters has the greater
output at time T. (See Fig. 146.)
The generalized maximumlikelihood test which we have just used is
clearly equivalent to the following statistical procedure: Assuming H is
true, a maximumlikelihood estimate is made for the unknown parameter
(or parameters); this estimated value (or values) is then used in place of
the actual parameter value in determining a probability density on the
STATISTICAL DETECTION 01' SIGNALS 349
IJ 'i (t) +nIt)
Wellhtinl function
_ttl
Weighting function
61(t)
Compare
outputs
Matched filter for '1(t)
FIG. 146. Optimum receiver for binary signals in gaussian noise.
sample space under hypothesis Ho. Then another estimate (or esti
mates) is made, assuming HI is true, and this estimated value is used in
determining a probability density under hypothesis HI. The ratio of
these two densities is the generalized likelihood ratio. Thus, parameter
estimation is an integral part of the hypothesistesting procedure.
Let us consider very briefly the problem of estimating ~ and T in
Yet) = 13,(t  T) +net) (1493)
where now s(t) is presumed to be defined outside the interval 0 S t S T
as well as inside it. Let
(1494)
Then the probability density for the first N observable coordinates for
fixed fJ and T is
(1495)
(1496)
Solutions of these equations for (j and T as functions of 'HI, , YN are
maximumlikelihood estimates. It is hard to say much about these solu
tions until the form of B(t) is specified. As an example, let us take 8(t)
350 lUNDO .,QNAJ.D ANI) MOilS
to be a sine wave with an integral numberof periodsin the time interval
os t T and estimate "''4
B(t)  COB 7I&Wo'
(1497)
Then, from Eq. (1494),
== fa'l' COS 11I(.,)o(t  .,.)t/),,(t) dt (1498)
c::I {Je" QOS fIkAJ.,. + (Jd" sin fTkA)OT
where el: = k
T
t/),,(t) COS1'&6'ot dt (1499a)
d" == k'l' t/),,(t) sin fR(,,)ot dt (1499b)
Substituting fromEq. (1498) and its derivatives into Eqs. (1495)
and (1496) and eliminating yields
(t (t;::) (t (t:.")
t ... , II il iI
an fR(,,)of' ==  (t (t  (t (t
iI iI iI kl
If the limits exist, this becomes
E kT y(t)g(t) dt  C k!' y(t)h(t) dt
tan =
E kT y(t)h(t)  D fo2' y(t)g(t) dt
where
(14101)
(14102)
. . .
C = '\' CI:' D = '\' d,,' E == '\' c"d"
Ui
2
Ui
2
Ui
2
iI iI
loT ROl(U  t)h(t) dt = sin mwou 0 SuS T
10T ROl(u  t)g(t) dt = cos mwou 0 SuS T
If the noise is white noise, Eq. (14101) reduces to
10'I' yet) sin fR(,,)ot dt
tan mc.JOT z= ( f'
J0 y(t) cos flkAJol ill
88 one would anticipate. A principal value for r from Eqs. (14100),
(14101), and is then an estimate of the phase of a(t).
General Remarks. Althoughthe choice of observable coordinates which
we have made is in a sense the mOlt natural one for problems involving
STATISTICAL DETECTION OF SIGNALS 351
gaussian noise and finite observation times, it is of course not the only
possible oue. Another natural way to make observations is to sample
the received signal at discrete intervals. If the sampling intervals are
appreciably longer than the "correlation time," the samples are approxi
mately independent gaussian random variables; hence it is relatively easy
to write approximately the joint probability density functions for the
samples in order to calculate the likelihood ratio. Some information
available in the received signal is lost when this is done. If the sampling
intervals are taken to be short, the joint probability densities of the
samples require the inversion of the correlation matrix for the samples.
The detection of a sine wave in noise has been treated using sampling
at intervals. t In the limit as the intervals between samples approach
zero, the matrix inversion becomes formally the solution of the integral
equation (1469).
We have restricted ourselves in Arts. 145 and 146 to what might be
termed synchronous communication, in which every symbol is repre
sented by a function of the same time duration, Radio teletype often
meets this condition; Morse code does not. However, one can think of
Morsecode symbols as being further divided into symbols on basic inter
vals which do meet this condition. t Thus if the dot, dash, symbol
space, and letter space have relative time durations of, say, 1, 3, 1, 3,
then the length of a dot may be taken as the basic time interval and each
of the Morsecode symbols may be expressed in terms of marks and spaces
on this basic interval. In general, of course, any code can be broken up
into a more elementary one in which each symbol has the same duration,
if the symbol durations in the original code are rationally related.
Also, we have concerned ourselves only with a persymbol analysis of
the received signal. In a more complete study, one should analyze the
interrelations among symbols. The parameters which we have treated
as unknown constants might actually be constants, or might vary with
time, but slowly enough so that for the duration of one symbol they are
nearly constant. In either case, information is obtained implicitly with
the reception of each symbol which should be useful in estimating param
eter values which exist with succeeding symbols.
One of the more difficult practical communication problems to which
statistical theory can be fruitfully applied is that of longdistance multi
path propagation, referred to in Art. 141. Rather complicated statistical
models of the received signals are necessary to account for the distortion
caused by the fluctuating multiple paths. Some results have been
obtained in this area, for which the reader is referred to the literature.j
i See, for example, Reich and Swerling (I).
t This is strictly true only if the keying is done by a machine rather than by band.
See Price (I), Price (III), Turin (I), and Root and Pitcher (II).
352 RANDOM SIGNALS AND NOISE
An additional restriction on the material of Arts. 145 and 146 is that
it applies only to twosymbol alphabets. The use of more than two sym
bols does not change the problem much, however, as long as the number
of symbols is finite. If a priori probabilities exist for the symbols, then a
decision may be made that will minimize the probability of error; if no
a priori probabilities are given, the likelihood may be maximized. In
either case the calculations are essentially the same as those above.
If the communication is not carried out with a discrete set of symbols,
as with ordinary voice radio, for example, the reception problem is to
recover the form of the transmitted signal as accurately as possible.
That is, the reception problem is a smoothing problem such as those
discussed in Chap. 11. The maximumlikelihood method has also been
applied to the extraction of continuouslymodulated signals from noise.t
14:7. Radar Signals in Gaussian Noise
Most of the early applications of statistical methods to radiowave
detection were to rader.] The initial problem was to find an optimum
receiver for detecting targetreflected signals in noise using a pulsed
transmitter, and we shall confine the discussion here to this question.
Statistical methods are appropriate in the treatment of many other radar
problems, however, such as detecting a target in the presence of sea
clutter, measuring accurately range and radial velocity with either pulsed
or modulated continuouswave radars, and measuring accurately azimuth
and elevation, particularly with scanning radars.
We shall consider a "conventional" radar emitting a periodic sequence
of rectangular pulses of radio waves. The signal is as shown in Fig. 141.
For simplicity we shall assume the target to be stationary and to have a
fixed reflecting surface, and we shall assume the radar to be trained
permanently in one direction. The expression for the received signal
given by Eq. (143) for a target at range r then becomes simply
y(t) = A(t  r) cos wo(t  7) +n(t) (14103)
Finally, we shall assume that the radar does what is known as range
gating. This means that the time between pulses is divided into equal
intervals, which correspond to intervals of range, and a decision is
required for each interval as to whether a reflected signal is present or
not. The pulse length T is taken to be equal to the length of these
intervals. (See Fig. 147.)
If we look at anyone of these range intervals, the problem is to make a
choice between two alternative hypotheses: u; there is no target present;
t Youl. (I).
~ For example, Lawson and Uhlenbeck (I), Woodward (I), Marcum (I), Hanae (I),
Schwartz (I), Middleton (IV).
STATISTICAL DETECTION OF SIGNALS 353
Transmitted pulse
Received signal
I I JTJ. I
'Selected range interval ~ To l
Flo. 147. Rangegated pulsed radar return.
and HI, there is a target present. It will be assumed that if a target is
present, it is at the beginning of the range interval. For a single pulse,
then, H0 is the hypothesis that a particular section of received signal of
T seconds duration is noise alone, and HI is the hypothesis that it is
sinewaveplus noise. This is a special case of the hypothesistesting
problem discussed in Art. 145. As in Eqs. (1470) and (1471) we see
that the outcome of a maximumlikelihood test is to choose Hl, i.e., to
decide a target is present, if
loT geT  t)y(t) dt > const. (14104)
where g(t) is determined by
loT R..(v  t)g(v) dv = cos "'o(T  t) 0 ~ t ~ T (14105)
and the constant is determined by the level of the test. In the usual
radar terminology the falsealarm rate is the average ratio of false target
sightings to the total number of decisions when no target is present.
The falsealarm rate is therefore equal to the probability of rejecting
He when H
o
is true; that is, the falsealarm rate is equal to the level of
the test. It was shown in Example 143.1 that the maximumlikelihood
test on 'Yt, , 'UN is a uniformly most powerful test of the null hypothe
sis He against a composite hypothesis which includes all positive ampli
tudes of the signal. The argument given there carries over immediately
to the limiting case as N + co, hence the test described above is uni
formly most powerful at its level, or falsealarm rate, against the hypothe
sis that there is a target echo of any positive amplitude in the received
signal.
Usual radar practice is to have a sequence of returned pulses to examine
instead of only one. Suppose the radar is trained in one direction long
enough for K pulses to be returned, again considering only one range
interval. The portion of the received waveform which is of interest is as
shown in Fig. 147. It will be assumed, quite reasonably, that the noise
bandwidth is great enough so that the noise is completely uncorrelated
354 RANDOM SIGNALS AND NOISE
(14106)
with itself one pulse period removed. Thus the noise in each section of
the received waveform, as shown in Fig. 147, will be independent of
that in each other section.
Take the pulse period to be To and let t be measured from the lead
ing edge of the range interval in question after each pulse. Let
y(1J(t) = y(t), 0 ~ t ~ T be the first section of received waveform,
y(2
J
(t ) = y(t + 7'0), 0 ~ t ~ T be the second section of received wave
form, and 80 on. Suppose that the modulating pulses are phaselocked
to the rf carrier so that the signal 81(t) is identical for each section of the
received waveform, Let
y (l) = 10'1' y(1J(t)c/) (t) dt
y (21 = 10'1' y(2l(t)c/) (t) dt
y",(K) = 10'1' yIK)(t)c/) (t) dt
Then, since Ym(P) is independent of Ym(q), p ~ q, we have the probability
densities
if there is 8. target present. The logarithm of the likelihood ratio is
N N
log 'N[Yl(l), . YN(K)] .. K " b",* _ "y",(l)b..
, 2 ~ ami ~ g.1
".1 .1
N
_ . . . _ "y",(J[)b.. (14109)
~ ct",'
.1
8TATlSTICAL DETBct'ION 01' SIGNALS 355
Pa$Sing to the limit and introducing the teet function /(1) defined by
Eq. (1469) gives
log ley) = L
T
I(t) [y(l)(t) + y(2)(t) + + y(K)(t)  dt
(14110)
From this formula for log l(y) one can interpret the likelihoodratio test in
terms of signaltonoise ratio maximizing filters as before. The result is
as follows: The received waveform for the selected time interval after
each pulse is put into such a filter. The output from the filter at the
end of each interval is stored, and after K pulses, these outputs are added.
If their sum exceeds an established threshold, the decision is that a target
is present. Again, this is a uniformly most powerful test at its level for
any positive signal amplitude. AI given by Eq. (14110), log l(tI) is a
gaussian random variable, so the probability of each kind of error can be
determined from the means and variances. With the notations used in
Eqs, (1472) and (1473), the probability that a target is missed is
00
1 f [ 1 (2:  Km
I
1)2] d
(T(l) exp  2 X
Klog"
1
or
.
f
exp ( ! ttl) du
V2; 2
 1 (I)]
VA .(1) Y
(14111)
and the probability of a false alarm is
...;R[lOI , _1 (I)]
.(l) r
.! f exp ( ! UI) du
V2... 2
.
(14112)
These error probabilities reduce of course to the expressions given by
(1475) and (1476) when K = 1.
The probabilityoferror formulas (14111) and (14112) are the same
8S would be obtained if instead of K pulses only one pulse were received
but with magnitude VK. times greater. This can be easily verified by
multiplying 81(t) by VK and substituting in Eqs. (1469), (1472), and
(1473). Thus the effective voltage signaltonoise ratio at the receiver
(for the detector being considered) is proportional to the square root of
the number of independent pulses received from one target.
356 RANDOM SIGNALS AND NOISE
(14113)
Envelope Detector. In the preceding section, the II received" signal,
here written 2(t), was taken to be
,(t) = B.(t) + net) i = 0, 1, 0 S t S T
80(t) Eii 0
with net) gaussian noise. The sort of matchedfilter techniques which
have been developed for optimum reception of such signals are some
times difficult to implement practically because they are sensitive to
RF phase. For this reason, it is often desirable to take the envelope of
the signal first, i.e., to rectify the signal and put it through a lowpass
filter in the conventional way so that only the modulating frequencies
are left, before doing any special signal processing. It should be realized
that doing this involves throwing away some information of possible
value.
When dealing with a signal ,(t) as given by Eq. (14113), it is expedient
to expand z(t) in the orthogonal series using the characteristic functions
of the correlation function of the noise, because the random coefficients
of the expansion are gaussian and statistically independent. However,
nothing is gained by expanding the envelope of z(t) in such a series; the
coefficients are no longer either gaussian or statistically independent.
Consequently, & different set of observable coordinates will be chosen.
The usual procedure, which we shall follow here, is to use for observable
coordinates samples of the envelope waveform taken at regular intervals
and far enough apart so that it isa reasonable approximation to suppose
them statistically independent. We suppose, as before, that the time
between pulses is divided into intervals of equal length and that it is
desired to test for the presence of a target echo in each one of these
intervals. The simplifying assumptions made ill the previous section
are in effect.
We also suppose the radar is trained in a fixed direction long enough
for K pulses to be returned from a target, and we look at a particular
range interval. The observable coordinates Yl, .. , YK are chosen to
be a sequence of values of the envelope, one from the time interval in
question after each pulse.
The signal input to the second detector (demodulator) of the radar
receiver is
z(t) = Bi(t) +n(t) 0 ~ t ~ T, i == 0, 1
80(t) == 0
as before. This can be written in the form of a narrowband signal.
From Eq. (882) we have
z(t) == Ai"cos wet + xe(t) cos Wet  x.(t) sin Wet i = 0, 1 (14114)
A
o
== 0, Al == const,
STATISTICAL DETECTION OF SIGNALS
357
where %eI and x" are gaussian random variables with the properties stated
in Art. 85. The demodulator is assumed to be a perfect envelope detec
tor. so its output Yet) is, from Eqs. (879) and (885a),
i = 0,1 (14115)
(14116) y, 0
otherwise
From Eq. (891), the probability density for the envelope at time t on the
hypothesis that the sampled waveform is noise alone is
()
y, ( y," )
Po y, = texp 
tift CT"
=0
where tI,,2 = R,,(O). The probability density for the envelope at time t
on the hypothesis that the sampled waveform is sinewaveplus noise is,
from Eq. (8115),
Pl(Y') = ]A exp ( y,t + All) 1
0
(Ad/,) y, 0
tI,,2 2t1,,1 (lfa'},
= 0 otherwise
The likelihood ratio for the samples Yl c::: Y,u J YK = Yta is, to a
good approximation,
s:
nPO(Yi) K
llC(Yl, ,YlC) = .,;;1 = ITexp / 10 (14117)
nPl(Y.) iI
since the y" are nearly independent. Thus a likelihood test is to choose
hypothesis HI (target present) if
K
Llog 10 > ... log" + = const.
iI
(14118)
The Bessel function Io(z) has the aeries expansion
Hence if the target echo has a small amplitude (low signaltonoise ratio)
it is approximately true th.at
(14119)
358 RANDOM SIGNALS AND NOISE
Using the further approximation, valid for small values of the argument,
log (1 +a) = 4 (14120)
the test given by (14118) becomes, for small signala,
lC
At
'' 'Viol> CODst.
iI
or simply
K
2: 'Viol> CODst. (14121)
kl
The smallsignal approximation can often be justified in practice by the
idea that it is only necessary to have a nearoptimum detector for weak
echoes, since strong echoes will be detected even if the receiver is well
below optimum. This test is easy to implement, since the y,2 are just
the outputs at regularly spaced time intervals of a squarelaw demodu
lator. Detection probabilities for the squarelaw envelope detector have
been tabulated by Marcum. t
Phase Detector. As a final example, we shall consider what is called a
phase detector, which can be useful for the detection of moving objects,
Again we want to test for the presence of a target echo in a fixed range
interval. The assumptions made for the above examples are still to
hold. The input signal to the demodulator is given by Eq. (14114),
which can be rewritten in the form
(14123)
(14122) i = 0, 1 z(t) = Yi(t) cos [C&)et +
yo(t) = [x
c
2(t)
+x.l(t)
Yl(t) = ([A 1 + x
c
(t)]2 + X,2(t) JJi
(t) = tan1 [x,(t)]
o xc(t)
.1(t) =tan1 [ x,(t) ]
At + xc(t)
The observable coordinates are taken to bea sequence of K values of the
phase f/>(t), one from the time interval in question after each pulse (these
values of phase must themselves be got by what amounts to an estimation
procedure; see Art. 146). If there is noise alone the probability density
function for the phase at any time t is
Po(fj),) = :.. ft fj) ft
= 0 otherwise
If there is a sine wave present in the noise, the probability density of
t Marcum (I).
where
and
STATISTICAL DETBCTION 01' SIGNALS 359
is 211" times p(.".. as given by Eq. (8118). For large signaltonoise
ratio, A
l
i/ 2fT.' very much greater than 1. t>(t) is usually small and the
approximate density function given by Eq. (8114) can be used.
Replacing cos 4>, by 1 and sin 4>, by and multiplying by 211' so as to get
the density for fixed 1f, one obtains
1 Al (A1
2
q,,2)
Pl(f/>,) == .& /iL exp  2I 7 f/> S r
V 211" fT,. (fft
= 0 otherwise
Then the likelihood ratio lor K independent samples is
1 (1,/' [A1
2
]
'1[(4)1, ,4>1[) ... (21I)lC/2 AllC exp 2<r,,2 (4)1
2
+ + 4>1[2)
and the likelihood test gives the decision criterion: no target present if
K
24>..2 > const. (14125)
iI
This test can be modified slightly to be made a test to detect targets
moving with a known constant velocity toward the radar. The effect of
target velocity is to give a doppler shift of frequency. For ordinary
pulse lengths, and for targets such as ships or airplanes, this shift in fre
quency results in an almost imperceptible total phase shift during one
pulse length. Thus one can think of a pure echo signal as having con
stant phase over one pulse length but with this phase advancing linearly
from pulse to pulse. The kth returned pulse should then have phase kj'
where 'Y is a constant depending on the velocity of the target. The
observable coordinates are then taken to be  k'Y instead of
and the test for no target at the given velocity is
K
l (4)..  k'Y)2 > canst.
1
...'1. Problems
1. Find a test for deciding' between the hypothesis H
o
and HI which minimizes the
expected 1088. H0 is the hypothesis that an observed real number % is distributed
according to the rectangular distribution
po(x)  }(
0
OSzS4
otherwise
HI is the hypothesis that z is distributed according to
Pa(Z)  1 exp
V2r 2
The a priori probabilities are 1ft  }(, 1ft  ". The 1088 values for each kind of error
are equal. Find the total probability of error.
'60
BANDOM SIGNALS AND NOISE
I. a. Show that the likelihood ratio 88 defined by Eq. (1430) gives a hypothesis
test which is equivalent to a test usinl either
max 'fItt(r)
1'( ) ...
fI  maxp ..(,,)
at
or
max P"a(J/i
1"( ) II.
11  max'P .. ( 1 I ~ )
..
b. H
o
is the simple hypothesis that an observed real number z is distributed accord
ing to
1
P.(.z)  V2r,_1/1
B 1(b) is the simple hypothesis
II
P16(:r;)   ,"'11
2
HI is the composite hypothesis containing all the HI (b) for b > O. Determine
the generalized likelihood ratio for H 0 against HI.
8. Let P(71i(l) be the family of probability density functions
P(l/itJ)  0
I/+la
 '1/ + 1 +a
0
"<(11
alS1/<(I
aSf/<a+l
0+1<1/
Estimate the parameter (Iby maximizing the likelihood:
a. One observation '1/1 of 'II is made.
"1  2
b. Three independent observations of l/ are made.
til  1, til  2 ~ ,
1/1 .. 2
,. N independent samples til, , 'liN are drawn from a population which is
known to be distributed according to a gaussian law but with unknown mean m and
variance cr'.
a. Show that the maximum likelihood estimator of the mean is the sampk mean tJ,
given by
and the maximum likelihood estimator of the variance is the ,ample variance ",
given by
STATISTICAL DETECTION OF 8IGNALS 361
b.. Showthat gis an unbiased estimator of m but that,1 is a biased estimatbr of cr
l
In particular
N 1
.1' (,1)  r"
and
where
6. Let a sample waveform be given by
yet) == ,.(t) + net) 0 ~ t S 21f, i  0, 1
,o(t)  C08 3'
'1(I)  cos 2t
net)  %1 cos t +%1 sin ,
with %1 and %1 independent gaU88ian random variables with mean zero and variance cr
l
u
Unit area
So
The unit impulse.function
(a)
%0
The unit stepfunction
(6)
FIG. AII. Singularity functions.
1
Ia
.so CI %0 %0 +cz
ThI rectinlUl., pulse function
(a)
FIG. Al2.
%0
The gaussjNl pulse function
(b)
density function to cover the case of discrete random variables through
the use of the unit impulse function. To give some validity to the unit
impulse function, or rather to the operations we shall perform with it,
it is often convenient to consider the unit impulse function to be the
limiting member of some infinite sequence of ordinary functions.
Consider the rectangular pulse function
)
{
2
1 when z.  a < :t < x. +a (Al7)
I.(x  %. = Oa
otherwise
where a ~ o. This function is shown in Fig. Al2. For this function
t: f..(x  x.) dz = 1
THE IMPULSE FUNCTION 367
for all values of a > O. If now we let a + 0, the width of the pulse
approaches zero and the height approaches infinity, whereas the area
remains constant at unity. The unit impulse function could therefore
be considered to be the limiting member of a sequence of rectangular
pulse functions:
8(x  x
o
) = lim !a(X  X
o
)
4+0
(AI8)
(AI9)
Although the rectangular pulse function is a simple and convenient
prototype for the impulse function, it is a discontinuous function. In
certain problems of interest, it is convenient to use a prototype which
possesses derivatives. One such function is the gaussian pulse function
a
g..(x  x
o)
= .y;: exp [at(x  xoP]
where a > o. This function is also shown in Fig. Al2. Nowt
J+." g..(x  xo) dx = 1
for all values of a > o. Further, the height of go(x  x
o
) increases
toward infinity as a ~ 00, while the skirts collapse in towards zero. The
gaussian pulse function hence satisfies the defining equations for the unit
impulse function in the limit a + 00, and we may set
8(x  xo) = lim g.(x  x
o
) (AItO)
0+
Al2. The Sifting Integral
Consider the integral
t:
I = __ J(x)&(x  %0) dx
where f(x) is continuous at X
o
 From the properties of the unit impulse
function, the integrand of I is nonzero only at the point z = z._ The
only contribution of f(x) to the integral is thus at the point x = X
o
, and
we can write
I = j(xo) J:. 8(x  x.) dx
Hence, using Eq. (Al2),
J: j(x)6(x  xo) dx = j(x.) (AiH)
Thus the effect of integrating the product of some given function and
the unit impulse function centered on the point X
o
is simply to evaluate
the given function at that point. The integral in Eq. (AIII) is known
as the silting integral.
t Cf. Dwight (I, Eq. (861.3).
APPENDIX 368
AI3. Fourier Transforms
The Fourier transform ~ ( u , x o ) of the unit impulse function c1(z  x.) is
Then, from the sifting property of the impulse function,
(Al12)
and hence
~ ( u , x o ) = e
ius
\(u,O) = 1
(AI13)
(Al14)
Formal application of the Fourier inversion integral then gives
and
1 f+
 eiv,ze
ius
du = 8(x  x
o
)
211" 10
1 f+
 Ieiv,s du = cl(x)
2... 1IO
(AI15)
(AI16)
From Eqs. (AI'3) and (Al14), we see that both the unit impulse func
tion 8(x) and its Fourier transform are even functions. Therefore
and
i:
1IO 8(x) cos ux dx = 1
1 f+lIO
2 1 cos ux du = 8(x)
1r .
(Al17)
From Eqs. (Al12) and (AI15) and the identity
we obtain the Fourier transform pair
and (AIIS)
Since both cos UX
o
and the pair of impulses are even functions, these can
also be written in the form
THE IMPULSE FUNCTION 369
Al4:. Derivatives of Impulse Functionst
Since, from Eqs. (Al5) and (Al7), the rectangular pulse function
can be expressed in terms of the unit step function as
1
!a(x  x.) = 2a (U[x  (x.  a)]  U[x  (z, + a)]) (Al20)
the derivative of that function is, from Eq. (Al6),
1
 x.) = 2a {8[x  (z,  a)]  8[x  (z, + a)]} (Al2I)
Suppose now that we integrate the product of this derivative and some
function f(x) which has a continuous derivative at x = X
O
Using Eq.
(AIII) we get
f
+
_..  xo) dx
I f+
= 2a __ !(x){a[x  (z,  a)]  a[x  (x. + a)]l dx
_ !(x.  a)  f(x. + a)
 2a
The limit of this as a + 0 is minus the derivative of f(x) evaluated at
x = x.; i.e.,
lim f +  x.) dx = f'(x
o
)
a"'"0 .
(Al22)
We shall define the derivative of the unit impulse function to be the
appropriate limit of the derivative of some one of its prototypes; thus,
for example,
a'(x  x.) =  x
o
)
tJ+O
We can then rewrite Eq. (Al22) as
f
+
_. f(x)6'(x  x
o
) dz = f'(x
o
)
(Al23)
(A124)
The result of integrating the product of some given function with a
continuous derivative at % = %. and the derivative of the unit impulse
function centered on %. is therefore minus the derivative of the given
function at that point.
The nth derivative of the unit impulse function may similarly be
defined as a limit of the nth derivative of some one of its prototypes.
t Of. van der Pol and Bremmer (I, Chap. V, Art. 10).
370 APPENDIX
It can then be shown that, if f(z) has a continuous nth derivative at
(Al25)
The Fourier transform of the nth derivative of the unit impulse function
is therefore
A..(u,x.) = J+: ~ ( " ) ( x  x.)eitJ.s d
= (ju)fteiu%o
Hence, using Eq. (Al13),
4,,(u,X
o
) = (ju)" 4(U,x
o
)
(Al26)
(Al27)
APPENDIX 2
INTEGRAL EQUATIONS
In Chaps. 6, 9, 11, and 14, we deal with certain linear integral equations.
These take either the form
(A21)
where a, b are constants, R(8,t) is an autocorrelation function, and Aand
t;(t) are unknown, or the form
L" R(',t)x(t) tIt = 1/(')
(A22)
where a, b, R(B,t) are as before, Y(B) is known, and x(t) is unknown. In
either Eq. (A21) or (A22), R(B,t) is called the kernel of the equation.
We shall state here some standard results from the theory of integral
equations pertaining to Eqs. (A21) and (A22).
First, however, some definitions are necessary.
A ~  l . Definitionst
A real or complexvalued function /(t) of a real variable t is said to be
of integrable square on the interval (J S t ~ b if
L" If(t)l' dt < 00
(A23)
From the definition, it follows that if a function f(t) is of integrable
square, so also are its conjugate f(t) and its magnitude IJ(t)l.
For two functions f(t) and get) of integrable square, it can be shown
(using the Schwartz inequality) that the integral
L" f(t)g(t) dt
exists. If
i"f(t)g*(t) tIt = 0
(A24)
t See Courant and Hilbert (I, p. 49). They conaider mOitly the 01_ of piecewiee
oontiDuoUi fUDCtiOD8 which is lubel_ of the functions of intell'able .quare. See
.180 Courant and Hilbert (I, p. 110).
371
372 APPENDIX
the functions J(t) and g(t) are said to be orthogonal on the interval
a s t s b. If
i" Ir(t)1 tit = 1
J(t) is said to be normalized. A class of functions /1I(t) (containing either
finitely or infinitely many member functions) is an orthogonal claBS of
functions if every pair of functions in the class is orthogonal. If, in
addition, every JII(t) is normalized, the class of functions is said to be
orthonormal.
A class of orthogonal functions !1I(t) is complete in the class of functions
of integrable square on the interval a ~ t ~ b if any function get) of
integrable square on a =:; t =:; b can be approximated arbitrarily closelyin
mean square by a linear combination of the !1I(t), i.e., if there are con
stants all so that
We write this
K
lim L" [g(t)  La,J,.(t)rtit = 0
N.. tl
K
g(t) = I.i.m.l a,J,,(t)
K.. t
(A25)
where "Li.m." stands for limit in the mean.
Any function which is continuous except at most at a finite number
of points in any finite interval, at which points the limits from each side
are finite, is called piecewise continuous. Such a function is necessarily
of integrable square over any finite interval a ~ t ~ b. However, there
are functions of integrable square which are not piecewise continuous.
A function K(8,t) of two variables which satisfies
K(s,t) == K*(t,8) (A26)
is symmetric. t If K(l,s) is real valued, this condition reduces, of course, to
K(8,t) = K(t,8)
A symmetric function K(8,t) which has the property that
L" L
b
K(8,t)g(8)g$(t) d8dt ~ 0 (A27)
for any ge,) of integrable square is nonnegative dejinite.t If the ine
quality ~ in (A27) is replaced by > for any g(t) such that
L" Ig(t)12 tit > 0
K(8,t) is positive definite.t As pointed out in Chap. 6, autocorrelation
t See Courant and Hilbert (I, p. 122).
t The terminololY is Dot standardized. Often, for example, our term "nonnega
tive defiaite" ill replaced by .. positive definite" and our U positive definite" by
".triotly positive definitM.'''
INTEGRAL EQUATIONS 373
functions R(B,t) satisfy (A26) and (A27) and hence are nonnegative
definite. They mayor may not be positive definite.
Aii. Theorems
We can now state without proof the fundamental theorems]' for inte
gral equations of the type (A21).
If R(s,t) is symmetric and
L" L" IR(8,t) I' d8de < ClO
(A28)
Eq. (A21) is satisfied for at least one real number A~ 0 and some
function 4(t) such that
A number A and a function .(t) which satisfy Eq. (A21) are called a
characteristic value and an associated characteristic Junction, respectively,
of the integral equation. The following properties all hold:
1. If .1(t) and .2(t) are characteristic functions associated with the
characteristic value ~ , then a.l(t) + bq,t(t) is also a characteristic func
tion associated with ~ , for any numbers a and b. Thus in particular to
each characteristic value A there corresponds at least one normalized
characteristic function.
2. If >.. and >.'" are different characteristic values, then any character
istic functions .,,(t) and f/>",(t) associated with)." and >."', respectively, are
orthogonal.
3. There are at most a countably infinite set (or sequence) of charac
teristic values AI; and for some constant A < 00, fAil < A, all k:
4. For each characteristic value ~ I ; there are at most a finite number
Nil of linearly independent characteristic functions associated with )..
The integer NI; is called the multiplicity of ~ i . Any NI; linearly inde
pendent characteristic functions can be transformed into N
i
orthonormal
characteristic functions by the Gram8chmidt process.
Thus, counting each AI; &8 many times 8S its multiplicity, there is a
sequence ).1, At, (finite or infinite) of characteristic values and a
sequence s, f/>I(t), of orthonormal characteristic functions such
that .,(t) is associated with ).,(t) and such that there are no more charac
teristic functions orthogonal to all the f/J.(t).
t Courant and Hilbert (I, Chap. III, Arts. 4 and 5). Also Riesz and NIIY (I,
p.242).
374 APPENDIX
5. Every function get) of integrable square admits the expansion, con
vergent in meansquare,
where
K
y(t) = h(t) +Li.m. l Y,,4>,,(t)
K.te i=l
y" = l/) g(t)4>:(t) dt
(A29)
(A210)
and where h(t) is some function satisfying
l/)R(s,t)h(t) dt == 0
6. The kernel R(B,t) can be expanded in the series
K
R(s,t) == l.i.m.l A,,4>.(s)4>:(t)
K...... k
(A211)
(A212)
7. If R(B,t) is nonnegative definite, all the nonzero characteristic
values are positive real numbers.
8. If in addition R(8,t) is positive definite, the orthonormal set of
characteristic functions forms a complete orthonormal set, and the h(t) in
Eq. (A29) may be taken to be zero. That is, for any function get) of
integrable square, there is a generalized Fourierseries expansion in terms
of the orthonormal characteristic functions:
K
g(t) = l.i.m. l g,,4>,,(t) (A213)
K..... kl
where the g" are given by Eq. (A210).
In addition to these results, there is Mercer's theorem, t which is super
ficially similar to property 6 above but is stronger in the case to which it
applies: If R(s,t) is nonnegative definite, then
.
R(s,t) = LA,,4>,,(s) 4>:(t)
iI
(A214)
where the convergence is uniform for 8, t, satisfying a 8 S b, a t b.
The integral equation (A22) is closely related to Eq. (A21). Obvi
ously if y(s) is a characteristic function of Eq. (A21) with characteristic
value is a solution of Eq. (A22). More generally, if
yes) = al<1>I(8) + + antP,,(s)
the solution of Eq. (A22) is, again obviously,
X(8) = al 41(1) + . . . + a" 4,,(8)
Al
t Courant and Hilbert. (I, p. 138). Also Riesz and Nagy (I, p. 246},
INTEGRAL EQUATIONS 375
(A215)
where
and the series
This extends, with restrictions, to a yea), which is an infinite linear combi
nation of characteristic functions. A general theorem, due to Picard, t
lays in this context:
Equation (A22) has a solution x(t) of integrable square if and only if
N
y(t) = l.i.m. ~ y"q,ft(t)
N....IO L,
,,1
y" = ll> y(t)cI>:(t) dt
10
\' ly,,11 < co
'' ~ , , 2
,,1
A solution, if one exists, is
N
x(t) = l.i.m. \' ~ " cI>,,(t)
N...  '' 1\"
aI
(A216)
and this is unique if the f/>,,(t) are complete.
Ai3. Rational Spectra
The existence of solutions to the characteristicvalue problem, Eq.
(A21), and some important properties of them are thus established quite
generally. There remains the problem of actually finding the charac
teristic values and functions. For one special class of autocorrelation
functions R(B,t) which is very important in engineering applications,
Eq. (A21) can always be solved directly for the ". and ~ , , ( t ) . The
functions R(a,t) referred to are those which are Fourier transforms of
rational functions. In this case the kernel R(s,t) is R(a  t) (to be
strictly correct, we should introduce a different symbol, since the first R
is a function of two arguments and the second of one) where
R(s  t) = R(.,.) = J eilr/TS(fl df (A217)
S(f) being a nonnegative rational integrable even function. The non
negativeness and integrability of 8(J) are necessary in order that R(T)
be nonnegative definite, i.e., that R(T) be an autocorrelation funetion.]
The evenness of 8(f) makes R(T) real. Introducing the variable p = j21rf
and making use of the special properties of S(f), we can write
(A218)
t See Courant and Hilbert (I, p. 160).
~ See Art. 66.
I See Art. 114.
376 APPENDIX
a 8 b (A219)
where N(p2) is a polynomial of degree n in p" and D(pt) is 'a polynomial
of degree d in pl. D(pl) can have no real roots and d > ft. One can
see very easily heuristically that for f/>(t) to be a solution of Eq. (A21),
it must satisfy a linear differential equation with constant coefficients.
In fact, substituting from Eq. (A218) in Eq. (A21), we have
A4>(8) = 1
6
t/>(t) dt f e"c.I) N(p:) df
G to D(p)
Differentiating the righthand side with respect to 8 is equivalent to
multiplying the integrand by p. Hence
AD(::2) t/>(8) = N (::2) [1
6
t/>(t) dt f ePCoI) df]
=N (::2) i
6
&(8  t)t/>(t) dt (A220)
=N (::2) t/>(a) a < a < b
To solve Eq. (A21), one first solves the homogeneous differential equa
tion (A220). The solution will contain the parameter Xand 2d arbitrary
constants CI, C2, J cu. This solution is substituted for in the
integral equation (A21). It will be found that the integral equation can
be satisfied only for a discrete set of values for A = Xk, k = 1, 2, ... ,
and that for each value the constants CI, , Cu must satisfy certain
conditions. These are the characteristic values. Any which
satisfies Eq. (A220) and the conditions on Cl, , Cu that arise when
). = A. is a characteristic function associated with ).Ie. If there are
more than one linearly independent functions 4>(8) associated with
they may be orthogonalized by the GramSchmidt process] and then
normalized. A rigorous demonstration that Eq. (A220) imposes a
necessary condition on 4>(8), and that solutions of this equation are
guaranteed to yield the characteristic values Ai and the characteristic
functions flle(8), is given by Blepian.] An example of this procedure is
contained in Art. 63.
It is often useful to expand some arbitrary function of integrable square
in a generalized Fourier series as in Eq. (A213). Thus it is useful to
know when the flle(l) form a complete set. One sufficient condition for
the characteristic functions of
L
6
R(s  t)t/>(t) dt = ).t/>(8)
t See Courant and Hilbert (I, p. 50).
*Slepian (I, Appendix 1).
INTEGRAL EQUATIONS 377
to form a complete set is that R(t) be the Fourier transform of a spectral
density. t Thus, in the case discussed above, where R(t) is the transform
of a rational spectral density, the t/>Ic(l) are always complete.
Again, if the autocorrelation function which is used for the kernel
satisfies Eq. (A218), the integral equation (A22) can be treated directly
by elementary methods.] We do this here, restricting ourselves to the
case of real R(t). There is some similarity between this discussion and
Slepian's treatment of the related characteristicvalue problem.
Exactly as in the characteristicvalue problem, it can be shown that a
solution x(l) of the integral equation must satisfy a linear differential
relation. In this case it is
a<t<b (A221)
We consider first the case N(p) == 1. Then, from Eq. (A221), if a solu
tion exists, it has to be
x(t) = D (;t:) yet) a < t< b (A222)
The problem now is to show under what conditions on yet) x(t) as given
by Eq. (A222) satisfies the integral equation. We find these conditions
by substituting x(t) from Eq. (A222) back in the integral equation and
performing repeated integrations by parts.
First, we need to establish some facts about the behavior of R(t).
From Eq. (A218), for the case we are considering,
k = 0, 1, 2, ... , 2d  2 (A224) and
f
 eP'
R(t) = _. D(p2) df
f
 i pC
RCIc)(t) = _p_6
1
_ df
__ D(P)
(A223)
These integrals converge absolutely and RCk)(t) exists for all t for k ~
2d  2. Also, by a conventional residue argument, if r is a closed con
tour containing all the residues in the top half f plane,
t > 0 (A225)
From (A225) it follows that R(t) has derivatives of all orders for t > 0,
and
RCJ:)(t) = rp"e
71
df
Jr D(p2)
t > 0, k = 0, 1, 2, . . (A226a)
t Bee Youla (I, Appendix A) or Root and Pitcher (II, Appendix I).
t Bee Zadeh and Ragazzini (I).
378 APPENDIX
Similarly, if I" is a closed contour containing all the residues in the
bottom half f plane,
R(lt,) (t) = { pie'" d ~ t < 0 k 0 1 2 (An '26b)
} r D(p2) 'J ,= , , , k
Let us calculate R<2
d
 I) (0+)  R<2
e1l)(O).
Let the coefficient of ptel in
D(pt) be au, then
()
21r)1"1 1 lUl
e
;2_"
R(2dl)(l) = J df
au(j2...) 2d r (I  Zt) (/  ZteI)
=.! \' Iresidues in top half plane}
au ''
i 0
t> 0
(A227)
where the residues in the summation are those of the integrand of the
integral in (A227). Similarly
RIUl)( t) = _.! \' {residues in bottom half plane} t >0
au ''
t> 0
t > 0 (A228)
where the residues in the summation are those of the same function as
above. Hence
R
I2dl)(t)
 RIUl)( t) == J:.. "residues
au ''
and
R(241) (0+)  R(2.11) (0) = 1. lim " residues
au '.0 ''
1
(A231)
(A229)
(A230)
and
From Art. 114we know that D(p2) can be factored into D
1
(j 2rJ) Dt (j 2rf )
where D
1
(j 2Tf ) has all its zeros in the bottom half f plane and D
t
(j 2rJ)
has all its zeros in the top half f plane. Hence, from Eq. (A225)
D
2
(;i) R(t) = 1D : ~ ~ ) df = 0 t > 0
D
1
(;i) R(t) = II" D : ~ ~ ) df = 0 t < 0
D ( ~ : ) R(t) = 0 t ;o!' 0
We now write
tl
D(pl) = 2au:pU:
iO
(A232)
o<t<b
80
Then,
INTEGRAL EQUATIONS
Ii
:t:(t) co ~ an ( ~ ) 2. y(t)
'0
379
(A233)
II
L' :t:(1')R(t  1') d1'  l at.D y(2I:l(T)R(t  T) dT
~  o
d
+ l a2. Ie' yU"(T)R(t  T) dT
iO
Integrating once by parts yields
II
L' x(T)R(t  T) dT  l 421:y(n1I(T)R(t  T) I:
iI
d II
+ l a2l:y(2tll(T)R(t  T) I: + l au Dy(2i:
1
l(T) R' (t  T) dT
iI iI
d
+ Lau f,' y(Ul)(T)R'(t  T) dT +ao L' Y(T)R(t  T) dT (A234)
iI
The terms evaluated at .,. = t cancel in the two sets of terms which are
integrated out. The other terms which are integrated out are
d d
 [l a2l:y(II:l) (a) ] R(t  a) +[l a2l:1/(I'1)(b) ] R(t  b) (A235)
iI iI
Integrating by parts once more, the third and fourth sets of terms in
Eq. (A234) yield
4 d
l auy(U2l(T)R'(t  1') t+ l auy(U21(1')R'(t  1') t
iI iI
d d
+ La2k L' y(2I<
2l(T)R"(t
 T) + Lau f,' y(2I<21(T)R"(t  T) dT
kl iI
(A236)
The terms evaluated at .,. == t in the two sets of terms which are inte
grated again cancel. The other terms which are integrated out are
d d
 [l an1/(1I:2)(a) ] R' (t a) + [l au1/(IA:I)(b)] R'(t  b) (A237)
iI iI
The process of integrating by parts is repeated 2d times, leaving the
integrals of the form
380 APPENDIX
a s t s b (A239)
unintegrated as they occur at every other step. At each step except the
last one, the middle terms in the integratedout terms cancel algebraically.
Linear combinations of derivatives of y at the two boundary points multi
plying a derivative of R(t), such as Eqs. (A235) and (A237), are left.
Thus, after the 2d integrations by parts, we have
L
6
x(7')R(t  7') d7' = YoR(t  a) +ZoR(t  b) + Y1R'(t  a)
+ ZlR'(t  b) + + y u_
2
R(l d I )(t  a) + Zu_
1
R(2t1 2)(t  b)
+ auy(l) R(241) (0+)  auy(l)R(2t11)(O)
24
+ L
6
y(7') LatiR(ti'(t  7') d7' (A238)
iO
where the Yk's and Z,/s are linear combinations of derivatives of Yet)
at a and b respectively. But by Eq. (A231), the last integral in Eq.
(A238) vanishes; and using Eq. (A228), we have
L
6
x(7')R(t  7') d7' = Yet)
2d2 2412
+ L YJ:RU')(t  a) + LZJ:R(J:)(b  t)
iO iO
Thus x(t) as given by Eq. (A218) satisfies the integral equation if and
only if the linear homogeneous boundary condition on y(l) given by
2d2 2d2
1 YJ:R(J:'(t  a) + 1 Z"R(J:)(b  t) = 0
iO iO
(A240)
is satisfied identically in t, a S t ~ b. It is sufficient for Eq. (A240)
that all the y ~ and Zk be zero, but it is not necessary. In fact, since
D
2
(p) is of degree d, and
it follows that each RCk)(t  a), k = d, d + 1, ... , 2d  2, can be
expressed as a linear combination of the first d  1 derivatives of
R(t  a), t > a. Since the first 2d  2 derivatives exist everywhere
and are continuous, this property also holds at t  a == 0, or t = a.
Thus, the first sum in Eq. (A240) can be replaced by a sum of d  1
linearly independent terms. An analogous argument allows the second
sum in Eq. (A240) to be replaced by a sum of d  1 linearly inde
pendent terms. Thus there are left 2d  2 linearly independent con
ditions to be satisfied by Yet) and its derivatives at the end points a and b.
INTEGRAL EQUATIONS 381
It may be observed that, since Eq. (A22) 'certainly has a solution if
yea) is a characteristic function of the associated equation (A21) with
the same kernel, all these characteristic functions satisfy the linear
boundary conditions, Eq. (A240).
If Eq. (A22) with R(t) as given by Eq. (A223) does not have a solu
tion, properly speaking, because yes) does not satisfy Eq. (A240), it still
always has a singular solution, i.e., a solution including impulse functions
and their derivatives. To see this, add to the :e(t) of Eq. (A223) a sum
242
l [blotl<lo)(t  a) + Clotl<lo)(t  b)]
iO
(A241)
with as yet undetermined coefficients. When this sum is substituted
into the left side of Eq. (A22), there results
242
l[bIoR<Io)(t  a) + cIoR<Io)(b  t)l
'iO
(A242)
Hence, by choosing b
i
==  Y. and Ci = Z., the boundary conditions
are eliminated and the integral equation can always be satisfied.
Now we remove the restriction that N(p2) == 1; that is, we only sup
pose that R(t) is given by Eq. (A218). We have stated already that
a solution x(t) of the integral equation must be some solution of the
differential equation (A221). We show now that Eq. (A22) with R(t)
8S given by Eq. (A218) has a solution if and only if some one of a set of
associated equations of the type just considered has a solution.
Let z(t) be a solution, unspecified, of
N (;;2) z(t) = y(t)
(A243)
and consider the integral equation
i" X(T)ll(t  T) dT ::: 2(t) a ~ t ~ b (A244)
A f e'"
where R(t)::: _. D(p2) df
Since N (::2) i" X(T)ll(t  T) dT = i" X(T)R(t  T) dT ... y(t)
if x(,.) is a solution of Eq. (A244) for any z(t) satisfying Eq. (A243),
it is a solution of the desired equation (A22). Conversely, if x(,.) is a
.solution of Eq. (A22) with R(t) as given by Eq. (A218) then
N (:t:) /." X(T)ll(t  T) dT = N (:t:) Zl(t)
382 APPENDIX
,vhere'I(') is chosen arbitrarily from among the solutions of Eq. (A243).
Let
E(I) ... J: :t(r)R(t  1') d1'  Zl(t) a ~ t ~ b (A245)
then N (;:) E(t) := 0
Let let) = 11(t) + E(t) (A246)
then '(t) is a solution of Eq. (A243), and by Eq. (A245) x(1") is a solu
tion of Eq. (A244) with I(t) == '(l). Thus x(1') is a solution of Eq. (A22)
if and only if it is & solution of Eq. (A244) for some ,et) satisfying Eq.
(A243).
The practical procedure for solving Eq. (A22) is therefore to operate
on the general solution of Eq. (A243) by D(d"/dt") and substitute the
resulting expression back in the integral equation. If the integral equa
tion has a proper solution, it is then guaranteed that, for some choice of
the undetermined constants, this expression is it. Otherwise, impulse
functions and their derivatives must be added, as explained above.
BmLIOGRAPHY
Bartlett, M. S.:
I. "An Introduction to Stochastic Processes," Cambridge University Press, New
York, 1955.
Bennett, William R.:
I. Response of a Linear Rectifier to Signal and Noise, Jou,. Acomtical Soc.
America, 11 (3): 165172, January, 1944.
II. Methods of Solving Noise Problems, Proc. IRE, ,,: 609638, May, 1956.
and S. O. Rice:
I. Note on Methods of Computing Modulation Products, Philosophical Maga,ine,
series 7, 18: 422424, September, 1934.
BlancLapierre, A n d r ~ , and Robert Fortet:
I. "Theorie des functions aleatoires," Masson et Cie, Paris, 1953.
Bode, Hendrik W.:
I. "Network Analysis and Feedback Amplifier Design," D. Van Nostrand,
Princeton, N.J., 1945.
and Claude E. Shannon:
I. A Simplified Derivation of Linear Leastsquare Smoothing and Prediction
Theory, Proc. IRE, S8: 417426, April, 1950.
Booton, Richard C., Jr.:
I. An Optimization Theory for Timevarying Linear Systems with Nonstationary
Statistical Inputs, Proe. IRE, '0: 977981, August, 1952.
Bose, A. G., and S. D. Pelaris:
I. A Theorem Concerning Noise Figures, IRE Convention Record, part 8, pp.
3541, 1955.
Bunimovich, V. I.:
I. "Fluctuation Processes in Radio Receivers," Sovietsko Radio, 1951 (in
RU88ian).
Burgess, R. E.:
I. The Rectification and Observation of Signals in the Presence of Noise, Philo
&ophical Magazine, series 7, U (328): 475503, May, 1951.
Callen, Herbert B., and Theodore A. Welton:
I. Irreversibility and Generalized Noise, Phy,. u, 83 (1): 3440, July 1, 1951.
Camap, Rudolf:
I. "Logical Foundations of Probability," University of Chicago Press, Chicago,
1950.
Chessin, P. L.:
I. A Bibliography on Noise, IRE Trani. on lnlorm,alion Thu1'lI, ITl (2): 1531,
September, 1955.
383
384 BIBLIOGRAPHY
Churchill, Ruel V.:
I. "Fourierseries and Boundaryvalue Problems," McGrawHill Book Co., New
York, 1941.
II. II Modern Operational Mathematics in Engineering," McGrawHill Book Co.,
New York, 1944.
Courant, Richard:
I. "Differential and Integral Calculus," I, rev. ed., 1937; II, 1936, Interscience
Publishers, New York.
and D. Hilbert:
I. "Methods of Mathematical Physics," It Interscience Publishers, New York,
1953.
Cram6r, Harald:
I. "Mathematical Methods of Statistics," Princeton University Press, Princeton,
N.J., 1946.
Davenport, Wilbur B., Jr.:
I. SignaltoNoiae Ratios in Bandpass Limiters, Jour. Applied PAJlrica, .4 (6):
720727, June, 1953.
, Richard A. Johnson, and David Middleton:
I. Statistical Errore in Measurements on Random Time Functions, Jour. Applied
PhllMCI, 18 (4): 377388, April, 1952.
Davia, R. C.:
I. On the Theory of Prediction of Nonstationary Stochastic Processes, Jour.
Applied PAyma, 18: 10471053, September, 1952.
Doob, .Joseph L.:
I. Time Series and Harmonic Analysis, Proe. Berkeley Symposium on