# Estimation and Detection

Lecture 9: Introduction Detection Theory
(Chs 1,2,3)

Dr. ir. Richard C. Hendriks – 9/12/2015
1

Example – Speech Processing
In speech processing applications a VAD is commonly used, e.g.,
• In speech enhancement: to determine whether speech is present or not. If speech is

not present, the remaining signal consists of noise only and can be used to estimate
the noise statistics.

• Speech coding: Detect whether speech is present. If speech is not present, there is
no need for the device (phone) to transmit any information.

2

Example – Speech Processing
A VAD can be implemented using a Bayesian hypothesis test:
H0 :

Yk (l) = Nk (l) (speech absence)

H1 :

Yk (l) = Sk (l) + Nk (l) (speech absence)

Base on statistical models for S and N and the right hypothesis criterium, we can automatically decide whether speech is absent or present.
(more details in our course speech processing in the course Digital audio and speech processing,
IN4182, 4th quarter).

How to optimally make the decision? ) Detection theory.
3

85 Ghz) • extremely accurate pulse sources. • Radiation can only be observed when the beam of emission is pointing toward the earth (lighthouse model) • wideband (100 Mhz .Example – Radio Pulsar Navigation Pulsars (pulsating star): • Highly magnetized rotating neutron star that emits a beam of electromagnetic radiation. Kramer (University of Manchester) 4 .

• Pulsars are "ideal" for time-of-arrival • pulsar signals are weak (SNR = -90 dB) How to optimally make the decision? ) Detection theory. 5 .Example – Radio Pulsar Navigation For some millisecond pulsars. the regularity of pulsation is more precise than an atomic clock.

. To arrive at a decision.What is Detection Theory? Definition Assume a set of data {x[0]. Determining the function T and its mapping to a decision is the central problem addressed in Detection Theory. . . . . x[N 1]) and then make a decision based on its value. . x[1]. . . 7 . first we form a function of the data or T (x[0]. x[N 1]} is available. x[1].

then the problem becomes a multiple hypothesis testing problem. One example is detection of different digits in speech processing.The Simplest Detection Problem Binary detection: Determine whether a certain signal that is embedded in noise is present or not. 8 . H0 x[n] = w[n] H1 x[n] = s[n] + w[n] Note that if the number of hypotheses is more than two.

we might arbitrarily choose one of the possibilities. However the probability of such a case is zero! 9 . H0 : x[0] = w[0] H1 : x[0] = 1 + w[0] One possible detection rule: H0 : H1 : 1 2 1 x[0] > 2 x[0] < for the case where x[0] = 12 .Example (1) Detection of a DC level of amplitude A = 1 embedded in white Gaussian noise w[n] with variance 2 with only one sample.

we are essentially asking weather x[0] has been generated according to the pdf p(x[0]. 10 . H0 ) or the pdf p(x[0].Example (2) The probability density function of x[0] under each hypothesis is as follows p(x[0]. H1 ) = = p p 1 2⇡ 1 2 2⇡ 2 exp exp ⇣ ⇣ 1 2 2 2 1 2 2 x [0] (x[0] ⌘ 1) 2 ⌘ Deciding between H0 and H1 . H0 ) p(x[0]. H1 ).

it will be more or less likely to make a decision error. H1 ) E(T . a decision rule can be formulated. • How to make an optimal decision? • The data under both H0 and H1 can be modelled with two different pdfs. H0 ) 2 11 . H0 ))2 d = var(T . A typical example: N 1 1 X T= x[n] > N n=0 • The detection performance will increase as the "distance" between the pdfs under both H0 and H1 increases. Using these pdfs. • Performance measure: deflection coefficient (E(T .Detection Performance • Can we expect to always make a correct decision? Depending on the noise variance 2 .

Today: • Important pdfs • Neyman-Pearson Theorem • Minimum Probability of Error 12 .

Important pdfs – Gaussian pdf p(x) = p where µ is the mean and 1 2⇡ 2 2 exp ⇣ 1 2 2 (x µ) 2 is the variance of x. Standard normal pdf: µ = 0 and 2 ⌘ 1 < x < +1 =1 The cumulative distribution function (cdf) of a standard normal pdf: (x) = Z ⇣ 1 p exp 2⇡ 1 x 1 2⌘ t dt 2 A more convenient description is the right-tail probability which is defined as Q(x) = 1 (x). 13 . This function which is called Q-function is used frequently in different detection problems where the signal and noise are normally distributed.

3 0.4 Q(x) 0.35 )(x) 0.9 0.Important pdfs – Gaussian pdf 1 0.25 Gaussian pdf cdf / 1-cdf 0.5 0.1 0 -20 -15 -10 -5 0 x 5 10 15 20 0 -20 -15 -10 -5 0 5 10 15 20 x 14 .7 0.15 0.2 0.3 0.4 0.2 0.1 0.6 0.05 0.8 0.

25 8=2 (Exponential pdf) 0. if xi is a standard normally distributed random variable. The chi-squared pdf with v degrees of freedom is defined as 8 v > < v 1 x 2 1 exp 12 x .4 exp( t)dt 0. x<0 and is denoted by 2 v.2 0. where x = v P i=1 x2i .45 1 tu 1 0.05 0 0 10 20 30 40 50 x 60 70 80 90 15 100 . v is assumed to be integer and v 1.Important pdfs – central Chi-squared A chi-squared pdf arises as the pdf of x.35 0 0.1 0.15 8=20 (approaching Gaussian) 0. x > 0 22 (v 2) p(x) = > : 0. The function Gamma function and is defined as 0.5 Z 0.3 @2 pdf (u) = (u) is the 0.

then x has a noncentral chi-squared pdf with v degrees of freedom and noncentrality Pv parameter = i=1 µ2i . x>0 2 2 (x + ) I 2 1 p(x) = : 0. Gaussian random variables with mean µi and variance = 1. x<0 2 16 .d.Important pdfs – non-central Chi-squared If x = v P i=1 x2i . The pdf then becomes 8 h i ⇣p ⌘ v 2 1 < 1 x 4 exp v x .i. where xi ’s are i.

• Neyman-Pearson Theorem: Maximize detection probability for a given false alarm probability. rules can be derived on how to chose . • Minimum probability of error • Bayesian detector 17 .Making Optimal Decisions Remember the example: H0 : x[0] < H1 : x[0] > Using detection theory.

1). 3. 18 . p(x[0].Introduction Example: Assume that we observe a random variable whose pdf is either N (0. 1) or N (1. H1 ) > p(x[0]. Notice that two different type of errors can be made. in this example for x[0] > 12 .2 and 3. Our hypothesis problem is then: H0 : µ=0 H1 : µ=1 Detection rule: H0 : H1 : 1 2 1 x[0] > 2 x[0] < Hence. H0 ). S.3.Neyman-Pearson Theorem . Kay – detection theory Figs.

Probability of false alarm: PF A = P (H1 . PM = 1 PD is used. Probability of detection: PD = P (H1 . H0 ) 2. probability of miss detection.Neyman-Pearson Theorem – Detection Performance Detection performance of a system is measured mainly by two factors: 1. H1 ) Note that sometimes instead of probability of detection. 19 .

• False alarm probability PF A = P (H1 . H1 ) • To design the optimal detector. the Neyman-Pearson approach is to maximise PD while keeping PF A fixed (small).2 and 3. 20 . H0 ) • Detection probability PD = P (H1 .3. • It is not possible to reduce both error probabilities.Neyman-Pearson Theorem – Detection Performance • These two errors can be traded off against each other. 3. S. Kay – detection theory Figs.

Our goal is to design T so as to maximize PD subject to PF A < ↵. x[1]..Neyman-Pearson Theorem Problem statement Assume a data set x = [x[0]. x[N 1]]T is available. . The detection problem is defined as follows H0 : T (x) < H1 : T (x) > where T is the decision function and is the detection threshold... 21 .

H0 ) is found from PF A = Z {x:L(x)> } p(x. H0 )dx = ↵ The function L(x) is called the likelihood ratio and the entire test is called the likelihood ratio test (LRT). 22 .Neyman-Pearson Theorem To maximize PD for a given PF A = ↵ decide H1 if L(x) = where the threshold p(x. H1 ) > p(x.

H0 ) > 0 23 . H1 )dx + p(x. H1 ) + p(x. As we want to maximise F . H1 ) + p(x.Neyman-Pearson Theorem . use Lagrangian: F = = = PD + (PF A ↵) ✓Z Z p(x. So.Derivation max PD subject to PF A = ↵ Constraint optimization. a value x should only be included in R1 if it increases the integrand. H0 )) dx ↵ ↵ ◆ R1 The problem now is (see Figures) to select to right range R1 and R0 . x should only be included in R1 if p(x. H0 )dx R R1 Z 1 (p(x.

Neyman-Pearson Theorem . so p(x. H0 ) > 0 ) A likelihood ratio is always positive. 24 . H0 ) = > 0 (if > 0 we would have PF A = 1) p(x. H0 ) where is found from PF A = ↵. H1 ) > p(x. p(x. H1 ) + p(x. H1 ) > .Derivation p(x.

Now the NP > Taking the logarithm of both sides and simplification results in N 1 2 1 X A x[n] > ln + = N n=0 NA 2 0 25 . 1. . .Neyman-Pearson Theorem – Example DC in WGN Consider the following signal detection problem H0 : x[n] = w[n] n = 0. . 1. . . N H1 : x[n] = s[n] + w[n] n = 0. N 1 1 where the signal is s[n] = A for A > 0 and w[n] is WGN with variance detector decides H1 if 1 N (2⇡ 2 ) 2 1 exp N (2⇡ 2 ) 2 h exp 1 1 n=0 (x[n] A) PN 1 2 i 1 n=0 x [n] 2 2 2 2 h PN 2 i 2 . . . .

H0 ) = Q p PD = P r(T (x) > 0 0 2 /N ◆ . To deter- mine the detection performance. we first note that the test statistic T (x) = x ¯ is Gaussian under each hypothesis and its distribution is as follows 8 < N (0. 2 ) under H0 H1 N We have then PF A = P r(T (x) > 0 ✓ . H1 ) = Q ) 0 p 0 = A 2 /N q 2 N Q 1 (PF A ) and ! PD and PF A are related to each other according to the following equation ! r Signal energy-to-noise ratio 2 NA 1 PD = Q Q (PF A ) 2 26 . 2 ) under N T (x) ⇠ : N (A.Neyman-Pearson Theorem – Example DC in WGN The NP detector compares the sample mean x ¯= 1 N PN 1 n=0 x[n] to a threshold 0 .

Further notice that the detection performance (PD ) increases monotonic with the deflection coefficient. H0 ) 2 In this case d2 = N A2 2 . 27 .Neyman-Pearson Theorem – Example DC in WGN PD = Q Q 1 (PF A ) r N A2 2 ! Remember the deflection coefficient (E(T . H0 ))2 d = var(T . H1 ) E(T .

Neyman-Pearson Theorem – Example Change Var Consider an IID process x[n]. Neyman-Pearson test: 1 N (2⇡ 12 ) 2 1 N (2⇡ 02 ) 2 exp exp h h 1 2 12 1 2 02 PN 1 2 n=0 x[n] PN 1 2 n=0 x [n] i i> 28 . with 2 1 > H0 : x[n] ⇠ N (0. 2 1 ). 2 0. 2 0) H1 : x[n] ⇠ N (0.

Neyman-Pearson Theorem – Example Change Var Taking the logarithm of both sides and simplification results in 1 2 we then have with 0 = 2 N ln +ln 1 2 0 1 2 1 2 1 2 0 ✓ 1 2 1 1 2 0 ◆ NX1 N x [n] > ln + ln 2 n=0 2 N 1 1 X 2 x [n] > N n=0 2 1 2 0 0 What about PD ? 29 .

PF A = 2Q ) q 0 PD = 2Q =Q Q 1 1 p p ✓ 0 2 0 ! . ⇢ ⇢ q q 0 0 PF A = P r |x[0]| > . H0 . H0 = 2P r x[0] > .Neyman-Pearson Theorem – Example Change Var For N = 1 we decide for H1 if: q 0 |x[0]| > . ◆q 1 2 PF A 0 2 p ! 1 2 0 2 PF A p 2 1 30 .

31 . For the former DC level detection example.Receiver Operating Characteristics The alternative way of summarizing the detection performance of a NP detector is to plot PD versus PF A . Note that here N A2 2 = 1. the ROC is shown here. This plot is called Receiver Operating Characteristics (ROC).

Minimum Probability of Error Assume the prior probabilities of H0 and H1 are known and represented by P (H0 ) and P (H1 ). The probability of error. Pe . It is shown that the following detector is optimal in this case p(x|H1 ) P (H0 ) > = p(x|H0 ) P (H1 ) In case P (H0 ) = P (H1 ). respectively. 32 . is then defined as Pe = P (H1 )P (H0 |H1 ) + P (H0 )P (H1 |H0 ) = P (H1 )PM + P (H0 )PF A Our goal is to design a detector that minimizes Pe . the detector is called the maximum likelihood detector.

R1 ◆ p(x|H1 )dx + P (H0 ) R1 [P (H0 )p(x|H0 ) Z p(x|H0 )dx R1 P (H1 )p(x|H1 )] dx R1 33 .Derivation Pe = = P (H1 )P (H0 |H1 ) + P (H0 )P (H1 |H0 ) Z Z P (H1 ) p(x|H1 )dx + P (H0 ) p(x|H0 )dx R0 We know that Z such that Pe = = R1 p(x|H1 )dx = 1 R0 ✓ P (H1 ) 1 Z P (H1 ) + Z Z p(x|H1 )dx.Minimum Probability of Error .

so an x should only be included in the region R if the integrand [P (H0 )P (x|H0 ) P (H1 )P (x|H1 )] is negative for that x. P (H0 )p(x|H0 ) p(x|H1 ) p(x|H0 ) < > P (H1 )p(x|H1 ) P (H0 ) = P (H1 ) 34 .Minimum Probability of Error .Derivation Pe = P (H1 ) + Z [P (H0 )p(x|H0 ) P (H1 )p(x|H1 )] dx R1 We want to minimize Pe .

Minimum Probability of Error– Example DC in WGN Consider the following signal detection problem H0 : x[n] = w[n] n = 0. Now the min. . .5). 1. = 1 (assuming P (H0 ) = P (H1 ) = A) PN 1 2 i 1 n=0 x [n] 2 2 2 2 h > 2 2 i >1 Taking the logarithm of both sides and simplification results in N 1 1 X A x[n] > N n=0 2 35 . . . 1. N H1 : x[n] = s[n] + w[n] n = 0. . N 1 1 where the signal is s[n] = A for A > 0 and w[n] is WGN with variance probability of error detector decides H1 if 0. . leading to 1 N (2⇡ 2 ) 2 1 exp N (2⇡ 2 ) 2 h exp p(x|H1 ) p(x|H0 ) 1 P (H0 ) P (H1 ) PN 1 n=0 (x[n] . . .

Minimum Probability of Error– Example DC in WGN Pe is then given by Pe = = = = 1 [P (H0 |H1 ) + P (H1 |H0 )] 2 " # N 1 N 1 1 1 X 1 X P r( x[n] < A/2|H1 ) + P r( x[n] > A/2|H0 ) 2 N n=0 N n=0 " !! !# 1 A/2 A A/2 1 Q p +Q p 2 /N 2 /N 2 ! r 2 NA Q 4 2 36 .

if P (H1 ) = P (H0 ) reduces again to the ML detector. 37 .Minimum Probability of Error – MAP detector Starting from p(x|H1 ) P (H0 ) > = p(x|H0 ) P (H1 ) we can use Bayes’ rule: p(Hi |x) = p(x|Hi )p(Hi ) p(x) we arrive at p(H1 |x) > p(H0 |x). this is called the MAP detector. which.

C11 P (H1 ) 38 . Minimizing the expected costs we get R = E[C] = 1 X 1 X 0=1 j=0 Cij P (Hi |Hj )P (Hj ) If C10 > C00 and C01 > C11 the detector that minimises the Bayes risk is to decide H1 when p(x|H1 ) C10 > p(x|H0 ) C01 C00 P (H0 ) = .Bayes Risk A generalisation of the minimum Pe criterion is one where costs are assigned to each type of error: Let Cij be the cost if we decide Hi while Hj is true.