You are on page 1of 46

IN5340/IN9340

Detection theory
Lecture IV, Version 1.1

Andreas Austeng

Dept. of Informatikk, University of Oslo

March 2022
Detection versus estimation

Detection:
The ability to discern between information-bearing patterns (signals) and random
patterns that distract from the information (noise).
Used for (automated) scanning of (large) data sets for the purpose of detecting,
localizing, and identifying particular targets of interest.

Estimation:
Extension of the detection part. It answers the question as to how much of a
particular item of interest there is (i.e., a fine localization in the parameter space)
and provides a measure of the accuracy or confidence.
Estimation

Some typical estimation examples are


I Frequency estimation (i.e. determine the actual Doppler shift).
I Differential time delay estimation (i.e. determine the distance or differential distance to a
target).
I Amplitude estimation (i.e. target strength, size, distance).
I Phase and/or bandwidth estimation (i.e. target identification).
I Spectral estimation (i.e. target identification).
I Analog wave form estimation (i.e. original noise-free recovery of the original wave form using
optimal filtering).
Detection

Some typical detection examples are


I Radar: passive, active, or bi-static (i.e. it uses an antenna and tests for target presence).
I Sonar: bi-static, active, or passive (i.e. it uses hydrophones and tests for target presence).
I Digital communication: coded binary words (symbols) (i.e. one wants to detect transmitted,
possible encoded, message bits consisting of binary zeros and ones).
I Acoustic detection (i.e. it uses microphones): for intrusion alarm, emitter detection
(i.e.gunnery, helicopter sound detection/localization etc.).
I Seismic detection (i.e. it uses geophones): to detect an earthquake, tunnel digging, nuclear
testing etc.
Decision model
The spaces involved ...

Source: Detection theory : applications and digital signal processing / Ralph D. Hippenstiel; Figure 1.1.
Decision model (cont.)
E: The event space

Examples of events can be


I Messages: alphabetical, numeric, Morse code, etc.
I Parameters: frequency, phase angle, object moving or not, object speed, etc.
I Targets: target present/target absent, friend/foe, etc.
The message may be in a discrete or continuous form, may be a random variable, random vector, or
a realization of a random process.
Example
𝑒 𝑖 , 𝑖 = 0, . . . , 𝐼 − 1,
I i.e. in the event space E, one of the 𝐼 possible events can happen.
I We may or may not know the a priori probability 𝑃𝑟 {𝑒 𝑖 } of the event 𝑒 𝑖 .
Decision model (cont.)
S: The signal space

The events are converted into representative signals.


Example (Event encoding, Ex 1.8)
Events: Signals:
𝐴 = 𝑒 0 → 𝑠0 (𝑡) : signal representing event A
𝐵 = 𝑒 1 → 𝑠1 (𝑡) : signal representing event B
𝐶 = 𝑒 2 → 𝑠2 (𝑡) : signal representing event C
.. .. ..
. . . for 0 ≤ 𝑡 ≤ 𝑇

Example (Intrusion alarm, Ex 1.12)


Events: Signals:
𝑒 0 (no intruder) → 𝑠0 (𝑡) = 𝐴 cos(𝜔0 𝑡 + 𝜃)
𝑒 1 (intruder present) → 𝑠1 (𝑡) = 𝐴 cos( [𝜔0 + 𝜔 𝐷 (𝑡)]𝑡 + 𝜃)
where 𝜔 𝑑 (𝑡) is a time dependent frequency shift.
Decision model (cont.)
R: The observer space

The signals are observed (i.e. obtained) after having passed through some channel (i.e. the medium).
In general, the received data is the transmitted signal plus an additive noise component:
𝑟 (𝑡) = 𝑠(𝑡) + 𝑛(𝑡)
Example (Target detection, Ex 1.13)
Events: Received data:
𝑒 0 (no target) → 𝑟 0 (𝑡) = 𝑛(𝑡), for 0 ≤ 𝑡 ≤ 𝑇 .
𝑒 1 (target present) → 𝑟 1 (𝑡) = 𝐴 𝑠(𝑡 − 𝑡 𝑅 ) + 𝑛(𝑡), for 𝑡 𝑅 ≤ 𝑡 ≤ 𝑇 + 𝑡 𝑅 ,
where 𝐴 is the signal loss coefficient and 𝑡 𝑅 the propagation delay. (No Doppler shift assumed).
Decision model (cont.)
D: The decision space

I Usually: dim(𝐷) = dim(𝐸)


I We will focus on:
How to obtain a mapping from 𝑅 to 𝐷.
I The mapping must be such that:
I An unambiguous decision is made.
I Every outcome must lead to some decision.

Example (Multiple decision, Ex 1.16)


𝑑𝑖 : choose ℎ𝑖 ,
the hypothesis that the event 𝑒 𝑖 occurred, meaning that the observation lie in the space (region) 𝑅𝑖 ,
for 𝑖 = 0, 1, 2, 3
The Gaussian (normal) density function

2
− ( 𝑥−𝑚2𝑥 )
If 𝑥 ∼ 𝑁 (𝑚 𝑥 , 𝜎 2 ), then 𝑓 𝑋 (𝑥) = √ 1 𝑒 2𝜎
2𝜋 𝜎

I Pdf symmetric about the mean 𝑚 𝑥 .


I Most likely value 𝑥 == mean.
I Pdf completely described by mean and variance.
I 𝑥 Independent ⇔ 𝑥 uncorrelated.
I 𝑥 is closed under scalar multiplication and typical linear transformations.
This means that filtering, convolution, integration, differentiation, all common transforms
(Hilbert, Fourier, Laplace, Z-transform) retain the characteristics of Gaussian random
variables.
I Central limit theorem.
I This model fits many physical phenomena.
Probability

Noise only

fn(x)

x x
1 2

I The probability that a single random value would be between 𝑥1 and 𝑥2 (given the probability
density function 𝑓 𝑋 (𝑥)): ∫ 𝑥2
𝑃(𝑥1 < 𝑥 < 𝑥2 ) = 𝑓 𝑋 (𝑥)𝑑𝑥
𝑥1
∫∞
I For any pdf: 𝑓 𝑋 (𝑥) 𝑑𝑥 = 1.
−∞
I And if 𝑥 ∼ 𝑁 (𝑚 𝑥 , 𝜎 2 ) :
I 𝑃(𝑚 𝑥 − 𝜎 < 𝑥 < 𝑚 𝑥 + 𝜎) = 0.6827, 𝑃(𝑚 𝑥 − 2𝜎 < 𝑥 < 𝑚 𝑥 + 2𝜎) = 0.9545
I 𝑃(𝑚 𝑥 − 3𝜎 < 𝑥 < 𝑚 𝑥 + 3𝜎) = 0.9973
Right tail probability

I In matlab:
I Given the standard normal distribution
p = normspec([-Inf,1],0,1,’outside’)
(zero-mean, unit variance):
2
𝑓 𝑋 (𝑥) = √1 𝑒 − 𝑥2 . 0.4
Probability Greater than Upper Bound is 0.15866

2𝜋
I Then the cumulative distribution is: 0.35

∫𝑥 𝑡2
Φ(𝑥) = √1 −∞ 𝑒 − 2 . 0.3

2𝜋
I The probability of exceeding a given value, 0.25

Density
𝑄(𝑥), is given by the complementary 0.2

cumulative distribution: 0.15


∫ ∞ 𝑡2
𝑄(𝑥) = √1 𝑥 𝑒 − 2 = 1 − Φ(𝑥) = 𝑃𝐹 𝐴. 0.1
2𝜋
I Q is invertible (monotonically decreasing): 0.05

𝛾 = 𝑄 −1 (𝑃𝐹 𝐴). 0
-4 -3 -2 -1 0 1 2 3 4
Critical Value
Set operations in probability theory

Bayes’ rule:
I Union: 𝐴 ∪ 𝐵
I From 𝑃(𝐵 | 𝐴) = 𝑃𝑃( 𝐴∩𝐵)
( 𝐴) ⇒
I Intersection: 𝐴 ∩ 𝐵 𝑃( 𝐴 ∩ 𝐵) = 𝑃(𝐵 | 𝐴) 𝑃( 𝐴)
I 𝑃( 𝐴 ∪ 𝐵) = 𝑃( 𝐴) + 𝑃(𝐵) − 𝑃( 𝐴 ∩ 𝐵) I Then:
I Conditional probabilities: 𝑃( 𝐴 | 𝐵) = 𝑃 ( 𝐴∩𝐵)
= 𝑃 (𝐵 | 𝐴) 𝑃 ( 𝐴)
𝑃 (𝐵) 𝑃 (𝐵)
𝑃 ( 𝐴∩𝐵)
𝑃( 𝐴 | 𝐵) = 𝑃 (𝐵)
𝑃 ( 𝐴∩𝐵)
𝑃(𝐵 | 𝐴) = 𝑃 ( 𝐴)
A basic signal detection problem

In a basic problem, we would have:


I A single known value of the signal 𝜃.
I A random zero-mean normally distributed additive noise 𝑛.
I A single observable measurement 𝑥.
I To decide either 𝐻0 or 𝐻1 .
A basic signal detection problem
The noise model

Noise only

fn(x)

I 𝑓𝑛 (𝑥): probability density function of the noise.


I The noise probability function can be arbitrary.
I A ’common’ one is normally-distributed, zero mean (𝜎 = standard deviation):

1 𝑥2

𝑓𝑛 (𝑥) = √ 𝑒 2𝜎 2
2𝜋𝜎
A basic signal detection problem
Signal + Noise

Noise only Signal + Noise

fn(x) f (x)
s

I 𝑓𝑠 (𝑥): probability density function of signal + noise


I Could be arbitrary but often normally-distributed, mean 𝜃 (𝜎 = standard deviation), i.e.
2
− ( 𝑥−𝜃2)
𝑓𝑠 (𝑥) = √ 1 𝑒 2𝜎
2𝜋 𝜎
I We define signal-to-noise ratio (SNR) as: 𝑑 0 = 𝜃
𝜎.
I I.e. as the separation between noise-only and signal+noise measured in standard deviations.
I if 𝜎 = 1, then 𝑑 0 = 𝜃.
A basic signal detection problem
Thresholded decision

Noise only Signal + Noise

fn(x) fs(x)

Decision θ
threshold x
c

Decision criterion:
I if 𝑥 > 𝑥 𝑐 , decide 𝐻1 (signal + noise)
I Otherwise, decide 𝐻0 (noise only)
A basic signal detection problem
Decision Bias, 𝛽

Noise only Signal + Noise

fn(x) fs(x)

small β Decision θ large β


threshold x
c

Decision Bias:
I We define the decision Bias, 𝛽, as
𝑓𝑠 (𝑥 𝑐 )
𝛽= .
𝑓𝑛 (𝑥 𝑐 )
I I.e as the relative likelihood of observing the threshold value 𝑥 𝑐 if the signal is present to
observing 𝑥 𝑐 if the signal is not present.
Decision outcomes

Decision

Noise only (H ) Signal+Noise (H )


0 1
Noise only

False Alarm (FA)


Correct Rejection (CR)
H0
True state

Type I Error
True negative (TN)
False Positive (FP)
Signal+Noise

Missed Detection (M)


Correct Detection (D)
H1

Type II Error
True positive (TP)
False Negative (FN)
Decision outcomes
Noise only

Noise only

fn(x)
CR FA

xc

I If truly noise only, then


∫ ∞ ∫ 𝑥𝑐
𝑃(𝐹 𝐴|𝑁) = 𝑓𝑛 (𝑥)𝑑𝑥 and 𝑃(𝐶 𝑅|𝑁) = 𝑓𝑛 (𝑥)𝑑𝑥
𝑥𝑐 −∞

I ... and we want 𝑥 𝑐 as large as possible.


Decision outcomes
Signal + Noise

Signal + Noise

f (x)
s
D
M
xc θ

I If signal is truly present, then


∫ ∞ ∫ 𝑥𝑐
𝑃(𝐶𝐷|𝑆) = 𝑓𝑠 (𝑥)𝑑𝑥 and 𝑃(𝑀 𝐷 |𝑆) = 𝑓𝑠 (𝑥)𝑑𝑥
𝑥𝑐 −∞

I ... and we want 𝑥 𝑐 as small as possible.


"a priori" and "a posteriori" probability
( ... all these probabilities! )

I Suppose two possible events; binary "one" and "zero".


I The probability of event ’𝑖 ’ (i.e. hypothesis 𝐻𝑖 ) given observation ’𝑦 ’ is described by the a
posteriori probability 𝑃𝑟 {𝐻𝑖 |𝑦}.
I The a priori probability of event ’𝑖 ’ (i.e. the probability of hypothesis 𝐻𝑖 ) is given by 𝑃𝑟{𝐻𝑖 }.
I i.e. a priori is something we know (or can know) on the forehand, while a posteriori is
something we have to experience.
"a priori" and "a posteriori" probability
( ... all these probabilities! )

Hypothesis Transsion probability Observation


(transmit) Pr{observed data | actual data transmitted}
Pr{ y | H }
(receive)

Pr{0T} = P0 = 0.7 “0” “0” Pr{0R}


Pr{0R|0T} = 0.8

Pr{0R|1T} = 0.2

Pr{1R|0T} = 0.2

Pr{1T} = P1 = 0.3 Pr{1R|1T} = 0.8


“1” “1” Pr{1R}

I The quantity of interest is the probability: 𝑃𝑟 {"(actual) symbol transmitted" | "observed


symbol" }
𝑃𝑟 {𝑦 |𝐻𝑖 }
I Find this by using Bayes’ rule: 𝑃𝑟{𝐻𝑖 |𝑦} = 𝑃𝑟 {𝑦 } 𝑃𝑟{𝐻𝑖 }
"a priori" and "a posteriori" probability
( ... all these probabilities! )

Example (Binary transmission)


I 𝑃𝑟{0𝑅} = 𝑃𝑟{0𝑅|1𝑇 }𝑃𝑟 {1𝑇 } + 𝑃𝑟{0𝑅|0𝑇 }𝑃𝑟 {0𝑇 }
= 0.2 × 0.3 + 0.8 × 0.7 = 0.62
I 𝑃𝑟{1𝑅} = 𝑃𝑟{1𝑅|0𝑇 }𝑃𝑟 {0𝑇 } + 𝑃𝑟 {1𝑅|1𝑇 }𝑃𝑟 {1𝑇 }
= 0.2 × 0.7 + 0.8 × 0.3 = 0.38
I 𝑃𝑟{1𝑇 |1𝑅} = 𝑃𝑟{1𝑅|1𝑇 }𝑃𝑟 {1𝑇 }/𝑃𝑟 {1𝑅}
= 0.8 × 0.3 / 0.38 = 0.632
I 𝑃𝑟{0𝑇 |0𝑅} = 𝑃𝑟{0𝑅|0𝑇 }𝑃𝑟 {0𝑇 }/𝑃𝑟 {1𝑅}
= 0.8 × 0.7 / 0.62 = 0.903
I And
I 𝑃𝑟{1𝑇, 1𝑅} = 𝑃𝑟 {1𝑇 |1𝑅}𝑃𝑟 {1𝑅} = 0.632 × 0.38 = 0.24
I 𝑃𝑟 {0𝑇, 0𝑅} = 𝑃𝑟 {0𝑇 |0𝑅}𝑃𝑟 {0𝑅} = 0.903 × 0.62 = 0.56
Bayes’ detection
Requirements
Given:
I The observations: 𝑦 𝑛 = 𝑠 𝑛 + 𝑛𝑛 for 𝑛 = 1 . . . 𝑁, i.e. 𝒚 = (𝑦 1 , 𝑦 2 , . . . 𝑦 𝑁 )𝑇
Requirements:
I The likelihood function 𝑓𝒚 |𝑯𝒊 ( 𝒚|𝐻𝑖 ) = 𝑓𝑖 ( 𝒚)
= the density of 𝒚 under the i’th hypothesis, for 𝑖 = 0, 1.
I The prior probability 𝑃𝑟 {𝐻𝑖 } as 𝑃𝑖 .
I Payoff matrix:
Cost associated with choosing hypothesis "𝑖 " when hypothesis " 𝑗 " is true: 𝐶𝑖 𝑗 .
Bayes’ detection
Details I The average cost (risk of penalty function):
𝐶 = 𝐶00 𝑃𝑟{choose 0, 0 is true}
I Possible outcomes: + 𝐶10 𝑃𝑟{choose 1, 0 is true}
(a) Choose 𝐻0 ; 𝐻0 is true (correct) + 𝐶11 𝑃𝑟{choose 1, 1 is true}
(b) Choose 𝐻1 ; 𝐻0 is true (mistake) + 𝐶01 𝑃𝑟{choose 1, 1 is true}
(c) Choose 𝐻1 ; 𝐻1 is true (correct) I Using Bayes’ rule, this becomes
(d) Choose 𝐻0 ; 𝐻1 is true (mistake) 𝐶 = 𝐶00 𝑃0 𝑃𝑟{choose 0 | 0 is true}
I The costs for the problem: +𝐶10 𝑃0 𝑃𝑟{choose 1 | 0 is true}
(𝐶𝑖 𝑗 : choosing "𝑖 " when " 𝑗 " is true) +𝐶11 𝑃1 𝑃𝑟{choose 1 | 1 is true}
(a) 𝐶00 (choose 0, 0 is true) +𝐶01 𝑃1 𝑃𝑟{choose 1 | 1 is true}
(b) 𝐶10 (choose 1, 0 is true)
I I.e.
(c) 𝐶11 (choose 1, 1 is true)
𝐶 =𝐶00 𝑃0 (1 − 𝑃𝐹 𝐴) + 𝐶10 𝑃0 𝑃𝐹 𝐴
(d) 𝐶01 (choose 0, 1 is true)
+ 𝐶11 𝑃1 𝑃 𝐷 + 𝐶01 𝑃1 𝑃 𝑀
Bayes’ detection
Details (continuing)
I The average cost:
𝐶 =𝐶00 𝑃0 (1 − 𝑃 𝐹 𝐴) + 𝐶10 𝑃0 𝑃 𝐹 𝐴
+ 𝐶11 𝑃1 𝑃 𝐷 + 𝐶01 𝑃1 𝑃 𝑀 .
I can be expressed as:

𝐶 =𝐶00 𝑃0 𝑓0 (𝒚) 𝑑 𝒚
𝑅 I it can be expressed as
∫0

+ 𝐶10 𝑃0 𝑓0 ( 𝒚) 𝑑 𝒚
𝑅1 𝐶 =𝐶00 𝑃0 𝑓0 ( 𝒚) 𝑑 𝒚
∫ 𝑅
+ 𝐶11 𝑃1 𝑓1 ( 𝒚) 𝑑 𝒚 ∫0
𝑅1
∫ + 𝐶10 𝑃0 𝑓0 ( 𝒚) 𝑑 𝒚
𝑅−𝑅0
+ 𝐶01 𝑃1 𝑓1 ( 𝒚) 𝑑 𝒚. ∫
I And by using
𝑅0
+ 𝐶11 𝑃1 𝑓1 ( 𝒚) 𝑑 𝒚
𝑅−𝑅0
𝑅 = 𝑅1 ∪ 𝑅0 where 𝑅1 ∩ 𝑅1 = , and ∫
+ 𝐶01 𝑃1 𝑓1 ( 𝒚) 𝑑 𝒚.

𝑓𝑖 ( 𝒚) 𝑑 𝒚 = 1 for 𝑖 = 0, 1 𝑅0
𝑅
Bayes’ detection
Details (continuing ...)

I The average cost∫reduces to:


𝐶 =𝐶00 𝑃0 𝑓0 (𝒚) 𝑑 𝒚
𝑅0


+𝐶10 𝑃0 1− 𝑓0 (𝒚) 𝑑 𝒚
𝑅0

 I Assuming cost of a bad decision is always
+𝐶11 𝑃1 1− 𝑓1 (𝒚) 𝑑 𝒚
𝑅0 larger than the cost of a good one, i.e.

+𝐶01 𝑃1 𝑓1 ( 𝒚) 𝑑 𝒚.
𝑅0 𝐶10 − 𝐶00 > 0 and 𝐶01 − 𝐶11 > 0
I This can be rewritten as:
fixed cost I then all terms in the variable cost are positive.
z }| {
𝐶 = 𝐶10 𝑃0 + 𝐶11 𝑃1 + · · · I Optimal choice of 𝑅0 : Assign 𝒚 to 𝑅0 when

 
𝑃1 (𝐶01 − 𝐶11 ) 𝑓1 (𝒚) − 𝑃0 (𝐶10 − 𝐶00 ) 𝑓0 (𝒚) 𝑑𝒚 . negative term in integral is dominant!
|
𝑅0
{z } I Conversely: Assign 𝒚 to 𝑅1 when first term in
variable cost to be optimized integral is dominant; hence no additional cost
I I.e. optimal value dependents on 𝑅0 is added to the fixed cost.
Bayes’ detection
𝐻1
I The decision criterion becomes: 𝑃1 [𝐶01 − 𝐶11 ] 𝑓1 ( 𝒚) >
< 𝑃0 [𝐶10 − 𝐶00 ] 𝑓0 ( 𝒚)
𝐻0
𝐻1
𝑓1 (𝒚) > 𝑃0 [𝐶10 −𝐶00 ]
I The Bayes’ formulation: Λ( 𝒚) = 𝑓0 (𝒚) < 𝑃1 [𝐶01 −𝐶11 ] =𝜆
𝐻0
(𝒚)
I Λ(𝒚) = 𝑓𝑓1 (𝒚) is known as the likelihood ratio (LR).
0
I Interpretation:
I If Λ(𝒚) > 𝜆, decide on 𝐻1 .
I If Λ(𝒚) < 𝜆, decide on 𝐻0 .
I Note: Both sides are non-negative, i.e. the inequality is preserved when taking the logarithm
of both sides (or another monotonic function)
𝐻1
I ln Λ(𝒚) > ln 𝜆
<
𝐻0
I ln Λ(𝒚) is known as the log likelihood ratio (LLR).
Maximum a posteriori (MAP) detection
I Bayes’ theorem gives
I We have prior probabilities but either cannot,
or don’t want to, assign a cost for making the 𝑃𝑟 (𝐻𝑖 | y < 𝑦 ≤ y + 𝑑 y) =
possible decisions, i.e. the maximum a
𝑃𝑟 ( y < 𝑦 ≤ y + 𝑑 y | 𝐻𝑖 )
posteriori (MAP) detection philosophy. 𝑃𝑟 (𝐻𝑖 ), 𝑖 = 1, 2.
𝑃𝑟 ( y < 𝑦 ≤ y + 𝑑 y)
I Start with the dominant a posteriori
probability: I Using (when 𝑑 y becomes small)
𝐻1 𝑃𝑟 ( y < 𝑦 ≤ y + 𝑑 y | 𝐻𝑖 ) = 𝑓𝑖 (𝑦) 𝑑 y, and
𝑃𝑟 (𝐻1 | 𝑦) >
< 𝑃𝑟 (𝐻0 | 𝑦) 𝑃𝑟 ( y < 𝑦 ≤ y + 𝑑 y) = 𝑓 (𝑦) 𝑑 y.
𝐻0

I i.e. given 𝑦, which og 𝐻0 and 𝐻1 is more likely I we get


true? 𝑓𝑖 (𝑦)
𝑃𝑟 (𝐻𝑖 |𝑦)) = 𝑃𝑟 (𝐻𝑖 ), 𝑖 = 1, 2.
I Rewrite in incremental form 𝑓 (𝑦)

𝐻1 I and the MAP detector


𝑃𝑟 (𝐻1 | y < 𝑦 ≤ y + 𝑑 y) >
< 𝐻
𝐻0 𝑓1 (𝑦) >1 𝑃𝑟 (𝐻0 ) 𝑃0
= .
𝑃𝑟 (𝐻0 | y <𝑦 ≤ y + 𝑑 y) 𝑓0 (𝑦) 𝐻< 𝑃𝑟 (𝐻1 ) 𝑃1
0
Maximum likelihood (ML) criterion

I If no prior probability nor any cost information is available, then comparing the likelihood
function will give a meaningful detection scheme.
I That is: "Given the data, which conditional PDF (i.e. 𝑓0 ( 𝒚) or 𝑓1 ( 𝒚)) has more likely
generated the data?"
I In the scalar case, the detection scheme becomes
𝐻1
>
𝑓1 (𝑦) < 𝑓0 (𝑦)
𝐻0

I and for multiple observations


𝐻1
𝑓1 ( 𝒚) >
< 1
𝑓0 ( 𝒚) 𝐻0
Minimum probability of error criterion

I The criterion is used in binary communication problems where the cost of making an error is
the same and the cost of making a correct decision is zero.
I We do allow non-equal prior probabilities (depends on the source coding).
I We set up the average cost in the same was as done in the Bayes’ case.
I The average cost becomes (𝐶00 = 𝐶11 = 0, 𝐶10 = 𝐶01 = 1)
∫ ∫
𝐶 = 𝑃0 𝑓0 ( 𝒚) 𝑑 𝒚 + 𝑃1 𝑓1 ( 𝒚) 𝑑 𝒚
𝑅1 𝑅0

I The Bayes’ approach then gives


𝐻1
𝑓1 (𝒚) > 𝑃0
Λ( 𝒚) = <
𝑓0 (𝒚) 𝐻0 𝑃1

I This is the same expression (including the threshold) as that of the MAP detector!
Min-Max criterion
I The cost information (i.e. 𝐶𝑖 𝑗 , 𝑖, 𝑗 = 0, 1)
is available but not the a priori
probabilities 𝑃0 or 𝑃1 .
I The best Bayes’ detection scheme is then
desirable: The worst possible choice of
𝐻𝑖 , 𝑖 = 0, 1 (Murphy’s law)
I The average Bayes’ cost (using 𝑃𝐹 𝐴 and
𝑃 𝑀 ):
I If we assume 𝐶00 = 𝐶11 = 0 (as in most real
𝐶 =𝑃0 𝐶00 + 𝑃1 𝐶11 + · · · life situations), we can plot 𝐶 as a function of
𝑃1 and either get a straight line or a curve as in
𝑃1 (𝐶00 − 𝐶11 )𝑃 𝑀 + 𝑃0 (𝐶10 − 𝐶00 )𝑃𝐹 𝐴
the figure.
I When 𝑃0 = 1, then 𝑃1 = 𝑃𝐹 𝐴 = 0 and I In the figure, the maximum occurs when the
𝐶 = 𝐶00 slope of the curve is zero.
I When 𝑃1 = 1, then 𝑃0 = 𝑃𝐹 𝐴 = 𝑃 𝑀 = 0 I To find the maximum, set the partial derivative
and 𝐶 = 𝐶11 of 𝐶 with respect to 𝑃1 to zero.
Min-Max criterion ...
I So given the maximum average cost, we
minimize it, hence the name Min-Max
criterion
𝜕𝐶 𝜕𝐶 n
= (1 − 𝑃1 )𝐶00 · · ·
𝜕𝑃1 𝜕𝑃1
+ 𝑃1 𝐶11 + 𝑃1 (𝐶00 − 𝐶11 )𝑃 𝑀 · · ·
o
+ (1 − 𝑃1 ) (𝐶10 − 𝐶00 )𝑃𝐹 𝐴
I The detector form using the Bayes’ cost
I Setting it to zero gives: formula is the Bayes detector with a threshold
𝜆 that requires the above equations to be true.
(𝐶11 − 𝐶00 ) + (𝐶01 − 𝐶11 )𝑃 𝑀
− (𝐶10 − 𝐶00 )𝑃𝐹 𝐴 = 0

I If 𝐶00 = 𝐶11 = 0 we get

𝐶01 𝑃 𝑀 = 𝐶10 𝑃𝐹 𝐴
The Neyman-Pearson criterion
I Approach:
I One of the first in a class called CFAR
I Fix the false alarm rate 𝑃 𝐹 𝐴 at a value 𝛼 (i.e.
(constant false alarm rate) detectors.
force (𝑃 𝐹 𝐴 − 𝛼) to zero) while maximizing
I Typically used in radar and sonar 𝑃 𝐷 (or minimizing 𝑃 𝑀 since 𝑃 𝑀 = 1 − 𝑃 𝐷 ).
applications I Easily solved with Lagrange multipliers.
I Allows constant false alarm rate.
I Cost function:
I Attractive when loss of life or essential
equipment are involved. minimize equality constraint
I Desirable when resource allocations are z
∫ }| { z }| {
h∫ i
important and the maximum false alarm 𝐽= 𝑓1 ( 𝒚) 𝑑 𝒚 +𝜆 [ 𝑓0 (𝒚) 𝑑 𝒚 − 𝛼
rate has to be kept at a tolerable level. 𝑅 𝑅1
∫ 0 ∫
I In essence, the false alarm rate is fixed h i
= 𝑓1 ( 𝒚) 𝑑 𝒚 + 𝜆 1 − [ 𝑓0 ( 𝒚) 𝑑 𝒚 − 𝛼
while the detection probability is 𝑅0 𝑅0
maximized. ∫
 
=𝜆(1 − 𝛼) + 𝑓1 ( 𝒚) − 𝜆 𝑓0 ( 𝒚) 𝑑 𝒚
𝑅0
The Neyman-Pearson criterion ...
I Solving the cost function:

 
𝐽 =𝜆(1 − 𝛼) + 𝑓1 ( 𝒚) − 𝜆 𝑓0 ( 𝒚) 𝑑 𝒚
𝑅0
I For fixed 𝜆, the first term is constant (>0).
I Need only to concider I By solving the above equation, we get:

  𝐻1
𝑓1 ( 𝒚) − 𝜆 𝑓0 (𝒚) 𝑑 𝒚 >
𝑅0 𝑧= < 𝛾 , and
𝐻0
I Same as for the Bayes’ detector.
I We get I The false alarm rate is then given by
𝐻1
> ∫ ∞
𝑓1 (𝒚) 𝜆 𝑓0 (𝒚)
<
𝐻0
𝑃𝐹 𝐴 = 𝑓0 (𝑧) 𝑑𝑧 = 𝛼
𝛾
I or 𝐻1
𝑓1 (𝒚) > I The threshold 𝛾 is chosen so that the 𝑃𝐹 𝐴
Λ( 𝒚) = < 𝜆
𝑓0 (𝒚) 𝐻0 constraint is met (i.e. 𝑃𝐹 𝐴 = 𝛼).
Receiver operator characteristic (ROC)
Detector performance evaluation

I Usually, we want to know


I The probability of detection, 𝑃 𝐷 .
I (The probability of miss, 𝑃 𝑀 , but 𝑃 𝐷 + 𝑃 𝑀 = 1).
I The probability of false alarm, 𝑃 𝐹 𝐴.
I By plotting 𝑃 𝐷 against 𝑃𝐹 𝐴 we can compare the performance of different detectors.
I For each unique threshold, a unique pair of 𝑃 𝐷 and 𝑃𝐹 𝐴 exists!
Receiver operator characteristic (ROC) (continuing)

xc small, ß small
1
increasing ß
operating point

P(D|S)

xc large, ß large
0
0 P(FA|N) 1

I Generally: 𝛽 = 𝜆 = 𝜕𝑃𝐹 𝐴
𝜕𝑃𝐷 , i.e. the slope is the threshold of the detector.
𝑓𝑠 ( 𝑥𝑐 )
I Also, from Bayes’ detector: 𝛽 = 𝑓𝑛 ( 𝑥𝑐 )
Receiver operator characteristic (ROC) (continuing ...)

increasing d’
P(D|S)

0
0 P(FA|N) 1

I Signal-to-noise ratio (SNR): 𝑑 0 = 𝜎𝜃 .


I A given value of 𝑑 0 traces out a unique ROC curve as 𝛽 varies.
Terminology
Decision

Noise only (H ) Signal+Noise (H )


0 1

Noise only
False Alarm (FA)
Correct Rejection (CR)
H0
True state

Type I Error
True negative (TN)
False Positive (FP)
Signal+Noise

Missed Detection (M)


Correct Detection (D)
H1

Type II Error
True positive (TP)
False Negative (FN)
Terminology ...
From wikipedia.org

I Sensitivity or true positive rate (TPR): 𝑇 𝑃𝑅 = 𝑇 𝑃/𝑃 = 𝑇 𝑃/(𝑇 𝑃 + 𝐹 𝑁)


I Specificity (SPC) or true negative rate: 𝑆𝑃𝐶 = 𝑇 𝑁/𝑁 = 𝑇 𝑁/(𝐹𝑃 + 𝑇 𝑁)
I Prediction or positive predictive value (PPV): 𝑃𝑃𝑉 = 𝑇 𝑃/(𝑇 𝑃 + 𝐹𝑃)
I Negative predictive value (NPV): 𝑁 𝑃𝑉 = 𝑇 𝑁/(𝑇 𝑁 + 𝐹 𝑁)
I Fall-out or false positive rate (FPR): 𝐹𝑃𝑅 = 𝐹𝑃/𝑁 = 𝐹𝑃/(𝐹𝑃 + 𝑇 𝑁) = 1 − 𝑆𝑃𝐶
I False negative rate: 𝐹 𝑁 𝑅 = 𝐹 𝑁/(𝑇 𝑃 + 𝐹 𝑁) = 1 − 𝑇 𝑃𝑅
I False discovery rate (FDR): 𝐹 𝐷 𝑅 = 𝐹𝑃/(𝑇 𝑃 + 𝐹𝑃) = 1 − 𝑃𝑃𝑉
I Accuracy (ACC): 𝐴𝐶𝐶 = (𝑇 𝑃 + 𝑇 𝑁)/(𝑃 + 𝑁)
Terminology (From wikipedia.org)

OBS: Flipped compared to previous foil


Sensitivity

I Consider the example of a medical test used to identify a disease.


I Sensitivity of the test is the proportion of people known to have the disease, who test positive
for it:
no. of true positives
Sensitivity =
no. of true positives + no of false negatives
no. of true positives
=
total no. of sick individuals in population
= prob. of a positive test, given that the pasient is ill

I A negative result in a test with high sensitivity is useful for ruling out disease.
I https://en.wikipedia.org/wiki/Sensitivity_and_specificity
Specificity

I Consider the example of a medical test for diagnosing a disease.


I Specificity of a test is the proportion of healthy patients known not to have the disease, who
will test negative for it:
no. of true negatives
Specificity =
no. of true negatives + no of false positives
no. of true negatives
=
total no. of well individuals in population
= prob. of a negative test, given that the pasient is well

I Positive result in a test with high specificity is useful for ruling in disease.
I https://en.wikipedia.org/wiki/Sensitivity_and_specificity
Sensitivity and Specifisity
From faktisk.no

https://www.faktisk.no/artikler/r8q/er-14-av-15-positive-koronaprover-falske
Worked example

I Norwegian Directorate of Health:


I Evaluation of Abbot’s Panbio COVID-19 rapid antigen test in Norway, 3 December 2020.
I https://www.helsedirektoratet.no/rapporter/
evaluation-of-abbots-panbio-covid-19-rapid-antigen-test-in-norway/
I From a total of 3991 cases, 250 where positive.

I The rapid covid test found


I 186 out of the 250 positive. I This gives
I 3738 out of the 3741 negative. I Sensitivity, TP / (TP + FN): 74.4 %
I Specifisity, TN / (FP + TN): 99.9 %
I This gives the following outcomes
I If we test 100 000 healthy, 100 of these
Decision would be found positive.
CR = TN = 3738 FA = FP = 3 I If we test 1000 with prevalence of 4 ‰, we
True
state

M = FN = 64 D = TP = 186 will miss out on one positive and fail on


one negative.

You might also like