SSP 2 3 - Detection

IN5340/IN9340
Detection theory
Lecture IV, Version 1.1
Andreas Austeng
Dept. of Informatikk, University of Oslo
March 2022
Detection versus estimation
Detection:
The ability to discern between information-bearing patterns (signals) and random
patterns that distract from the information (noise).
Used for (automated) scanning of (large) data sets for the purpose of detecting,
localizing, and identifying particular targets of interest.
Estimation:
Extension of the detection part. It answers the question as to how much of a
particular item of interest there is (i.e., a fine localization in the parameter space)
and provides a measure of the accuracy or confidence.
Estimation
Some typical estimation examples are

I Frequency estimation (i.e. determine the actual Doppler shift).
I Differential time delay estimation (i.e. determine the distance or differential distance to a
target).
I Amplitude estimation (i.e. target strength, size, distance).
I Phase and/or bandwidth estimation (i.e. target identification).
I Spectral estimation (i.e. target identification).
I Analog wave form estimation (i.e. original noise-free recovery of the original wave form using
optimal filtering).
Detection
Some typical detection examples are

I Radar: passive, active, or bi-static (i.e. it uses an antenna and tests for target presence).
I Sonar: bi-static, active, or passive (i.e. it uses hydrophones and tests for target presence).
I Digital communication: coded binary words (symbols) (i.e. one wants to detect transmitted,
possible encoded, message bits consisting of binary zeros and ones).
I Acoustic detection (i.e. it uses microphones): for intrusion alarm, emitter detection
(i.e.gunnery, helicopter sound detection/localization etc.).
I Seismic detection (i.e. it uses geophones): to detect an earthquake, tunnel digging, nuclear
testing etc.
Decision model
The spaces involved ...
Source: Detection theory : applications and digital signal processing / Ralph D. Hippenstiel; Figure 1.1.
Decision model (cont.)
E: The event space
Examples of events can be

I Messages: alphabetical, numeric, Morse code, etc.
I Parameters: frequency, phase angle, object moving or not, object speed, etc.
I Targets: target present/target absent, friend/foe, etc.
The message may be in a discrete or continuous form, may be a random variable, random vector, or
a realization of a random process.
Example
𝑒 𝑖 , 𝑖 = 0, . . . , 𝐼 − 1,
I i.e. in the event space E, one of the 𝐼 possible events can happen.
I We may or may not know the a priori probability 𝑃𝑟 {𝑒 𝑖 } of the event 𝑒 𝑖 .
S: The signal space
The events are converted into representative signals.

Example (Event encoding, Ex 1.8)
Events: Signals:
𝐴 = 𝑒 0 → 𝑠0 (𝑡) : signal representing event A
𝐵 = 𝑒 1 → 𝑠1 (𝑡) : signal representing event B
𝐶 = 𝑒 2 → 𝑠2 (𝑡) : signal representing event C
.. .. ..
. . . for 0 ≤ 𝑡 ≤ 𝑇
Example (Intrusion alarm, Ex 1.12)

Events: Signals:
𝑒 0 (no intruder) → 𝑠0 (𝑡) = 𝐴 cos(𝜔0 𝑡 + 𝜃)
𝑒 1 (intruder present) → 𝑠1 (𝑡) = 𝐴 cos( [𝜔0 + 𝜔 𝐷 (𝑡)]𝑡 + 𝜃)
where 𝜔 𝑑 (𝑡) is a time dependent frequency shift.
R: The observer space
The signals are observed (i.e. obtained) after having passed through some channel (i.e. the medium).
In general, the received data is the transmitted signal plus an additive noise component:
𝑟 (𝑡) = 𝑠(𝑡) + 𝑛(𝑡)
Example (Target detection, Ex 1.13)
Events: Received data:
𝑒 0 (no target) → 𝑟 0 (𝑡) = 𝑛(𝑡), for 0 ≤ 𝑡 ≤ 𝑇 .
𝑒 1 (target present) → 𝑟 1 (𝑡) = 𝐴 𝑠(𝑡 − 𝑡 𝑅 ) + 𝑛(𝑡), for 𝑡 𝑅 ≤ 𝑡 ≤ 𝑇 + 𝑡 𝑅 ,
where 𝐴 is the signal loss coefficient and 𝑡 𝑅 the propagation delay. (No Doppler shift assumed).
D: The decision space
I Usually: dim(𝐷) = dim(𝐸)

I We will focus on:
How to obtain a mapping from 𝑅 to 𝐷.
I The mapping must be such that:
I An unambiguous decision is made.
I Every outcome must lead to some decision.
Example (Multiple decision, Ex 1.16)

𝑑𝑖 : choose ℎ𝑖 ,
the hypothesis that the event 𝑒 𝑖 occurred, meaning that the observation lie in the space (region) 𝑅𝑖 ,
for 𝑖 = 0, 1, 2, 3
The Gaussian (normal) density function
2
− ( 𝑥−𝑚2𝑥 )
If 𝑥 ∼ 𝑁 (𝑚 𝑥 , 𝜎 2 ), then 𝑓 𝑋 (𝑥) = √ 1 𝑒 2𝜎
2𝜋 𝜎
I Pdf symmetric about the mean 𝑚 𝑥 .

I Most likely value 𝑥 == mean.
I Pdf completely described by mean and variance.
I 𝑥 Independent ⇔ 𝑥 uncorrelated.
I 𝑥 is closed under scalar multiplication and typical linear transformations.
This means that filtering, convolution, integration, differentiation, all common transforms
(Hilbert, Fourier, Laplace, Z-transform) retain the characteristics of Gaussian random
variables.
I Central limit theorem.
I This model fits many physical phenomena.
Probability
Noise only
fn(x)
x x
1 2
I The probability that a single random value would be between 𝑥1 and 𝑥2 (given the probability
density function 𝑓 𝑋 (𝑥)): ∫ 𝑥2
𝑃(𝑥1 < 𝑥 < 𝑥2 ) = 𝑓 𝑋 (𝑥)𝑑𝑥
𝑥1
∫∞
I For any pdf: 𝑓 𝑋 (𝑥) 𝑑𝑥 = 1.
−∞
I And if 𝑥 ∼ 𝑁 (𝑚 𝑥 , 𝜎 2 ) :
I 𝑃(𝑚 𝑥 − 𝜎 < 𝑥 < 𝑚 𝑥 + 𝜎) = 0.6827, 𝑃(𝑚 𝑥 − 2𝜎 < 𝑥 < 𝑚 𝑥 + 2𝜎) = 0.9545
I 𝑃(𝑚 𝑥 − 3𝜎 < 𝑥 < 𝑚 𝑥 + 3𝜎) = 0.9973
Right tail probability
I In matlab:
I Given the standard normal distribution
p = normspec([-Inf,1],0,1,’outside’)
(zero-mean, unit variance):
2
𝑓 𝑋 (𝑥) = √1 𝑒 − 𝑥2 . 0.4
Probability Greater than Upper Bound is 0.15866
2𝜋
I Then the cumulative distribution is: 0.35
∫𝑥 𝑡2
Φ(𝑥) = √1 −∞ 𝑒 − 2 . 0.3
2𝜋
I The probability of exceeding a given value, 0.25
Density
𝑄(𝑥), is given by the complementary 0.2
cumulative distribution: 0.15

∫ ∞ 𝑡2
𝑄(𝑥) = √1 𝑥 𝑒 − 2 = 1 − Φ(𝑥) = 𝑃𝐹 𝐴. 0.1
2𝜋
I Q is invertible (monotonically decreasing): 0.05
𝛾 = 𝑄 −1 (𝑃𝐹 𝐴). 0
-4 -3 -2 -1 0 1 2 3 4
Critical Value
Set operations in probability theory
Bayes’ rule:
I Union: 𝐴 ∪ 𝐵
I From 𝑃(𝐵 | 𝐴) = 𝑃𝑃( 𝐴∩𝐵)
( 𝐴) ⇒
I Intersection: 𝐴 ∩ 𝐵 𝑃( 𝐴 ∩ 𝐵) = 𝑃(𝐵 | 𝐴) 𝑃( 𝐴)
I 𝑃( 𝐴 ∪ 𝐵) = 𝑃( 𝐴) + 𝑃(𝐵) − 𝑃( 𝐴 ∩ 𝐵) I Then:
I Conditional probabilities: 𝑃( 𝐴 | 𝐵) = 𝑃 ( 𝐴∩𝐵)
= 𝑃 (𝐵 | 𝐴) 𝑃 ( 𝐴)
𝑃 (𝐵) 𝑃 (𝐵)
𝑃 ( 𝐴∩𝐵)
𝑃( 𝐴 | 𝐵) = 𝑃 (𝐵)
𝑃 ( 𝐴∩𝐵)
𝑃(𝐵 | 𝐴) = 𝑃 ( 𝐴)
A basic signal detection problem
In a basic problem, we would have:

I A single known value of the signal 𝜃.
I A random zero-mean normally distributed additive noise 𝑛.
I A single observable measurement 𝑥.
I To decide either 𝐻0 or 𝐻1 .
The noise model
Noise only
fn(x)
I 𝑓𝑛 (𝑥): probability density function of the noise.

I The noise probability function can be arbitrary.
I A ’common’ one is normally-distributed, zero mean (𝜎 = standard deviation):
1 𝑥2
−
𝑓𝑛 (𝑥) = √ 𝑒 2𝜎 2
2𝜋𝜎
Signal + Noise
Noise only Signal + Noise
fn(x) f (x)
s
I 𝑓𝑠 (𝑥): probability density function of signal + noise

I Could be arbitrary but often normally-distributed, mean 𝜃 (𝜎 = standard deviation), i.e.
2
− ( 𝑥−𝜃2)
𝑓𝑠 (𝑥) = √ 1 𝑒 2𝜎
2𝜋 𝜎
I We define signal-to-noise ratio (SNR) as: 𝑑 0 = 𝜃
𝜎.
I I.e. as the separation between noise-only and signal+noise measured in standard deviations.
I if 𝜎 = 1, then 𝑑 0 = 𝜃.
Thresholded decision
fn(x) fs(x)
Decision θ
threshold x
c
Decision criterion:
I if 𝑥 > 𝑥 𝑐 , decide 𝐻1 (signal + noise)
I Otherwise, decide 𝐻0 (noise only)
Decision Bias, 𝛽
fn(x) fs(x)
small β Decision θ large β

threshold x
c
Decision Bias:
I We define the decision Bias, 𝛽, as
𝑓𝑠 (𝑥 𝑐 )
𝛽= .
𝑓𝑛 (𝑥 𝑐 )
I I.e as the relative likelihood of observing the threshold value 𝑥 𝑐 if the signal is present to
observing 𝑥 𝑐 if the signal is not present.
Decision outcomes
Decision
Noise only (H ) Signal+Noise (H )

0 1
Noise only
False Alarm (FA)

Correct Rejection (CR)
H0
True state
Type I Error
True negative (TN)
False Positive (FP)
Signal+Noise
Missed Detection (M)

Correct Detection (D)
H1
Type II Error
True positive (TP)
False Negative (FN)
Decision outcomes
Noise only
Noise only
fn(x)
CR FA
xc
I If truly noise only, then

∫ ∞ ∫ 𝑥𝑐
𝑃(𝐹 𝐴|𝑁) = 𝑓𝑛 (𝑥)𝑑𝑥 and 𝑃(𝐶 𝑅|𝑁) = 𝑓𝑛 (𝑥)𝑑𝑥
𝑥𝑐 −∞
I ... and we want 𝑥 𝑐 as large as possible.

Decision outcomes
Signal + Noise
Signal + Noise
f (x)
s
D
M
xc θ
I If signal is truly present, then

∫ ∞ ∫ 𝑥𝑐
𝑃(𝐶𝐷|𝑆) = 𝑓𝑠 (𝑥)𝑑𝑥 and 𝑃(𝑀 𝐷 |𝑆) = 𝑓𝑠 (𝑥)𝑑𝑥
𝑥𝑐 −∞
I ... and we want 𝑥 𝑐 as small as possible.

"a priori" and "a posteriori" probability
( ... all these probabilities! )
I Suppose two possible events; binary "one" and "zero".

I The probability of event ’𝑖 ’ (i.e. hypothesis 𝐻𝑖 ) given observation ’𝑦 ’ is described by the a
posteriori probability 𝑃𝑟 {𝐻𝑖 |𝑦}.
I The a priori probability of event ’𝑖 ’ (i.e. the probability of hypothesis 𝐻𝑖 ) is given by 𝑃𝑟{𝐻𝑖 }.
I i.e. a priori is something we know (or can know) on the forehand, while a posteriori is
something we have to experience.
Hypothesis Transsion probability Observation

(transmit) Pr{observed data | actual data transmitted}
Pr{ y | H }
(receive)
Pr{0T} = P0 = 0.7 “0” “0” Pr{0R}

Pr{0R|0T} = 0.8
Pr{0R|1T} = 0.2
Pr{1R|0T} = 0.2
Pr{1T} = P1 = 0.3 Pr{1R|1T} = 0.8

“1” “1” Pr{1R}
I The quantity of interest is the probability: 𝑃𝑟 {"(actual) symbol transmitted" | "observed

symbol" }
𝑃𝑟 {𝑦 |𝐻𝑖 }
I Find this by using Bayes’ rule: 𝑃𝑟{𝐻𝑖 |𝑦} = 𝑃𝑟 {𝑦 } 𝑃𝑟{𝐻𝑖 }
Example (Binary transmission)

I 𝑃𝑟{0𝑅} = 𝑃𝑟{0𝑅|1𝑇 }𝑃𝑟 {1𝑇 } + 𝑃𝑟{0𝑅|0𝑇 }𝑃𝑟 {0𝑇 }
= 0.2 × 0.3 + 0.8 × 0.7 = 0.62
I 𝑃𝑟{1𝑅} = 𝑃𝑟{1𝑅|0𝑇 }𝑃𝑟 {0𝑇 } + 𝑃𝑟 {1𝑅|1𝑇 }𝑃𝑟 {1𝑇 }
= 0.2 × 0.7 + 0.8 × 0.3 = 0.38
I 𝑃𝑟{1𝑇 |1𝑅} = 𝑃𝑟{1𝑅|1𝑇 }𝑃𝑟 {1𝑇 }/𝑃𝑟 {1𝑅}
= 0.8 × 0.3 / 0.38 = 0.632
I 𝑃𝑟{0𝑇 |0𝑅} = 𝑃𝑟{0𝑅|0𝑇 }𝑃𝑟 {0𝑇 }/𝑃𝑟 {1𝑅}
= 0.8 × 0.7 / 0.62 = 0.903
I And
I 𝑃𝑟{1𝑇, 1𝑅} = 𝑃𝑟 {1𝑇 |1𝑅}𝑃𝑟 {1𝑅} = 0.632 × 0.38 = 0.24
I 𝑃𝑟 {0𝑇, 0𝑅} = 𝑃𝑟 {0𝑇 |0𝑅}𝑃𝑟 {0𝑅} = 0.903 × 0.62 = 0.56
Bayes’ detection
Requirements
Given:
I The observations: 𝑦 𝑛 = 𝑠 𝑛 + 𝑛𝑛 for 𝑛 = 1 . . . 𝑁, i.e. 𝒚 = (𝑦 1 , 𝑦 2 , . . . 𝑦 𝑁 )𝑇
Requirements:
I The likelihood function 𝑓𝒚 |𝑯𝒊 ( 𝒚|𝐻𝑖 ) = 𝑓𝑖 ( 𝒚)
= the density of 𝒚 under the i’th hypothesis, for 𝑖 = 0, 1.
I The prior probability 𝑃𝑟 {𝐻𝑖 } as 𝑃𝑖 .
I Payoff matrix:
Cost associated with choosing hypothesis "𝑖 " when hypothesis " 𝑗 " is true: 𝐶𝑖 𝑗 .
Bayes’ detection
Details I The average cost (risk of penalty function):
𝐶 = 𝐶00 𝑃𝑟{choose 0, 0 is true}
I Possible outcomes: + 𝐶10 𝑃𝑟{choose 1, 0 is true}
(a) Choose 𝐻0 ; 𝐻0 is true (correct) + 𝐶11 𝑃𝑟{choose 1, 1 is true}
(b) Choose 𝐻1 ; 𝐻0 is true (mistake) + 𝐶01 𝑃𝑟{choose 1, 1 is true}
(c) Choose 𝐻1 ; 𝐻1 is true (correct) I Using Bayes’ rule, this becomes
(d) Choose 𝐻0 ; 𝐻1 is true (mistake) 𝐶 = 𝐶00 𝑃0 𝑃𝑟{choose 0 | 0 is true}
I The costs for the problem: +𝐶10 𝑃0 𝑃𝑟{choose 1 | 0 is true}
(𝐶𝑖 𝑗 : choosing "𝑖 " when " 𝑗 " is true) +𝐶11 𝑃1 𝑃𝑟{choose 1 | 1 is true}
(a) 𝐶00 (choose 0, 0 is true) +𝐶01 𝑃1 𝑃𝑟{choose 1 | 1 is true}
(b) 𝐶10 (choose 1, 0 is true)
I I.e.
(c) 𝐶11 (choose 1, 1 is true)
𝐶 =𝐶00 𝑃0 (1 − 𝑃𝐹 𝐴) + 𝐶10 𝑃0 𝑃𝐹 𝐴
(d) 𝐶01 (choose 0, 1 is true)
+ 𝐶11 𝑃1 𝑃 𝐷 + 𝐶01 𝑃1 𝑃 𝑀
Bayes’ detection
Details (continuing)
I The average cost:
𝐶 =𝐶00 𝑃0 (1 − 𝑃 𝐹 𝐴) + 𝐶10 𝑃0 𝑃 𝐹 𝐴
+ 𝐶11 𝑃1 𝑃 𝐷 + 𝐶01 𝑃1 𝑃 𝑀 .
I can be expressed as:
∫
𝐶 =𝐶00 𝑃0 𝑓0 (𝒚) 𝑑 𝒚
𝑅 I it can be expressed as
∫0
∫
+ 𝐶10 𝑃0 𝑓0 ( 𝒚) 𝑑 𝒚
𝑅1 𝐶 =𝐶00 𝑃0 𝑓0 ( 𝒚) 𝑑 𝒚
∫ 𝑅
+ 𝐶11 𝑃1 𝑓1 ( 𝒚) 𝑑 𝒚 ∫0
𝑅1
∫ + 𝐶10 𝑃0 𝑓0 ( 𝒚) 𝑑 𝒚
𝑅−𝑅0
+ 𝐶01 𝑃1 𝑓1 ( 𝒚) 𝑑 𝒚. ∫
I And by using
𝑅0
+ 𝐶11 𝑃1 𝑓1 ( 𝒚) 𝑑 𝒚
𝑅−𝑅0
𝑅 = 𝑅1 ∪ 𝑅0 where 𝑅1 ∩ 𝑅1 = , and ∫
+ 𝐶01 𝑃1 𝑓1 ( 𝒚) 𝑑 𝒚.
∫
𝑓𝑖 ( 𝒚) 𝑑 𝒚 = 1 for 𝑖 = 0, 1 𝑅0
𝑅
Bayes’ detection
Details (continuing ...)
I The average cost∫reduces to:

𝐶 =𝐶00 𝑃0 𝑓0 (𝒚) 𝑑 𝒚
𝑅0
∫

+𝐶10 𝑃0 1− 𝑓0 (𝒚) 𝑑 𝒚
𝑅0
∫
I Assuming cost of a bad decision is always
+𝐶11 𝑃1 1− 𝑓1 (𝒚) 𝑑 𝒚
𝑅0 larger than the cost of a good one, i.e.
∫
+𝐶01 𝑃1 𝑓1 ( 𝒚) 𝑑 𝒚.
𝑅0 𝐶10 − 𝐶00 > 0 and 𝐶01 − 𝐶11 > 0
I This can be rewritten as:
fixed cost I then all terms in the variable cost are positive.
z }| {
𝐶 = 𝐶10 𝑃0 + 𝐶11 𝑃1 + · · · I Optimal choice of 𝑅0 : Assign 𝒚 to 𝑅0 when
∫

𝑃1 (𝐶01 − 𝐶11 ) 𝑓1 (𝒚) − 𝑃0 (𝐶10 − 𝐶00 ) 𝑓0 (𝒚) 𝑑𝒚 . negative term in integral is dominant!
|
𝑅0
{z } I Conversely: Assign 𝒚 to 𝑅1 when first term in
variable cost to be optimized integral is dominant; hence no additional cost
I I.e. optimal value dependents on 𝑅0 is added to the fixed cost.
Bayes’ detection
𝐻1
I The decision criterion becomes: 𝑃1 [𝐶01 − 𝐶11 ] 𝑓1 ( 𝒚) >
< 𝑃0 [𝐶10 − 𝐶00 ] 𝑓0 ( 𝒚)
𝐻0
𝐻1
𝑓1 (𝒚) > 𝑃0 [𝐶10 −𝐶00 ]
I The Bayes’ formulation: Λ( 𝒚) = 𝑓0 (𝒚) < 𝑃1 [𝐶01 −𝐶11 ] =𝜆
𝐻0
(𝒚)
I Λ(𝒚) = 𝑓𝑓1 (𝒚) is known as the likelihood ratio (LR).
0
I Interpretation:
I If Λ(𝒚) > 𝜆, decide on 𝐻1 .
I If Λ(𝒚) < 𝜆, decide on 𝐻0 .
I Note: Both sides are non-negative, i.e. the inequality is preserved when taking the logarithm
of both sides (or another monotonic function)
𝐻1
I ln Λ(𝒚) > ln 𝜆
<
𝐻0
I ln Λ(𝒚) is known as the log likelihood ratio (LLR).
Maximum a posteriori (MAP) detection
I Bayes’ theorem gives
I We have prior probabilities but either cannot,
or don’t want to, assign a cost for making the 𝑃𝑟 (𝐻𝑖 | y < 𝑦 ≤ y + 𝑑 y) =
possible decisions, i.e. the maximum a
𝑃𝑟 ( y < 𝑦 ≤ y + 𝑑 y | 𝐻𝑖 )
posteriori (MAP) detection philosophy. 𝑃𝑟 (𝐻𝑖 ), 𝑖 = 1, 2.
𝑃𝑟 ( y < 𝑦 ≤ y + 𝑑 y)
I Start with the dominant a posteriori
probability: I Using (when 𝑑 y becomes small)
𝐻1 𝑃𝑟 ( y < 𝑦 ≤ y + 𝑑 y | 𝐻𝑖 ) = 𝑓𝑖 (𝑦) 𝑑 y, and
𝑃𝑟 (𝐻1 | 𝑦) >
< 𝑃𝑟 (𝐻0 | 𝑦) 𝑃𝑟 ( y < 𝑦 ≤ y + 𝑑 y) = 𝑓 (𝑦) 𝑑 y.
𝐻0
I i.e. given 𝑦, which og 𝐻0 and 𝐻1 is more likely I we get

true? 𝑓𝑖 (𝑦)
𝑃𝑟 (𝐻𝑖 |𝑦)) = 𝑃𝑟 (𝐻𝑖 ), 𝑖 = 1, 2.
I Rewrite in incremental form 𝑓 (𝑦)
𝐻1 I and the MAP detector

𝑃𝑟 (𝐻1 | y < 𝑦 ≤ y + 𝑑 y) >
< 𝐻
𝐻0 𝑓1 (𝑦) >1 𝑃𝑟 (𝐻0 ) 𝑃0
= .
𝑃𝑟 (𝐻0 | y <𝑦 ≤ y + 𝑑 y) 𝑓0 (𝑦) 𝐻< 𝑃𝑟 (𝐻1 ) 𝑃1
0
Maximum likelihood (ML) criterion
I If no prior probability nor any cost information is available, then comparing the likelihood
function will give a meaningful detection scheme.
I That is: "Given the data, which conditional PDF (i.e. 𝑓0 ( 𝒚) or 𝑓1 ( 𝒚)) has more likely
generated the data?"
I In the scalar case, the detection scheme becomes
𝐻1
>
𝑓1 (𝑦) < 𝑓0 (𝑦)
𝐻0
I and for multiple observations

𝐻1
𝑓1 ( 𝒚) >
< 1
𝑓0 ( 𝒚) 𝐻0
Minimum probability of error criterion
I The criterion is used in binary communication problems where the cost of making an error is
the same and the cost of making a correct decision is zero.
I We do allow non-equal prior probabilities (depends on the source coding).
I We set up the average cost in the same was as done in the Bayes’ case.
I The average cost becomes (𝐶00 = 𝐶11 = 0, 𝐶10 = 𝐶01 = 1)
∫ ∫
𝐶 = 𝑃0 𝑓0 ( 𝒚) 𝑑 𝒚 + 𝑃1 𝑓1 ( 𝒚) 𝑑 𝒚
𝑅1 𝑅0
I The Bayes’ approach then gives

𝐻1
𝑓1 (𝒚) > 𝑃0
Λ( 𝒚) = <
𝑓0 (𝒚) 𝐻0 𝑃1
I This is the same expression (including the threshold) as that of the MAP detector!
Min-Max criterion
I The cost information (i.e. 𝐶𝑖 𝑗 , 𝑖, 𝑗 = 0, 1)
is available but not the a priori
probabilities 𝑃0 or 𝑃1 .
I The best Bayes’ detection scheme is then
desirable: The worst possible choice of
𝐻𝑖 , 𝑖 = 0, 1 (Murphy’s law)
I The average Bayes’ cost (using 𝑃𝐹 𝐴 and
𝑃 𝑀 ):
I If we assume 𝐶00 = 𝐶11 = 0 (as in most real
𝐶 =𝑃0 𝐶00 + 𝑃1 𝐶11 + · · · life situations), we can plot 𝐶 as a function of
𝑃1 and either get a straight line or a curve as in
𝑃1 (𝐶00 − 𝐶11 )𝑃 𝑀 + 𝑃0 (𝐶10 − 𝐶00 )𝑃𝐹 𝐴
the figure.
I When 𝑃0 = 1, then 𝑃1 = 𝑃𝐹 𝐴 = 0 and I In the figure, the maximum occurs when the
𝐶 = 𝐶00 slope of the curve is zero.
I When 𝑃1 = 1, then 𝑃0 = 𝑃𝐹 𝐴 = 𝑃 𝑀 = 0 I To find the maximum, set the partial derivative
and 𝐶 = 𝐶11 of 𝐶 with respect to 𝑃1 to zero.
Min-Max criterion ...
I So given the maximum average cost, we
minimize it, hence the name Min-Max
criterion
𝜕𝐶 𝜕𝐶 n
= (1 − 𝑃1 )𝐶00 · · ·
𝜕𝑃1 𝜕𝑃1
+ 𝑃1 𝐶11 + 𝑃1 (𝐶00 − 𝐶11 )𝑃 𝑀 · · ·
o
+ (1 − 𝑃1 ) (𝐶10 − 𝐶00 )𝑃𝐹 𝐴
I The detector form using the Bayes’ cost
I Setting it to zero gives: formula is the Bayes detector with a threshold
𝜆 that requires the above equations to be true.
(𝐶11 − 𝐶00 ) + (𝐶01 − 𝐶11 )𝑃 𝑀
− (𝐶10 − 𝐶00 )𝑃𝐹 𝐴 = 0
I If 𝐶00 = 𝐶11 = 0 we get
𝐶01 𝑃 𝑀 = 𝐶10 𝑃𝐹 𝐴
The Neyman-Pearson criterion
I Approach:
I One of the first in a class called CFAR
I Fix the false alarm rate 𝑃 𝐹 𝐴 at a value 𝛼 (i.e.
(constant false alarm rate) detectors.
force (𝑃 𝐹 𝐴 − 𝛼) to zero) while maximizing
I Typically used in radar and sonar 𝑃 𝐷 (or minimizing 𝑃 𝑀 since 𝑃 𝑀 = 1 − 𝑃 𝐷 ).
applications I Easily solved with Lagrange multipliers.
I Allows constant false alarm rate.
I Cost function:
I Attractive when loss of life or essential
equipment are involved. minimize equality constraint
I Desirable when resource allocations are z
∫ }| { z }| {
h∫ i
important and the maximum false alarm 𝐽= 𝑓1 ( 𝒚) 𝑑 𝒚 +𝜆 [ 𝑓0 (𝒚) 𝑑 𝒚 − 𝛼
rate has to be kept at a tolerable level. 𝑅 𝑅1
∫ 0 ∫
I In essence, the false alarm rate is fixed h i
= 𝑓1 ( 𝒚) 𝑑 𝒚 + 𝜆 1 − [ 𝑓0 ( 𝒚) 𝑑 𝒚 − 𝛼
while the detection probability is 𝑅0 𝑅0
maximized. ∫

=𝜆(1 − 𝛼) + 𝑓1 ( 𝒚) − 𝜆 𝑓0 ( 𝒚) 𝑑 𝒚
𝑅0
The Neyman-Pearson criterion ...
I Solving the cost function:
∫

𝐽 =𝜆(1 − 𝛼) + 𝑓1 ( 𝒚) − 𝜆 𝑓0 ( 𝒚) 𝑑 𝒚
𝑅0
I For fixed 𝜆, the first term is constant (>0).
I Need only to concider I By solving the above equation, we get:
∫
𝐻1
𝑓1 ( 𝒚) − 𝜆 𝑓0 (𝒚) 𝑑 𝒚 >
𝑅0 𝑧= < 𝛾 , and
𝐻0
I Same as for the Bayes’ detector.
I We get I The false alarm rate is then given by
𝐻1
> ∫ ∞
𝑓1 (𝒚) 𝜆 𝑓0 (𝒚)
<
𝐻0
𝑃𝐹 𝐴 = 𝑓0 (𝑧) 𝑑𝑧 = 𝛼
𝛾
I or 𝐻1
𝑓1 (𝒚) > I The threshold 𝛾 is chosen so that the 𝑃𝐹 𝐴
Λ( 𝒚) = < 𝜆
𝑓0 (𝒚) 𝐻0 constraint is met (i.e. 𝑃𝐹 𝐴 = 𝛼).
Receiver operator characteristic (ROC)
Detector performance evaluation
I Usually, we want to know

I The probability of detection, 𝑃 𝐷 .
I (The probability of miss, 𝑃 𝑀 , but 𝑃 𝐷 + 𝑃 𝑀 = 1).
I The probability of false alarm, 𝑃 𝐹 𝐴.
I By plotting 𝑃 𝐷 against 𝑃𝐹 𝐴 we can compare the performance of different detectors.
I For each unique threshold, a unique pair of 𝑃 𝐷 and 𝑃𝐹 𝐴 exists!
Receiver operator characteristic (ROC) (continuing)
xc small, ß small
1
increasing ß
operating point
P(D|S)
xc large, ß large
0
0 P(FA|N) 1
I Generally: 𝛽 = 𝜆 = 𝜕𝑃𝐹 𝐴
𝜕𝑃𝐷 , i.e. the slope is the threshold of the detector.
𝑓𝑠 ( 𝑥𝑐 )
I Also, from Bayes’ detector: 𝛽 = 𝑓𝑛 ( 𝑥𝑐 )
Receiver operator characteristic (ROC) (continuing ...)
increasing d’
P(D|S)
0
0 P(FA|N) 1
I Signal-to-noise ratio (SNR): 𝑑 0 = 𝜎𝜃 .

I A given value of 𝑑 0 traces out a unique ROC curve as 𝛽 varies.
Terminology
Decision
Noise only (H ) Signal+Noise (H )

0 1
Noise only
False Alarm (FA)
Correct Rejection (CR)
H0
True state
Type I Error
True negative (TN)
False Positive (FP)
Signal+Noise
Missed Detection (M)

Correct Detection (D)
H1
Type II Error
True positive (TP)
False Negative (FN)
Terminology ...
From wikipedia.org
I Sensitivity or true positive rate (TPR): 𝑇 𝑃𝑅 = 𝑇 𝑃/𝑃 = 𝑇 𝑃/(𝑇 𝑃 + 𝐹 𝑁)

I Specificity (SPC) or true negative rate: 𝑆𝑃𝐶 = 𝑇 𝑁/𝑁 = 𝑇 𝑁/(𝐹𝑃 + 𝑇 𝑁)
I Prediction or positive predictive value (PPV): 𝑃𝑃𝑉 = 𝑇 𝑃/(𝑇 𝑃 + 𝐹𝑃)
I Negative predictive value (NPV): 𝑁 𝑃𝑉 = 𝑇 𝑁/(𝑇 𝑁 + 𝐹 𝑁)
I Fall-out or false positive rate (FPR): 𝐹𝑃𝑅 = 𝐹𝑃/𝑁 = 𝐹𝑃/(𝐹𝑃 + 𝑇 𝑁) = 1 − 𝑆𝑃𝐶
I False negative rate: 𝐹 𝑁 𝑅 = 𝐹 𝑁/(𝑇 𝑃 + 𝐹 𝑁) = 1 − 𝑇 𝑃𝑅
I False discovery rate (FDR): 𝐹 𝐷 𝑅 = 𝐹𝑃/(𝑇 𝑃 + 𝐹𝑃) = 1 − 𝑃𝑃𝑉
I Accuracy (ACC): 𝐴𝐶𝐶 = (𝑇 𝑃 + 𝑇 𝑁)/(𝑃 + 𝑁)
Terminology (From wikipedia.org)
OBS: Flipped compared to previous foil

Sensitivity
I Consider the example of a medical test used to identify a disease.

I Sensitivity of the test is the proportion of people known to have the disease, who test positive
for it:
no. of true positives
Sensitivity =
no. of true positives + no of false negatives
no. of true positives
=
total no. of sick individuals in population
= prob. of a positive test, given that the pasient is ill
I A negative result in a test with high sensitivity is useful for ruling out disease.
I https://en.wikipedia.org/wiki/Sensitivity_and_specificity
Specificity
I Consider the example of a medical test for diagnosing a disease.

I Specificity of a test is the proportion of healthy patients known not to have the disease, who
will test negative for it:
no. of true negatives
Specificity =
no. of true negatives + no of false positives
no. of true negatives
=
total no. of well individuals in population
= prob. of a negative test, given that the pasient is well
I Positive result in a test with high specificity is useful for ruling in disease.
I https://en.wikipedia.org/wiki/Sensitivity_and_specificity
Sensitivity and Specifisity
From faktisk.no
https://www.faktisk.no/artikler/r8q/er-14-av-15-positive-koronaprover-falske
Worked example
I Norwegian Directorate of Health:

I Evaluation of Abbot’s Panbio COVID-19 rapid antigen test in Norway, 3 December 2020.
I https://www.helsedirektoratet.no/rapporter/
evaluation-of-abbots-panbio-covid-19-rapid-antigen-test-in-norway/
I From a total of 3991 cases, 250 where positive.
I The rapid covid test found

I 186 out of the 250 positive. I This gives
I 3738 out of the 3741 negative. I Sensitivity, TP / (TP + FN): 74.4 %
I Specifisity, TN / (FP + TN): 99.9 %
I This gives the following outcomes
I If we test 100 000 healthy, 100 of these
Decision would be found positive.
CR = TN = 3738 FA = FP = 3 I If we test 1000 with prevalence of 4 ‰, we
True
state
M = FN = 64 D = TP = 186 will miss out on one positive and fail on

one negative.

SSP 2 3 - Detection

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SSP 2 3 - Detection

Uploaded by

Copyright:

Available Formats

IN5340/IN9340

Dept. of Informatikk, University of Oslo

Some typical estimation examples are

Some typical detection examples are

Examples of events can be

The events are converted into representative signals.

Example (Intrusion alarm, Ex 1.12)

I Usually: dim(𝐷) = dim(𝐸)

Example (Multiple decision, Ex 1.16)

I Pdf symmetric about the mean 𝑚 𝑥 .

cumulative distribution: 0.15

In a basic problem, we would have:

I 𝑓𝑛 (𝑥): probability density function of the noise.

Noise only Signal + Noise

I 𝑓𝑠 (𝑥): probability density function of signal + noise

Noise only Signal + Noise

Noise only Signal + Noise

small β Decision θ large β

Noise only (H ) Signal+Noise (H )

False Alarm (FA)

Missed Detection (M)

I If truly noise only, then

I ... and we want 𝑥 𝑐 as large as possible.

I If signal is truly present, then

I ... and we want 𝑥 𝑐 as small as possible.

I Suppose two possible events; binary "one" and "zero".

Hypothesis Transsion probability Observation

Pr{0T} = P0 = 0.7 “0” “0” Pr{0R}

Pr{1T} = P1 = 0.3 Pr{1R|1T} = 0.8

I The quantity of interest is the probability: 𝑃𝑟 {"(actual) symbol transmitted" | "observed

Example (Binary transmission)

I The average cost∫reduces to:

I i.e. given 𝑦, which og 𝐻0 and 𝐻1 is more likely I we get

𝐻1 I and the MAP detector

I and for multiple observations

I The Bayes’ approach then gives

I If 𝐶00 = 𝐶11 = 0 we get

I Usually, we want to know

I Signal-to-noise ratio (SNR): 𝑑 0 = 𝜎𝜃 .

Noise only (H ) Signal+Noise (H )

Missed Detection (M)

I Sensitivity or true positive rate (TPR): 𝑇 𝑃𝑅 = 𝑇 𝑃/𝑃 = 𝑇 𝑃/(𝑇 𝑃 + 𝐹 𝑁)

OBS: Flipped compared to previous foil

I Consider the example of a medical test used to identify a disease.

I Consider the example of a medical test for diagnosing a disease.

I Norwegian Directorate of Health:

I The rapid covid test found

M = FN = 64 D = TP = 186 will miss out on one positive and fail on

You might also like