Professional Documents
Culture Documents
Basic concepts
Florian RUPPIN
Université Grenoble Alpes / LPSC
ruppin@lpsc.in2p3.fr
Course content: Benoit Clément
2 Bibliography
SAMPLE
• Finite size
• Selected through a random process
eg. Result of a measurement
POPULATION
• Potentially infinite size
eg. All possible results
Experiment
• Described by:
⌦
Ā
A
P : {Events} [0 : 1]
A P(A)
• Properties:
P(⌦) = 1
P(A or B) = P(A) + P(B) if (A and B) = ;
• Interpretation:
⌦ B
A
• Event “A and B” associated with the
intersection of the subsets
A and B
⌦ B
A • Event “A or B” associated with the
A or B union of the subsets
P(A or B) = P(A) + P(B) P(A and B)
⌦ B
0
⌦ =B
A
P(A and B)
P(A|B) =
P(B)
A and B
A|B
• The definition of the conditional probability leads to:
P(A and B) = P(A|B).P(B) = P(B|A).P(A)
Relation between P(A|B) and P(B|A), the Bayes theorem:
• Two events are independent, if the realization of one is not linked in any way to
the realization of the other: P(A|B) = P(A) and P(B|A) = P(B)
P(X = x) = 0
• For a real variable:
Cumulative density function: F(x) = P(X < x)
Z
x x
+1
By construction:
f (x)dx = P(⌦) = 1
1 F( 1) = P(;) = 0
Note - Discrete variables can also be described by a
probability density function using Dirac distributions: F(+1) = P(⌦) = 1
X Z a
f (x) = p(i) (i x) F(a) = f (x)dx
i 1 Z
X b
with p(i) = 1 P(a < X < b) = F(b) F(a) = f (x)dx
a
i
a population
a sample
• To this end, it is useful to consider a sample as a finite set from which one
can randomly draw elements, with equipropability.
We can then associate to this process a probability density:
the empirical density or sample density
1X
fsample (x) = (x i)
n i
This density will be useful to translate properties of distribution to a finite sample.
• Measure of location:
• Measure of dispersion:
Uncertainty / Error
Mean value: Sum (integral) of all possible values weighted by the probability of occurrence
Z +1 Xn
1
µ = x̄ = xf (x)dx µ = x̄ = xi
1 n i=1
2
Standard deviation ( ) and variance ( v = ): Mean value of the
squared deviation to the mean
Z Xn
2 1
v= 2
= (x 2
µ) f (x)dx v= = (xi µ)2
n i=1
Koenig’s
Z theorem: Z Z
2
= x2 f (x)dx + µ2 f (x)dx 2µ xf (x)dx = x2 µ2 = x2 x̄2
Variable : K = 6.5
Parameters : k
e
Law : P (k; ) =
Mean :
k!
Variance :
Poisson distribution
k
e
p small, k ⌧ n P (k; ) =
k!p
np = µ= = & 25
k!(n k)! 2⇡
p
µ = np = np(1 p)
µ=µ =
n & 30
Chi-square distribution
n C n
⇣n⌘
f (C; n) = C 2 1 e 2 /2 2
p 2
µ=n = 2n
Independent Uncorrelated
Parameters : n, p1 , p2 , ...pS
n!
: P (~
k1 k2 kS
Law k; n, p~) = p1 p2 ...pS
k1 !k2 !...kS !
Mean : µi = npi
2
Variance : i = npi (1 pi ) Cov(Ki , Kj ) = npi pj
Note: Variables are not independent. The binomial corresponds to S = 2 but has
only one independent variable
• Multinormal distribution:
Parameters : µ
~,⌃
1 1
~ )T ⌃ 1
Law : ~ , ⌃) = p
f (~x; µ e 2 (~
x µ (~
x µ
~)
2⇡|⌃|
Y 1 (xi µi )2
2 2
If uncorrelated: f (~
x; µ
~ , ⌃) = p e i
i 2⇡
Independent Uncorrelated
1 c2
fC (c) !p e 2
n!+1 2⇡
The sum of many random fluctuations is normally distributed
Gaussian Gaussian
X1 + X2
X1 p
2
Gaussian Gaussian
X1 + X2 + X3 X1 + X2 + X3 + X4 + X5
p p
3 5
Measured value: ✓
True value: ✓0
Measured value: ✓ = ✓0 +
O + S + P
2
• Each i is a realization of a random variable of mean 0 and variance i
For uncorrelated error sources:
9
O =↵ O =
2 2 2 2 2 2 2 2 2
S =↵ S tot = (↵ tot ) = ↵ ( O + S + P) = O + S + P
;
P =↵ P
• Choice for ↵ :
Many sources of error central limit theorem normal distribution
↵ = 1 gives (approximately) a 68% confidence interval
↵ = 2 gives a 95% confidence interval
Florian Ruppin - ESIPAP - 02/02/2018
31 Error propagation
df
f (x) 0
f (x) =
dx
• Measure: x± x
f
• Compute: f (x) ! f ? f
x
x
Assuming small errors and using
the Taylor expansion: x
df 1 d2 f
f (x + x) = f (x) + x+ x2
dx 2 dx2
df 1 d2 f
f (x x) = f (x) x+ x2
dx 2 dx2
1 df
f = |f (x + x) f (x x)| = x
2 dx
Florian Ruppin - ESIPAP - 02/02/2018
32 Error propagation
• Measure: x± x, y ± y zm = f (xm , ym ) @f
=
df (x, ym )
@x dx
• Compute: f (x, y, ...) f? 2 xf Curve z = f (x, ym ) fixed ym
@f @f ym
xf = x and yf = y
@x @y xm
2 x
Then: 2 y
✓ ◆2 ✓ ◆2
2 2 2 @f @f @f @f
f = xf + yf + 2⇢xy xf yf = x + y + 2⇢xy x y
@x @y @x @y
X ✓ @f ◆2 X @f @f
2
f = xi +2 ⇢x i x j xi xj
i
@xi i,j<i
@xi @xj
• For any unbiased estimator of ✓ , the variance cannot exceed (Cramer-Rao bound):
!
1 1 !
ˆ
⇥
h 2
i 1
= ⇥ @ lnL
2 ⇤ 1
E @lnL
ˆ
E h @✓ 2 i = ⇥ ⇤
@✓ ⇥ @lnL 2
2
@ lnL
E E @✓ 2
@✓
4
✓ ◆
2 n 1 2 4
Variance of the estimator (convergence): ˆ2 = 2 +2 !
n 1 n n
@L
=0
@✓ ✓=✓ˆ
Note: system of equations for several parameters
Note: minimizing lnL often simplify the expression
Asymptotically unbiased
Asymptotically efficient (reaches the Cramer-Rao bound)
Asymptotically normally distributed
ˆ~
~ 1 1 ~
(
ˆ
✓ ~ T⌃
✓) 1 ~ˆ ✓)
(✓ ~ 1 @lnL @lnL
f (✓; ✓, ⌃) = p e 2
⌃ij = E
2⇡|⌃| @✓i @✓j
• Goodness of fit: The value of ˆ is Chi-2 distributed with
2lnL(✓)
ndf = sample size number of parameters
Z +1
p value = f 2 (x; ndf)dx Probability of getting
ˆ
2lnL(✓) a worse agreement
ˆ 1X
lnL = lnL(✓) lnL(✓) = 1
⌃ij (✓i ✓ˆi )(✓j ˆ 3
✓j ) + O(✓ )
2 i,j
Confidence contours are defined by the equation:
Z 2
n ✓ 1 2 3
↵
lnL = (n✓ , ↵) with ↵ = f 2 (x; n✓ )dx
0 68.3 0.5 1.15 1.76
Values of for different number 95.4 2 3.09 4.01
parameters n✓ and confidence levels ↵
99.7 4.5 5.92 7.08
i
2⇡ yi
@L @lnL @K 2 Least squares or Chi-2 fit is the maximum
=0, 2 = =0 likelihood estimator for Gaussian errors
@✓ @✓ @✓
~ 1
• Generic case with correlations: K (✓) = (~
y 2
f~(x, ✓))
~ T⌃ 1
(~y f~(x, ✓))
~
2
Florian Ruppin - ESIPAP - 02/02/2018
43 Example: fitting a line
• For f (x) = ax K 2 (a) = Aa2 2Ba + C = 2lnL
X x2 X x i yi X y2
i , i
A= 2
B= 2
, C= 2
y i i
y i i
y i
i
@K 2 B
= 2Aa 2B = 0 â =
@a A
@2K 2 2 1
2
= 2A = 2 â = a = p
@a a A
@K 2
9
@a = 2Aa + 2Cb 2D = 0 =
BD EC AE BC
â = , b̂ =
@K 2 ; AB C 2 AB C2
@b = 2Ca + 2Bb 2E = 0
2 2
9
@ K 1 >
@a2 = 2A = 2⌃11 >
> ⌃ 1
=
A C
⌃=
1 B C
>
> C B AB C2 C A
=
@2K2
@b2 = 2B = 2⌃221 r r
>
>
>
> â = =
B
, b̂ = =
A
2
@ K 2
1
>
; a
AB C2 b
AB C2
@a@b = 2C = 2⌃12
contour at 68.3%
contour at 95.4%
contour at 99.7%
parameter
parameter a, slope
1
Finally: f (x) = lim h(x)
n!+1 n
!0
• For Bayesians: the posterior density is the probability density of the true value.
It can be used to estimate an interval: P(✓ 2 [a, b]) = ↵
• No such thing for a Frequentist: the interval itself becomes the random
variable [a, b] is a realization of [A, B]
Systematic uncertainties
+ ⌫+
L(✓, ⌫) with ⌫ = ⌫0 ± ⌫ or ⌫0 ⌫
• In Bayesian statistics, nuisance parameters are dealt with by assigning them a prior ⇡(⌫).
• Usually a multinormal law is used with mean ⌫0 and covariance matrix estimated
from ⌫0 (+ correlation if needed)
f (x|✓, ⌫)⇡(✓)⇡(⌫)
f (✓, ⌫|x) = RR
f (x|✓, ⌫)⇡(✓)⇡(⌫)d✓d⌫
• The final posterior distribution is obtained by marginalization over the nuisance parameters:
Z R
f (x|✓, ⌫)⇡(✓)⇡(⌫)d⌫
f (✓|x) = f (✓, ⌫|x)d⌫ = RR
f (x|✓, ⌫)⇡(✓)⇡(⌫)d✓d⌫
• PL(✓) has the same statistical asymptotical properties than the regular likelihood
Checking the compatibility of two datasets {xi }, {yi }: are they issued
from the same distribution ?
• In every case:
• Test for unbinned data: compare the sample cumulative density function to the tested one
Dn = 0.069
p value = 0.0617
1 : [0, 0.0875]