You are on page 1of 1

ACAT 2019, Poster Session Saas-Fee, Switzerland

Generalization of Homogeneity Tests Used in HEP Experiments


Jakub Trusina, Jiřı́ Franc, and Adam Novotný
Department of Mathematics, Faculty of Nuclear Sciences and Physical Engineering,
Czech Technical University in Prague
jakub.trusina@fjfi.cvut.cz

• To develop homogeneity (two sample) tests which can be applied to weighted unbinned data samples in ROOT
Goals: • To verify that suggested generalized tests statistics have their presumed distribution
• To compare power of tests for χ2 test, Kolmogorov-Smirnov test, Anderson-Darling test, and Cramér-von Mises test

I. Introduction Experiment’s description


Homogeneity tests currently available in ROOT We repeated the whole procedure 10 000 times. Then we plotted EDF of each test’s p-values and compared
it to CDF of U(0,1).
• TH1::Chi2Test
• Second sample
– it allows testing weighted data • First sample
– distribution: Y ∼ N(0.5,1.52)
– it is unreliable when sample sizes are significantly different – distribution: X ∼ N(0,1) – sample size: m = 100 000
– it can be applied only to binned data; however, various binning can – sample size: n = 1000 1.5ϕ(Yi) 
lead to different test’s conclusion – weights: Vi = Yi −0.5
where ϕ is the standard normal distribution
– weights: Wi = 1 100 ϕ 1.5
• TH1::KolmogorovTest is a modification of the Kolmogorov-
Smirnov (KS) test that can be applied to binned weighted data; how- Results
ever, returned p-value is higher than the true one 1
2
pvals' distribution • Even though p-values of χ2 test are correct in this experiment, we
0.9
U(0,1) can show an counterexample. Let now X, Y ∼ N(0, 1) and Wi, Vi ∼
• TH1::AndersonDarlingTest is a modification of the Anderson- FN (pval)

0.8 Gamma(k, θ).


Darling (AD) test, it can be applied to binned unweighted data only
0.7 0.4

• TMath::KolmogorovTest is the classical KS test which can be 0.6 • The ratios of rejec- 0.052

0.05

applied only to unweighted and unbinned data tion (on significance

F(x)
0.5 0.3
0.048

• ROOT::Math::GoFTest 0.4 level 0.05) for this an- 0.046

Var[W]
0.3 other experiment dif- 0.2 0.044

– this class contains implementations of KS and AD test 0.042

0.2 fers significantly for 0.04

– both tests are applicable to unweighted and unbinned samples 0.1 various combinations
0.1

0.038

– AD test can be applied also to binned data 0 of k and θ. 0


0.036

0 0.2 0.4 0.6 0.8 1 0 0.25 0.5 0.75 1


E[W]
ordered p-values
Problems with binned data KS pvals' distribution AD pvals' distribution CvM pvals' distribution
1 1 1
hA U(0,1) U(0,1) U(0,1)
0.9 0.9 0.9
200 chi2 pval: 0.038683
• Different results FN (pval) FN (pval) FN (pval)

0.8 0.8 0.8


180
– χ2 test (or any other test of 0.7 0.7 0.7
160

140
binned data) can lead to differ- 0.6 0.6 0.6
120 ent decision if user adjusts the
F(x)

F(x)

F(x)
0.5 0.5 0.5
100

80
binning configuration 0.4 0.4 0.4
60

40
– higher number of bins usually 0.3 0.3 0.3

20 leads to higher p-value 0.2 0.2 0.2


−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5
0.1 0.1 0.1
• Advantages of tests based
0 0 0
hC on EDF 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
ordered p-values ordered p-values ordered p-values
180 chi2 pval: 0.097174
– while histogram loses informa- From the figures above, we can see that all four tests have their p-values distributed uniformly if the null
hypothesis is true. In this case, we consider the null hypothesis not as FX = FY but as FnW (x) → F (x) a.s.
160

140
tion of sample’s distribution
120 within each bin, EDF keeps and Fm V (x) → F (x) a.s. for every x ∈ R. For KS, AD and CvM tests we also verified p-values distribution
100
complete information in case of FX = FY and both samples have weights produced independently from some random nonnegative
80

60
– every difference inside bin can distribution (as in the counterexample).
40 be counted (such as Cramér-
20
von Mises (CvM) or AD test)
IV. Power of test comparison
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5
Power of test differs for various experiments’ setting. We carried out another experiment in which we observed
– binned KS test does not find
the effect of six parameters on the power of test.
An example of various binning of the same sample and its effect. the true maximum distance be-
Samples were produced from N(0,1) and N(0.1,1) tween EDFs but maximum dis- Experiment’s description
• Two different binning configuration tance between cumulative his- We produced two samples from N(0,1) and N(µs, (1+σs)2). All weights of the first sample are equal to 1 while
tograms which is most likely weights of the second sample were independently generated from Gamma(k, θ). Parameters k and θ will be
nbins = 10, min = -2.5, max = 2.5, pval = 0.0387
lower represented by mean (µw ) and variance (σw ) of weights. The first sample’s size is equal to n while the other
nbins = 11, min = -2.45, max = 2.55, pval = 0.0972 sample’s is equal to k·n. For every setting of (µs, σs, µw , µw , σw , n, k) we repeated procedure 1000 times
and calculated ratio of rejected tests (r) on significance level α = 0.05 which is power of test’s estimate.
II. Generalized homogeneity tests
We suggest modifications of KS, CvM and AD homogeneity tests statistics. Let (X, W ) = Results
=0 =0.1 =0.2
((X1, ..., Xn)0, (W1, ...,P
Wn)0) be first sample with its weights and (Y , V ) = ((Y1, ..., Ym)0, (V1, ..., Vm)0) the 1
s
1
s
1
s

second one. Let W• = ni=1 Wi and K 1 (·) be Bessel function of the third kind. AD test
4 0.8 0.8 0.8
CvM test
• Fundamental definitions KS test
0.6 2
test 0.6 0.6
Weighted EDF Effective sample size Mixed sample’s WEDF
r

 n
2 0.4 0.4 0.4
P
n Wi
W ,V neFnW (x)+meFmV (x)
FnW (x) = W1 i=1
P
Wi1(−∞,Xi](x)

ne = n Hne,me (x) = ne+me
0.2 0.2 0.2
Wi2
P
i=1
i=1 0 0 0
0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5
• Kolmogorov-Smirnov test s s s
q
W ,V neme W (x) − F V (x)
Test statistic Tn,m = ne+me sup F n m Figure 1: Parameter setting: (µw , σw , n, k) = (0.3, 0.1, 200, 10). AD test has the highest ratio of rejected tests
x∈R for both changing parameter µs and σs. This is also true for µs = 0.3, 0.4, 0.5.
+∞ 2 2 w
=0.01 w
=0.2 w
=0.4
Presumed asymptotic distribution K(λ) = 1−2
P k+1
(−1) e −2k λ 1 1 1
k=1 AD test
0.8 CvM test 0.8 0.8
• Cramér-von Mises test KS test
 2 0.6 2
test 0.6 0.6
W ,V W (x) − F V (x) dH W ,V
= nne+m
e me
R
Test statistic Tn,m F
r

e n m ne,me
R 0.4 0.4 0.4

+∞ − 12
 2
  2

1 (1+4k) (1+4k) 0.2 0.2 0.2
(−1)k k
P √
Presumed as. dist. LCvM(z) = π√ z
1 + 4k exp − 16z K1 16z
4
k=0 0 0 0
0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5
• Anderson-Darling test
w w w
2
W ,V FnW (x)−FmV (x)
( ) W ,V
Tn,m = nne+m
e me
R
Test statistic   dHne,me Figure 2: Parameter setting: (µs, σs, n, k) = (0.1, 0.1, 500, 20). AD test has the highest r for both changing
e W ,V W ,V
0<HnWe,m
,V
e (x)<1
Hne,me (x) 1−Hne,me (x) parameter µw and σw . However, it is interesting discovery that rising σw lowers power of test. χ2 test is
√ +∞ 1 2 2
unstable for small number of events (when µw = 0.01).

 
LAD(z) = z2π 2 (1 + 4k)exp − (1+4k) π
P
Presumed k 8z k=1 k=20 k=40
k=0 1 1 1
asympt.
distribution +∞
(1+4k)2π 2w2 0.8 0.8 0.8
 
R z
exp 8(w2+1) − dw
8z
0 0.6 0.6 0.6
r

AD test
III. Numerical verification of presumed distributions 0.4
CvM test
0.4 0.4

Since no theoretical proof of asymptotic properties has been done yet, we can demonstrate them numerically. KS test
0.2 2
0.2 0.2
test
If we consider data as random variables, distribution of test statistic is a continuous function, and the null
0 0 0
hypothesis is true then 10 2 10 3
10 10 24
10 3
10 10 2
4
10 3 10 4
  n n n
W ,V
p-value =˙ 1 − FT Tn,m ∼ U(0, 1).
Figure 3: Parameter setting: (µs, σs, µw , σw ) = (0.1, 0.2, 0.4, 0.01). AD test has again the highest ratio of
rejected tests for both changing parameter k and n.
We carried out an experiment in which we produced two samples from different distributions and assigned
them weights in such a way that their WEDFs converge to the same distribution. Afterward, we applied Acknowledgment We acknowledge support from the Czech CTU grant SGS18/188/OHK4/3T/14 and
homogeneity tests. MEYS grants LM2015068 and LTT18001.

You might also like