Professional Documents
Culture Documents
Trusina Acat2019
Trusina Acat2019
• To develop homogeneity (two sample) tests which can be applied to weighted unbinned data samples in ROOT
Goals: • To verify that suggested generalized tests statistics have their presumed distribution
• To compare power of tests for χ2 test, Kolmogorov-Smirnov test, Anderson-Darling test, and Cramér-von Mises test
• TMath::KolmogorovTest is the classical KS test which can be 0.6 • The ratios of rejec- 0.052
0.05
F(x)
0.5 0.3
0.048
Var[W]
0.3 other experiment dif- 0.2 0.044
– both tests are applicable to unweighted and unbinned samples 0.1 various combinations
0.1
0.038
140
binned data) can lead to differ- 0.6 0.6 0.6
120 ent decision if user adjusts the
F(x)
F(x)
F(x)
0.5 0.5 0.5
100
80
binning configuration 0.4 0.4 0.4
60
40
– higher number of bins usually 0.3 0.3 0.3
140
tion of sample’s distribution
120 within each bin, EDF keeps and Fm V (x) → F (x) a.s. for every x ∈ R. For KS, AD and CvM tests we also verified p-values distribution
100
complete information in case of FX = FY and both samples have weights produced independently from some random nonnegative
80
60
– every difference inside bin can distribution (as in the counterexample).
40 be counted (such as Cramér-
20
von Mises (CvM) or AD test)
IV. Power of test comparison
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5
Power of test differs for various experiments’ setting. We carried out another experiment in which we observed
– binned KS test does not find
the effect of six parameters on the power of test.
An example of various binning of the same sample and its effect. the true maximum distance be-
Samples were produced from N(0,1) and N(0.1,1) tween EDFs but maximum dis- Experiment’s description
• Two different binning configuration tance between cumulative his- We produced two samples from N(0,1) and N(µs, (1+σs)2). All weights of the first sample are equal to 1 while
tograms which is most likely weights of the second sample were independently generated from Gamma(k, θ). Parameters k and θ will be
nbins = 10, min = -2.5, max = 2.5, pval = 0.0387
lower represented by mean (µw ) and variance (σw ) of weights. The first sample’s size is equal to n while the other
nbins = 11, min = -2.45, max = 2.55, pval = 0.0972 sample’s is equal to k·n. For every setting of (µs, σs, µw , µw , σw , n, k) we repeated procedure 1000 times
and calculated ratio of rejected tests (r) on significance level α = 0.05 which is power of test’s estimate.
II. Generalized homogeneity tests
We suggest modifications of KS, CvM and AD homogeneity tests statistics. Let (X, W ) = Results
=0 =0.1 =0.2
((X1, ..., Xn)0, (W1, ...,P
Wn)0) be first sample with its weights and (Y , V ) = ((Y1, ..., Ym)0, (V1, ..., Vm)0) the 1
s
1
s
1
s
second one. Let W• = ni=1 Wi and K 1 (·) be Bessel function of the third kind. AD test
4 0.8 0.8 0.8
CvM test
• Fundamental definitions KS test
0.6 2
test 0.6 0.6
Weighted EDF Effective sample size Mixed sample’s WEDF
r
n
2 0.4 0.4 0.4
P
n Wi
W ,V neFnW (x)+meFmV (x)
FnW (x) = W1 i=1
P
Wi1(−∞,Xi](x)
•
ne = n Hne,me (x) = ne+me
0.2 0.2 0.2
Wi2
P
i=1
i=1 0 0 0
0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5
• Kolmogorov-Smirnov test s s s
q
W ,V neme W (x) − F V (x)
Test statistic Tn,m = ne+me sup F n m Figure 1: Parameter setting: (µw , σw , n, k) = (0.3, 0.1, 200, 10). AD test has the highest ratio of rejected tests
x∈R for both changing parameter µs and σs. This is also true for µs = 0.3, 0.4, 0.5.
+∞ 2 2 w
=0.01 w
=0.2 w
=0.4
Presumed asymptotic distribution K(λ) = 1−2
P k+1
(−1) e −2k λ 1 1 1
k=1 AD test
0.8 CvM test 0.8 0.8
• Cramér-von Mises test KS test
2 0.6 2
test 0.6 0.6
W ,V W (x) − F V (x) dH W ,V
= nne+m
e me
R
Test statistic Tn,m F
r
e n m ne,me
R 0.4 0.4 0.4
+∞ − 12
2
2
1 (1+4k) (1+4k) 0.2 0.2 0.2
(−1)k k
P √
Presumed as. dist. LCvM(z) = π√ z
1 + 4k exp − 16z K1 16z
4
k=0 0 0 0
0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5
• Anderson-Darling test
w w w
2
W ,V FnW (x)−FmV (x)
( ) W ,V
Tn,m = nne+m
e me
R
Test statistic dHne,me Figure 2: Parameter setting: (µs, σs, n, k) = (0.1, 0.1, 500, 20). AD test has the highest r for both changing
e W ,V W ,V
0<HnWe,m
,V
e (x)<1
Hne,me (x) 1−Hne,me (x) parameter µw and σw . However, it is interesting discovery that rising σw lowers power of test. χ2 test is
√ +∞ 1 2 2
unstable for small number of events (when µw = 0.01).
−
LAD(z) = z2π 2 (1 + 4k)exp − (1+4k) π
P
Presumed k 8z k=1 k=20 k=40
k=0 1 1 1
asympt.
distribution +∞
(1+4k)2π 2w2 0.8 0.8 0.8
R z
exp 8(w2+1) − dw
8z
0 0.6 0.6 0.6
r
AD test
III. Numerical verification of presumed distributions 0.4
CvM test
0.4 0.4
Since no theoretical proof of asymptotic properties has been done yet, we can demonstrate them numerically. KS test
0.2 2
0.2 0.2
test
If we consider data as random variables, distribution of test statistic is a continuous function, and the null
0 0 0
hypothesis is true then 10 2 10 3
10 10 24
10 3
10 10 2
4
10 3 10 4
n n n
W ,V
p-value =˙ 1 − FT Tn,m ∼ U(0, 1).
Figure 3: Parameter setting: (µs, σs, µw , σw ) = (0.1, 0.2, 0.4, 0.01). AD test has again the highest ratio of
rejected tests for both changing parameter k and n.
We carried out an experiment in which we produced two samples from different distributions and assigned
them weights in such a way that their WEDFs converge to the same distribution. Afterward, we applied Acknowledgment We acknowledge support from the Czech CTU grant SGS18/188/OHK4/3T/14 and
homogeneity tests. MEYS grants LM2015068 and LTT18001.