Appendix toFast Cross-Validation via Sequential Analysis
Tammo Krueger, Danny Panknin, Mikio Braun
Technische Universitaet BerlinMachine Learning Group10587 Berlin
t.krueger@tu-berlin.de
,
{
panknin|mikio
}
@cs.tu-berlin.de
1 Selection of Meta-Parameters for the Fast Cross-validation
The algorithm has a number of free parameters as can be seen from the pseudo-code in Algo-rithm 1: maxSteps, the number of subsamples sizes to consider,
α
, the significance level for thebinarization of the test errors,
α
l
,
β
l
, the significance levels for the sequential analysis test, andearlyStoppingWindow, the number of steps to look back in the early stopping procedure. Whilewe will give an in-depth treatment of the selection of
π
0
,π
1
and the maxSteps parameter in thefollowing sections we here give some suggestions for the other parameters. The parameter
α
con-trols the significance level in each step of the test for similar behavior. We suggest to set this tothe usual level of
α
= 0
.
05
. Furthermore
β
l
and
α
l
control the significance level of the
H
0
(con-figuration is a loser) and
H
1
(configuration is a winner) respectively. We suggest an asymmetricsetup by setting
β
l
= 0
.
1
, since we want to drop loser configurations relatively fast and
α
l
= 0
.
01
,since we want to be really sure when we accept a configuration as overall winner. Finally, we setearlyStoppingWindow to 3 for maxSteps
= 10
and 6 for maxSteps
= 20
, as we have observed thatthis choice works well in practice.
1.1 Choosing the Optimal Sequential Test Parameters
Asoutlinedinthemainpartofthepaperwewanttousethesequentialtestingframeworktoeliminateunderperforming configurations as fast as possible while postponing the decission for a winner aslong as possible. Using the parameters of the sequential testing framework we have to choose
π
0
and
π
1
, such that the the area of acceptance for
H
0
(region
H
0
(
π
0
,π
1
,β
l
,α
l
)
denoted by “LOSER” inthe overview figure) is maximized, while the earliest point of acceptance of
H
1
(
S
a
(
π
0
,π
1
,β
l
,α
l
)
in the overview figure) is postoned until the procedure has run at least maxStep steps:
(
π
0
,π
1
) = argmax
π
0
,π
1
H
0
(
π
0
,π
1
,β
l
,α
l
)
s.t.
S
a
(
π
0
,π
1
,β
l
,α
l
)
∈
(
maxSteps
−
1
,
maxSteps
]
(1)It turns out that the global optimization in Equation (1) can be approximated by
π
0
= 0
.
5
∧
π
1
= min
π
1
ASN(
π
0
,π
1
|
π
= 1
.
0)
≥
maxSteps (2)where
ASN(
·
,
·
)
(Average Sample Number) is the expected number of steps until the given test willyield a decision, if the real
π
= 1
.
0
. For details of the sequential analysis please consult [1].Note that sequential analysis formally requires i.i.d. variables which is clearly not the case in oursetting. However, we focus on loser configurations, which are always zero (ergo deterministic) andtherefore i.i.d. by construction. Also note that the true distribution of the trace matrix is complex andin general unknown. Our method should therefore be considered a first approximation with morerefined methods being the topic of future work.1
Add a Comment