Professional Documents
Culture Documents
Finding Important Variables and Interactions in Black Boxes Theme
Finding Important Variables and Interactions in Black Boxes Theme
$'
1 Quasi Regression and black boxes
$
2
Finding important
variables and interactions Theme
As dimension increases many numerical problems become
in black boxes more statistical.
& %& %
'
Quasi Regression and black boxes
$'
3 Quasi Regression and black boxes
$
4
Mortgage backed
Example: integration securities integrand
Z
I = f (x)dx Paskov & Traub, Caflisch, Morokoff, & Owen
(0;1)d
Y = Present value of 30 years of monthly cash flows.
Sampling Methods
Prepayment:
1=2
1. Monte Carlo: n 1. puts lumps into payment stream
2. Quasi-Monte Carlo: n 1
(log n)d 1
, but no practical 2. more common when interest rates are low
error estimate
MBS Model (from Goldman-Sachs)
3. Randomized Quasi-Monte Carlo: replication based error
estimates, and n 3=2
(log n)(d 1)=2 Y = f (X )
Rates are asymptotic under mild conditions on f
X U [0; 1℄360 ! Z = 1 (X )
Interest rates: r1 : : : r360 Geometric Brownian motion
& %& %
Also statistical: approximation
driven by Z
$'
5 Quasi Regression and black boxes
& %& %
n i=1
n u i=1
QMC xi very uniform in low dimensional projections
'
Quasi Regression and black boxes
$'
7 Quasi Regression and black boxes
$
8
& %& %
1,2,3 dimensional ANOVA effects
::: after numerical investigation Which variables are important?
exploiting symmetry and Gaussianity
Which interact?
'
Quasi Regression and black boxes
$'
9 Quasi Regression and black boxes
$
10
Examples X Y
Predict log10 (perf ) from the others
Semiconductors Device design Speed, heat perf published performance of computer
Automotive Auto Frame Strength, weight mmin minimum main memory in kilobytes
& %& %
Function found by training on
Kriging widely used Journel, Huijbreghts, Sacks, Ylvisaker,
Welch, Wynn, Mitchell
'
Quasi Regression and black boxes
$'
11 Quasi Regression and black boxes
$
12
2:82S 1:12 + 0:45x1 + 2:24x2 + 2:51x3 1:63x4 0:56x5 + 0:43x6
+ 3:17S 1:09 + 2:28x1 0:10x2 + 1:44x3 + 2:70x4 + 1:24x5 + 0:25x6
+ 0:39S 0:04 0:11x1 + 0:11x2 + 0:12x3 0:10x4 0:04x5 + 0:02x6
is a sigmoidal function
1. Nearly linear?
2. Nearly additive?
3. Nearly quadratic?
We would like:
1. a systematic approach
& %& %
2. that also predicts f
where S
'
Quasi Regression and black boxes
$'
13 Quasi Regression and black boxes
$
14
& %& %
EG: orthogonal polynomials, sinusoids, wavelets,
Apply graphical and numerical interpretation to fe Hermite( 1()), Chebychev(qbeta())
'
Quasi Regression and black boxes
$'
15 Quasi Regression and black boxes
$
16
& %& %
3
Order(r ) krk1 = max1jd r(j ) B1 4 2 Lin Lin Quad 3 d3
p = 1 + 3d + 3d(d 1) + (2=3)d(d 1)(d 2)
'
Quasi Regression and black boxes
$'
17 Quasi Regression and black boxes
$
18
Approximation through
Interpretation
P R
Variance of f is r6=0 r + (x) dx integration
2 2
P
Importance of S is r2S r
2
Define: Z (x) = ( 0 (x); : : : ; p 1 (x))T
P
r2S r Var(r )
Estimate by e2
e
& %& %
uses only first
'
Quasi Regression and black boxes
Regression and
$'
19 Quasi Regression and black boxes
$
20
Precursors of
quasi-regression
quasi-regression
Z 1Z
= Z (x)Z (x)T dx Z (x)f (x)dx Quasi-interpolation
Z Chui & Diamond, Wang
= Z (x)f (x)dx
Z Z ) to get fast approximate
by orthogonality
“Ignore the denominator” ( T
Observations interpolation.
& %& %
Quasi-Regression Owen 1992 describes quasi-regression for Latin hypercube
e =
1 ZT Y sampling
n
'
Quasi Regression and black boxes
sampling Define:
n
n1
Define: X
er(n) r (xi )f (xi )
i = Yi Zi i=1
P
= n i=1 Zi (Yi
n
Zi ) n
n1
Æp1 1 T X 2
P Sr(n) r (xi )f (xi ) er(n)
A p p = n T
n i=1 Zi Zi
1
I i=1
Then:
Now
" #
e = n1 Z T Y er(n) = er(n 1)
+1 n r (xi )f (xi ) er(n 1)
= 1 Z T (Z + )
n
Sr(n) = n 1 S (n 1)
+
n r
= Æ +A " #2
^ = (Z T Z ) Z T (Z + ) n 1
r (xi )f (xi ) e(n
1
r 1)
n2
= (Z T Z ) 1
ZT
& %& %
(n)
= (I + A) 1 Æ Chan, Golub, Leveque who use nSr
= (I A + A2 A3 )Æ E nn 1 Sr(n) = Var(er(n) )
=: Æ AÆ
'
Quasi Regression and black boxes
$'
23 Quasi Regression and black boxes
$
24
Presented as lack-of-fit:
Updatable accuracy
1 R2
estimates
Predict f (xn ) by fen 1 (xn ) xn indep of fen 1 LOF = ISE Ld
OF = AVG(f fe)2
V ar AVG(f e0 )2
Average recent squared errors
nm
[ (nm ) =
ISE
1 X
f (xi ) fei
2
(xi ) log10 (LOF ) R2
1
nm nm 1
i=nm 1 4 99:99%
on subsequence nm = m(m + 1)=2 3 99:9%
p 2 99%
estimates avg ISE over recent 2n values
Diagnostic:
1 90%
P
=) 0 0%
& %& %
r Var(r )
Large LOF and small
e need bigger basis
1 900%
'
Quasi Regression and black boxes
$'
25 Quasi Regression and black boxes
$
26
Incorporating shrinkage
Footprint
O(np4 )
O(np2 )
(Quasi-)regression
Quasi-reg allows larger n or much larger p
Costs of algebra
O(n2 + p2 )
[good luck]
fe
;n (x) =
r;n er;n r (x);
r;n 2 [0; 1℄
Kriging
O(p2 )
Space
r
O(p)
Easy
Optimally
r2
Dimension
O(n3 + p3 )
r;n =
High
High
r2 + Var(er;n )
Low
Low
O(np2 )
O(np)
Time
High
Low
Low
Quasi-regression
1)2
er(n
^r;n =
Regression
& %& %
e.g.
er(n 1)2
+ Sr(n 1)
Kriging
'
Quasi Regression and black boxes
Exploiting residuals
$'
27 Quasi Regression and black boxes
$
28
For r 6= 0: r (f ) = r (f
), for
2R
n
1X
Var r (xi )(f (xi )
) depends on c
n i=1 N-net example
Try
0 f (x) is prediction of log10 (perf )
More generally d = 6 r are Legendre polynomials
r are tensor products
n
1
n
X X
er(n) r (xi ) f (xi ) s;i 1 es(i s (xi )
1)
& %& %
Bounding
feedback
er and Sr
Still updatable
er(n)
NB: n ( r ) is a martingale in n
Quasi Regression and black boxes
'
&
'
Neural net results LOF
1000
Beta[0] (constant factor) is 2.0717
Sample size
Sample mean is 2.0719, sample variance is 0.14359
10000
Unbiased estimates of dimension variances
0.11441 0.026592 0.0027723 0.0 0.0 0.0
100000
Dimension Probabilities
(Ratios of dimension variances to sample variance)
$'
%&
$'
31
29
Quasi Regression and black boxes
52%
Biggest main effect: syct is
Var syct mmin mmax cach chmin delch total
Biggest interaction syctcach is 5:5%
% 0.520 0.011 0.088 0.131 0.037 0.009 0.797
%
$
%
$
32
30
'
Quasi Regression and black boxes
$'
33 Quasi Regression and black boxes
Caveats
$
34
necessarily causal
with no data
R
False negatives: (fe f )2 dx might be dominated by x
0.0 0.2 0.4 0.6 0.8 1.0 away from data. Small error and simple model might mask
poor fit in training region. (Easy to compare f and fe on
training data.)
Degree 1 2 3 4
Functions r and estimated anova components correlated
& %& %
Coef -0.272 -0.030 0.00242 .0000777
on empirical distribution
% of fe 51.38 0.630 0.00041 0.000004
Using product of empirical margins mitigates problem (only
slightly)
'
Quasi Regression and black boxes
CPU inputs
$'
35 Quasi Regression and black boxes
Biggest interaction
$
36
Cycle time Cache Size 5.5% of fe
• • •• • •
• • • • •
•••• •• • • • • • ••••
•••••••• • •••
••
• •• • ••• • • • •• •• •• • ••••••• •
• ••••• •• • ••••••••••••• ••••••••• • • •• •••••• •• • • • •••••••• ••••
0.6
•• •••••••••••••••• • • ••••••• •
•••
••• •••
•••
•••
•••• ••••• • ••
••
••• ••••••• •• •
•••••• • •••••••••••••••• ••
•• •••••••••••••••• •••• •
• ••••••••••• ••••
mmin •
• •••••• •••
• ••••••••••••••••
•••• ••••
••••••••• • ••••••
•••
••••••••
••
•••••• •
••••••••••••
•
•
••• • •• •• •• ••••••• •• • •••• • ••• •• •
•• • ••• ••• •••
0.0
• •
-0.1 00.10.2
•• ••• • • • •• • •• •
•••••••••••• • • ••• •• • • •••••• • • • • ••••• •• • •• • ••••••••• •
• •••••••••• ••••••••• •• ••• ••• ••• • •••• ••••
•••• • •••• •••• •• • • ••••••••• •• •
••••••••••• •••• •••
•••••••• • •••••• ••• ••••••
••• ••••••••••
•••• • • •••••••••••••••• • •• •••••••••••••••••••••• ••
0.6
4 -30.2
• • • • •
• • • • • • • • •
-0.-0.
0.6
• • • • •
••••• • • • •• ••• • • • • • • •• • cach • ••• • • • ••• •• •• •
• • • • •
•• ••••• • •••• • •••••• • ••••• • • ••••••• •
• •••••••••
•• •••••• • • • • ••• ••• •• ••••••••••••
• •• •••••••••••• • • • ••••••••• •
• •• ••••••• ••
0.0
• • • • •
• • • • •• • • chmin •• 0.4
•••••••••• • • • • ••• • • • •••••••••• ••• •• •• •• • ••• • •••••••••• • • • 0.6
•••• •••••• •
• •••••••••••••••••
• •• •••••••••••••••••
•••••••• •• •
•• •• ••• •••• ••• •••••••••••••••
•• •• •••••
•••
••••• ••••• ••
• ••• • •
•
•••• ••••••••••••••••••••••••••••
• 0.2 0.4
0.0
& %& %
• • • • • • 0.2
•• • ••• •• • •• • •• •
0.6
chdel
• •• • • ••••• •••• • ••• • •• •••• •
••• •• ••• ••••••• •• • •• •
•••••••••••••••••• ••
• • ••••••
•••••• • • •
• •••••••••••••• ••••••••••••••••••••••••••• • ••••• •••••••••••••• ••• •• • •••••••••• •••••• •••• •
••••
••••••• • •• ••••• • •••••••••
•••••• •••• ••••••••• • • ••••••••••••••• •• • •
0.0
Biggest interaction
$'
37 Quasi Regression and black boxes
Cycle time Main Memory Max 5.4% of fe
Cycle Time x Cache Size Interaction
1.0
0.8
0.6
0.04
cach
-0.0200.02
0.4
-0.04
-0.06
0.2
0.8
0.6
0.0
0.4 0.8
0.6
& %& %
0.2 0.4
0.0 0.2 0.4 0.6 0.8 1.0 0.2
syct
'
Quasi Regression and black boxes
$
40
N-net conclusions
0.8
2.
mmax
help
0.0
& %& %
0.0 0.2 0.4 0.6 0.8 1.0
syct
$'
41 Quasi Regression and black boxes
$
42
Next directions
1. Mars-like dynamic choice of basis Robot arm function
2. Comparisons of f and fe on training data Robot arm has 4 joints: Lengths Lj , angles j
4 j
X 4 j
X
4. Distinguishing f structure from fe artifacts X X
u= Lj
os k v= Lj sin k
5. More types of statistical/ML black boxes j =1 k=1 j =1 k=1
& %& %
10. Examples with noise ( unusable basis fns)