00 upvotes00 downvotes

0 views30 pagesMichael Importance Weighting

Feb 27, 2017

© © All Rights Reserved

PDF, TXT or read online from Scribd

Michael Importance Weighting

© All Rights Reserved

0 views

00 upvotes00 downvotes

Michael Importance Weighting

© All Rights Reserved

You are on page 1of 30

April 15, 2015

Introduction

Want to utilize information about test distribution

Correct bias or discrepancy between training and testing

distributions

1

Importance Weighting

Unlabeled test data from target distribution P

Weight the cost of errors on training instances.

Common denition of weight for point x: w(x) = P(x)/Q(x)

2

Importance Weighting

Can we give generalization bounds for this method?

When does DA work? When does it not work?

How should we weight the costs?

3

Overview

Preliminaries

Learning guarantee when loss is bounded

Learning guarantee when loss is unbounded, but second moment

is bounded

Algorithm

4

Preliminaries: Rnyi divergence

( )1

1 P(x)

D (P||Q) = log2 P(x)

1 x

Q(x)

[ ] 1

1

P (x)

D (P||Q)

d (P||Q) = 2 =

x

Q1 (x)

D (P||Q) = 0 iff P = Q

5

Preliminaries: Importance weights

Lemma 1:

Proof:

( P(x) )2

2 2

EQ [w ] = w (x)Q(x) = Q(x) = d2 (P||Q)

Q(x)

xX xX

1

EQ [w2 (x)L2h (x)] d+1 (P||Q)R(h)1

6

Preliminaries: Importance weights

1 1

Hlders Inequality (Jin, Wilson, and Nobel, 2014): Let p + q = 1, then

( ) p1 ( ) q1

p q

|ax bx | |ax | |bx |

x x x

7

Preliminaries: Importance weights

[ ]2 [ ]

2 P(x) 1 P(x) 1

ExQ [w (x)L2h (x)] = Q(x) L2h (x) = P(x) P(x) L2h (x)

x

Q(x) x

Q(x)

[ [ ] ] 1 [ ] 1

P(x) 2

P(x) P(x)Lh (x)

1

x

Q(x) x

[ +1

] 1

= d+1 (P||Q) P(x)Lh (x)Lh 1

(x)

x

1 1 1

d+1 (P||Q)R(h)1 B1+ = d+1 (P||Q)R(h)1

8

Learning Guarantees: Bounded case

P(x)

supx w(x) = supx Q(x) = d (P||Q) = M. Let d (P||Q) < +. Fix h H.

Then, for any > 0, with probability at least 1 ,

2

w (h)| M log

|R(h) R

2m

9

Overview

Preliminaries

Learning guarantee when loss is bounded

Learning guarantee when loss is unbounded, but second moment

is bounded

Algorithm

10

Learning Guarantees: Bounded case

least 1 , the following bound holds:

1 1 1 R(h)2 ] log 1

R(h) R w (h) + 2M log + 2[d+1 (P||Q)R(h)

3m m

11

Learning Guarantees: Bounded case

( n ) ( )

1 n2

Pr xi exp

n 2 2 + 2M/3

i=1

when |xi | M.

12

Learning Guarantees: Bounded case

Then |Z| M. Thus, by lemma 2, the variance of Z can be bounded in

terms of d+1 (P||Q):

1

2 (Z) = EQ [w2 (x)Lh (x)2 )] R(h)2 d+1 (P||Q)R(h)1 R(h)2

( )

m2 /2

Pr[R(h) R

w (h) > ] exp .

2 (Z) + M/3

13

Learning Guarantees: Bounded case

1

1

2M log M2 log2 1 2 2 (Z) log 1

R(h) Rw (h) +

+ +

3m 9m2 m

1

w (h) + 2M log + 2 2 (Z) log 1

=R

3m m

14

Learning Guarantees: Bounded case

Theorem 2: Let H be a nite hypothesis set. Then for any > 0, with

probability at least 1 , the following holds for the importance

weighting method:

w (h) + 2M(log|H| + log ) + 2d2 (P||Q)(log |H| = log

1 1

R(h) R

3m m

15

Learning Guarantees: Bounded case

in the case of = 1:

1 1

R(h) R w (h) + 2M log + 2d2 (P||Q) log

3m m

16

Learning Guarantees: Bounded case

Assume there exists h0 H such that Lh0 (x) = 1 for all x. There exists

an absolute constant c, c = 2/412 , such that

[ ]

d2 (P||Q) 1

Pr sup |R(h) Rw (h)|

c>0

hH 4m

17

Overview

Preliminaries

Learning guarantee when loss is bounded

Learning guarantee when loss is unbounded, but second moment

is bounded

Algorithm

18

Learning Guarantees: Unbounded case

Guassian distribution with P and Q with means and

[ ]

P(x) P Q2 (x )2 P2 (x )2

= exp

Q(x) Q 2P2 Q2

P(x)

Thus, even if P = Q and = , d (P||Q) = supx Q(x) = , thus

Theorem 1 is not informative.

19

Learning Guarantees: Unbounded case

+ [ ]

Q 2 2 (x )2 P2 (x )2 2

dw (P||Q) = 2 exp Q Q dx

P 2 2P2

20

Learning Guarantees: Unbounded case

But sample from Q only has few points far from

A few extreme sample points would have large weights

21

Learning Guarantees: Unbounded case

Pdim({Lh (x) : H H}) = p < . Assume that d2 (P||Q) < + and

w(x) = 0 for all x. Then for > 0, with probability at least 1 , the

following holds:

5/4

3 p log

2me

p + log

4

R(h) Rw (h) + 2

8

d2 (P||Q)

m

22

Learning Guarantees: Unbounded case

[ ]

h]

Pr suphH E[Lh ] E[L

2]

> 2 + log

1

E[L

[ h ]

Pr suphH,t Pr[Lh>t]Pr[L h >t]

>

h >t]

Pr[L

[ ] ( )

2

Pr suphH

R(h)R(h)

> 2 + log 4H (2m) exp m4

1

R(h)

[ ]

h (x)]

Pr sup h H 2

E[Lh (x)]E[L

> 2 + log 1

E[Lh (x)]

( )

m2

p 4

4 exp p log 2em

[ ] ( )

h (x)] m8/3

Pr suphH E[Lh (x)]E[L

2

> 4 exp p log 2em

p 45/3

E[Lh (x)]

(x)]|

h 3 p log 2me +log 8

5/4 2 2

2 max{ E[Lh (x)], E[Lh (x)]} 8 p

m 23

Learning Guarantees: Unbounded case

] ( )

[ R(h) R w (h) 2em m8/3

Pr sup > 4 exp p log 5/3 .

hH d2 (P||Q) p 4

H = {w(x)Lh (x) : h H}. Note, any set shattered by H is shattered

by H , since there exists a subset B of a set A that is shattered by H ,

such that H shatters A with witnesses si = ri /w(xi ).

24

Overview

Learning guarantee when loss is bounded

Learning guarantee when loss is unbounded, but second moment

is bounded

Algorithm

25

Alternative algorithms

u (h) = 1 m u(xi )Lh (xi ) and let Q

u : X 7 R, u > 0. Let R be the

m i=1

empirical distribution: Theorem 4: Let H be a hypothesis set such

that Pdim({Lh (x) : h H}) = p < . Assume that

0 < EQ [u2 (x)] < + and u(x) = 0 for all x. Then for any > 0 with

probability at least 1 ,

[ ]

|R(h) R

u (h) |EQ [w(x) u(x)]Lh (x) |

( ) p log 2me 4

p + log

3

+25/4 max

8

EQ [u2 (x)L2h (x)], EQ [u2 (x)L2h (x)]

m

26

Alternative algorithms

Minimize upper bound

( )

( )

max EQ [u ], EQ [u ] EQ [u2 ] 1 + O(1/ m) ,

2 2

[ ]

minuU E |w(x u(x)| + EQ [u2 ]

Trade-off between bias and variance minimization.

27

Alternative algorithms

28

Alternative algorithms

29

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.