You are on page 1of 18

Online Appendix for Bray (2021)

Rust’s Algorithm Under a General Sampling Density

Robert L. Bray

Kellogg School of Management, Northwestern University

August 1, 2021

Introduction

My argument seamlessly extends to this case in which the Monte Carlo sample is drawn from a
general density function. Regardless of the sampling distribution, Rust’s algorithm breaks the curse
of dimensionality only if the number of state variables that are meaningfully history dependent—i.e.,
contingent of the prior period’s state s or action a—is O(log(d)). However, the nature of this near
memorylessness varies with the sampling distribution. Under the uniform sampling distribution,
almost all state variables, ti , have a distribution that’s arbitrarily close to a uniform distribution
from almost all states s ∈ [0, 1]d . But under sampling density µd , almost all state variables, ti , have
a distribution that’s arbitrarily close to the conditional marginal sampling distribution, µid (ti |t<i ),
from almost all states s ∈ [0, 1]d .

Analysis

I will now derive the analog of each of Bray’s (2021) results under general sampling density µd .
Suppose the elements of md = {mid }bi=1
d
are drawn from a distribution with general density function
µd , which has full support over [0, 1]d .
In this case, Rust’s random Bellman operator would be

i µ
Pbd i
i=1 V (md )pda (md |s)
(Γ̂bµ
d V )(s) ≡ max uda (s) + β ,
a∈ a Pbd µ i
i=1 pda (md |s)

where pµda (t|s) ≡pda (t|s)/µd (t).

1
We will call this operator’s fixed point, V̂dbµ , the Rust value function under sampling density µ.
Note that we can also express the standard Bellman operator in terms of pµda and µd :
Z
(Γd V )(s) = max uda (s) + β V (t)pµda (t|s)µd (dt),
a∈ a t∈[0,1]d

where µd (dt) ≡µd (t)λd (dt)

We will now re-express our definitions, assumptions, and propositions in terms of µd .

Definition 7. A sequence of dynamic programs is strongly Rust solvable under sampling density
µ if for all  > 0 there exists b ∈ b such that supd∈N E(||V̂dbµ − Vd ||) < .

Definition 8. A sequence of dynamic programs is weakly Rust solvable under sampling density µ
if for all  > 0 there exists b ∈ b such that supd∈N E(||V̂dbµ − Vd ||1 ) < .

Definition 9. The ith state variable is -dependent under sampling density µ if ||Vdµ,−i − Vd ||1 < ,
where Vdµ,−i is the value function under density function pµ,−i i i
da (t|s) ≡ pda (t|s)µd (ti |t<i )/pd (ti |t<i ).

Definition 10. A sequence of dynamic programs is nearly memoryless under sampling density µ
if for all  > 0, the number of state variables that are not -dependent under sampling density µ is
O(log(d)) as d → ∞.

Definition 11. A sequence of dynamic programs whose Rust value functions under sampling density

µ are {V̂ d }d∈N Rust -approximates under sampling density µ a sequence of dynamic programs

with value functions {Vd }d∈N if there exists b ∈ b for which supd∈N E(||V̂ d − Vd ||1 ) < .

Assumption 7. The scaled transition density function is Lipschitz continuous in its second argu-
ment: For each d ∈ N and t ∈ [0, 1]d , there exists Lipschitz constant `µd (t) ∈ R+ such that
|pµ µ
maxa∈a supr∈[0,1]d sups∈[0,1]d \r da (t|s)−pda (t|r)|
µ
`d (t)||s−r||2
≤ 1.

Assumption 8. The square integral of the scaled transition density Lipschitz function with re-
spect to measure µd is bounded by a polynomial function of d: There exists Lµ ∈ b such that
supd∈N t∈[0,1]d `µd (t)2 µd (dt)/Lµd ≤ 1.
R

Assumption 9. At the origin, the scaled transition density function is bounded by a polynomial
function of d: there exists M µ ∈ b such that maxa∈a supd∈N supt∈[0,1]d pµda (t|0)/Mdµ ≤ 1.

Assumption 10. The scaled transition density function is bounded by a polynomial function of d:
there exists M µ ∈ b such that maxa∈a supd∈N sups,t∈[0,1]d pµda (t|s)/Mdµ ≤ 1.

2
Proposition 1. Any sequence of dynamic programs that satisfies Assumptions 1, 2, 7, 8, and 9 is
strongly Rust solvable under sampling density µ.

Proposition 2. Any sequence of dynamic programs that satisfies Assumptions 1, 2, and 10 is


weakly Rust solvable under sampling density µ.

Proposition 3. Every sequence of dynamic programs that satisfies Assumptions 1, 2, 7, 8, and 9


can be -approximated by a sequence that satisfies Assumptions 1, 2, and 10, for all  > 0.

Proposition 4. Any sequence of dynamic programs that satisfies Assumptions 1, 2, and 10 is


nearly memoryless under sampling density µ.

Proposition 5. Any sequence of dynamic programs that (i) is weakly Rust solvable under sampling
density µ and (ii) satisfies Assumptions 1 and 2 can be Rust -approximated under sampling density
µ by a sequence of dynamic programs that is nearly memoryless under sampling density µ, for all
 > 0.

Proofs

Proposition 1. Any sequence of dynamic programs that satisfies Assumptions 1, 2, 7, 8, and 9 is


strongly Rust solvable under sampling density µ.

Proof. For d ∈ N, V ∈ Vd , a ∈ a, and b ∈ b define

i µ
Pbd i
i=1 V (md )pda (md |s)
(Γ̂bµ
da V )(s) ≡uda (s) + β Pbd µ ,
i
i=1 pda (md |s)
bd
(Γ̃bµ V (mid )pµda (mid |s)/bd ,
X
da V )(s) ≡uda (s) + β
i=1
bd

gdi V (mid )pµda (mid |s)/ bd ,
X p
and Zda (s) ≡β
i=1

bd
where {gdi }i=1 is a set of independent standard normal random variables. Since E((Γ̃bµ
da V )(s)) =
(Γda V )(s), where Γda is defined in the proof of Proposition 1, Pollard’s (1989) seventh equation
implies that

2 2π  
E ||Γ̃bµ
da V − Γda V || ≤√ E sup Zda bµ
(s)2 . (1)
bd s∈[0,1]d

3
We will now bound the expectation on the right. First, note that

bd
bµ bµ 2 2
|V (mid )(pµda (mid |t) − pµda (mid |s))| /bd
X
E(|Zda (t) − Zda (s)| : md ) =β 2
i=1
 bd
2 X
βKd ||t − s||2
≤ `µd (mid )2 /bd . (2)
1−β
i=1

This expression implies that


v
u bd
βKd uX µ
δdm ≡ t d`d (mid )2 /bd
1−β
i=1
q
bµ bµ 2
≥ sup E(|Zda (t) − Zda (s)| : md ). (3)
s,t∈[0,1]d

f (x)
Now for a given x > 0, divide [0, 1]d into f (x) ≡ dδdm /(2x)ed equally sized cubes, and define {cix }i=1
as the center points of these cubes. By design, these center points satisfy

x(1 − β)
sup min ||s − cix ||2 ≤ qP ,
s∈[0,1]d i∈{1,··· ,f (x)} βKd bd µ i 2
i=1 `d (md ) /bd

which with (2) implies that

bµ bµ i 2
sup min E(|Zda (s) − Zda (cx )| : md ) ≤ x2 . (4)
s∈[0,1]d i∈{1,··· ,f (x)}

Now combining (3) and (4) with Pollard’s (1989) eighth equation yields
s   q
bµ 2 bµ
E sup Zda (s) : md − E(Zda (0)2 : md )
s∈[0,1]d
Z pδdm
≤C log(f (x))dx
0
√ Z 1q
=Cδdm d

log 1/u du
0

=Cδdm dπ/2,

where C ≥ 1 is a universal constant that’s independent of all model parameters. And since x−y ≤ z

4
implies x2 ≤ 2y 2 + 2z 2 for all x, y, z ∈ R, this implies that

    
bµ bµ
E sup Zda (s)2 = E E sup Zda (s)2 : md
s∈[0,1]d s∈[0,1]d
 

≤ E 2 E(Zda (0)2 : md ) + (Cδdm )2 dπ
 βK M µ 2  dβCK 2
d d d
≤2 +π E(`µd (mid )2 )
1−β 1−β
 βK 2
d
≤ (2(Mdµ )2 + πd2 C 2 Lµd ).
1−β

Now combining this with (1) and applying Jensen’s inequality yields for all V ∈ Vd
q
2
E(||Γ̃bµ
da V − Γda V ||) ≤ E ||Γ̃bµ
da V − Γ da V ||
s
2π  bµ

≤ √ E sup Zda (s)2
bd s∈[0,1]d

2πβKd
q
≤ 1/4 (Mdµ )2 + d2 C 2 Lµd . (5)
bd (1 − β)

Next, define constant function ιd ∈ Vd , where ιd (t) = Kd /(1 − β) for all t ∈ [0, 1]d . With this,
(5) yields

bd
!
  Pbd V (mi )pµ (mi |s)
E(||Γ̂bµ Γ̃bµ pµda (mid |s)/bd
X
da Vd − da Vd ||) =E sup 1− β i=1
Pbd µ
d da d
p (mi |s)
s∈[0,1]d i=1 i=1 da d
bd
!
βKd
pµda (mid |s)/bd
X
≤ E sup 1−
1−β s∈[0,1]d i=1

||Γ̃bµ

=E − Γda ιd ||
da ιd
2πβKd
q
≤ 1/4 (Mdµ )2 + d2 C 2 Lµd .
bd (1 − β)

5
Combining this with (5) yields

E ||Γ̂bµ bµ
 
d Vd − Vd || = E ||Γ̂d Vd − Γd Vd ||

E ||Γ̂bµ
X 
≤ da Vd − Γda Vd ||
a∈ a
E ||Γ̂bµ bµ bµ
X  
≤ da Vd − Γ̃da Vd || + E ||Γ̃da Vd − Γda Vd ||
a∈ a
4|a|πβKd
q
≤ 1/4
(Mdµ )2 + d2 C 2 Lµd .
bd (1 − β)

And with this, Rust’s (1997) Lemma 2.2 establishes that

E ||V̂dbµ − Vd || ≤ E ||Γ̂bµ
 
d Vd − Vd || /(1 − β)
4|a|πβKd
q
≤ 1/4 (Mdµ )2 + d2 C 2 Lµd ,
bd (1 − β) 2

which is smaller than  when bd is larger than



a
4| |πβKd
q 4
(Mdµ )2 + d2 C 2 Lµd .
(1−β)2

Proposition 2. Any sequence of dynamic programs that satisfies Assumptions 1, 2, and 10 is


weakly Rust solvable under sampling density µ.

Proof. Assumptions 2 and 10 ensure that |V (t)pµda (t|s)| ≤ Kd Mdµ /(1 − β) for all V ∈ V. Hence, for
V ∈ V, Popoviciu’s inequality implies that Var(V (mid )pµda (mid |s)) ≤ (Kd Mdµ )2 /(1 − β)2 , and thus
that

bd
X  (Kd Mdµ )2
Var V (mid )pµda (mid |s)/bd ≤ .
bd (1 − β)2
i=1

Accordingly, Chebyshev’s inequality establishes, for a given δ > 0 and V ∈ V, that

(Kd Mdµ )2
Pr(ydb (s) = 1) ≤ ,
δ 2 bd (1 − β)2
( b )
d Z
yd (s) ≡1 i µ
X
b i
where V (md )pda (md |s)/bd − Vd (t)pda (t|s)λd (dt) > δ .
i=1 t∈[0,1]d

6
And this implies for V ∈ V that


˜ V )(s) − (Γda V )(s)|)
E(|(Γda
bd Z !
(mid )pµda (mid |s)/bd
X
=β E V − Vd (t)pda (t|s)λd (dt)
i=1 t∈[0,1]d
bd Z !
V (mid )pµda (mid |s)/bd −
X
≤ Pr(ydb (s) = 0) E Vd (t)pda (t|s)λd (dt) : ydb (s) = 0
i=1 t∈[0,1]d
bd Z !
V (mid )pµda (mid |s)/bd −
X
+ Pr(ydb (s) = 1) E Vd (t)pda (t|s)λd (dt) : ydb (s) = 1
i=1 t∈[0,1]d

(Kd Mdµ )2
≤1 · δ + 2 · 2Kd Mdµ /(1 − β)
δ bd (1 − β)2
2(Kd Mdµ )3
=δ + 2 ,
δ bd (1 − β)3

where Γ̃bµ
da is defined in the proof of Proposition 1 and Γda is defined in the proof of Proposition 1.
Hence, for V ∈ V we have

2(Kd Mdµ )3
E(||Γ̃bµ
da V − Γda V ||1 ) ≤ δ + . (6)
δ 2 bd (1 − β)3

And this, in turn, implies that

bd
!
Z   Pbd V (mi )pµ (mi |s)
d
E(||Γ̂bµ Γ̃bµ pµda (mid |s)/bd
X
da Vd − da Vd ||1 ) =E 1− β i=1
Pbd µ
d da d
λd (ds)
p (mi |s)
s∈[0,1]d i=1 i=1 da d
bd
Z !
βKd
pµda (mid |s)/bd λd (ds)
X
≤ E 1−
1−β s∈[0,1]d i=1

||Γ̃bµ

=E da ιd − Γda ιd ||1
2(Kd Mdµ )3
≤δ + (7)
δ 2 bd (1 − β)3

7
where Γ̂bµ
da and ιd are defined in the proof of Proposition 1. Now combining (6) and (7) yields

E ||Γ̂bµ bµ
 
d Vd − Vd ||1 = E ||Γ̂d Vd − Γd Vd ||1

E ||Γ̂bµ
X 
≤ da Vd − Γda Vd ||1
a∈ a
E ||Γ̂bµ bµ bµ
X  
≤ da Vd − Γ̃da Vd ||1 + E ||Γ̃da Vd − Γda Vd ||1
a∈ a
4|a|(Kd Mdµ )3
≤2|a|δ + .
δ 2 bd (1 − β)3

And with this, Rust’s (1997) Lemma 2.2 establishes that

E ||V̂dbµ − Vd ||1 ≤ E ||Γ̂bµ


 
d Vd − Vd ||1 /(1 − β)
4|a|(Kd Mdµ )3
≤2|a|δ/(1 − β) + 2 ,
δ bd (1 − β)4

(1−β) a
8| |(Kd Mdµ )3
which is less than  when δ < a
4| | and bd > δ 2 (1−β)4
.

Proposition 3. Every sequence of dynamic programs that satisfies Assumptions 1, 2, 7, 8, and 9


can be -approximated by a sequence that satisfies Assumptions 1, 2, and 10, for all  > 0.
2
Proof. First, choose  > 0 small enough so that t∈[0,1]d `µd (t)µd (dt) > δd ≡ 2(1−β)
R
aβKd d
√ . Second, set

γd ∈ R such that t∈[0,1]d max(0, `µd (t) − γd )µd (dt) = δd . Third, define probability density function
R

1 `µd (t) ≥ γd

ωdµ (t) ≡R .
r∈[0,1]d 1 `d (r) ≥ γd µd (dr)
 µ

8
Fourth, Jensen’s inequality yields
Z Z
`µd (t)2 µd (dt) 1 `µd (t) ≥ γd `µd (t)2 µd (dt)


t∈[0,1]d t∈[0,1]d
Z Z
ωdµ (t)`µd (t)2 µd (dt) 1 `µd (r) ≥ γd µd (dr)

=
t∈[0,1]d r∈[0,1]d
Z !2 Z
ωdµ (t)`µd (t)µd (dt) 1 `µd (r) ≥ γd µd (dr)


t∈[0,1]d r∈[0,1]d
!2 Z
max(0, `µd (t) − γd )µd (dt)
R
t∈[0,1]d
1 `µd (r) ≥ γd µd (dr)

= γd + R
r∈[0,1]d 1 `d (r) ≥ γd µd (dr)
 µ
r∈[0,1]d
!2 Z
δ
 µ d 1 `µd (r) ≥ γd µd (dr)

= γd + R
r∈[0,1]d 1 `d (r) ≥ γd µd (dr) r∈[0,1]d

≥ min (γd + δd /y)2 y


y∈ R+
=2γd δd ,

which with Assumption 8 implies

γd ≤ Lµd /(2δd ).

Fifth, define probability density function


Z
pµda (t|s) ≡pµda (t|s) +1− pµda (r|s)µd (dr),
r∈[0,1]d

pµda (t|s) ≡ max pµda (t|0) + dγd , pµda (t|s) .

where

This density function satisfies Assumption 10, since

pµda (t|s) ≤pµda (t|s) + 1



≤pµda (t|0) + dγd + 1

≤Mdµ + dLµd /(2δd ) + 1
aβKd Lµd d
=Mdµ + + 1,
(1 − β)2

9
which is polynomially bounded. Also, by design we have
Z
|pµda (t|s) − pµda (t|s)|µd (dt)
t∈[0,1]d
Z Z
= pµda (t|s) +1− pµda (r|s)µd (dr) − pµda (t|s) µd (dt)
t∈[0,1]d r∈[0,1]d
Z Z
pµda (t|s) pµda (r|s) − pµda (r|s) µd (dr) − pµda (t|s) µd (dt)

= +
t∈[0,1]d r∈[0,1]d
Z Z
≤ |pµda (t|s) − pµda (t|s)|µd (dt) + |pµda (r|s) − pµda (r|s)|µd (dr)
t∈[0,1]d r∈[0,1]d
Z
=2 |pµda (t|s) − pµda (t|s)|µd (dt)
t∈[0,1]d
Z √
max 0, `µd (t)||t|| −

≤2 dγd µd (dt)
t∈[0,1]d
√ Z
max 0, `µd (t) − γd ) µd (dt)

≤2 d
t∈[0,1]d

=2 dδd . (8)

Now let Γµd represent the Bellman operator under density function pµda . With (8), we find that this
operator satisfies

||Γµd Vd − Vd || =||Γµd Vd − Γd Vd ||
Z
Vd (t)(pµda (t|s) − pµda (t|s))µd (dt)
X
≤ sup β
a∈ a s∈[0,1]d
Z
t∈[0,1]d
βKd X
≤ sup |pµ (t|s) − pµda (t|s)|µd (dt)
1 − β a∈a s∈[0,1]d t∈[0,1]d da
βKd X √
≤ sup 2 dδd
1 − β a∈a s∈[0,1]d

2aβδd Kd d
=
1−β
=(1 − β).

And with this, Rust’s (1997) Lemma 2.2 establishes that ||V µd − Vd || ≤ , where V µd is the fixed
point of Γµd .

Proposition 4. Any sequence of dynamic programs that satisfies Assumptions 1, 2, and 10 is


nearly memoryless under sampling density µ.

10
q
Proof. Pinsker’s inequality establishes that ||λ1 − pida (·|s, t<i )||1 ≤ 2κ(pida (·|s, t<i ), λ1 ). And this
implies that
Z
p−i
da (t<i , t≥i |s) − pda (t<i , t≥i |s) λd−i+1 (dt≥i )
t≥i ∈[0,1]d−i+1
Z Z
= p−i
da (t<i , ti , t≥i+1 |s) − pda (t<i , ti t≥i+1 |s) λd−i (dt≥i+1 )λ1 (dti )
t ∈[0,1] t≥i+1 ∈[0,1]d−i
Zi Z
= 1− pida (ti |s, t<i ) p−i
da (t<i , ti , t≥i+1 |s)λd−i (dt≥i+1 )λ1 (dti )
ti ∈[0,1] t≥i+1 ∈[0,1]d−i
R
Z
t≥i+1 ∈[0,1]d−i pda (t<i , ti , t≥i+1 |s)λd−i (dt≥i+1 )
= 1 − pida (ti |s, t<i ) λ1 (dti )
ti ∈[0,1] pida (ti |s, t<i )
p<i+1 (t<i , ti )
Z
= 1 − pida (ti |s, t<i ) di λ1 (dti )
ti ∈[0,1] pda (ti |s, t<i )
Z
<i
=pd (t<i ) 1 − pida (ti |s, t<i ) λ1 (dti )
ti ∈[0,1]

=p<i i
d (t<i )||λ1 − pda (·|s, t<i )||1
q
≤p<i
d (t<i ) 2κ(pida (·|s, t<i ), λ1 ).

And with this, Jensen’s inequality yields


Z
||p−i
da (·|s) − pda (·|s)||1 = p−i
da (t|s) − pda (t|s) λd (dt)
t∈[0,1]d
Z Z
= p−i
da (t<i , t≥i |s) − pda (t<i , t≥i |s) λd−i+1 (dt≥i )λi−1 (dt<i )
t ∈[0,1]i−1 t≥i ∈[0,1]d−i+1
Z <i q
≤ p<i
d (t<i ) 2κ(pida (·|s, t<i ), λ1 )λi−1 (dt<i )
t<i ∈[0,1]i−1
s Z
≤ 2 p<i i
d (t<i )κ(pda (·|s, t<i ), λ1 )λi−1 (dt<i )
t<i ∈[0,1]i−1
q
= 2 E κ(pida (·|s, t<i ), λ1 ) .


N such that
Pd
κ(pida (·|s, t<i ), λ1 ) <

Next, Lemma 1 implies that there exists m, n ∈ i=1 E
m + n log(d), for all d ∈ N. Since κ(pida (·|s, t<i ), λ1 ) ≥ 0, this implies that for a given γ >

q
0 the inequality 2 E κ(pida (·|s, t<i ), λ1 ) ≤ γ holds for at least d − 2(m + n log(d))/γ 2 values


of i. And with the result above, this implies that ||p−i


da (·|s) − pda (·|s)||1 < γ holds for at least
d − 2(m + n log(d))/γ 2 values of i. Now define Ωida = {s ∈ [0, 1]d : ||p−i
da (·|s) − pda (·|s)||1 < γ} as the
set points that satisfy this inequality for a given i ∈ {1, · · · , d}. The Lebesgue measures of these

11
sets satisfy

d d Z
1 s ∈ Ωida λd (ds)
X X
λd (Ωida )

=
i=1 i=1 s∈[0,1]d
Z d
1 s ∈ Ωida λd (ds)
X 
=
s∈[0,1]d i=1
Z
d − 2(m + n log(d))/γ 2 λd (ds)


s∈[0,1]d

=d − 2(m + n log(d))/γ 2 .

Since λd (Ωida ) ≤ 1 this implies that

λd (Ωida ) ≥ 1 − δ (9)

holds for at least d − 2(m + n log(d))/(δγ 2 ) values of i. Thus, for i that satisfies (10), we have

Z Z
||Γ−i
da Vd − Γda Vd ||1 = uda (s) + β V (t)p−i
da (t|s)λd (dt)
s∈[0,1]d t∈[0,1]d
Z
− uda (s) − β V (t)pda (t|s)λd (dt) λd (ds)
t∈[0,1]d
Z Z
=β Vd (t)(p−i
da (t|s) − pda (t|s))λd (dt) λd (ds)
s∈[0,1]d t∈[0,1]d
Z
≤β||Vd || ||p−i
da (·|s) − pda (·|s)||1 λd (ds)
s∈[0,1]d
Z
≤β||Vd || ||p−i
da (·|s) − pda (·|s)||1 λd (ds)
s∈Ωida
Z !
||p−i

+ da (·|s)||1 + ||pda (·|s)||1 λd (ds)
s∈[0,1]d \Ωida
Z Z !
≤β||Vd || γλd (ds) + 2λd (ds)
s∈Ωida s∈[0,1]d \Ωida

=β||Vd || γλd (Ωida ) + 2(1 − λd (Ωida ))




≤β(Kd /(1 − β))(γ + 2δ),

where Γda is the action-a Bellman operator defined in the proof of Proposition 1, and Γ−i
da is the

12
analogous operator under p−i
da . Thus, for i that satisfies (10), we have

||Γ−i −i
d Vd − Vd ||1 =||Γd Vd − Γd Vd ||1
X
≤ ||Γ−i
da Vd − Γda Vd ||1
a∈ a
≤|a|β(Kd /(1 − β))(γ + 2δ),

where Γ−i −i
d is the analog of Γd under pd . And with this, Rust’s (1997) Lemma 2.2 implies that
||Vd−i − Vd ||1 < |a|βKd (γ+2δ)/(1−β)2 holds for i that satisfies (10). Hence, setting γ = δ = 4| (1−β) 2

3a|βKd
,
|a|βKd

we find that ||Vd−i − Vd ||1 <  holds for at least d − 2(m + n log(d))/(δγ 2 ) = d − 128 (1−β) 2 (m +
n log(d)) values of i.

Lemma 1. Assumption 10 implies that there exist m, n ∈ N such that


Pd
E(κ(pid (·|t<i ),µid (·|t<i ))
maxa∈a supd∈N sups∈[0,1]d i=1
m+n log(d) < 1.

Proof. The following implies the result:

 
sup pµda (t|s) = exp sup log(pµda (t|s))
t∈[0,1]d t∈[0,1]d
Z 
≥ exp log(pµda (t|s))pda (t|s)λd (dt)
t∈[0,1]d
d Z
X  pi (t |s, t )  
da i <i
= exp log p da (t|s)λ d (dt)
i=1 t∈[0,1]
d µid (ti |t<i )
Xd

= exp E κ(pida (·|s, t<i ), µid (·|t<i )) .
i=1

Proposition 4. Any sequence of dynamic programs that satisfies Assumptions 1, 2, and 10 is


nearly memoryless under sampling density µ.
q
Proof. Pinsker’s inequality establishes that ||pida (·|s, t<i ) − µid (·|t<i )||1 ≤ 2κ(pida (·|s, t<i ), µid (·|t<i )).

13
And this implies that
Z
pµ,−i
da (t<i , t≥i |s) − pda (t<i , t≥i |s) λd−i+1 (dt≥i )
t≥i ∈[0,1]d−i+1
Z
= µid (ti |t<i )/pida (ti |s, t<i ) − 1 pda (t<i , ti , t≥i+1 |s)λd−i+1 (dt≥i )
t≥i ∈[0,1]d−i+1
Z
= µid (ti |t<i )/pida (ti |s, t<i ) − 1 p<i+1
d (t<i , ti )λ1 (dti )
ti ∈[0,1]
Z
=p<i
d (t<i ) µid (ti |t<i ) − pida (ti |s, t<i ) λ1 (dti )
ti ∈[0,1]

=p<i i
d (t<i )||µd (·|t<i ) − pida (·|s, t<i )||1
q
≤p<i
d (t<i ) 2κ(pida (·|s, t<i ), µid (·|t<i )).

And with this, Jensen’s inequality yields


Z
||pµ,−i
da (·|s) − pda (·|s)||1 = pµ,−i
da (t|s) − pda (t|s) λd (dt)
t∈[0,1]d
Z q
≤ p<i
d (t<i ) 2κ(pida (·|s, t<i ), µid (·|t<i ))λi−1 (dt<i )
t<i ∈[0,1]i−1
s Z
≤ 2 p<i i i
d (t<i )κ(pda (·|s, t<i ), µd (·|t<i ))λi−1 (dt<i )
t<i ∈[0,1]i−1
q
= 2 E κ(pida (·|s, t<i ), µid (·|t<i )) .


Next, Lemma 1 implies that there exists m, n ∈ N such that


Pd
κ(pida (·|s, t<i ), µid (·|t<i )) <

i=1 E
m + n log(d), for all d ∈ N. Since κ(pida (·|s, t<i ), µid (·|t<i )) ≥ 0, this implies that for a given γ > 0
q
the inequality 2 E κ(pida (·|s, t<i ), µid (·|t<i )) ≤ γ holds for at least d − 2(m + n log(d))/γ 2 values


of i. And with the result above, this implies that ||pµ,−i


da (·|s) − pda (·|s)||1 < γ holds for at least
d − 2(m + n log(d))/γ 2 values of i. Now define Ωµ,i d µ,−i
da = {s ∈ [0, 1] : ||pda (·|s) − pda (·|s)||1 < γ}
as the set points that satisfy this inequality for a given i ∈ {1, · · · , d}. The Lebesgue measures of

14
these sets satisfy

d d Z
λd (Ωµ,i 1 s ∈ Ωµ,i
X X 
da ) = da λd (ds)
i=1 i=1 s∈[0,1]d
Z d
1 s ∈ Ωµ,i
X 
= da λd (ds)
s∈[0,1]d i=1
Z
d − 2(m + n log(d))/γ 2 λd (ds)


s∈[0,1]d

=d − 2(m + n log(d))/γ 2 .

Since λd (Ωµ,i
da ) ≤ 1 this implies that

λd (Ωµ,i
da ) ≥ 1 − δ (10)

holds for at least d − 2(m + n log(d))/(δγ 2 ) values of i. Thus, for i that satisfies (10), we have

Z Z
||Γµ,−i
da Vd − Γda Vd ||1 = uda (s) + β V (t)pµ,−i
da (t|s)λd (dt)
s∈[0,1]d t∈[0,1]d
Z
− uda (s) − β V (t)pda (t|s)λd (dt) λd (ds)
t∈[0,1]d
Z Z
=β Vd (t)(pµ,−i
da (t|s) − pda (t|s))λd (dt) λd (ds)
s∈[0,1]d t∈[0,1]d
Z
≤β||Vd || ||pµ,−i
da (·|s) − pda (·|s)||1 λd (ds)
s∈[0,1]d
Z
≤β||Vd || ||pµ,−i
da (·|s) − pda (·|s)||1 λd (ds)
s∈Ωµ,i
da
Z !
||pµ,−i

+ da (·|s)||1 + ||pda (·|s)||1 λd (ds)
s∈[0,1]d \Ωµ,i
da
Z Z !
≤β||Vd || γλd (ds) + 2λd (ds)
s∈Ωµ,i
da s∈[0,1]d \Ωµ,i
da

=β||Vd || γλd (Ωµ,i


da ) + 2(1 − λd (Ωµ,i 
da ))

≤β(Kd /(1 − β))(γ + 2δ),

where Γda is the action-a Bellman operator defined in the proof of Proposition 1, and Γµ,−i
da is the

15
analogous operator under pµ,−i
da . Thus, for i that satisfies (10), we have

||Γµ,−i
d Vd − Vd ||1 =||Γµ,−i
d Vd − Γd Vd ||1
X µ,−i
≤ ||Γda Vd − Γda Vd ||1
a∈ a
≤|a|β(Kd /(1 − β))(γ + 2δ),

where Γµ,−i
d is the analog of Γd under pµ,−i
d . And with this, Rust’s (1997) Lemma 2.2 implies
that ||Vdµ,−i − Vd ||1 < |a|βKd (γ + 2δ)/(1 − β)2 holds for i that satisfies (10). Hence, setting γ =
(1−β) 2 µ,−i
δ = 4| a |βKd , we find that ||Vd − Vd ||1 <  holds for at least d − 2(m + n log(d))/(δγ 2 ) = d −
|a|βKd
 3
128 (1−β) 2 (m + n log(d)) values of i.

Proposition 5. Any sequence of dynamic programs that (i) is weakly Rust solvable under sampling
density µ and (ii) satisfies Assumptions 1 and 2 can be Rust -approximated under sampling density
µ by a sequence of dynamic programs that is nearly memoryless under sampling density µ, for all
 > 0.

Proof. Define Ξµd as a general set with measure b−2 µd (dt) = b−2
R
d under sampling density µ: t∈Ξµ
d
d .
And use this to define the following probability transition density function:

pµb (t|s) ≡1 pµda (t|s) ≤ b2d pµda (t|s) + b2d 1 t ∈ Ξµd pµda (s),
 
da
Z
where pµda (s) ≡ 1 pµda (r|s) > b2d pµda (r|s)µd (dr).

r∈[0,1]d

This density function funnels all the mass that exceeds to b2d into Ξµd . Since it never exceeds 2b2d , this
density function satisfies Assumption 10. So Proposition 4 establishes that this density function
corresponds with a sequence of dynamic programs that’s nearly memoryless under sampling density
µ (given Assumptions 1 and 2).
Define the following as the set of points for which the new density function equals the old density
function:

d (s) ≡ t ∈ [0, 1] : pda (t|s) = pda (t|s) ∀ a ∈ a .


Ωµb
 d µb µ

16
The measure of this set under sampling density µ satisfies
Z Z XZ
1 pµda (r|s) > b2d µd (dt)
 
µd (dt) ≥1 − µd (dt) −
t∈Ωµ
d (s) t∈Ξµ
d a∈ a t∈[0,1]d
XZ
≥1 − b−2 −2
1 pµda (r|s) > b2d pµda (r|s)µd (dt)

d − bd
a∈ a Zt∈[0,1]d
pµda (r|s)µd (dt)
X
≥1 − b−2 −2
d − bd
a∈ a t∈[0,1]d

≥1 − (1 + |a|)/b2d .

The probability that this set contains all mid values satisfies
Z bd
 
Pr ∪bi=1
d
mid ⊂ Ωbd (s) = µd (dt)
t∈Ωµ
d (s)

≥(1 − (1 + |a|)/b2d )bd

= exp bd log(1 − (1 + |a|)/b2d )




= exp bd (−(1 + |a|)/b2d − (1 + |a|)2 /(2b4d ) − (1 + |a|)3 /(3b6d ) − · · · )




> exp(−(1 + |a|)/bd )

>1 − (1 + |a|)/bd . (11)

µb
Next, define Γ̂d as Rust’s random Bellman operator evaluated under density function pµb
da . Since
V̂dµb ∈ Vd , we have

µb
|(Γ̂d V̂dµb )(s) − (Γ̂µb µb
d V̂d )(s)| ≤ βKd /(1 − β). (12)

And if ∪bi=1
d
mid ⊂ Ωbd (s) then

µb
|(Γ̂d V̂dµb )(s) − (Γ̂µb µb
d V̂d )(s)| = 0. (13)

17
Now combining (11)–(13) yields

µb µb
E ||Γ̂d V̂dµb − V̂dµb ||1 = E ||Γ̂d V̂dµb − Γ̂µb µb 

d V̂d ||1
Z
µb
E |(Γ̂d V̂dµb )(s) − (Γ̂µb µb 
= d V̂d )(s)| λd (ds)
s∈[0,1]d
Z
βKd /(1 − β) 1 − Pr ∪µb i µb 
≤ i=1 md ⊂ Ωd (s) λd (ds)
d

s∈[0,1]d
βKd (1 + |a|)
Z
< λd (ds)
s∈[0,1]d bd (1 − β)
βKd (1 + |a|)
= .
bd (1 − β)

And with this, we can use Rust’s (1997) Lemma 2.2 to establishes that

µb µb βKd (1 + |a|)
E ||V̂ d − V̂dµb ||1 ≤ E ||Γ̂d V̂dµb − V̂dµb ||1 /(1 − β) <
 
. (14)
bd (1 − β)2

Now for a given  > 0 choose b ∈ b such that a


βKd (1+| |)
< /2 and E(||V̂dµb − Vd ||1 ) < /2. And
bd (1−β)2
µb µb µb 
E(||V̂dµb − Vd ||1 ) < .

with this, (14) yields E ||V̂ d − Vd ||1 ≤ E ||V̂ d − V̂d ||1 +

References
Bray, Robert L. 2021. A Comment on ”Using Randomization to Break the Curse of Dimensionality”.
Working Paper .

Pollard, David. 1989. Asymptotics via empirical processes. Statistical Science 4(4) 341–354.

Rust, John. 1997. Using randomization to break the curse of dimensionality. Econometrica 65(3)
487–516.

18

You might also like