You are on page 1of 39

Lecture 9: Property Testing for

Distributions

1
Problem Statement
• Let 𝒫 be a family of distributions on
???
• Given an unknown distribution , distinguish whether:
• , or
• is -far from .
???

2
Access to the Distribution
• Oracle access: iid samples .

3
Distance Between Distributions
Statistical distance (total variation distance):
For :

4
Statistical Distance vs.
• A distribution is a vector,
• Distance between vectors:

• Claim:

5
Statistical Distance vs.

Δ ( 𝑝,𝑞 ) =max|𝑝 ( 𝐴 ) − 𝑞 ( 𝐴 )| 1
≥ ⋅ ‖𝑝− 𝑞∥1= ∑ |𝑝 ( 𝑥 ) −𝑞 ( 𝑥 )|
𝐴⊆Ω 2 𝑥 ∈Ω

𝑞
𝑝
What event maximizes ?

0
Ω
6
Statistical Distance vs.

Δ ( 𝑝,𝑞 ) =max|𝑝 ( 𝐴 ) − 𝑞 ( 𝐴 )| 1
≥ ⋅ ‖𝑝− 𝑞∥1= ∑ |𝑝 ( 𝑥 ) −𝑞 ( 𝑥 )|
𝐴⊆Ω 2 𝑥 ∈Ω

Let .
1

𝑞
𝑝

0
Ω
7
Statistical Distance vs.

Δ ( 𝑝,𝑞 ) =max|𝑝 ( 𝐴 ) − 𝑞 ( 𝐴 )| 1
≤ ⋅ ‖𝑝− 𝑞∥1= ∑ |𝑝 ( 𝑥 ) −𝑞 ( 𝑥 )|
𝐴⊆Ω 2 𝑥 ∈Ω

Let .
1
Let be any event.
Then :
𝑞
𝑝

0
Ω
8
Why Test Properties of Distributions?
• Suppose 𝒜 is designed assuming
• Example:
• Randomness: uniformly random
• Noise: Gaussian
• When : guaranteed errs
• What if we’re not sure?
1. Test whether or is -far from
2. If tester says “ is -far from ”: abort
3. If tester says “”:

9
Testing Uniformity
• Question: is uniform on , or -far?

• Strategy:
1. Take samples
2. Count collisions: how many s.t. ?
3. Few collisions accept, lots of collisions reject

10
Basic Observation
• The uniform distribution minimizes the collision probability:

1 1

0 0

Uniform Far From Uniform

11
Collision Probability

• Lemma:

12
[ )]
2

∑(
Collision Probability Cauchy-Schwartz 1 1
≥ 𝑝 (𝑥 )−
𝑛 𝑥 ∈ [𝑛] 𝑛
• Lemma:

• Corollary:

13
Analyzing the Tester Threshold:
Accept iff
• Let indicate
• If :

• If :
uniform -far

14
Concentration Bound
Want:
• Let
• not independent!
• Claim: if for large enough, then

Chebyshev:

15
Bounding the Variance
• For a single indicator:

• Independence?
and are independent iff
• What’s left?

16
The Contribution of Triplets

• Triple collision:

17
Bounding the Variance
pairs triplets + quadruplets

≤𝜇 for triplets, 0 for quadruplets

18
How Small Is ?
Threshold:
Accept iff

uniform -far

19
How Small Is ?
• Prevent flip from “no” to “yes”:

• Prevent flip from “yes” to “no”:

• Setting we get both:

20
Overall Sample Complexity

• Can be improved to

21
Testing Identity to any Fixed
Distribution

22
Identity Testing
• Fix
• To test whether or :
1. “Discretize” the distributions into
2. Check whether by reduction to uniformity testing

23
Identity Testing for Discretized Distributions
:
If :
• Idea: “flatten” the distribution Map to where

1 1

4/5 4/5

3/5 3/5

2/5 2/5

1/5 1/5
0 0
𝑎𝑏 ( 𝑎 ,1 ) ( 𝑎 ,2 ) ( 𝑎 , 3 ) ( 𝑏 ,1 ) ( 𝑏 , 2 )
24
Identity Testing for Discretized Distributions
:
If :
• What is Map to where

𝑎𝑏 ( 𝑎 ,1 ) ( 𝑎 ,2 ) ( 𝑎 , 3 ) ( 𝑏 ,1 ) ( 𝑏 , 2 )

25
Identity Testing for Discretized Distributions
:
If :
• What is Map to where

𝑎𝑏 ( 𝑎 ,1 ) ( 𝑎 ,2 ) ( 𝑎 , 3 ) ( 𝑏 ,1 ) ( 𝑏 , 2 )

Note: doesn’t need


Sample complexity?
to be discretized!
26
Discretizing the Distribution
• Idea: round probabilities to multiples of
• How?

1 1

0 0
Ω Ω
27
Discretizing the Distribution Attempt #1: given ,
Let
W.p. : output
• Idea: round probabilities to multiples of Else: output
• How?
Pr [ 𝑖 ]=𝑞 ( 𝑖 ) ⋅
𝐹 (𝑞 )
𝑗 ( 𝛾 /𝑛 )
𝑞 ( 𝑖)
=𝑗 ( )
𝛾
𝑛
1 1

0 0
Ω Ω ⊥
28
Discretizing the Distribution
• Problem: very small values 0
• Example:

{
𝐹 ( 𝑞1 ) ( 𝑖 ) = 𝐹 ( 𝑞 2) ( 𝑖 ) = 0 , 𝑖∈ [ 𝑛 ]
1 ,𝑖=⊥
29
Discretizing the Distribution
• Solution:
• First “smooth” by mixing with the uniform distribution
• Ensures:
• No element has probability “too small”
• Statistical distance (roughly) preserved
• Then apply the rounding filter

30
Mixing With the Uniform Distribution
:
• Given :
• W.p. ½: output
• W.p. ½: output uniform element in
• for any
• What happens to ?

31
Applying the Rounding Filter
: set
• Given ,
• Let
• W.p. : output
• Else: output

32
Identity Testing
• Fix
• To test whether or :
1. “Discretize” the distributions into
2. Check whether by reduction to uniformity testing

33
Lower Bound on Uniformity
Testing

34
Testing Uniformity
• Question: is uniform on , or -far?

• Strategy:
1. Take samples where
2. Count collisions: how many s.t. ?
3. Few collisions accept, lots of collisions reject

35
Lower Bound for Uniformity
• Claim: samples required for testing uniformity with .

• Observation:
• Uniformity is preserved under name-changes

36
Label-Invariant Testers
• is label invariant if its decision is preserved under name-changes:
for any

• Any label-invariant property has a label-invariant tester:

37
Lower Bound on Uniformity
• Claim: samples required for testing uniformity with .
• Assume is label-invariant with
• sees a collision
• Let be uniform on
• sees a collision

38
End (Part II)

39

You might also like