Professional Documents
Culture Documents
Lecture9 DistributionTesting
Lecture9 DistributionTesting
Distributions
1
Problem Statement
• Let 𝒫 be a family of distributions on
???
• Given an unknown distribution , distinguish whether:
• , or
• is -far from .
???
2
Access to the Distribution
• Oracle access: iid samples .
3
Distance Between Distributions
Statistical distance (total variation distance):
For :
4
Statistical Distance vs.
• A distribution is a vector,
• Distance between vectors:
• Claim:
5
Statistical Distance vs.
Δ ( 𝑝,𝑞 ) =max|𝑝 ( 𝐴 ) − 𝑞 ( 𝐴 )| 1
≥ ⋅ ‖𝑝− 𝑞∥1= ∑ |𝑝 ( 𝑥 ) −𝑞 ( 𝑥 )|
𝐴⊆Ω 2 𝑥 ∈Ω
𝑞
𝑝
What event maximizes ?
0
Ω
6
Statistical Distance vs.
Δ ( 𝑝,𝑞 ) =max|𝑝 ( 𝐴 ) − 𝑞 ( 𝐴 )| 1
≥ ⋅ ‖𝑝− 𝑞∥1= ∑ |𝑝 ( 𝑥 ) −𝑞 ( 𝑥 )|
𝐴⊆Ω 2 𝑥 ∈Ω
Let .
1
𝑞
𝑝
0
Ω
7
Statistical Distance vs.
Δ ( 𝑝,𝑞 ) =max|𝑝 ( 𝐴 ) − 𝑞 ( 𝐴 )| 1
≤ ⋅ ‖𝑝− 𝑞∥1= ∑ |𝑝 ( 𝑥 ) −𝑞 ( 𝑥 )|
𝐴⊆Ω 2 𝑥 ∈Ω
Let .
1
Let be any event.
Then :
𝑞
𝑝
0
Ω
8
Why Test Properties of Distributions?
• Suppose 𝒜 is designed assuming
• Example:
• Randomness: uniformly random
• Noise: Gaussian
• When : guaranteed errs
• What if we’re not sure?
1. Test whether or is -far from
2. If tester says “ is -far from ”: abort
3. If tester says “”:
9
Testing Uniformity
• Question: is uniform on , or -far?
• Strategy:
1. Take samples
2. Count collisions: how many s.t. ?
3. Few collisions accept, lots of collisions reject
10
Basic Observation
• The uniform distribution minimizes the collision probability:
1 1
0 0
11
Collision Probability
• Lemma:
12
[ )]
2
∑(
Collision Probability Cauchy-Schwartz 1 1
≥ 𝑝 (𝑥 )−
𝑛 𝑥 ∈ [𝑛] 𝑛
• Lemma:
• Corollary:
13
Analyzing the Tester Threshold:
Accept iff
• Let indicate
• If :
• If :
uniform -far
14
Concentration Bound
Want:
• Let
• not independent!
• Claim: if for large enough, then
Chebyshev:
15
Bounding the Variance
• For a single indicator:
• Independence?
and are independent iff
• What’s left?
16
The Contribution of Triplets
• Triple collision:
17
Bounding the Variance
pairs triplets + quadruplets
18
How Small Is ?
Threshold:
Accept iff
uniform -far
19
How Small Is ?
• Prevent flip from “no” to “yes”:
20
Overall Sample Complexity
• Can be improved to
21
Testing Identity to any Fixed
Distribution
22
Identity Testing
• Fix
• To test whether or :
1. “Discretize” the distributions into
2. Check whether by reduction to uniformity testing
23
Identity Testing for Discretized Distributions
:
If :
• Idea: “flatten” the distribution Map to where
1 1
4/5 4/5
3/5 3/5
2/5 2/5
1/5 1/5
0 0
𝑎𝑏 ( 𝑎 ,1 ) ( 𝑎 ,2 ) ( 𝑎 , 3 ) ( 𝑏 ,1 ) ( 𝑏 , 2 )
24
Identity Testing for Discretized Distributions
:
If :
• What is Map to where
𝑎𝑏 ( 𝑎 ,1 ) ( 𝑎 ,2 ) ( 𝑎 , 3 ) ( 𝑏 ,1 ) ( 𝑏 , 2 )
25
Identity Testing for Discretized Distributions
:
If :
• What is Map to where
𝑎𝑏 ( 𝑎 ,1 ) ( 𝑎 ,2 ) ( 𝑎 , 3 ) ( 𝑏 ,1 ) ( 𝑏 , 2 )
1 1
0 0
Ω Ω
27
Discretizing the Distribution Attempt #1: given ,
Let
W.p. : output
• Idea: round probabilities to multiples of Else: output
• How?
Pr [ 𝑖 ]=𝑞 ( 𝑖 ) ⋅
𝐹 (𝑞 )
𝑗 ( 𝛾 /𝑛 )
𝑞 ( 𝑖)
=𝑗 ( )
𝛾
𝑛
1 1
0 0
Ω Ω ⊥
28
Discretizing the Distribution
• Problem: very small values 0
• Example:
{
𝐹 ( 𝑞1 ) ( 𝑖 ) = 𝐹 ( 𝑞 2) ( 𝑖 ) = 0 , 𝑖∈ [ 𝑛 ]
1 ,𝑖=⊥
29
Discretizing the Distribution
• Solution:
• First “smooth” by mixing with the uniform distribution
• Ensures:
• No element has probability “too small”
• Statistical distance (roughly) preserved
• Then apply the rounding filter
30
Mixing With the Uniform Distribution
:
• Given :
• W.p. ½: output
• W.p. ½: output uniform element in
• for any
• What happens to ?
31
Applying the Rounding Filter
: set
• Given ,
• Let
• W.p. : output
• Else: output
32
Identity Testing
• Fix
• To test whether or :
1. “Discretize” the distributions into
2. Check whether by reduction to uniformity testing
33
Lower Bound on Uniformity
Testing
34
Testing Uniformity
• Question: is uniform on , or -far?
• Strategy:
1. Take samples where
2. Count collisions: how many s.t. ?
3. Few collisions accept, lots of collisions reject
35
Lower Bound for Uniformity
• Claim: samples required for testing uniformity with .
• Observation:
• Uniformity is preserved under name-changes
36
Label-Invariant Testers
• is label invariant if its decision is preserved under name-changes:
for any
37
Lower Bound on Uniformity
• Claim: samples required for testing uniformity with .
• Assume is label-invariant with
• sees a collision
• Let be uniform on
• sees a collision
38
End (Part II)
39