Professional Documents
Culture Documents
Lecture2 FrequencyMoments
Lecture2 FrequencyMoments
𝑘 ≪ 𝑛 cells
CountMin & CountSketch
Estimate quality comparison: Frequency vector of 𝑛 ∖ 𝑖
• CountMin: w.h.p.,
𝑓𝑖 ≤ 𝑓መ𝑖 ≤ 𝑓1 + 𝜖 ∥ 𝑓−𝑖 ∥1
• CountSketch: w.h.p.,
𝑓መ𝑖 − 𝑓𝑖 ≤ 𝜖 ∥ 𝑓−𝑖 ∥2
+1 𝑘 ≪ 𝑛 cells
𝜎 = 𝑎1 , … , 𝑎𝑖 , … ℎ: 𝑛 → 𝑘
+1
𝑘 ≪ 𝑛 cells
Question:
How big should 𝑘 be for E 𝑓መ𝑖
to be “good”?
CountMin: 𝑓𝑖 ≤ 𝑓መ𝑖 ≤ 𝑓𝑖 + 𝜖 ∥ 𝑓−𝑖 ∥1
ℎ: 𝑛 → 𝑘
𝜎 = 𝑎1 , … , 𝑎𝑖 , …
+1 𝑘 ≪ 𝑛 cells
•ByAtMarkov:
the end: 𝐶 𝑗 = σ𝑖:ℎ 𝑖 =𝑗 𝑓𝑖
𝑘 = 2/𝜖
• Query
Pr 𝑓መ𝑖 − 𝑓𝑖 > 𝜖 𝑓መ𝑓
𝑖: return = 𝐶1 ℎ<𝑖1/2
𝑖 −𝑖 𝐸 𝑓መ𝑖 − 𝑓𝑖 = 2𝜖 𝑓−𝑖 1
Estimate Quality
• Single estimate 𝑓መ𝑖 :
• Always: 𝑓መ𝑖 ≥ 𝑓𝑖
• W.p. ½: 𝑓መ𝑖 ≤ 𝑓𝑖 + 𝜖 𝑓−𝑖 1
• To amplify the success probability:
• Compute independent 𝑓መ𝑖1 , … , 𝑓መ𝑖ℓ
• Output: 𝑓መ𝑖 = min{ 𝑓መ𝑖1 , … , 𝑓መ𝑖ℓ }
1 ℓ
Pr 𝑓መ𝑖 > 𝜖 𝑓−𝑖 1 <
2
⇒ ℓ = log 1/𝛿
CountMin: 𝑓𝑖 ≤ 𝑓መ𝑖 ≤ 𝑓1 + 𝜖 ∥ 𝑓−𝑖 ∥1
ℎ1 , ℎ2 , … , ℎℓ : 𝑛 → 𝑘
𝜎 = 𝑎1 , … , 𝑎𝑖 , …
+1
+1 1
ℓ = log
+1 𝛿
+1
𝑘 = 2/𝜖
CountSketch
ℎ: 𝑛 → 𝑘
𝜎 = 𝑎1 , … , 𝑎𝑖 , … 𝑟: 𝑛 → −1, +1
±1 𝑘 ≪ 𝑛 cells
• At the end: 𝐶 𝑗 = σ𝑖:ℎ 𝑖 =𝑗 𝑟
𝑖 𝑓𝑖
E = 0 Question:
What is E 𝐶 ℎ 𝑖 ?
E𝐶 ℎ 𝑖 = 𝑟 𝑖 ⋅ 𝑓𝑖
How does it relate to 𝑓𝑖 ?
By Chevyshev:
𝑓−𝑖 2 1
Pr 𝑓መ𝑖 − 𝑓𝑖 > 2 <
𝑘 4
Want: 𝑓መ𝑖 − 𝑓𝑖 ≤ 𝜖 𝑓−𝑖 2 Set 𝑘 = Θ 1/𝜖 2
CountSketch: 𝑓መ𝑖 − 𝑓𝑖 ≤ 𝜖 𝑓−𝑖 2
ℎ1 , ℎ2 , … , ℎℓ : 𝑛 → 𝑘
𝑟1 , 𝑟2 , … , 𝑟ℓ : 𝑛 → 𝑘
𝜎 = 𝑎1 , … , 𝑎𝑖 , …
±1
±1 1
ℓ = log
±1 𝛿
±1
𝑘 = 2/𝜖
Linear Sketching
Random linear mapping 𝑀 ∈ ℝ𝑠×𝑚 :
𝑀 𝜎 = 𝑀𝜎 answer
CountMin as a Linear Sketch
What is 𝑀 ? ℎ: 𝑛 → 𝑘
𝜎 = 𝑎1 , … , 𝑎𝑖 , …
+1
CountMin as a Linear Sketch
What is 𝑀 ? ℎ: 𝑛 → 𝑘
𝜎 = 𝑎1 , … , 𝑎𝑖 , …
𝜎
1 0 0 1 1 0 0 0 0 1
+1
𝑀 𝜎1 𝑀𝜎1 𝑀 𝜎2 𝑀𝜎2
= =
Linear Sketches =
“Dimensionality reduction”:
embed 𝑣1 , … , 𝑣𝑘 ∈ ℝ𝑁 into ℝ𝑛 while preserving
essential properties
- distances (Johnson-Lindenstrauss Lemma)
- ...
Break
Estimating Higher Frequency
Moments
Higher Frequency Moments
• Reminder: 𝐹𝑘 = σ𝑖∈ 𝑛 𝑓𝑖𝑘
• Higher 𝑘 ⇒ weighted towards highest 𝑓𝑖
• 𝐹2 is the variance of 𝑓
• Higher moments have applications in databases (?)
Estimating 𝐹2 : the Tug-Of-War Sketch
Tug-of-War Sketch
• Choose ℎ: 𝑛 → −1, +1 from 4-wise independent family
•𝑥←0
• Process 𝑎𝑖 ∈ 𝑛 :
𝑥 ← 𝑥 + ℎ 𝑎𝑖
• Output 𝑥 2
Tug-of-War Sketch
• Choose ℎ: 𝑛 → −1, +1 from 4-wise independent family
•𝑥←0
• Process 𝑎𝑖 ∈ 𝑛 : 𝑖 ≠ 𝑗: E = 0
𝑥 ← 𝑥 + ℎ 𝑎𝑖 𝑖 = 𝑗: E = 1
• Output 𝑥 2
2
2
E 𝑥2 = E ℎ 𝑖 𝑓𝑖 =E ℎ 𝑖 ℎ 𝑗 𝑓𝑖 𝑓𝑗 = 𝑓 2 = 𝐹22
𝑖∈ 𝑛 𝑖,𝑗∈ 𝑛
Tug-of-War Sketch
• 𝑉𝑎𝑟 𝑥 2 ≤ 𝐹22
• By Chebyshev:
Pr 𝑥 2 − 𝐹22 > 𝑘 ⋅ 𝐹2 ≤ 1/𝑘 2
• We want:
Pr 𝑥 2 − 𝐹22 > 𝜖 ⋅ 𝐹2 ≤ 𝛿
?
Median-of-Means Trick Pr 𝑧Ƹ − z > 𝑐 ⋅ std ≤ 1/𝑐 2
1
Step 2: take median of Θ log 1Τ𝛿 means log 𝑛log
𝛿
Space: 𝑂
𝜖2