Professional Documents
Culture Documents
Introduction To Streaming Algorithm - Pt1
Introduction To Streaming Algorithm - Pt1
Logistics
• Prerequisites:
Algorithms + Complexity
or
Probability + Computational Models with grade ≥ 85
Logistics
• Grade:
• 70% exam
• 30% HW assignments (5-6)
• 5 bonus points for participating in Mentimeter quiz during class
• Participate sin at least 11 (out of 13) quizzes
data workspace
This Course
• Part I: Streaming Algorithms
• Part II: Sublinear-Time Algorithms
• Part III: Distributed Algorithms
Streaming Algorithms
Algorithm
(workspace)
data
?
𝑥 ∈ 0,1 𝑛
?
?
data1
data5
data3
data2
universe
NO
YES
??? Property
𝒫 “close to 𝒫”
Need to change at
most 𝜖 ⋅ 𝑛 of the
object to get 𝒫
“far from 𝒫”
Property Testing (Formally)
Given 𝑥 ∈ 0,1 𝑛 and a property 𝑃 ⊆ 0,1 𝑛 , distinguish between:
• 𝑥 ∈ 𝑃,
• 𝑥 is 𝜖-far from 𝑃:
for all 𝑦 ∈ 𝑃 we have Δ 𝑥, 𝑦 ≥ 𝜖 ⋅ 𝑛, where Δ = “edit distance”
17
Back to Sortedness
• “𝜖-close to sorted”?
• Need to change at most 𝜖 ⋅ 𝑛 values to get a sorted list
Naïve Attempt
• Sample 𝑘 uniformly random indices 𝑖 and verify 𝑥𝑖 ≤ 𝑥𝑖+1
• How large should 𝑘 be?
• Bad example:
𝐿 = 1,2, … , 𝑛/2,1,2, … , 𝑛/2
• How far from sorted?
• How large 𝑘?
3 2 1 6 5 4 9 8 7 12 11 10 15 14 13 16
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Correctness
• Need to show:
• If 𝐿 is sorted, we accept w.h.p.
• If 𝐿 is 𝜖-far from sorted, we reject w.h.p.
Event 𝐴1
Use the smallest
Event 𝐴2 event that occurred
Event 𝐴3 to estimate the
number of elements
Flajolet-Martin Algorithm
• Sequence of events 𝐴0 ⊆ ⋯ ⊆ 𝐴log 𝑛
• Event 𝐴𝑖 : the binary encoding of the number ends with 𝑖 zeroes
Mentimeter Experiment
Flajolet-Martin Algorithm
• Let 𝑚 = 0
• Let ℎ: 1, … , 𝑛 → 1, … , 𝑛 be a random hash function*
• To process 𝑎𝑖 :
• Let 𝑧 = number of trailing zeroes in binary representation of ℎ 𝑎𝑖
• 𝑚 ← max 𝑚, 𝑧
• Output: 2𝑚
Analysis of Flajolet-Martin
Analysis of Flajolet-Martin
Space Complexity
The Hash Function
• Pairwise-independence: for every 𝑥1 , 𝑥2 ∈ 𝑛 and 𝑦1 , 𝑦2 ∈ 𝑛 ,
1
Pr ℎ 𝑥1 = 𝑦1 ∧ ℎ 𝑥2 = 𝑦2 = 2
ℎ 𝑛