Introduction To Streaming Algorithm - Pt1

Lecture 1: Introduction
Logistics
• Prerequisites:
Algorithms + Complexity
or
Probability + Computational Models with grade ≥ 85
Logistics
• Grade:
• 70% exam
• 30% HW assignments (5-6)
• 5 bonus points for participating in Mentimeter quiz during class
• Participate sin at least 11 (out of 13) quizzes
• Office hours: email me (roshman@tau.ac.il)

Logistics
IF YOU DON’T FEEL WELL, STAY HOME

What Is This Course About?
• Traditional models of computing:
Algorithm:
data workspace
This Course
• Part I: Streaming Algorithms
• Part II: Sublinear-Time Algorithms
• Part III: Distributed Algorithms
Streaming Algorithms
Algorithm
(workspace)
data
Goal: compute 𝑓 𝑑𝑎𝑡𝑎

… approximately, w.h.p.
Streaming Algorithms
• Useful when:
• Data really is a stream
• Many cases where it’s not
Sublinear-Time Algorithms
Algorithm
?
𝑥 ∈ 0,1 𝑛
?
?

9
One Current Example….
Distributed Algorithms
data4
data1
data5
data3
data2

Course Goals
• See some cool algorithms and lower bounds
• Get a “feel” for randomized algorithms and probability
Today: a Tasting Menu
• One sublinear-time algorithm
• One streaming algorithm
• One distributed algorithm
Testing List Sortedness in
Sublinear Time
[Ergün, Kannan, Kumar, Rubinfeld, Viswanathan ‘00]
List Sortedness
• Input: a list 𝐿 = 𝑥1 , … , 𝑥𝑛 of integers
• Output: is 𝐿 sorted?
For every 𝑖 ∈ 1, … , 𝑛 − 1 : 𝑥𝑖 ≤ 𝑥𝑖+1
• Can’t answer without reading the entire list

• What can we do?
Property Testing
universe
NO
YES
??? Property
𝒫 “close to 𝒫”
Need to change at
most 𝜖 ⋅ 𝑛 of the
object to get 𝒫
“far from 𝒫”
Property Testing (Formally)
Given 𝑥 ∈ 0,1 𝑛 and a property 𝑃 ⊆ 0,1 𝑛 , distinguish between:
• 𝑥 ∈ 𝑃,
• 𝑥 is 𝜖-far from 𝑃:
for all 𝑦 ∈ 𝑃 we have Δ 𝑥, 𝑦 ≥ 𝜖 ⋅ 𝑛, where Δ = “edit distance”
17
Back to Sortedness
• “𝜖-close to sorted”?
• Need to change at most 𝜖 ⋅ 𝑛 values to get a sorted list
Naïve Attempt
• Sample 𝑘 uniformly random indices 𝑖 and verify 𝑥𝑖 ≤ 𝑥𝑖+1
• How large should 𝑘 be?
• Bad example:
𝐿 = 1,2, … , 𝑛/2,1,2, … , 𝑛/2
• How far from sorted?
• How large 𝑘?
• What about checking pairs, 𝑥𝑖 ≤ 𝑥𝑗 , for random 𝑖 < 𝑗 ?

𝐿 = 3,2,1,6,5,4,9,8,7, … , 𝑛, 𝑛 − 1, 𝑛 − 2
Actual Algorithm
Repeat 𝑘 times:
• Sample uniform index 𝑖
• Perform binary search for the value 𝑥𝑖
• If binary search ends at position different from 𝑖 – reject
Finally: accept
Example
3 2 1 6 5 4 9 8 7 12 11 10 15 14 13 16
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Correctness
• Need to show:
• If 𝐿 is sorted, we accept w.h.p.
• If 𝐿 is 𝜖-far from sorted, we reject w.h.p.
• Say 𝑖 is a good index if binary search for 𝑥𝑖 ends up in position 𝑖

• Claim: the elements at good indices are sorted!
Proof of Claim
• Let 𝑖 < 𝑗 be good indices
• Let 𝑘 = last common index in the binary search for 𝑥𝑖 and for 𝑥𝑗
• Then: 𝑥𝑖 ≤ 𝑥𝑘 ≤ 𝑥𝑗
Using the Claim
• Suppose 𝐿 is 𝜖-far from sorted
⇒ at most 1 − 𝜖 𝑛 good indices in 𝐿
(otherwise: replace just the bad indices)
⇒ at least 𝜖𝑛 bad indices in 𝐿, so:
𝜖𝑛
Pr 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝑎 𝑏𝑎𝑑 𝑖𝑛𝑑𝑒𝑥 = =𝜖
𝑛
• How many samples needed to find a bad index w.h.p.?

Streaming Algorithm for Distinct
Elements
[Flajolet, Martin’84]
Distinct Elements
• 𝑓 𝑎1 , … , 𝑎𝑚 = 𝑎1 , … , 𝑎𝑚 , where 𝑎1 , … , 𝑎𝑚 ∈ 𝑛
• Naïve solution?
• Claim: can’t do it deterministically with 𝑜 𝑛 bits

• Another claim: can’t do it exactly with 𝑜 𝑛 bits. But…
1
• Can get 1 + 𝜖 -multiplicative approximation in 𝑂 log 𝑛 + bits
𝜖2
Lower Bound of Ω 𝑛 for Exact, Deterministic
Algorithms
Flajolet-Martin Algorithm
• Choose random hash function ℎ: 1, … , 𝑛 → 1, … , 𝑛
• Define sequence of events over 1, … , 𝑛 , of increasing probability:
Event 𝐴0 = 1, … , 𝑛
Event 𝐴1
Use the smallest
Event 𝐴2 event that occurred
Event 𝐴3 to estimate the
number of elements
• Sequence of events 𝐴0 ⊆ ⋯ ⊆ 𝐴log 𝑛
• Event 𝐴𝑖 : the binary encoding of the number ends with 𝑖 zeroes
Mentimeter Experiment
• Let 𝑚 = 0
• Let ℎ: 1, … , 𝑛 → 1, … , 𝑛 be a random hash function*
• To process 𝑎𝑖 :
• Let 𝑧 = number of trailing zeroes in binary representation of ℎ 𝑎𝑖
• 𝑚 ← max 𝑚, 𝑧
• Output: 2𝑚
Analysis of Flajolet-Martin
Analysis of Flajolet-Martin
Space Complexity
The Hash Function
• Pairwise-independence: for every 𝑥1 , 𝑥2 ∈ 𝑛 and 𝑦1 , 𝑦2 ∈ 𝑛 ,
1
Pr ℎ 𝑥1 = 𝑦1 ∧ ℎ 𝑥2 = 𝑦2 = 2
ℎ 𝑛
• Example: for every prime 𝑞, the family

ℋ = ℎ𝑎,𝑏 𝑥 = 𝑎𝑥 + 𝑏 mod 𝑞 ∶ 𝑎, 𝑏 ∈ 𝔽𝑞
is pairwise-independent.
• Representing ℎ𝑎,𝑏 ?
Improving the Accuracy
• Result must be of the form 2𝑖
• High variance
• How to improve?
Distributed Algorithm for All-Pairs
Shortest Paths

Introduction To Streaming Algorithm - Pt1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Streaming Algorithm - Pt1

Uploaded by

Copyright:

Available Formats

Lecture 1: Introduction

• Office hours: email me (roshman@tau.ac.il)

IF YOU DON’T FEEL WELL, STAY HOME

Goal: compute 𝑓 𝑑𝑎𝑡𝑎

Goal: compute 𝑓 𝑑𝑎𝑡𝑎

Goal: compute 𝑓 𝑑𝑎𝑡𝑎

• Can’t answer without reading the entire list

• What about checking pairs, 𝑥𝑖 ≤ 𝑥𝑗 , for random 𝑖 < 𝑗 ?

• Say 𝑖 is a good index if binary search for 𝑥𝑖 ends up in position 𝑖

• How many samples needed to find a bad index w.h.p.?

• Claim: can’t do it deterministically with 𝑜 𝑛 bits

• Example: for every prime 𝑞, the family

You might also like