Welcome to Scribd!

Streaming Algorithm Count-Min Sketch

Uploaded by

0% found this document useful (0 votes)

5 views16 pages

The Count-min sketch is a probabilistic data structure that uses multiple hash functions to map frequencies onto a matrix in order to track the count of elements in a dataset. It overcounts frequencies due to hash collisions, but using more hash functions reduces collisions. Applications include compressed sensing, networking, NLP, stream processing, and frequency tracking. The Count-min sketch allows estimating how many times an item occurred but may overestimate due to collisions of different items mapping to the same index. Increasing the size of the matrix decreases overcounting and improves accuracy of frequency estimates.

Original Description:

Original Title

Streaming algorithm Count-min sketch (1)

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

5 views16 pages

Streaming Algorithm Count-Min Sketch

Uploaded by

Sridevi Unnikrishnan

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 16

Search inside document

Count-min sketch

Count-min sketch
• Count-min sketch approach was proposed by Graham
Cormode and S. Muthukrishnan. in the paper approximating data
with the count-min sketch published in 2011.
• The Count-min sketch is a probabilistic data structure. The
Count-Min sketch is a simple technique to summarize large amounts
of frequency data.
• Count-min sketch algorithm talks about keeping track of the count of
things. i.e, How many times an element is present in the set.
• Finding the count of an item could be easily achieved in Java
using HashTable or Map.
• It uses multiple hash functions to map these frequencies on to the
matrix
So the final entry on each index was 3, 2, 2, 2. Now take the minimum count among these entries
and that is the result. So min(3, 2, 2, 2) is 2, that means the above test input is processed two
times in the above list.
Issue with Count-min sketch and its solution:
• What if one or more elements got the same hash values and then they all incremented. So, in
that case, the value would have been increased because of the hash collision.
• Count-min sketch overcounts the frequencies because of the hash functions. So the more hash
function we take there will be less collision.
• The fewer hash functions we take there will be a high probability of collision. Hence it
always recommended taking more number of hash functions.
• Applications of Count-min sketch:
• Compressed Sensing
• Networking
• NLP
• Stream Processing
• Frequency tracking
• Extension: Heavy-hitters
• Extension: Range-query
Count-Min Sketches — Approximately How
Many Times did People Watch Adele’s “Hello”?
• Let’s take an example. Suppose we have a 3-column, 4-row
Count-Min Sketch. The very first user on your platform decides
to watch “History of Japan”.
• When that video’s id is hashed by the three hashes, we get 1, 2,
and 1, indicating that we should increment the counters at (1,1),
(2,2), and (1,3).
• Now suppose that another user watches “Hello” by Adele, which
hashes to 1, 1, and 4.
• Following that, yet another user watches “History of Japan”.
• how many times users watched “Hello”.
• “Hello” hashes to 1,1, and 4, so we should examine the counts at
the cells (1,1), (1,2), and (4,3). These counts are 3, 1, and 1
respectively.

• “Hello” was watched only once, and yet, in the Count-Min Sketch
above, one of the counters says “Hello” was watched 3 times!
• This is because “Hello” and “Japan” collide on the first hash
function, so any views of “Japan”, which was watched twice, will
also count towards “Hello”.
• Essentially, the Count-Min Sketch is doubling-up, allowing
multiple events to share the same counter in order to preserve
space.
• The more cells in the table, the more counters we store in
memory, so the less doubling-up will occur, and the more
accurate our counters will be.
• Even though “Hello” was never watched in this sequence, each
of the videos collided with “Hello” in at least one hash function.
• As a result, when we query the view count for Adele’s “Hello”,
• we are incorrectly told that it has been watched once
• we can reduce both the probability and magnitude of errors by
using a larger table.
• If goal is to understand general trends rather than to make
precise measurements.

Some Uses of Hashing in Networking Problems: Michael Mitzenmacher
Document44 pages
Some Uses of Hashing in Networking Problems: Michael Mitzenmacher
Kank Riyan
No ratings yet
Approximate Heavy Hitters Problems
Document9 pages
Approximate Heavy Hitters Problems
Sridevi Unnikrishnan
No ratings yet
SoICT-Eng - ProbComp - Lec 5
Document41 pages
SoICT-Eng - ProbComp - Lec 5
Sope Coto
No ratings yet
Streaming Algorithms: Ajinkya Potdar Hemanga Krishna Borah
Document47 pages
Streaming Algorithms: Ajinkya Potdar Hemanga Krishna Borah
selva
No ratings yet
Awesome Big Data Algorithms
Document37 pages
Awesome Big Data Algorithms
Yamabushi
No ratings yet
EEB 435 Python LECTURE 3
Document34 pages
EEB 435 Python LECTURE 3
osward
No ratings yet
Google Interview Questions
Document9 pages
Google Interview Questions
radz143
No ratings yet
Algorithm Design Paradigm-3
Document14 pages
Algorithm Design Paradigm-3
Yukti Satheesh
No ratings yet
Computer Science Revision
Document73 pages
Computer Science Revision
ashley chipwanyira
No ratings yet
Algorithms and Complexity: Bioinformatics Spring 2008 Hiram College
Document57 pages
Algorithms and Complexity: Bioinformatics Spring 2008 Hiram College
Asha Rose Thomas
No ratings yet
Week 1 - Introduction and Time Complexity
Document32 pages
Week 1 - Introduction and Time Complexity
Disha Gupta
No ratings yet
Main Techniques and Performance of Each Compression
Document23 pages
Main Techniques and Performance of Each Compression
Rizki Azka
No ratings yet
Problem Set 1
Document3 pages
Problem Set 1
Shariul
No ratings yet
CS305/503, Spring 2009: Michael Barnathan
Document22 pages
CS305/503, Spring 2009: Michael Barnathan
TheAkHolic
No ratings yet
Bar Charts and Histograms WITH GRATEFUL THANKS TO MR BARTON
Document10 pages
Bar Charts and Histograms WITH GRATEFUL THANKS TO MR BARTON
MathsArkAcademy
No ratings yet
Amortized Analysis-Hiring Problem
Document34 pages
Amortized Analysis-Hiring Problem
Yash khatri
No ratings yet
Lecture 6
Document27 pages
Lecture 6
Ronza
No ratings yet
1 GettingStarted
Document28 pages
1 GettingStarted
Tanvi sharma
No ratings yet
Hashing: Fundamentals, Solving Search and Insert Problem Using Hashing, Deletion From Hash Table, Collision Resolution
Document28 pages
Hashing: Fundamentals, Solving Search and Insert Problem Using Hashing, Deletion From Hash Table, Collision Resolution
dofal
No ratings yet
Lemp El Ziv Report
Document17 pages
Lemp El Ziv Report
John Lester Cruz Estarez
No ratings yet
Introduction To DAA
Document11 pages
Introduction To DAA
Asher
No ratings yet
Veloso Sbac03
Document8 pages
Veloso Sbac03
Hieu Minh
No ratings yet
Seven Quality Tools
Document64 pages
Seven Quality Tools
9986212378
No ratings yet
Week 6 - JavaScript (Part 2)
Document16 pages
Week 6 - JavaScript (Part 2)
Aidan Thompson
No ratings yet
Mining Data Streams
Document67 pages
Mining Data Streams
usha
No ratings yet
Lecture1 Notes
Document4 pages
Lecture1 Notes
Nathan Logan
No ratings yet
Binary Number, Bits and Byte: Sen Zhang
Document78 pages
Binary Number, Bits and Byte: Sen Zhang
Leorick Miciano
No ratings yet
Binary Number, Bits and Byte: Sen Zhang
Document81 pages
Binary Number, Bits and Byte: Sen Zhang
RizMarza
No ratings yet
Lec17 Notes
Document4 pages
Lec17 Notes
hancocker
No ratings yet
Algorithm Analysis Chapter 4
Document20 pages
Algorithm Analysis Chapter 4
Ahmad Maulana Ibrahim
No ratings yet
cs161 Lecture1notes Class
Document4 pages
cs161 Lecture1notes Class
Pawandeep Singh
No ratings yet
Evaluate Algorithms
Document36 pages
Evaluate Algorithms
Nurul Akmar Emran
No ratings yet
Converted 1691334400307
Document11 pages
Converted 1691334400307
EMH gamer
No ratings yet
36993-Lesson 3
Document10 pages
36993-Lesson 3
Osman Ahmed
No ratings yet
Analysis of Algorithms: Running Time
Document5 pages
Analysis of Algorithms: Running Time
Sankardeep Chakraborty
No ratings yet
Mmd04A Streams
Document78 pages
Mmd04A Streams
ammuachunew1234
No ratings yet
Computational Tools DTU Presentation Week3
Document33 pages
Computational Tools DTU Presentation Week3
dr.rawstone
No ratings yet
Experiment No 6
Document11 pages
Experiment No 6
Aman Jain
No ratings yet
Pseudocode Examples
Document7 pages
Pseudocode Examples
Stavros86
No ratings yet
Lecture 2
Document57 pages
Lecture 2
triva
No ratings yet
Lecture 04 Analysis of Algorithms
Document48 pages
Lecture 04 Analysis of Algorithms
utkarsh.shrivastava.ug21
No ratings yet
Unit 4 - 4.4
Document23 pages
Unit 4 - 4.4
King Bavisi
No ratings yet
Data Science 5
Document82 pages
Data Science 5
kagome
No ratings yet
Efficiency of Algorithms Algorithms
Document5 pages
Efficiency of Algorithms Algorithms
Thrown
No ratings yet
Design and Analysis of Algorithms: Logistics, Introduction, and Multiplication!
Document57 pages
Design and Analysis of Algorithms: Logistics, Introduction, and Multiplication!
Nguyễn Thanh Phát
No ratings yet
Queueing
Document65 pages
Queueing
Shivam Daryanani
100% (1)
CS-311 Design and Analysis of Algorithms
Document50 pages
CS-311 Design and Analysis of Algorithms
abdullah noor
No ratings yet
Speed Up Your Numpy and Pandas With Numexpr Package: You Have 2 Free Stories Left This Month
Document11 pages
Speed Up Your Numpy and Pandas With Numexpr Package: You Have 2 Free Stories Left This Month
Vikash Rryder
No ratings yet
Math Concepts
Document4 pages
Math Concepts
colchaodoquarto
No ratings yet
Greedy Algorithm
Document28 pages
Greedy Algorithm
saikatdebbarma61
No ratings yet
Lecture01 Algorithm Analysis
Document58 pages
Lecture01 Algorithm Analysis
sanjena1234
No ratings yet
Unix For Poets
Document25 pages
Unix For Poets
Sergei
No ratings yet
Topic 4 - Randomized Algorithms, II: 4.1.1 Clustering Via Graph Cuts
Document6 pages
Topic 4 - Randomized Algorithms, II: 4.1.1 Clustering Via Graph Cuts
preethi
No ratings yet
PPT1 Module2 Hadoop Distribution
Document23 pages
PPT1 Module2 Hadoop Distribution
Hiran Suresh
No ratings yet
Compression Error Detection & Correction: - Compression: Squeeze Out Redundancy
Document12 pages
Compression Error Detection & Correction: - Compression: Squeeze Out Redundancy
Jeffrey Evans
No ratings yet
Viden Io Data Analytics Bloom Filter and FM Algo Final
Document42 pages
Viden Io Data Analytics Bloom Filter and FM Algo Final
Ram Chandu
No ratings yet
Java - The JSP Files (Part 3) - Black Light and White Rabbits by Vikram Vaswani and Harish Kamath
Document20 pages
Java - The JSP Files (Part 3) - Black Light and White Rabbits by Vikram Vaswani and Harish Kamath
optimisticserpent
No ratings yet
Presented by Vikas Reddy Garlapati CSC 540C Hw4
Document11 pages
Presented by Vikas Reddy Garlapati CSC 540C Hw4
karanreddy
No ratings yet
From Average To K-means
From Everand
From Average To K-means
Beam van Waardenberg
No ratings yet
Elliott Wave Timing Beyond Ordinary Fibonacci Methods
From Everand
Elliott Wave Timing Beyond Ordinary Fibonacci Methods
Mark Lytle
Rating: 4 out of 5 stars
4/5 (21)
Relatório de Preços de Produtos: Primeiro Nível: Selecione
Document3 pages
Relatório de Preços de Produtos: Primeiro Nível: Selecione
Vitor Manoel
No ratings yet
Java
Document381 pages
Java
Kavya
No ratings yet
District Adressess Uganda
Document9 pages
District Adressess Uganda
wonue
No ratings yet
Advanced Microprocessor 1: Rsk/Cbkpc/Ec
Document13 pages
Advanced Microprocessor 1: Rsk/Cbkpc/Ec
rskhot
No ratings yet
01 - Core Minescape v.5
Document82 pages
01 - Core Minescape v.5
M Rizki Mufty
No ratings yet
Introduction To Computing Fundamentals of ICT Week 1-2
Document21 pages
Introduction To Computing Fundamentals of ICT Week 1-2
Ronald Cambil Jr.
No ratings yet
Fx7400plus Chapter8 en
Document38 pages
Fx7400plus Chapter8 en
ahmadsawalmah
No ratings yet
SP4800 Field Repair Guide
Document245 pages
SP4800 Field Repair Guide
Bill Neyman
100% (1)
Azure-RealTime Training Syllabus
Document9 pages
Azure-RealTime Training Syllabus
karmjit1980
No ratings yet
Online Research Databases
Document2 pages
Online Research Databases
Sia DLSL
No ratings yet
Midterm Exam in MST 4
Document2 pages
Midterm Exam in MST 4
Eddie Angco Torremocha
No ratings yet
Educational Technology 100 ITEMS ADRIANNENENG
Document17 pages
Educational Technology 100 ITEMS ADRIANNENENG
Adrian Neneng
100% (1)
MIC QB Units 2.5
Document2 pages
MIC QB Units 2.5
Ojaswini Borse
No ratings yet
Cousins of Compiler
Document25 pages
Cousins of Compiler
Kuldeep Pal
100% (1)
AACEi-magazine COST - Engineering 5oct2013
Document56 pages
AACEi-magazine COST - Engineering 5oct2013
Ri A Pe
100% (1)
Quick Reference Card OpenStage 15 SIP
Document2 pages
Quick Reference Card OpenStage 15 SIP
Rodrigo Montes
No ratings yet
DFDFDFDF PDF
Document3 pages
DFDFDFDF PDF
Boubker Bellahna
No ratings yet
On A General Form of Rk4 Method
Document10 pages
On A General Form of Rk4 Method
IgnacioF.FernandezPaba
No ratings yet
Vikrama Simhapuri University: Nellore: Mba New Syllabus Non-CBCS With Effect From (2011-2013) Batch
Document35 pages
Vikrama Simhapuri University: Nellore: Mba New Syllabus Non-CBCS With Effect From (2011-2013) Batch
Sandeep Kumar
100% (3)
SMS Flow
Document89 pages
SMS Flow
Ayan Chakraborty
100% (1)
Exception Handling
Document6 pages
Exception Handling
honaday945
No ratings yet
Smart Dustbin Using Arduino Report 1
Document24 pages
Smart Dustbin Using Arduino Report 1
Ñîžäm Ñž
100% (1)
HUAWEI E5330BsTCPU-21.210.19.00.00 Release Notes
Document9 pages
HUAWEI E5330BsTCPU-21.210.19.00.00 Release Notes
Filzen Elro Roda
No ratings yet
Eng9 - Q2 - M3 - W2 - Interpret Information Found in Non-Linear Texts - V5
Document19 pages
Eng9 - Q2 - M3 - W2 - Interpret Information Found in Non-Linear Texts - V5
CELIA T. BOLASTUG
No ratings yet
PQT Notes
Document337 pages
PQT Notes
Dot Kidman
100% (1)
Asansol Engineering College: Topic: PPT Assignment
Document11 pages
Asansol Engineering College: Topic: PPT Assignment
Haimanti Sadhu
No ratings yet
Characteristics of Computer System
Document13 pages
Characteristics of Computer System
khanngull118
No ratings yet
Microstick II Information Sheet
Document2 pages
Microstick II Information Sheet
morcov19
No ratings yet
Family Safety - Little Bird PDF
Document10 pages
Family Safety - Little Bird PDF
Ayu Valentina
No ratings yet
Training Siemens Configuration gatewayCG
Document54 pages
Training Siemens Configuration gatewayCG
Marius
50% (2)