You are on page 1of 3

Declaration

I hereby declare that the work being presented in the project report titled “Implementation of
CluStream Algorithm for Clustering of Data Streams” in partial fulfillment of the requirement
for the award of the degree of Integrated Dual Degree in Computer Science & Engineering,
submitted in the Department of Electronics and Computer Engineering, Indian Institute of
Technology Roorkee, is an authentic record of my own work carried out under the guidance of
Dr. Durga Toshniwal, Asstt. Professor, Department of Electronics and Computer Engineering,
Indian Institute of Technology Roorkee.

I have not submitted the matter embodied in this project report for the award of any other degree.

Dated: 20/10/20010

Place: IIT Roorkee (Parthsarthi Mishra)


Certificate

This is to certify that declaration made by the candidate is correct to the best of my knowledge

and belief.

Date:20/10/2010 (Dr Durga Toshniwal)

Place: IIT Roorkee Department of Electronics and

Computer Engineering,

IIT Roorkee
Abstract
Clustering is a challenging problem for the data stream domain. Traditional algorithms assume
full availability of data sets in the main memory. Thus they are inefficient for data arriving in
streams. The existing one pass algorithms developed for this problem do not address the
following issues: (1) Evolution of data with time and (2) Discovery of clusters over different
portions of stream.

CluStream tries to address both the problems by dividing the task into online and offline
components. The online component periodically stores the detailed summary statistics. The
offline component uses these summary statistics to answer any user specified query for
information on various time horizons. The concepts of pyramidal time frame and microclustering
are used for efficient storage.

You might also like