Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
5Activity
0 of .
Results for:
No results containing your search query
P. 1
Clustering and classification in Information Retrieval: from standard techniques towards the state of the art

Clustering and classification in Information Retrieval: from standard techniques towards the state of the art

Ratings: (0)|Views: 248|Likes:
Published by Vincenzo Russo
This document is an overview about the clustering and the classification techniques in the Information Retrieval (IR) application do- main. The first part of the document covers classical and affirmed techniques both in clustering and in classification for information retrieval. The second part is about the most recent development in the area of the machine learning applied to the document mining. For every technique we cite experiments found in the most important lit- erature.
This document is an overview about the clustering and the classification techniques in the Information Retrieval (IR) application do- main. The first part of the document covers classical and affirmed techniques both in clustering and in classification for information retrieval. The second part is about the most recent development in the area of the machine learning applied to the document mining. For every technique we cite experiments found in the most important lit- erature.

More info:

Categories:Types, Research, Science
Published by: Vincenzo Russo on Sep 08, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

05/11/2014

pdf

text

original

 
Clustering and classification in InformationRetrieval: from standard techniques towardsthe state of the art
Vincenzo Russo (vincenzo.russo@neminis.org)Department of PhysicsUniversità degli Studi di Napoli “Federico II”Complesso Universitario di Monte Sant’AngeloVia Cinthia, I-80126 Naples, ItalySeptember 2008
Technical Report TR-9-2008 – SoLCo Project
Abstract
This document is an overview about the clustering and the clas-sification techniques in the Information Retrieval (IR) application do-main. The first part of the document covers classical and affirmedtechniques both in clustering and in classification for information re-trieval. The second part is about the most recent development in thearea of the machine learning applied to the document mining. Forevery technique we cite experiments found in the most important lit-erature.
1
 
Contents
2.1 Bag of words model. . . . . . . . . . . . . . . . . . . . . . . . 52.1.1 Stop words. . . . . . . . . . . . . . . . . . . . . . . . 52.1.2 Stemming. . . . . . . . . . . . . . . . . . . . . . . . . 62.1.3 Lemmatization. . . . . . . . . . . . . . . . . . . . . . 62.2 The vector space model. . . . . . . . . . . . . . . . . . . . . . 62.3 The vector space model in the information retrieval. . . . . 72.3.1 Term frequency and weighting. . . . . . . . . . . . . 7
3.1 Problem statement. . . . . . . . . . . . . . . . . . . . . . . . 83.2 Clustering in Information Retrieval. . . . . . . . . . . . . . . 93.3 Hierarchical approaches. . . . . . . . . . . . . . . . . . . . . 11
4.1 Problem statement. . . . . . . . . . . . . . . . . . . . . . . . 134.2 Classification in Information Retrieval. . . . . . . . . . . . . 134.3 Hierarchical approaches. . . . . . . . . . . . . . . . . . . . . 14
5.1 Quality of the classification results. . . . . . . . . . . . . . . 145.2 Quality of the clustering results. . . . . . . . . . . . . . . . . 15
7.1 Flat clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . 167.2 Hierarchical clustering. . . . . . . . . . . . . . . . . . . . . . 177.3 Proposed state-of-the-art techniques. . . . . . . . . . . . . . 177.3.1 Bregman Co-clustering. . . . . . . . . . . . . . . . . . 187.3.2 Support Vector Clustering. . . . . . . . . . . . . . . . 197.4 Proposedstate-of-the-arttechniques: computationalcomplex-ity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197.4.1 Bregman Co-clustering. . . . . . . . . . . . . . . . . . 207.4.2 Support Vector Clustering. . . . . . . . . . . . . . . . 207.5 Proposed state-of-the-art techniques: experimental results. 217.5.1 Bregman Co-clustering. . . . . . . . . . . . . . . . . . 217.5.2 Support Vector Clustering. . . . . . . . . . . . . . . . 222
 
8.1 Support Vector Machines. . . . . . . . . . . . . . . . . . . . . 248.2 Proposed state-of-the-art technique. . . . . . . . . . . . . . . 248.3 Proposedstate-of-the-arttechnique: computationalcomplex-ity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258.4 Proposed state-of-the-art technique: experimental results. . 263

Activity (5)

You've already reviewed this. Edit your review.
1 thousand reads
1 hundred reads
Vijay Harrell liked this
subbaiah54 liked this

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->