You are on page 1of 8

LUCENE Demo

By Swapnil Chaudhari

Lucene Features

Field based indexing and searching Different fields of a webpage are


Title URL Anchor text Content, etc..

Different boost factors to give importance to fields Uses inverted index to store content of crawled documents Open source Apache project

Lucene Index

Concepts

Index: sequence of documents (a.k.a. Directory) Document: sequence of fields Field: named sequence of terms Term: a text string (e.g., a word) Term frequencies and positions

Statistics

Installation Procedure

Download lucene 3.5 src from

http://archive.apache.org/dist/lucene/java/3.5.0/

Extract the archive Copy the folder lucene-3.5.0/contrib/demo to lucene-3.5.0/src Create a new java project in eclipse using the extracted lucene source. Configure Build Path of the project to display only src/java and src/demo/src/java packages

Local File Indexing using Lucene


Create a folder say documents. Dump all the files which you have to index in this folder. Run IndexFiles.java in demo/src/java package with parameters

-index <path of folder where index needs to be created > -docs <full path of folder containing documents>

Verifying Index in Luke-All

Download luke-all from

http://code.google.com/p/luke/downloads/detail? name=lukeall-3.5.0.jar&

Run the command java -jar lukeall-3.5.0.jar -index <path of index> to open index in luke-all Number of documents indexed should be non zero.

Searching Index in lucene

Run SearchFiles.java in demo/src/java package with parameters

-index <full path of index created by IndexFiles.java

Enter query in plain text All the documents containing the query terms will be retrieved.

Query Formulation

Query term occurence types


+Must Should -Must not

E.g. The query +contents:mango contents:plant will retrieve all documents that contains mango and may or maynot contain the term plant.