You are on page 1of 8


By Swapnil Chaudhari

Lucene Features

Field based indexing and searching Different fields of a webpage are

Title URL Anchor text Content, etc..

Different boost factors to give importance to fields Uses inverted index to store content of crawled documents Open source Apache project

Lucene Index


Index: sequence of documents (a.k.a. Directory) Document: sequence of fields Field: named sequence of terms Term: a text string (e.g., a word) Term frequencies and positions


Installation Procedure

Download lucene 3.5 src from

Extract the archive Copy the folder lucene-3.5.0/contrib/demo to lucene-3.5.0/src Create a new java project in eclipse using the extracted lucene source. Configure Build Path of the project to display only src/java and src/demo/src/java packages

Local File Indexing using Lucene

Create a folder say documents. Dump all the files which you have to index in this folder. Run in demo/src/java package with parameters

-index <path of folder where index needs to be created > -docs <full path of folder containing documents>

Verifying Index in Luke-All

Download luke-all from name=lukeall-3.5.0.jar&

Run the command java -jar lukeall-3.5.0.jar -index <path of index> to open index in luke-all Number of documents indexed should be non zero.

Searching Index in lucene

Run in demo/src/java package with parameters

-index <full path of index created by

Enter query in plain text All the documents containing the query terms will be retrieved.

Query Formulation

Query term occurence types

+Must Should -Must not

E.g. The query +contents:mango contents:plant will retrieve all documents that contains mango and may or maynot contain the term plant.