Professional Documents
Culture Documents
Webinar Intro To Solr
Webinar Intro To Solr
Co-sponsored by
Sponsored by
Consulting Subscriptions
Training
Certified
Search Distributions
Customers
Building Better,
Faster, Less Costly Health
Best Practices
Search Applications Checks
Scalability
823 billion documents searched by Lucene at MySpace.com
Performance
Real time: LinkedIn search covers 48 million members, adding one
new member (with new content) per second
Relevancy
Open source APIs deliver better customization and the ability to fine
tune results
Economics
5-8x reduction in server footprint over commercial search
No vendor lock-in lowers lifecycle costs
Relevance Indexing
Vector Space Model (VSM) for relevance Finds and maps terms and documents
Common across many search engines Conceptually similar to a book index
Apache Lucene is a highly optimized At the heart of fast search/retrieve
implementation of the VSM
Solr Config
Define low-level Lucene controls
Specify how clients interact with Solr via Request
Handlers (“mini servlets”)
Configure highlighting, spell checking, admin, etc.
Get to know
your content
Get to know
your users
Collection/Aggregate
Examine collection level stats, like:
MIME Types
Number of Docs
Update rates
Languages present
Much, much more
Look for patterns and relationships
Identify helpful resources
© 2010 Lucid Imagination, Inc. 21
Modeling your Content
Randomly sample a set of your documents
Look for:
Common structures like titles, tables, columns, etc.
Important metadata
Tokenization issues
Try out in http://localhost:8983/solr/admin/analysis.jsp
Importance Indicators
May also look at paragraph, sentence,
word and character issues
HTTP support by
definition:
http://localhost:8983/sol
r/select/?q=*:*&fl=score,
id
http://localhost:8983/sol
r/select/?q=name:iPod&f
l=score,id
Common Techniques
Analysis:
Lowercase, stemming,
synonyms, stopwords,
compound analysis (e.g. STR-
AV220 -> STR AV 220)
Faceting
Spell Checking
Editorial
See http://lucene.li/U
© 2010 Lucid Imagination, Inc. 29
Improving Findability
Phrase Queries and other Position-based Queries
(SpanQuery)
Disjunction Max Query (aka “DisMax”)
Intent Analysis
Invisible Queries
Fake Queries
Relevance Feedback and “More Like This”
See http://lucene.li/S
Solr Support
http://www.lucidimagination.com/How-We-Can-Help
solr-user@lucene.apache.org