You are on page 1of 50

Introductions!

Who the hell am I? Jay Hill, Lucid Imagina-on 7 years Lucene experience 4 years Solr experience Author of Lucid Training SME for Lucid Cer-ca-on Who the hell are you? New to search? New to Lucene/Solr? BaKle-tested veterans?

Lucid Imagina-on, Inc.

We'll Leave Time For Q&A!


Who's doing what? Solr 3.1? Solr 1.4.1? Nightly build? Solr 1.3 or older? Are there any specic problems you're having? Meanwhile, interrupt, ask ques8ons as we go, etc.

Lucid Imagina-on, Inc.

A Brief Word About Lucid Imagination!


Lucid Imagina8on: The commercial company suppor-ng Lucene/Solr open source search. Founded by Yonik Seeley Creator of Solr Erik Hatcher Co-author, Lucene In Ac-on Grant Ingersoll Apache PMC Chair Marc Krellenstein Lucid CTO Sta includes 9 Lucene/Solr commiKers Training, cer-ca-on, support, LucidWorks Enterprise

Lucid Imagina-on, Inc.

Lucid Customers (That I've Worked With)!

Lucid Imagina-on, Inc.

On To The Sinning!!

Lucid Imagina-on, Inc.

Sins As Anti-Patterns?!
"Sorta kinda" Specify Nothing (Sloth) Creeping Featurei-s (Greed) Blowhard Jamboree (Pride) Boat Anchor (Lust) Not Invented Here (Envy) Phatware (GluKony) Emperor's New Clothes (Wrath)

Lucid Imagina-on, Inc.

Sins Can Contradict One Another!!


You'll no-ce that many of the "sins" we see will be the exact opposite of others Just as some of us tend towards laziness, others towards excess Some-mes you - "Look before you leap." Other -mes, "He who hesitates is lost." In Solr (or any search app), one size never ts all
Lucid Imagina-on, Inc.

"I don't know and I don't care."

Lucid Imagina-on, Inc.

Sloth!
"We aren't really into open source." Lack of commitment to Solr and/or the search applica-on itself Not developing in-house Solr exper-se Not paying enough aKen-on to JVM sebngs, garbage collec-on, and RAM alloca-on.

Lucid Imagina-on, Inc.

Sloth!
Neglec-ng to get familiar with the source code It is open source ader all! Not taking the -me to understand the main parts of Solr: Request Handlers Search components Query parsers Extend QParserPlugin class ValueSource & ValueSourceParser custom func-ons New pseudo-elds in 4.x Response writers
Lucid Imagina-on, Inc.

Sloth!
Not keeping up with new features and developments in Lucene and Solr

CHANGES.txt use "di" to keep up on changes

Lucid Imagina-on, Inc.

Sloth!
New features in Solr 3.1: Solr spa8al Edismax query parser NOT experimental! Dynamic metadata extrac-on via UIMA Numeric range face8ng (like date face-ng) Lucene RAMDirectoryFactory available Face-ng performance improvements Spellcheck and Terms components now work for distributed search Suggester component beKer autosuggest! Can add custom dict., phrases, etc.
Lucid Imagina-on, Inc.

Sloth!
New features coming in Solr 4.x: Lucene DocumentWritersPerThread (DWPT) Moving towards "real -me" UpdateHandler upgrade to work with real--me Field collapsing/grouping Pivot facets SolrCloud (Zookeeper) Fuzzy queries 100 -mes faster Pseudo elds via func-ons Relevancy func-on queries: n, idf, docFreq, norm,

Lucid Imagina-on, Inc.

Sloth: The Path To Salvation!


Commit to the project and to learning Solr Stay up to date on Solr changes Stay current with ongoing releases Get familiar with the source code Spend some -me to understand the main congura-on les: solrcong.xml schema.xml Read through the en-re Solr Wiki once every so oden Develop in-house Solr exper-se

Lucid Imagina-on, Inc.

Save a penny, lose a customer.

Lucid Imagina-on, Inc.

Greed!
Skimping on resources such as: RAM "Here's a quarter buddy, go buy some RAM!" Storage space You will get what you pay for! on the other hand, not every company has "deep pockets"

Lucid Imagina-on, Inc.

Greed!
Trying to "squeeze by", indexing to, and searching on, the same server
Indexing Indexing

Shards (Indexers)

Slave/Searchers

Searches
Lucid Imagina-on, Inc.

Load Balancer

Searches

Greed!
Not making the eort to nd the right balance between precision and recall
Recall: What frac-on of the relevant documents in the collec-on were re- turned by the system? Precision: What frac-on of the returned results are relevant to the informa-on need?

Lucid Imagina-on, Inc.

Greed!
A few thoughts about relevance: Get feedback from domain experts Is it beKer to have lots of results with less precision, or fewer, more targeted results? Dierent sites will have very dierent requirements

Lucid Imagina-on, Inc.

Greed: The Path To Salvation!


Pry open your wallet don't be cheap You don't have to push the envelope Find the right balance between recall and precision Don't push for more results over precision unless that is a clear requirement (some-mes it is)

Lucid Imagina-on, Inc.

"What could possibly go wrong?

Lucid Imagina-on, Inc.

Pride!
Reinven-ng the wheel "Why don't we just write our own search libraries?" Nobody has a use case like us right? "We need to change the scoring algorithms."

Lucid Imagina-on, Inc.

Pride!
Thinking you can "do it all" in Solr Solr is rarely a good choice as a SOR Consider other tools to work with Solr: Nutch Mahout OpenNLP Google Connector Framework Your own code

Lucid Imagina-on, Inc.

Pride!
Stubbornly refusing to use resources such as the mailing lists: Solr user list: solr-user@lucene.apache.org Solr developer list: dev@lucene.apache.org Lucene user list: java-user@lucene.apache.org LucidFind: hKp://www.lucidimagina-on.com/search/

Lucid Imagina-on, Inc.

Pride!
"I will not yield!" Trying to "win baKles" on the mailing lists Good Karma be a good ci-zen in the community

Lucid Imagina-on, Inc.

Pride: The Path To Salvation!


Ask for help when needed Let the business needs dene the project don't let the tail wag the dog Get a feel for the Solr community and respect the experience of others You're situa-on, while possibly unique, is probably not completely dissimilar to others. Learn from the pioneers and Solr veterans

Lucid Imagina-on, Inc.

"Someone stop me!"

Lucid Imagina-on, Inc.

Lust!
Obsessing over unimportant details too early in the project Agile approach is well suited to Solr development iterate! Trying to "push the envelope" Necessary some-mes, but it's not called the "bleeding edge" without reason "Ease in" to major changes Too much aKen-on to JVM sebngs Solr experts are not usually JVM/GC experts

Lucid Imagina-on, Inc.

Lust!
"An--greed" CommiEng too many resources to Solr Make sure the OS has plenty of RAM to cache les, etc "If one is good, a dozen must be beKer!" As much as possible, try to get a sense of what your query volume will be, and don't just throw money at building a monstrous farm of searchers Solr has proven to be much more ecient than some large, commercial search solu-ons

Lucid Imagina-on, Inc.

Lust!
Blood from a turnip: Trying some absurd new technique, "just because" RAMDirectoryFactory not a secret way to faster indexing/searching No disk-backed persistence Usually not worth it but you never know Research rst before going "extreme"
Lucid Imagina-on, Inc.

Lust!
No need to index millions of docs for development BeKer to work with small sets of data while gebng started. Don't worry too much about eld types as you get started. Get data in the index, then analyze and rene.

Lucid Imagina-on, Inc.

Lust: The Path To Salvation!


Use an agile approach start simply, build your applica-on slowly, iterate Deal with the low-hanging fruit rst Measure twice, cut once Don't miss the forest for the trees no need to obsess over details in the early stages Do some due diligence before trying unorthodox approaches Get a small sample of data indexed w/o worrying about type, then itera-ons of renement

Lucid Imagina-on, Inc.

"If we had some bacon we could have some bacon and eggs if we had some eggs."

Lucid Imagina-on, Inc.

Envy!
Adding "cool" features you see on other sites, but don't really need Keep it "lean and mean", especially to start Resist the urge to include the "kitchen sink"

Lucid Imagina-on, Inc.

Envy!
You too can master dismax! Don't be afraid of dismax/edismax Lots of controls to learn, but also lots of power Flexibility to search mul-ple elds Boost dierent elds Boost phrase elds (pf) higher than query elds (qf) Use boost queries (bq) and func-on queries (bf) Most in-mida-ng params: -e mm
Lucid Imagina-on, Inc.

Envy!
Spa-al search seems complicated, but major sites make it look easy Now, in Solr 3.1 it is easy! You can: Store spa-al data in your index Filter by distance Sort by distance Boost/bias by distance Facet by distance Also consider: Search-based naviga-on such as "Show me in-stock items only"
Lucid Imagina-on, Inc.

Envy: The Path To Salvation!


Focus on your requirements, don't try to add "bells and whistles" you don't need Don't be hesitant to dive into the power of dismax/edismax Take advantage of new features such as Solr spa-al, if those features will add value to the end user experience

Lucid Imagina-on, Inc.

"A fat stomach never breeds ne thoughts."

Lucid Imagina-on, Inc.

Gluttony!
Staying t and trim is usually good prac-ce when designing and running Solr applica-ons Once again keep it "lean and mean" A lot of these issues cross over into the Sloth category The eort needed to keep your congura-on and data eciently managed is not considered important Don't lose control of your congura-on les Remove unnecessary elements Version control all congura-on les

Lucid Imagina-on, Inc.

Gluttony!
Slim down those "bloated" queries: q="red shoes"& accountId=(12343 OR 338899 OR 554443 OR 243445 OR 55442OR 3330899 OR 59927 OR 3888999 OR 549 OR 440293579 34201 OR 339917 OR 300191 OR 339338 OR 109823 OR 679176 OR 31407815 OR 3001756 OR 134322 OR 311123 OR 987888 OR 997181 OR 771819 OR 100292 OR 3389474 OR 5505759 OR 2459577 OR 4499957 OR 1996571 OR 559590 OR 220299 OR 4404872 OR 151510 OR 66017 OR 666 OR 113459 OR 890575 OR 505725 OR 330393 OR 349940 OR 4094994 OR 1245995 OR 2459959 OR 4255909 OR 899955 OR 7878899 OR 100999 )

Lucid Imagina-on, Inc.

Gluttony!
Stay in shape Flex Your Solr Muscles! Keep up on new features Training, when appropriate Cer-ca-on Contribute! Follow the user lists Refactor when new features can help Keep up to date on new releases

Lucid Imagina-on, Inc.

Gluttony: The Path To Salvation!


Keep congura-on les clean and trim. Remove unused elements Periodically review queries to make sure they are ecient Refactor when necessary keep your applica-on t and trim

Lucid Imagina-on, Inc.

"Hope is the denial of reality."

Lucid Imagina-on, Inc.

Wrath!
Wrath - usually synonymous with anger, but Lets use an older deni-on here: A vehement denial of the truth, both to others and in the form of self-denial and impaMence. Step back every now and then and look objec-vely at your applica-on

Lucid Imagina-on, Inc.

Wrath!
Resist the push to rush to produc-on

Lucid Imagina-on, Inc.

Wrath!
Ignoring new Solr releases OK to wait un-l a release is proven But gebng too far behind makes upgrading more painful with each release We don't have -me to do it right, but we always have -me to x it

Lucid Imagina-on, Inc.

Wrath!
Ignoring complaints about results relevance Disregarding feedback from stakeholders Remember the point of your search applica-on is to support the business, not to "build cool stu" Not taking advantage of log les Consider mining log les, storing data in rela-onal DB for genera-ng reports Capturing user queries and query counts can be extremely useful Can also be used for query-based autosuggest. (not just indexed terms)

Lucid Imagina-on, Inc.

Wrath: The Path To Salvation!


Keep your version of Solr up to date OK to wait "awhile", but don't skip versions Seek and embrace feedback from business and domain experts Constantly gauge and improve relevance as an ongoing task Avoid the push to release too soon (as best you can) Take advantage of log les to understand what users are doing, and what is not working well

Lucid Imagina-on, Inc.

Bsqueda, y usted encontrar!

You might also like