You are on page 1of 84

Solr & Lucene at Etsy

Gregg Donovan Technical Lead, Search gregg@etsy.com

1.5 years Solr & Lucene at Etsy.com 3 years Solr & Lucene at TheLadders.com

8+ million members

9.3 million items

800k+ active sellers

1+ billion pageviews / month

Maximize Solr out-of-the-box

Hack at a low-level

Know when to do each

Or

Don’t fear trunk

builds.apache.org/job/Solr-trunk/changes

http://localhost:8393/solr/placesuggest/ select? q={!lucene}s* &sfield=latlong&pt=37.595804,-122.364521 &sort=div(geodist(),sqrt(sum(population,50))) %20asc

{!lucene} {!field} {!term} {!boost} {!func} {!dismax} {!edismax}

Cheap ranking awesomeness

ExternalFileField ftw!

schema.xml: <fieldType name="file" keyField="treasury_id" defVal="0" stored="false" indexed="true" class="solr.ExternalFileField" valType="float"/> <field name="hotness" type="file"/> /search/data/treasury/external_hotness.1306390802088: 1=2.3 2=1.7 3=1.1 Solr query: sort={!func}hotness+desc

ExternalFileField caveats

More relevance: boost query

http://localhost:8983/solr/listings/select? q={!boost b=$rel v=$qq} &rel=category:furniture^10+OR+((-material:acrylic) ^5) &qq=desk

Impression tracking

etsy.com/search?q=desk&explain=1

Side-by-Side testing

Cheap performance wins

Put off sharding till you must

cat ${indexDir}/* > /dev/null

Return IDs, minimize stored fields

RAM: $10-20 / GB

SSD: 0.1ms vs 10ms seek

Custom?

solr-user

Tools for low-level hacking

Continuous deployment

One button. So easy a dog could do it.

MTTR > MTBF

github.com/etsy/logster

Tracking GC

export GC_DEBUG="-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX: +PrintGCApplicationStoppedTime -XX:+PrintAdaptiveSizePolicy XX:AdaptiveSizePolicyOutputInterval=1 -XX:+PrintTenuringDistribution -XX: +PrintGCDetails -Xloggc:/var/log/search/gc.log"

Alerting

Testing

SaveAsFixture

Profiling

Java Primitive Library
fastutil trove4j

Know the hooks
SolrRequestHandler SearchComponent QParserPlugin SolrEventListener SolrCache ValueSourceParser

SolrIndexSearcher gotchas
reference counting using it as a cache key:
WeakHashMap<SolrIndexSearcher,MyValue> myCache...

Example: personalized collections

fq={!term f=id}123 OR {!term f=id}456

Need a map of PK to docId

Use custom SolrCache plus SolrEventListener to fill it

github.com/giokincade/FastTermFilter

i18n currency sorting and filtering

currency.xml: <currencyConfig version="1.0"> ! <currencies> ! ! <currency name="United States Dollar" symbol="$" code="USD"/> ! ! <currency name="Australian Dollar" symbol="$" code="AUD"/> ! ! <currency name="Canadian Dollar" symbol="$" code="CAD"/> ! ! <currency name="Czech Koruna" symbol="Kč" code="CZK"/> ... ! </currencies> ! <rates> ! ! <rate from="USD" to="AUD" rate="1.168750"/> ! ! <rate from="USD" to="CAD" rate="1.085000"/> ! ! <rate from="USD" to="CZK" rate="20.107500"/> ! ! <rate from="USD" to="DKK" rate="5.323750"/> ... </rates> </currencyConfig>

price:[$10.00 to $50.00] price:[10.00USD to 50.00USD] price:20.00EUR

MoneyFieldType.java: @Override public Query getRangeQuery(QParser parser, SchemaField field, String part1, String part2, final boolean minInclusive, final boolean maxInclusive) { final MoneyValue p1 = MoneyValue.parse(part1, defaultCurrency); final MoneyValue p2 = MoneyValue.parse(part2, defaultCurrency); if (!p1.getCurrencyCode().equals(p2.getCurrencyCode())) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, new ParseException("Cannot parse range query " + part1 + " to " + part2 + ": range queries only supported when upper and lower bound have same currency.")); } String currencyCode = p1.getCurrencyCode(); final MoneyValueSource vs = new MoneyValueSource(field, currencyCode, parser); return new SolrConstantScoreQuery(new ValueSourceRangeFilter(vs, p1.getAmount() + "", p2.getAmount() + "", minInclusive, maxInclusive)); }

Replication gotcha

SOLR-2202

Related Searches

Autosuggest!

bjewlery dewelry ejewelry ejwelry ewelery ewerly ewlery fewelry fewlery fjewelery fjewelry gewerly gewlery hewelery hewelry hewerly hewlery hjewelry iewelry ijewelry jawelery jawlery jeawlery jeelery jeelry jeewelery jeewelry jeewlery jeewlry jefwelry jejelry jelelry jelery jellery jelwelery jelwelry jelwlery jemelry jemerly jemwelry jeqwelry jerelery jerelry jerely jererly jerlery jerwelery jerwelry jerwely jerwerly jeselery jeselry jevelry jeverly jewalery jewdelry jewedlry jeweelrry jeweelry jeweely jeweer jeweery jeweilry jeweiry jewejery jewejlry jewejrly jewejry jewekey jewekry jewelary jeweldy jewele jewelee jewelelry jewelera jewelerey jewelerly jewelert jewelerty jeweleru jeweleruy jeweleryl jewelerys jeweleryy jewelet jewelety jeweleya jewelfry jewelfy jeweliy jewellryp jewelltry jewelly jewelory jewelra jewelray jewelre jewelree jewelreyy jewelrfy jewelrh jewelri jewelrky jewelrly jewelrr jewelrs jewelrsy jewelrt jewelrty jewelru jewelruy jewelrye jewelryh jewelryl jewelrym jewelryr jewelrys jewelryt jewelryu jewelryuk jewelryy jewelrz jewelsry jewelsy jeweltry jewelty jewelw jewelwery jewelwey jewelwy jewelya jewelyj jewelyr jewelyry jewelyu jewelyy jewelzry jeweory jewerey jeweriy jewerky jewerlary jewerley jewerli jewerlly jewerls jewerlt jewerlu jewerlyh jewerlyr jewerlys jewerlyu jewerry jeweryl jewetry jewewlry jewewly jewewrly jewewry jeweylry jewiery jewilary jewkery jewlary jewledy jewleery jewlelery jewlely

The TermDictionary is not a whitelist