You are on page 1of 84

Solr & Lucene at Etsy

Gregg Donovan
Technical Lead, Search
gregg@etsy.com
1.5 years Solr & Lucene at Etsy.com

3 years Solr & Lucene at TheLadders.com


8+ million members
9.3 million items
800k+ active sellers
1+ billion pageviews / month
Maximize Solr out-of-the-box
Hack at a low-level
Know when to do each
Or
Don’t fear trunk
builds.apache.org/job/Solr-trunk/changes
http://localhost:8393/solr/placesuggest/
select?
q={!lucene}s*
&sfield=latlong&pt=37.595804,-122.364521
&sort=div(geodist(),sqrt(sum(population,50)))
%20asc
{!lucene}
{!field}
{!term}
{!boost}
{!func}
{!dismax}
{!edismax}
Cheap ranking awesomeness
ExternalFileField ftw!
schema.xml:
<fieldType name="file" keyField="treasury_id" defVal="0"
stored="false" indexed="true" class="solr.ExternalFileField"
valType="float"/>
<field name="hotness" type="file"/>

/search/data/treasury/external_hotness.1306390802088:
1=2.3
2=1.7
3=1.1

Solr query:
sort={!func}hotness+desc
ExternalFileField caveats
More relevance: boost query
http://localhost:8983/solr/listings/select?
q={!boost b=$rel v=$qq}
&rel=category:furniture^10+OR+((-material:acrylic)
^5)
&qq=desk
Impression tracking
etsy.com/search?q=desk&explain=1
Side-by-Side testing
Cheap performance wins
Put off sharding till you must
cat ${indexDir}/* > /dev/null
Return IDs, minimize stored fields
RAM: $10-20 / GB
SSD: 0.1ms vs 10ms seek
Custom?
solr-user
Tools for low-level hacking
Continuous deployment
One button.
So easy a dog could do it.
MTTR > MTBF
github.com/etsy/logster
Tracking GC
export GC_DEBUG="-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:
+PrintGCApplicationStoppedTime -XX:+PrintAdaptiveSizePolicy -
XX:AdaptiveSizePolicyOutputInterval=1 -XX:+PrintTenuringDistribution -XX:
+PrintGCDetails -Xloggc:/var/log/search/gc.log"
Alerting
Testing
SaveAsFixture
Profiling
Java Primitive Library
fastutil
trove4j
Know the hooks
SolrRequestHandler
SearchComponent
QParserPlugin
SolrEventListener
SolrCache
ValueSourceParser
SolrIndexSearcher gotchas
reference counting
using it as a cache key:
WeakHashMap<SolrIndexSearcher,MyValue> myCache...
Example:
personalized collections
fq={!term f=id}123 OR {!term f=id}456
Need a map of PK to docId
Use custom SolrCache plus SolrEventListener
to fill it
github.com/giokincade/FastTermFilter
i18n currency sorting and filtering
currency.xml:

<currencyConfig version="1.0">
! <currencies>
! ! <currency name="United States Dollar" symbol="$" code="USD"/>
! ! <currency name="Australian Dollar" symbol="$" code="AUD"/>
! ! <currency name="Canadian Dollar" symbol="$" code="CAD"/>
! ! <currency name="Czech Koruna" symbol="Kč" code="CZK"/>
...
! </currencies>
! <rates>
! ! <rate from="USD" to="AUD" rate="1.168750"/>
! ! <rate from="USD" to="CAD" rate="1.085000"/>
! ! <rate from="USD" to="CZK" rate="20.107500"/>
! ! <rate from="USD" to="DKK" rate="5.323750"/>
...
</rates>
</currencyConfig>
price:[$10.00 to $50.00]

price:[10.00USD to 50.00USD]

price:20.00EUR
MoneyFieldType.java:

@Override
public Query getRangeQuery(QParser parser, SchemaField field, String part1, String part2,
final boolean minInclusive, final boolean maxInclusive) {
final MoneyValue p1 = MoneyValue.parse(part1, defaultCurrency);
final MoneyValue p2 = MoneyValue.parse(part2, defaultCurrency);

if (!p1.getCurrencyCode().equals(p2.getCurrencyCode())) {
throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
new ParseException("Cannot parse range query " + part1 + " to " + part2 +
": range queries only supported when upper and lower bound have same
currency."));
}

String currencyCode = p1.getCurrencyCode();


final MoneyValueSource vs = new MoneyValueSource(field, currencyCode, parser);

return new SolrConstantScoreQuery(new ValueSourceRangeFilter(vs,


p1.getAmount() + "", p2.getAmount() + "", minInclusive, maxInclusive));
}
Replication gotcha
SOLR-2202
Related Searches
Autosuggest!
bjewlery dewelry ejewelry ejwelry ewelery ewerly ewlery fewelry
fewlery fjewelery fjewelry gewerly gewlery hewelery hewelry hewerly
hewlery hjewelry iewelry ijewelry jawelery jawlery jeawlery jeelery
jeelry jeewelery jeewelry jeewlery jeewlry jefwelry jejelry jelelry
jelery jellery jelwelery jelwelry jelwlery jemelry jemerly jemwelry
jeqwelry jerelery jerelry jerely jererly jerlery jerwelery jerwelry
jerwely jerwerly jeselery jeselry jevelry jeverly jewalery jewdelry
jewedlry jeweelrry jeweelry jeweely jeweer jeweery jeweilry jeweiry
jewejery jewejlry jewejrly jewejry jewekey jewekry jewelary jeweldy
jewele jewelee jewelelry jewelera jewelerey jewelerly jewelert
jewelerty jeweleru jeweleruy jeweleryl jewelerys jeweleryy jewelet
jewelety jeweleya jewelfry jewelfy jeweliy jewellryp jewelltry
jewelly jewelory jewelra jewelray jewelre jewelree jewelreyy
jewelrfy jewelrh jewelri jewelrky jewelrly jewelrr jewelrs jewelrsy
jewelrt jewelrty jewelru jewelruy jewelrye jewelryh jewelryl
jewelrym jewelryr jewelrys jewelryt jewelryu jewelryuk
jewelryy jewelrz jewelsry jewelsy jeweltry jewelty jewelw jewelwery
jewelwey jewelwy jewelya jewelyj jewelyr jewelyry jewelyu jewelyy
jewelzry jeweory jewerey jeweriy jewerky jewerlary jewerley jewerli
jewerlly jewerls jewerlt jewerlu jewerlyh jewerlyr jewerlys jewerlyu
jewerry jeweryl jewetry jewewlry jewewly jewewrly jewewry jeweylry
jewiery jewilary jewkery jewlary jewledy jewleery jewlelery jewlely
The TermDictionary is not a whitelist

You might also like