Professional Documents
Culture Documents
Who the hell am I? Jay Hill, Lucid Imagina-on 7 years Lucene experience 4 years Solr experience Author of Lucid Training SME for Lucid Cer-ca-on Who the hell are you? New to search? New to Lucene/Solr? BaKle-tested veterans?
On To The Sinning!!
Sins As Anti-Patterns?!
"Sorta
kinda"
Specify
Nothing
(Sloth)
Creeping
Featurei-s
(Greed)
Blowhard
Jamboree
(Pride)
Boat
Anchor
(Lust)
Not
Invented
Here
(Envy)
Phatware
(GluKony)
Emperor's
New
Clothes
(Wrath)
Sloth!
"We
aren't
really
into
open
source."
Lack
of
commitment
to
Solr
and/or
the
search
applica-on
itself
Not
developing
in-house
Solr
exper-se
Not
paying
enough
aKen-on
to
JVM
sebngs,
garbage
collec-on,
and
RAM
alloca-on.
Sloth!
Neglec-ng
to
get
familiar
with
the
source
code
It
is
open
source
ader
all!
Not
taking
the
-me
to
understand
the
main
parts
of
Solr:
Request
Handlers
Search
components
Query
parsers
Extend
QParserPlugin
class
ValueSource
&
ValueSourceParser
custom
func-ons
New
pseudo-elds
in
4.x
Response
writers
Lucid
Imagina-on,
Inc.
Sloth!
Not
keeping
up
with
new
features
and
developments
in
Lucene
and
Solr
Sloth!
New
features
in
Solr
3.1:
Solr
spa8al
Edismax
query
parser
NOT
experimental!
Dynamic
metadata
extrac-on
via
UIMA
Numeric
range
face8ng
(like
date
face-ng)
Lucene
RAMDirectoryFactory
available
Face-ng
performance
improvements
Spellcheck
and
Terms
components
now
work
for
distributed
search
Suggester
component
beKer
autosuggest!
Can
add
custom
dict.,
phrases,
etc.
Lucid
Imagina-on,
Inc.
Sloth!
New
features
coming
in
Solr
4.x:
Lucene
DocumentWritersPerThread
(DWPT)
Moving
towards
"real
-me"
UpdateHandler
upgrade
to
work
with
real--me
Field
collapsing/grouping
Pivot
facets
SolrCloud
(Zookeeper)
Fuzzy
queries
100
-mes
faster
Pseudo
elds
via
func-ons
Relevancy
func-on
queries:
n,
idf,
docFreq,
norm,
Greed!
Skimping
on
resources
such
as:
RAM
"Here's
a
quarter
buddy,
go
buy
some
RAM!"
Storage
space
You
will
get
what
you
pay
for!
on
the
other
hand,
not
every
company
has
"deep
pockets"
Greed!
Trying
to
"squeeze
by",
indexing
to,
and
searching
on,
the
same
server
Indexing
Indexing
Shards (Indexers)
Slave/Searchers
Searches
Lucid
Imagina-on,
Inc.
Load Balancer
Searches
Greed!
Not
making
the
eort
to
nd
the
right
balance
between
precision
and
recall
Recall:
What
frac-on
of
the
relevant
documents
in
the
collec-on
were
re-
turned
by
the
system?
Precision:
What
frac-on
of
the
returned
results
are
relevant
to
the
informa-on
need?
Greed!
A
few
thoughts
about
relevance:
Get
feedback
from
domain
experts
Is
it
beKer
to
have
lots
of
results
with
less
precision,
or
fewer,
more
targeted
results?
Dierent
sites
will
have
very
dierent
requirements
Pride!
Reinven-ng
the
wheel
"Why
don't
we
just
write
our
own
search
libraries?"
Nobody
has
a
use
case
like
us
right?
"We
need
to
change
the
scoring
algorithms."
Pride!
Thinking
you
can
"do
it
all"
in
Solr
Solr
is
rarely
a
good
choice
as
a
SOR
Consider
other
tools
to
work
with
Solr:
Nutch
Mahout
OpenNLP
Google
Connector
Framework
Your
own
code
Pride!
Stubbornly
refusing
to
use
resources
such
as
the
mailing
lists:
Solr
user
list:
solr-user@lucene.apache.org
Solr
developer
list:
dev@lucene.apache.org
Lucene
user
list:
java-user@lucene.apache.org
LucidFind:
hKp://www.lucidimagina-on.com/search/
Pride!
"I
will
not
yield!"
Trying
to
"win
baKles"
on
the
mailing
lists
Good
Karma
be
a
good
ci-zen
in
the
community
Lust!
Obsessing
over
unimportant
details
too
early
in
the
project
Agile
approach
is
well
suited
to
Solr
development
iterate!
Trying
to
"push
the
envelope"
Necessary
some-mes,
but
it's
not
called
the
"bleeding
edge"
without
reason
"Ease
in"
to
major
changes
Too
much
aKen-on
to
JVM
sebngs
Solr
experts
are
not
usually
JVM/GC
experts
Lust!
"An--greed"
CommiEng
too
many
resources
to
Solr
Make
sure
the
OS
has
plenty
of
RAM
to
cache
les,
etc
"If
one
is
good,
a
dozen
must
be
beKer!"
As
much
as
possible,
try
to
get
a
sense
of
what
your
query
volume
will
be,
and
don't
just
throw
money
at
building
a
monstrous
farm
of
searchers
Solr
has
proven
to
be
much
more
ecient
than
some
large,
commercial
search
solu-ons
Lust!
Blood
from
a
turnip:
Trying
some
absurd
new
technique,
"just
because"
RAMDirectoryFactory
not
a
secret
way
to
faster
indexing/searching
No
disk-backed
persistence
Usually
not
worth
it
but
you
never
know
Research
rst
before
going
"extreme"
Lucid
Imagina-on,
Inc.
Lust!
No
need
to
index
millions
of
docs
for
development
BeKer
to
work
with
small
sets
of
data
while
gebng
started.
Don't
worry
too
much
about
eld
types
as
you
get
started.
Get
data
in
the
index,
then
analyze
and
rene.
"If we had some bacon we could have some bacon and eggs if we had some eggs."
Envy!
Adding
"cool"
features
you
see
on
other
sites,
but
don't
really
need
Keep
it
"lean
and
mean",
especially
to
start
Resist
the
urge
to
include
the
"kitchen
sink"
Envy!
You
too
can
master
dismax!
Don't
be
afraid
of
dismax/edismax
Lots
of
controls
to
learn,
but
also
lots
of
power
Flexibility
to
search
mul-ple
elds
Boost
dierent
elds
Boost
phrase
elds
(pf)
higher
than
query
elds
(qf)
Use
boost
queries
(bq)
and
func-on
queries
(bf)
Most
in-mida-ng
params:
-e
mm
Lucid
Imagina-on,
Inc.
Envy!
Spa-al
search
seems
complicated,
but
major
sites
make
it
look
easy
Now,
in
Solr
3.1
it
is
easy!
You
can:
Store
spa-al
data
in
your
index
Filter
by
distance
Sort
by
distance
Boost/bias
by
distance
Facet
by
distance
Also
consider:
Search-based
naviga-on
such
as
"Show
me
in-stock
items
only"
Lucid
Imagina-on,
Inc.
Gluttony!
Staying
t
and
trim
is
usually
good
prac-ce
when
designing
and
running
Solr
applica-ons
Once
again
keep
it
"lean
and
mean"
A
lot
of
these
issues
cross
over
into
the
Sloth
category
The
eort
needed
to
keep
your
congura-on
and
data
eciently
managed
is
not
considered
important
Don't
lose
control
of
your
congura-on
les
Remove
unnecessary
elements
Version
control
all
congura-on
les
Gluttony!
Slim
down
those
"bloated"
queries:
q="red
shoes"&
accountId=(12343
OR
338899
OR
554443
OR
243445
OR
55442OR
3330899
OR
59927
OR
3888999
OR
549
OR
440293579
34201
OR
339917
OR
300191
OR
339338
OR
109823
OR
679176
OR
31407815
OR
3001756
OR
134322
OR
311123
OR
987888
OR
997181
OR
771819
OR
100292
OR
3389474
OR
5505759
OR
2459577
OR
4499957
OR
1996571
OR
559590
OR
220299
OR
4404872
OR
151510
OR
66017
OR
666
OR
113459
OR
890575
OR
505725
OR
330393
OR
349940
OR
4094994
OR
1245995
OR
2459959
OR
4255909
OR
899955
OR
7878899
OR
100999
)
Gluttony!
Stay
in
shape
Flex
Your
Solr
Muscles!
Keep
up
on
new
features
Training,
when
appropriate
Cer-ca-on
Contribute!
Follow
the
user
lists
Refactor
when
new
features
can
help
Keep
up
to
date
on
new
releases
Wrath!
Wrath
-
usually
synonymous
with
anger,
but
Lets
use
an
older
deni-on
here:
A
vehement
denial
of
the
truth,
both
to
others
and
in
the
form
of
self-denial
and
impaMence.
Step
back
every
now
and
then
and
look
objec-vely
at
your
applica-on
Wrath!
Resist
the
push
to
rush
to
produc-on
Wrath!
Ignoring
new
Solr
releases
OK
to
wait
un-l
a
release
is
proven
But
gebng
too
far
behind
makes
upgrading
more
painful
with
each
release
We
don't
have
-me
to
do
it
right,
but
we
always
have
-me
to
x
it
Wrath!
Ignoring
complaints
about
results
relevance
Disregarding
feedback
from
stakeholders
Remember
the
point
of
your
search
applica-on
is
to
support
the
business,
not
to
"build
cool
stu"
Not
taking
advantage
of
log
les
Consider
mining
log
les,
storing
data
in
rela-onal
DB
for
genera-ng
reports
Capturing
user
queries
and
query
counts
can
be
extremely
useful
Can
also
be
used
for
query-based
autosuggest.
(not
just
indexed
terms)