Rapid Prototyping With Solr

Rapid Prototyping with Solr
Erik Hatcher, Lucid Imagination erik.hatcher @ lucidimagination.com, May 25, 2011
Abstract
Got data? Let's make it searchable! This interactive presentation will demonstrate getting documents into Solr quickly, will provide some tips in adjusting Solr's schema to match your needs better, and finally will discuss how showcase your data in a flexible search user interface. We'll see how to rapidly leverage faceting, highlighting, spell checking, and debugging. Even after all that, there will be enough time left to outline the next steps in developing your search application and taking it to production.
My Background
Erik Hatcher
Lucid Imagination
Technical Staff
Co-author
Java Development with Ant / Ant in Action (Manning) Lucene in Action (Manning)
Apache Software Foundation

Committer Lucene / Solr PMC Lucene TLP Member
Why prototype?
Demonstrate Solr can handle your data and searching needs; mitigate risk, learn the unknown Its quick and easy, with very little time investment Immediate functional user interface impresses decision makers and target users; get buy-in
The user interface IS the app
Prior Art
Hoss amazing ISFDB work
http://www.lucidimagination.com/blog/tag/isfdb/
Previous Rapid Prototyping with Solr presentations

Data.gov Catalog on Solr: http://www.lucidimagination.com/blog/2010/11/05/data-govon-solr/ Rich text files on Solr: http://www.lucidimagination.com/Community/Hear-fromthe-Experts/Podcasts-and-Videos/Rapid-PrototypingSearch-Applications-Solr CSV (conference attendee data) on Solr: http://www.slideshare.net/erikhatcher/rapid-prototypingwith-solr-4312681
Rapid Prototyping using CSV

Fired up Solrs example configuration /update/csv
http://localhost:8983/solr/update/csv? commit=true&stream.file=EuroCon2010.csv&fieldnames=fi rst,last,company,title,country&header=true&f.country.map =Great+Britain:United+Kingdom
Tweak configuration
schema: domain-centric field names solrconfig: /browse request handler Template adjustments
Instant classic search results view, tree map visualization of facet data, and random selection of contest winners
7
CSV results
using rich text files

curl "http://localhost:8983 /solr/update/extract? stream.file=/docs/file.pdf &literal.id=/docs/file.pdf
using Data.Gov catalog data

/update/csv again!
10
Explaining
11
Suggest
12
Venn Viz
13
E-commerce data
http://bbyopen.com/ Product data, via easy HTTP JSON API
14
Ingesting the data

require 'solr! #...! 1.upto(max_pages) do |page|! puts "Processing page #{page}"! json = fetch_page(page)! ! response = JSON.parse(json, :symbolize_names=>true)! puts "Total products: #{response[:total]}" if page == 1! ! mapping = {! :id => :sku,! :name_t => :name,! :thumbnail_s => :thumbnailImage,! :url_s => :url,! :type_s => :type,! :category_s => Proc.new {|prod| ! prod[:categoryPath].collect {|cat| cat[:name]}.join(' >> ')},! :department_s => :department,! :class_s => :class,! :subclass_s => :subclass,! :sale_price_f => :salePrice! }! ! Solr::Indexer.new(response[:products], mapping, ! {:debug => debug, :buffer_docs => 500}).index! end!
15
solr-rubys secret power

Solr::Indexer.new( source, mapping, options ).index Quacks like a duck source simply #eachs mapping simply #[]s
16
on Prism
17
What is Prism?
Yet another opinionated brainstorm from Erik https://github.com/lucidimagination/Prism Under the covers
Ruby
because its beautiful
Sinatra
to be lightweight and have elegant flexible routing
Velocity
because it is easy to learn and use, and has powerful features, facilitates edit/refresh work
Separate from Solr, Rack-savvy, allows easy coding of new routes and capabilities Designed to work with any arbitrary Solr instance, and already has some basic LucidWorks Enterprise capability Totally a proof-of-concept at this point just a quick hack
18
on Solritas
19
Solritas?
Pronounced: so-LAIR-uh-toss Celeritas is a Latin word, translated as "swiftness" or "speed". It is often given as the origin of the symbol c, the universal notation for the speed of light - http:// en.wikipedia.org/wiki/Celeritas Technically its the VelocityResponseWriter (wt=velocity)
simply passes the Solr response through the Apache Velocity templating engine
http://wiki.apache.org/solr/VelocityResponseWriter Built into Solr, available instantly out of the box at: http://localhost:8983/solr/browse
20
on Blacklight
21
Blacklight?
http://projectblacklight.org/ Blacklight is a free and open source Ruby on Rails based discovery interface (a.k.a. next-generation catalog) especially optimized for heterogeneous collections. You can use it as a library catalog, as a front end for a digital repository, or as a single-search interface to aggregate digital content that would otherwise be siloed. Production sites:
http://search.lib.virginia.edu/ http://searchworks.stanford.edu/ Features: Authentication Saved searches Bookmarks saved result items Selected items for exporting to 3rd party systems Customizable / extensible UI
22
Prototyping Tips and Tools

Get data into Solr in the simplest possible way
CSV if it fits, its really nice
Schema adjusting
<dynamicField name="*" type="string" multiValued="true"/> <copyField source="*" dest="text"/>
Data analysis
Understand what Solr is doing with your fields Solrs Schema Browser and /admin/luke request handler
UI
/browse easy tweaking of <solr-home>/conf/velocity/*.vm templates
23
Now what?
Script the indexing process: full and incremental/delta Work with real users on real needs Integrate into production systems Iterate on schema enhancements and configuration tweaks Deploy to staging/production environments and work at scale: collection size, real queries and volume, hardware and JVM settings
24
Test
Performance Scalability Relevance Automate all of the above, start baselines, avoid regressions
25
Thanks!
26

Rapid Prototyping With Solr

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Rapid Prototyping With Solr

Uploaded by

Copyright:

Available Formats

Rapid Prototyping with Solr

Erik Hatcher, Lucid Imagination erik.hatcher @ lucidimagination.com, May 25, 2011

Apache Software Foundation

Previous Rapid Prototyping with Solr presentations

Rapid Prototyping using CSV

using rich text files

using Data.Gov catalog data

Ingesting the data

solr-rubys secret power

Prototyping Tips and Tools

You might also like