You are on page 1of 26

DBSight

Instant Scalable Full Text Search


on Any Databases
Some Customers
• Websites
• Workopolis.com, biggest job site in Canada
• Twenga.com, busy
• Current.com
• Corporate
• eBay
• Costco
• FactSet
• Genetech
• Consulting
• Computer Science Corporation
• Federal
• Federal Procurement Data System
• Banking
• European Central Bank, International Counterfeit Deterrence
Centre(ECB-ICDC)
Features
• Add Search when you want to
• Database Independent
• Language Independent
• Very Easy to Create
• Very Easy to Modify
• Very Easy to Monitor/Maintain
• Feature Rich
• Facet Search
• Real Time Search
• Flexible ranking
• …
• High Performance
• Linearly Scalable
The Pain to Add Search
• Do you have it? Common, but slow SQL,
select * from tableA
where column1 like ‘%abc%’
or column2 like ‘%abc%’
or …
• Some database has search features, but still
• Not easy to customize
• No facet search
• Database specific solution
• Some Lucene based search, but
• Not cover development life cycle
• Hard to maintain.
• Will it scale if your data grows?
• Too closely coupled with your program
DBSight Design Goals
• Very fast to create and adjust
• “Knobs” to tune search
• Features beyond basic search
o Facet Search
o Results ranking by attributes like “price”, “time”!
• Minimal administration
• Low total cost of ownership
• Off-the-shelf, no expensive consulting fees
• Flexible
• Customizable analyzers, similarities, data retrieval, UI
scaffolding, different output formats, APIs.
Why DBSight? – Simple!
• Create Search as 1-2-3
1. Setup JDBC Connection
2. Select with SQL
3. Generate Search Configuration,
Results Template
• Java knowledge is not necessary
Simple to Create
• Web UI to
o set SQL to retrieve content
Simple to Create
• Scaffolding to generate search result
template
Simple to Create
• Keep it DRY – Do not Repeat Yourself
• Make full use of existing metadata
• From your SQL
• Generate most Lucene configuration
• Generate most search options
• Generate most rendering templates
• UI to fine-tune customization
Why DBSight? – Not So Simple!
• A whole Solution
• No consulting fees
• UI to manage everything
• Set it up, and leave it run. No babysitting.
• Agnostic of programming languages or frameworks
• Create and maintain, with basic SQL
• Built-in Usage Statistics
• Scalable
• Linearly Scalable Sharded Search
• Separated Indexing and Searching
• Many Customization point
• Search Results templates easy to customize
• API for deep integration
DBSight covers SDLC
• No existing production ready solutions in
the market covers the whole software
development life cycle.
o DBSight make search a separated concern
o Change easily when database schema changes
o Enterprise Ready
 Monitoring
 Portable for Dev=>Test => Stage => Production
environments
DBSight – Loosely Coupled
• Database Independent
• Programming language Independent
• Framework Independent
• Works during the whole software life cycle
• Allows frequent adjustment
• Easily re-create the whole index
• Linearly scalable for high concurrency and for
high data volume
DBSight – Enterprise Ready
• Easy to move deployment environments
o Development, Testing(QA), Staging, Production
• Secure
o access control
o sensitive database passwords
• Package-able Solution
o Import/Export configuration
o Customizable enterprise-specific scaffolding
DBSight – SQL Friendly
• Incremental Indexing
o Handle New/Updated/Deleted records
o Find Deleted Records Efficiently!
 Support Hard Deleted Records
 Support Soft Deleted Records
• Flexible User Defined SQL
o Star-schema like content retrieval
o No too-smart auto discovery
• Efficient
o Multi-threaded data retrieval
o Caching to minimize database load
o Customizable number of SQL connections
Easy to maintain
• Scheduled Jobs
o Incremental indexing
o Re-Create indexing
o Build spell checking, synonyms, stop words
dictionaries
• Web UI to
o Monitor Indexing process
o Monitor Search Usage
Facet Search
• Not all facet search are the same!
• Single Value
• Multiple Value
• Range Value
• Fast!
• Memory Efficient!
• More Features!
• List most-used facet according to usage!
• Sum()/Avg() functions
Fast Facet Search
• In Memory Facets
• Several cache for More Speed
• Cache for top facets
• Cache for recent facets
• Automatically Pre-warm up
Examples of Facet Search
• Basic Facet Search
• Example: Category
• Classic ( 23 matches)
• Multi-Valued Facet Search
• Example: Tag, or Tag Cloud. Several tags for one record
• Configurable dynamic facet grouping
o Example: Price Range
 $2 ~$3 (7 matches)
• Average/Sum for each facet
o Example: Average price for the Year Range
 1970~1980(6 matches, average price $1,245.34)
Feature: Seperated Indexing and
Searching
• Problem: Search Pauses!
o Indexing is CPU and Disk intensive
o Searching is CPU, Disk, and Memory intensive
o Resource competition

• Solution: Separated Indexing and Searching


o Different JVM processes
 Easier to manage resources via JVM settings
o Indexing and Searching can be on different machines.
 Improve performance
 No hiccup because of CPU, memory, disk resources
contension.
 Cluster of Searching nodes for better scalable performance!
DBSight Architecture
• Crawl database via user defined SQL
• Multiple database tables
• Support star-schema like outer joins
• Create and Maintain Lucene index
• Incremental Indexing
• Re-Creating Indexing
• Serve Search results via
• user defined templates
• XML/HTML/JSON/JSONP
• API, protocol buffer for Java and other languages
• Linear Scalability
DBSight Common Setup
• Embedded
• Multiple Indexes on single node
• Single Node, Indexing + Searching
• Two Nodes, Separated Indexing and Searching
• Setup for LAN
• Setup for WAN
• Cluster of Searching nodes via Replication, one
Indexing nodes
• Cluster of Sharded Nodes, each with Indexing
+Searching
• Cluster of Searching nodes via Sharding, one or
several indexing nodes.

You might also like