on Any Databases Some Customers • Websites • Workopolis.com, biggest job site in Canada • Twenga.com, busy • Current.com • Corporate • eBay • Costco • FactSet • Genetech • Consulting • Computer Science Corporation • Federal • Federal Procurement Data System • Banking • European Central Bank, International Counterfeit Deterrence Centre(ECB-ICDC) Features • Add Search when you want to • Database Independent • Language Independent • Very Easy to Create • Very Easy to Modify • Very Easy to Monitor/Maintain • Feature Rich • Facet Search • Real Time Search • Flexible ranking • … • High Performance • Linearly Scalable The Pain to Add Search • Do you have it? Common, but slow SQL, select * from tableA where column1 like ‘%abc%’ or column2 like ‘%abc%’ or … • Some database has search features, but still • Not easy to customize • No facet search • Database specific solution • Some Lucene based search, but • Not cover development life cycle • Hard to maintain. • Will it scale if your data grows? • Too closely coupled with your program DBSight Design Goals • Very fast to create and adjust • “Knobs” to tune search • Features beyond basic search o Facet Search o Results ranking by attributes like “price”, “time”! • Minimal administration • Low total cost of ownership • Off-the-shelf, no expensive consulting fees • Flexible • Customizable analyzers, similarities, data retrieval, UI scaffolding, different output formats, APIs. Why DBSight? – Simple! • Create Search as 1-2-3 1. Setup JDBC Connection 2. Select with SQL 3. Generate Search Configuration, Results Template • Java knowledge is not necessary Simple to Create • Web UI to o set SQL to retrieve content Simple to Create • Scaffolding to generate search result template Simple to Create • Keep it DRY – Do not Repeat Yourself • Make full use of existing metadata • From your SQL • Generate most Lucene configuration • Generate most search options • Generate most rendering templates • UI to fine-tune customization Why DBSight? – Not So Simple! • A whole Solution • No consulting fees • UI to manage everything • Set it up, and leave it run. No babysitting. • Agnostic of programming languages or frameworks • Create and maintain, with basic SQL • Built-in Usage Statistics • Scalable • Linearly Scalable Sharded Search • Separated Indexing and Searching • Many Customization point • Search Results templates easy to customize • API for deep integration DBSight covers SDLC • No existing production ready solutions in the market covers the whole software development life cycle. o DBSight make search a separated concern o Change easily when database schema changes o Enterprise Ready Monitoring Portable for Dev=>Test => Stage => Production environments DBSight – Loosely Coupled • Database Independent • Programming language Independent • Framework Independent • Works during the whole software life cycle • Allows frequent adjustment • Easily re-create the whole index • Linearly scalable for high concurrency and for high data volume DBSight – Enterprise Ready • Easy to move deployment environments o Development, Testing(QA), Staging, Production • Secure o access control o sensitive database passwords • Package-able Solution o Import/Export configuration o Customizable enterprise-specific scaffolding DBSight – SQL Friendly • Incremental Indexing o Handle New/Updated/Deleted records o Find Deleted Records Efficiently! Support Hard Deleted Records Support Soft Deleted Records • Flexible User Defined SQL o Star-schema like content retrieval o No too-smart auto discovery • Efficient o Multi-threaded data retrieval o Caching to minimize database load o Customizable number of SQL connections Easy to maintain • Scheduled Jobs o Incremental indexing o Re-Create indexing o Build spell checking, synonyms, stop words dictionaries • Web UI to o Monitor Indexing process o Monitor Search Usage Facet Search • Not all facet search are the same! • Single Value • Multiple Value • Range Value • Fast! • Memory Efficient! • More Features! • List most-used facet according to usage! • Sum()/Avg() functions Fast Facet Search • In Memory Facets • Several cache for More Speed • Cache for top facets • Cache for recent facets • Automatically Pre-warm up Examples of Facet Search • Basic Facet Search • Example: Category • Classic ( 23 matches) • Multi-Valued Facet Search • Example: Tag, or Tag Cloud. Several tags for one record • Configurable dynamic facet grouping o Example: Price Range $2 ~$3 (7 matches) • Average/Sum for each facet o Example: Average price for the Year Range 1970~1980(6 matches, average price $1,245.34) Feature: Seperated Indexing and Searching • Problem: Search Pauses! o Indexing is CPU and Disk intensive o Searching is CPU, Disk, and Memory intensive o Resource competition
• Solution: Separated Indexing and Searching
o Different JVM processes Easier to manage resources via JVM settings o Indexing and Searching can be on different machines. Improve performance No hiccup because of CPU, memory, disk resources contension. Cluster of Searching nodes for better scalable performance! DBSight Architecture • Crawl database via user defined SQL • Multiple database tables • Support star-schema like outer joins • Create and Maintain Lucene index • Incremental Indexing • Re-Creating Indexing • Serve Search results via • user defined templates • XML/HTML/JSON/JSONP • API, protocol buffer for Java and other languages • Linear Scalability DBSight Common Setup • Embedded • Multiple Indexes on single node • Single Node, Indexing + Searching • Two Nodes, Separated Indexing and Searching • Setup for LAN • Setup for WAN • Cluster of Searching nodes via Replication, one Indexing nodes • Cluster of Sharded Nodes, each with Indexing +Searching • Cluster of Searching nodes via Sharding, one or several indexing nodes.