You are on page 1of 28

When SQL is not Enough

…it comes Elasticsearch


About me
 Project Manager @
 13 years professional experience
 .NET Web Development MCPD
 SQL Server 2012 (MCSA)
 External Expert Horizon 2020
 Business Interests
 Web Development, SOA, Integration
 Security & Performance Optimization
 Contact
 ivelin.andreev@icb.bg
 www.linkedin.com/in/ivelin
 www.slideshare.net/ivoandreev

2 |
Agenda

 What
 Why
 Jump start
 Analysis in depth
 Side by side with SQL
 Demo
What is ES
 Powerful real-time search and analytics engine

“…It has a very advanced distributed model, speaks JSON


natively, and exposes many advanced search features,
all seamlessly expressed through JSON DSL…”
Shay Banon – Creator, Founder, CTO
 What else…
 Document-oriented
 Sophisticated RESTful API
 Entirely open source
 Based on Apache Lucene
 Requires JAVA
Popularity (All DB Engines)

All DB Engines Ranking


Popularity (Search Engines)
Who Uses ES
“You don’t learn walk by following
rules. You learn by doing”
(Richard Branson)

First Steps in Elasticsearch


Terms
ElasticSearch RDBMS
Index Database
Type Table
Field Column
Document Row

 Scaling
 Cluster; Node; Shard (Primary/ Replica)
RESTful APIs

 Document APIs POST /[index]/[type] {


“…”,”…” }
 Index, Get, Update, Delete GET /[index]/[type]/[ID] { }
 Bulk API available PUT /[index]/[type]/[ID] {
“…”,”…” }
 Search APIs DELETE /[index]/[type]/[ID]

 Send/Receive JSON
 Basic queries via query string
http://localhost:9200/{indexName}/{type}/_search?q=searchstr&size=100

http://localhost:9200/{index1,index2}/{type}/_search?q=createdby:ivo

http://localhost:9200/_search?q=tag:spam
Query DSL

 Entire JSON object is the Query DSL


 Query
 Full text queries
 Results ordered by relevance
 Every field is searchable
 Filter
 Binary – either a field matches or it does not
 Filters and queries can be nested
 Nesting passes relevance to parents
Query - for full-text search or for any condition
that should affect the relevance score

Filter – for everything else


How To (Filters)
 ES provides 27 filters (Sep 2015)
 Term/Terms filter
{ "term": { "date": "2015-10-10" }}
 Range filter
{"range": {"age": {"gte":20, "lt":30}}}
 Exists/Missing filter
{"exists": {"field": "title"}}
 Bool filter
{"bool": {
"must": { "term": { "folder": "inbox" }},
"must_not": { "term": { "tag": "spam" }}
"should": [{ "term": { "starred": true }}, { "term": { "unread": true }}]
}}
How To (Queries)
 ES provides 38 queries (Sep 2015)
 match query
{ "match": { "tweet": "About Search" }
 multi_match query
{ "multi_match": {
"query": "full text search",
"fields": [ "title", "body" ] }}
 bool query
{ "bool": {
"must": { "match": { "title": "how to make millions" }},
"must_not": { "match": { "tag": "spam" }},
"should": [
{ "match": { "tag": "starred" }},
{ "range": { "date": { "gte": "2014-01-01" }}}
]}}

 fuzzy query
Any index search solution is way better than “LIKE”
How does SQL Full-text Index Work

 Column-level language
 Used by stemmers and tokenizers
 Different columns for different languages
 Language tags are respected (XML, binary)
 Stop words
ALTER FULLTEXT STOPLIST ProductSL
ADD ‘blah' LANGUAGE 1033;
 Thesaurus files
 (i.e. “song”->”tune”)
Inverted Index
ES Analysis Process

 Character filters
 Simplify data (“&” -> “and”, “ü” -> “u”)
 Tokenizers
 Split data into words (terms, tokens)
 Token filters
 Lowercase
 Remove words w/o relevance impact (“a”, “the”)
 Synonyms added
 Stemming
 Reduce to root form (“dogs” -> “dog”)
Analyzers
 FT fields are analyzed into terms to create inverted index
 Configured when index is created
"Set the shape to semi-transparent by calling set_trans(5)"
Analyzer Type Example
Whitespace Set, the, shape, to, semi-transparent, by, calling, set_trans(5)
Standard (Def.) set, the, shape, to, semi, transparent, by, calling, set_trans, 5
Simple set, the, shape, to, semi, transparent, by, calling, set, trans
Stop set, the, shape, to, semi, transparent, by, calling, set, trans
Language (EN) set, shape, semi, transparent, calling, set_trans, 5
Pattern “nonword”:{ “type”: “pattern”, “pattern”:”[^\\w]+” }
Custom Allows combination of Tokenizer[1:1] and TokenFilters[0:N]
Security Remarks

 RAM is Important
 Data structures reside in-memory
 Performance and reliability depend on it
• Be Aware
• No authentication!
• Protect private data alone
• Prevent expensive requests (DoS)
• Protect http://localhost:9200
Side by Side
ElasticSearch SQL Full-text Search
Performance RAM mainly Disk I/O mainly
Licensing Open Source Commercial
Platform Any (Java) Windows Only
Wildcards Yes Partly
FTS Syntax Rich Basic
Extensibility Plugins CLR or custom code
Scale Out Yes No
Relational Integrity No Yes
Security No Yes
FT Search Setup Manual Wizard
Index Update Manual Auto
From SQL to Elasticsearch

 Rivers (deprecated)
 Logstash
 Open source log management tool
 Client libraries
 .NET
 Elasticsearch.Net
 Nest
 Also Java, JS, Perl, Python, Ruby, PHP
Summary

 Not a replacement of RDBMS


 Real-time search applications
 Built for scalability
 Easy to install
 RESTful API and JSON
Deployment (Windows)

 Install Java 
 Download ES zip
 Install
 [ESHome]/bin> service install
 Set ES service to start automatically
 [ESHome]/bin> service manager
 Open in browser http://localhost:9200/
 Plugin Install
 [ESHome]/bin> plugin -i elasticsearch/marvel/latest
 Restart ES
Takeaways

 Tools
 Kopf: https://github.com/lmenezes/elasticsearch-kopf
 Marvel: https://www.elastic.co/products/marvel
 Curl: http://curl.haxx.se/download.html
 JDBC Driver: http://www.java2s.com/Code/Jar/s/Downloadsqljdbc430jar.htm

 Community
 https://discuss.elastic.co
 Getting Started
 http://joelabrahamsson.com/elasticsearch-101/
Sponsors

You might also like