DBA Tut09

CRUD Operations and search query in ElasticSearch (E.S.
(*) Idea of tutorial is written in http://www.ponybai.com/elastic/instructions/labs.html#lab3

(**) References why type is removed in ES >7:
https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-
types.html#_index_per_document_type
For Windows installation: Refer to this guideline to install ES and Kibana

https://www.elastic.co/start
1. Dev tools
Go to Elasticsearch running page by typing localhost:9200 first in the browser.
Figure 1: Elastic booted up
Then go to localhost:5601/.
Open Dev tools console in Elastic Search by typing Dev tools in Search bar
Figure 2: Open Dev tools by typing Dev tools in Search Elastic
The first time running ElasticSearch, there are only indexes from existing ES. No indexes from
outside. Indexes in ES are similar to tables in relational database.
3. Try some commands

Try some commands by typing _search in Elastic to see what will be displayed
Figure 3: Typing _search in elastic
It lists the indexes used in ES. Index in ES is similar to table in mysql

Figure 4: Result
Below the previous _search request, enter the following request to get basic cluster and node
information.
Figure 5: Enter / to get node information

Figure 6: Node information
To get list of indexes in the system, running
Figure 7: Get list of indices in the system
Figure 8: List of indices
Imagine a dataset that is row oriented (e.g. spreadsheet or a traditional databases). How
would you write a JSON document based on sample entries that look like the following? Think
about field structure and empty fields:
Figure 9: Item for the first row
Figure 10: Item for the second row
Now that we have defined the documents, let's index them. Notice that the id field defined
inside the documents is just a normal field like any other. The actual document id is defined in
the request URL when you index the documents. Index both JSON documents into
the my_blogs index. Use _doc for the type and their respective ids. Use PUT method to add
document with id, in contrast to POST
Figure 11: Add index
Figure 12: Result
The index operation can also be executed without specifying the id. In such a case, you should
use a POST instead of a PUT and an id will be generated automatically.
Index the following document without an id and check the _id in the response. Make sure you
use POST.
Figure 13: Using POST to create document without id
Use a GET command to retrieve the document with id of 1 and type _doc from

the my_blogs index.
Figure 14: Output of GET specific document
Delete document with id = 2 in ElasticSearch and re-verify document

Figure 15: Delete in ElasticSearch
Reverify index
Figure 16: Reverify index
Figure 17: Result
Finally, delete the my_blogs index
Figure 18: Delete index
Get information about mappings and data of my_blogs index.
Figure 19: Re-get index
If found
Figure 20: If mappings found (before deletion)
Figure 21: Mappings not found

Lab 3: Query data
Objective: In this lab, you will write various queries that search documents in the ecommerce
index using Search API. You will use queries like match, range, and bool.
Navigate to home page, and add sample data
Figure 22: Sample elasticsearch data
Choose “Try sample data” tab and then choose “Sample Ecommerce orders”. This allow user
to add ecommerce data for trying
Figure 23: Sample Ecommerce order Elasticsearch
Figure 24: Get indexes inside Elasticsearch
Figure 25: Kibana_sample_data_ecommerce index found
Now come to the main part – querying
In Elasticsearch, there are 2 types of searching

 Term level query: Is used to search for precise value such as price, product_Id and
username. Use this to search for keyword field or searching for exact value.
 Full-text search query: Is used to search for text, description, v.v., include options for
fuzzy matching. Returns documents that match a provided text, number, date or boolean
value. The provided text is analyzed before matching.
Term level query example:
Figure 26: Term level query
Result of term
Figure 27: Searching by term-level query result

Full-text query example
1. Write command matches all documents, giving them all a _score of 1.0
Figure 28: Get all documents
2. Add the "size" parameter to your previous request and set it to 100. You should now
see 100 “kibana_sample_data_ecommerce” in the results. This is similar to “LIMIT” in
SQL.
_search means get inner data added. By default, Elasticsearch returns 10 available result
Figure 29: Specify size to get data
Use “size”, elastic will return 100 documents on return.
3. Range in ElasticSearch
Can be specified by using “range” key with “gte (>=)” and “lte (<=)”. To specify “< and >”, use
“lt” and “gt”.
Figure 30: Using range in Elasticsearch
4. Write and execute a match query for ecommerce that have the term "Eddie" in
the "customer_full_name" field.
Full text search includes simple “text” level query, represented by “query – match” and
advanced “bool” logic search (represented by “query – bool”)
Figure 31: Search contain in elastic search

In query above, if only “Eddie” exists in customer_full_name field, then result is returned
Figure 32: A result contain Eddie in customer_full_name field
If user wants to find exact match with both words exists in single field, using operator “and”
with “query”. Using “and” force fields must have 2 searched values exist in field. If “and” is not
exist, default is “or” operator (not specified in “operator” field – see below)
Figure 33: Search "Eddie" and "Underwood" in customer_full_name, both must exists in customer_full_name field
Figure 34: Result will contain both Eddie and Underwood
Without “and”, ElasticSearch will search customer_full_name to have Diane or Underwood.

With “and”, ES will search customer_full_name to have Diane and Underwood, include “Diane
Underwood” and “Underwood Diane” (not in order). In order for searching with order, refer
below
Excercise: Write query for the following cases

 Searched items that have the word "high" in their "product_name" field.
 Searched items that have "high" or "heeled" in their "product_name" field.
 Searched items that have "high " and "heeled " in their "product_name" field.
Write your answer here
With match_phrase search needs to match the following criteria
 all the terms must appear in the field

 they must have the same order as the input value
For example, if you index the following documents (using standard analyzer for the field foo):
Figure 35: Documents to be indexed
match_phrase will return the first and second documents
Figure 36: Match phrase
Search match_phrase example

Figure 37: Match_phrase
Figure 38: Match phrase
minimum_should_match: includes at least X terms
Figure 39: Minimum should match

Minimum_should_match will return collections with at least 2 of the terms: short & coat &
white/black. For example: short coat || short white-black.
Boolean query: must / must_not: Result must/must not contain terms. For example
Figure 40:Must and must not
Explanation: Must and must not: Result must match products.product_name contain at least
2 terms; however, result returned must not contain oil or green.
Figure 41: Must/must not

Exercise: It looks like releases usually come with performance optimizations and
improvements. Assuming you do not want to upgrade your deployment, write the query so
that it must not contain navi, but have at least 2 terms: "Short coat - white/black",
Write your answer here
should: If these clauses match, they increase the _score; otherwise, they have no effect
Figure 42: Using should increase score
If using should, product_name which contain dark has _score higher. Score higher means it
stays on the top of result.
Figure 43: Score increased by using should
Lab 4: Read index mappings and define own mapping
At this point, we do not use existing data from ecommerce dataset, we will create a new index
for the blog type, the following data is used
Figure 44: Sample dataset
Now we will create a new index call tmp_blogs with type = _doc and id = 1
Figure 45: _doc with id = 1
In order to view mapping of each doc, type _mapping
Figure 46: _mapping

Figure 47: Fragment of mapping
The mapping keyword fields type is text and keyword. It means it can be broken down into
words. With keyword. If keyword fields type is “keyword”, it’s kept as it. For example,
 “Roosters crow everyday” => “Rooster”, “crow”, “everyday”  Text
 “Roosters crow everyday” => “Roosters crow everyday”  Keyword
Let's analyze every field mapped:

 The publish_date field is correctly mapped as date.
 The author field is a string and is mapped as text and keyword type, which is great as
we can search for authors in individual text and whole string text, but also with sorting
and aggregation (discussed later) them. Similarly, the fields title and URL are mapped
correctly as we may want to search and sort on those fields.
 locales array is an array of fixed strings and there is no need to search on it, so we
could update it to only keyword.
 The category field could sometimes be used to search, but in general it is a drop-down
menu or a list in the website. That means we can also map it as a keyword only field.
 We will never use the content field to sort or aggregate, only search. So, we can change
the mapping to only be text.
Now delete and recreate with tmp_blogs index
DELETE tmp_blogs
PUT tmp_blogs
{
"mappings": {
"properties": {
"publish_date": {
"type": "date"
},
"author": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"category": {
"type": "keyword"
},
Figure 48: Mapping (1)

"content": {
"type": "text"
},
"locales": {
"type": "keyword"
},
adasd
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
Figure 49: Mapping (2)
Note that using “fields”: a field in ElasticSearch can be mapped into several types, not just one
type.
Result is
Figure 50: Result after creating index with mappings
Re-index a record again
Figure 51: Re-index
Figure 52: output
Search for engineering in the category field

Figure 53: Search engineering category
Result is empty, because type = “keyword” is used, it means it must match exactly as typed. If
you change category keyword value as below
Figure 54: Query with keyword type
Result is
Figure 55: Result
Text analysis - Analyze something by running _analyzer
Figure 56: Default analyzer of Elasticsearch: standard
By default, ElasticSearch will use standard analyzer for analyze text for searching. Here in this
sentence: The text is broken down into smaller tokens [Introducing, beta, releases,
Elasticsearch, ‘Kibana’, Docker, images]  Tokenizer step completes
Next, standard analyzer apples lowercase token filter to searched string. Therefore, all token
will be transferred to lowercase.
See more at: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-

standard-analyzer.html
Figure 57: Elasticsearch token
If using a different analyzer called whitespace
Figure 58: Whitespace analyzer

Figure 59: Whitespace analyzer
In whitespace analyzer, punctuation is not removed and text is not in lowercase. Here comes the
colon(:)
See more analyzer engine here:

https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-analyzers.html
Analyzer Meaning
Standard (Default) Standard analyzer cut words into several smaller terms, remove
punctuation, lowercase and support removing stop word – disabled by
default but can be turned on (a, off, many, …)
Eg: The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.
 [2, quick, brown, foxes, jumped, over, lazy, dog's, bone]
Simple Divide text into terms whenever it encounters a character which is not
a letter, transform text to lower case.
 [ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]
Whitespace Divide text into terms whenever it encounters a character which is
space
Stop Same as “simple”, but remove stop word (a/an/the…)
 [quick, brown, foxes, jumped, over, lazy, dog, s, bone]
Keyword Same as text input
 [The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.]
Pattern Search by Regex pattern, default to \W+
Language Search by language-specific criteria. Criteria is configured by yourself
…
Lab 7: Improving search result

Using multi_match query if you want to search both fields
Figure 60: Using multi_match query

Here, using multi_match query searches both product category and product_name for
“shoes”. After running this command
Figure 61: Result contain shoes in product name or category
 Boosting field by giving a field “boost” - count more towards the relevance score. Default
boost of field is 1.0
Figure 62: Giving boost to field category of product
Score of top hits will be multiplied by 2.

 Using _source field to get field returned, this is similar to SELECT <field> from SQL
Figure 63: Using _source to specify which field to take
Figure 64: Output
 Sorting field: Using sort [], similar to ORDER BY in SQL.

Figure 65: Using sort to sort field
Exercise:
1. Modify your previous query so that the results are sorted first by email ascending, then
created_on from newest to oldest.
2. Our web application only shows three blog hits at a time. Modify the previous query so
that it only returns the top 3 hits.
3. Suppose a user clicks on page 4 of the search results from your previous query. Write a
query that returns the 3 hits of page 4.
Hint: Using from, along with size. From is similar to OFFSET in SQL.
Lab 8: Fixing data, introducing “scripts” (Optional)

Another way to update, similar to Lab 3 above, is using script updates. Script update is used to
update a field with complex operations inside that field. For example, increase targeted field
by existing value plus 1.
First, we will create a new index for testing, which doesn’t harm original index that we have
already created.
Figure 66: Reindex data
Using “”” for multiline script
Here, we are creating a new index based on existing data that we have already created named
kibana_sample_data_ecommerce_fixed. After running this command, we create a
new field in the _source, namely number_of_views and reindexBatch
Figure 67: Number of views and reindexBatch fields are created
Behind the scene, query in Figure 9 will look at the _source field, then add 2 fields
reindexBatch and number_of_views, with initial values set to 1 and 0 respectively.
2. Using _update command in ElasticSearch, increased existing number_of_views to 41

with _id = zw6hiXYBJeOyv4l3qtHH. The result will return number 41 to
number_of_views field
Figure 68: Update number_of_views to 41
After updating, we need to check whether number_of_views is increased by 41 or not by

running this command
Figure 69: Search by id and get number_of_views field
Figure 70: Result

DBA Tut09

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DBA Tut09

Uploaded by

Copyright:

Available Formats

CRUD Operations and search query in ElasticSearch (E.S.

(*) Idea of tutorial is written in http://www.ponybai.com/elastic/instructions/labs.html#lab3

For Windows installation: Refer to this guideline to install ES and Kibana

Figure 1: Elastic booted up

3. Try some commands

Figure 3: Typing _search in elastic

It lists the indexes used in ES. Index in ES is similar to table in mysql

Figure 5: Enter / to get node information

To get list of indexes in the system, running

Figure 7: Get list of indices in the system

Figure 8: List of indices

Figure 10: Item for the second row

Figure 12: Result

Use a GET command to retrieve the document with id of 1 and type _doc from

Figure 14: Output of GET specific document

Delete document with id = 2 in ElasticSearch and re-verify document

Figure 16: Reverify index

Figure 17: Result

Finally, delete the my_blogs index

Figure 18: Delete index

Get information about mappings and data of my_blogs index.

Figure 19: Re-get index

Figure 21: Mappings not found

Navigate to home page, and add sample data

Figure 22: Sample elasticsearch data

Figure 24: Get indexes inside Elasticsearch

Figure 25: Kibana_sample_data_ecommerce index found

Now come to the main part – querying

In Elasticsearch, there are 2 types of searching

Term level query example:

Figure 26: Term level query

Figure 27: Searching by term-level query result

Figure 28: Get all documents

Figure 29: Specify size to get data

Use “size”, elastic will return 100 documents on return.

Figure 31: Search contain in elastic search

Figure 32: A result contain Eddie in customer_full_name field

Figure 34: Result will contain both Eddie and Underwood

Without “and”, ElasticSearch will search customer_full_name to have Diane or Underwood.

Excercise: Write query for the following cases

Write your answer here

With match_phrase search needs to match the following criteria

 all the terms must appear in the field

Figure 35: Documents to be indexed

match_phrase will return the first and second documents

Figure 36: Match phrase

Search match_phrase example

Figure 38: Match phrase

minimum_should_match: includes at least X terms

Figure 39: Minimum should match

Figure 40:Must and must not

Figure 41: Must/must not

Figure 42: Using should increase score

Figure 44: Sample dataset

Figure 45: _doc with id = 1

In order to view mapping of each doc, type _mapping

Figure 46: _mapping

Let's analyze every field mapped:

Figure 48: Mapping (1)

Figure 49: Mapping (2)

Figure 50: Result after creating index with mappings

Re-index a record again

Figure 51: Re-index