You are on page 1of 37

CRUD Operations and search query in ElasticSearch (E.S.

(*) Idea of tutorial is written in http://www.ponybai.com/elastic/instructions/labs.html#lab3


(**) References why type is removed in ES >7:
https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-
types.html#_index_per_document_type

For Windows installation: Refer to this guideline to install ES and Kibana


https://www.elastic.co/start

1. Dev tools
Go to Elasticsearch running page by typing localhost:9200 first in the browser.

Figure 1: Elastic booted up

Then go to localhost:5601/.

Open Dev tools console in Elastic Search by typing Dev tools in Search bar
Figure 2: Open Dev tools by typing Dev tools in Search Elastic

The first time running ElasticSearch, there are only indexes from existing ES. No indexes from
outside. Indexes in ES are similar to tables in relational database.

3. Try some commands


Try some commands by typing _search in Elastic to see what will be displayed

Figure 3: Typing _search in elastic

It lists the indexes used in ES. Index in ES is similar to table in mysql


Figure 4: Result

Below the previous _search request, enter the following request to get basic cluster and node
information.

Figure 5: Enter / to get node information


Figure 6: Node information

To get list of indexes in the system, running

Figure 7: Get list of indices in the system

Figure 8: List of indices

Imagine a dataset that is row oriented (e.g. spreadsheet or a traditional databases). How
would you write a JSON document based on sample entries that look like the following? Think
about field structure and empty fields:
Figure 9: Item for the first row

Figure 10: Item for the second row

Now that we have defined the documents, let's index them. Notice that the id field defined
inside the documents is just a normal field like any other. The actual document id is defined in
the request URL when you index the documents. Index both JSON documents into
the my_blogs index. Use _doc for the type and their respective ids. Use PUT method to add
document with id, in contrast to POST
Figure 11: Add index

Figure 12: Result

The index operation can also be executed without specifying the id. In such a case, you should
use a POST instead of a PUT and an id will be generated automatically.
Index the following document without an id and check the _id in the response. Make sure you
use POST.
Figure 13: Using POST to create document without id

Use a GET command to retrieve the document with id of 1 and type _doc from


the my_blogs index.

Figure 14: Output of GET specific document

Delete document with id = 2 in ElasticSearch and re-verify document


Figure 15: Delete in ElasticSearch

Reverify index

Figure 16: Reverify index

Figure 17: Result

Finally, delete the my_blogs index

Figure 18: Delete index

Get information about mappings and data of my_blogs index.

Figure 19: Re-get index

If found
Figure 20: If mappings found (before deletion)

Figure 21: Mappings not found


Lab 3: Query data
Objective: In this lab, you will write various queries that search documents in the ecommerce
index using Search API. You will use queries like match, range, and bool. 

Navigate to home page, and add sample data

Figure 22: Sample elasticsearch data

Choose “Try sample data” tab and then choose “Sample Ecommerce orders”. This allow user
to add ecommerce data for trying
Figure 23: Sample Ecommerce order Elasticsearch

Figure 24: Get indexes inside Elasticsearch

Figure 25: Kibana_sample_data_ecommerce index found

Now come to the main part – querying

In Elasticsearch, there are 2 types of searching


 Term level query: Is used to search for precise value such as price, product_Id and
username. Use this to search for keyword field or searching for exact value.
 Full-text search query: Is used to search for text, description, v.v., include options for
fuzzy matching. Returns documents that match a provided text, number, date or boolean
value. The provided text is analyzed before matching.

Term level query example:

Figure 26: Term level query

Result of term

Figure 27: Searching by term-level query result


Full-text query example
1. Write command matches all documents, giving them all a _score of 1.0

Figure 28: Get all documents

2. Add the "size" parameter to your previous request and set it to 100. You should now
see 100 “kibana_sample_data_ecommerce” in the results. This is similar to “LIMIT” in
SQL.

_search means get inner data added. By default, Elasticsearch returns 10 available result

Figure 29: Specify size to get data

Use “size”, elastic will return 100 documents on return.

3. Range in ElasticSearch
Can be specified by using “range” key with “gte (>=)” and “lte (<=)”. To specify “< and >”, use
“lt” and “gt”.
Figure 30: Using range in Elasticsearch

4. Write and execute a match query for ecommerce that have the term "Eddie" in
the "customer_full_name" field.

Full text search includes simple “text” level query, represented by “query – match” and
advanced “bool” logic search (represented by “query – bool”)

Figure 31: Search contain in elastic search


In query above, if only “Eddie” exists in customer_full_name field, then result is returned

Figure 32: A result contain Eddie in customer_full_name field

If user wants to find exact match with both words exists in single field, using operator “and”
with “query”. Using “and” force fields must have 2 searched values exist in field. If “and” is not
exist, default is “or” operator (not specified in “operator” field – see below)
Figure 33: Search "Eddie" and "Underwood" in customer_full_name, both must exists in customer_full_name field

Figure 34: Result will contain both Eddie and Underwood

Without “and”, ElasticSearch will search customer_full_name to have Diane or Underwood.


With “and”, ES will search customer_full_name to have Diane and Underwood, include “Diane
Underwood” and “Underwood Diane” (not in order). In order for searching with order, refer
below

Excercise: Write query for the following cases


 Searched items that have the word "high" in their "product_name" field.
 Searched items that have "high" or "heeled" in their "product_name" field.
 Searched items that have "high " and "heeled " in their "product_name" field.

Write your answer here

With match_phrase search needs to match the following criteria

 all the terms must appear in the field


 they must have the same order as the input value
For example, if you index the following documents (using standard analyzer for the field foo):

Figure 35: Documents to be indexed

match_phrase will return the first and second documents

Figure 36: Match phrase

Search match_phrase example


Figure 37: Match_phrase

Figure 38: Match phrase

minimum_should_match: includes at least X terms

Figure 39: Minimum should match


Minimum_should_match will return collections with at least 2 of the terms: short & coat &
white/black. For example: short coat || short white-black.

Boolean query: must / must_not: Result must/must not contain terms. For example

Figure 40:Must and must not

Explanation: Must and must not: Result must match products.product_name contain at least
2 terms; however, result returned must not contain oil or green.

Figure 41: Must/must not


Exercise: It looks like releases usually come with performance optimizations and
improvements. Assuming you do not want to upgrade your deployment, write the query so
that it must not contain navi, but have at least 2 terms: "Short coat - white/black",
Write your answer here

should: If these clauses match, they increase the _score; otherwise, they have no effect

Figure 42: Using should increase score

If using should, product_name which contain dark has _score higher. Score higher means it
stays on the top of result.
Figure 43: Score increased by using should
Lab 4: Read index mappings and define own mapping

At this point, we do not use existing data from ecommerce dataset, we will create a new index
for the blog type, the following data is used

Figure 44: Sample dataset

Now we will create a new index call tmp_blogs with type = _doc and id = 1

Figure 45: _doc with id = 1

In order to view mapping of each doc, type _mapping

Figure 46: _mapping


Figure 47: Fragment of mapping

The mapping keyword fields type is text and keyword. It means it can be broken down into
words. With keyword. If keyword fields type is “keyword”, it’s kept as it. For example,
 “Roosters crow everyday” => “Rooster”, “crow”, “everyday”  Text
 “Roosters crow everyday” => “Roosters crow everyday”  Keyword

Let's analyze every field mapped:


 The publish_date field is correctly mapped as date.
 The author field is a string and is mapped as text and keyword type, which is great as
we can search for authors in individual text and whole string text, but also with sorting
and aggregation (discussed later) them. Similarly, the fields title and URL are mapped
correctly as we may want to search and sort on those fields.
 locales array is an array of fixed strings and there is no need to search on it, so we
could update it to only keyword.
 The category field could sometimes be used to search, but in general it is a drop-down
menu or a list in the website. That means we can also map it as a keyword only field.
 We will never use the content field to sort or aggregate, only search. So, we can change
the mapping to only be text.
Now delete and recreate with tmp_blogs index

DELETE tmp_blogs

PUT tmp_blogs
{
"mappings": {
"properties": {
"publish_date": {
"type": "date"
},
"author": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"category": {
"type": "keyword"
},

Figure 48: Mapping (1)


"content": {
"type": "text"
},
"locales": {
"type": "keyword"
},
adasd
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}

Figure 49: Mapping (2)

Note that using “fields”: a field in ElasticSearch can be mapped into several types, not just one
type.
Result is

Figure 50: Result after creating index with mappings

Re-index a record again

Figure 51: Re-index

Figure 52: output

Search for engineering in the category field


Figure 53: Search engineering category

Result is empty, because type = “keyword” is used, it means it must match exactly as typed. If
you change category keyword value as below

Figure 54: Query with keyword type

Result is
Figure 55: Result

Text analysis - Analyze something by running _analyzer

Figure 56: Default analyzer of Elasticsearch: standard

By default, ElasticSearch will use standard analyzer for analyze text for searching. Here in this
sentence: The text is broken down into smaller tokens [Introducing, beta, releases,
Elasticsearch, ‘Kibana’, Docker, images]  Tokenizer step completes

Next, standard analyzer apples lowercase token filter to searched string. Therefore, all token
will be transferred to lowercase.

See more at: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-


standard-analyzer.html
Figure 57: Elasticsearch token

If using a different analyzer called whitespace

Figure 58: Whitespace analyzer


Figure 59: Whitespace analyzer

In whitespace analyzer, punctuation is not removed and text is not in lowercase. Here comes the
colon(:)

See more analyzer engine here:


https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-analyzers.html

Analyzer Meaning
Standard (Default) Standard analyzer cut words into several smaller terms, remove
punctuation, lowercase and support removing stop word – disabled by
default but can be turned on (a, off, many, …)
Eg: The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.
 [2, quick, brown, foxes, jumped, over, lazy, dog's, bone]
Simple Divide text into terms whenever it encounters a character which is not
a letter, transform text to lower case.
Eg: The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.
 [ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]
Whitespace Divide text into terms whenever it encounters a character which is
space
Stop Same as “simple”, but remove stop word (a/an/the…)
Eg: The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.
 [quick, brown, foxes, jumped, over, lazy, dog, s, bone]
Keyword Same as text input
Eg: The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.
 [The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.]
Pattern Search by Regex pattern, default to \W+
Language Search by language-specific criteria. Criteria is configured by yourself

Lab 7: Improving search result


Using multi_match query if you want to search both fields

Figure 60: Using multi_match query


Here, using multi_match query searches both product category and product_name for
“shoes”. After running this command

Figure 61: Result contain shoes in product name or category

 Boosting field by giving a field “boost” - count more towards the relevance score. Default
boost of field is 1.0

Figure 62: Giving boost to field category of product

Score of top hits will be multiplied by 2.


 Using _source field to get field returned, this is similar to SELECT <field> from SQL
Figure 63: Using _source to specify which field to take

Figure 64: Output

 Sorting field: Using sort [], similar to ORDER BY in SQL.


Figure 65: Using sort to sort field

Exercise:
1. Modify your previous query so that the results are sorted first by email ascending, then
created_on from newest to oldest.
2. Our web application only shows three blog hits at a time. Modify the previous query so
that it only returns the top 3 hits.
3. Suppose a user clicks on page 4 of the search results from your previous query. Write a
query that returns the 3 hits of page 4.
Hint: Using from, along with size. From is similar to OFFSET in SQL.

Lab 8: Fixing data, introducing “scripts” (Optional)


Another way to update, similar to Lab 3 above, is using script updates. Script update is used to
update a field with complex operations inside that field. For example, increase targeted field
by existing value plus 1.

First, we will create a new index for testing, which doesn’t harm original index that we have
already created.
Figure 66: Reindex data

Using “”” for multiline script

Here, we are creating a new index based on existing data that we have already created named
kibana_sample_data_ecommerce_fixed. After running this command, we create a
new field in the _source, namely number_of_views and reindexBatch
Figure 67: Number of views and reindexBatch fields are created

Behind the scene, query in Figure 9 will look at the _source field, then add 2 fields
reindexBatch and number_of_views, with initial values set to 1 and 0 respectively.

2. Using _update command in ElasticSearch, increased existing number_of_views to 41


with _id = zw6hiXYBJeOyv4l3qtHH. The result will return number 41 to
number_of_views field

Figure 68: Update number_of_views to 41

After updating, we need to check whether number_of_views is increased by 41 or not by


running this command
Figure 69: Search by id and get number_of_views field

Figure 70: Result

You might also like