Professional Documents
Culture Documents
1. Dev tools
Go to Elasticsearch running page by typing localhost:9200 first in the browser.
Then go to localhost:5601/.
Open Dev tools console in Elastic Search by typing Dev tools in Search bar
Figure 2: Open Dev tools by typing Dev tools in Search Elastic
The first time running ElasticSearch, there are only indexes from existing ES. No indexes from
outside. Indexes in ES are similar to tables in relational database.
Below the previous _search request, enter the following request to get basic cluster and node
information.
Imagine a dataset that is row oriented (e.g. spreadsheet or a traditional databases). How
would you write a JSON document based on sample entries that look like the following? Think
about field structure and empty fields:
Figure 9: Item for the first row
Now that we have defined the documents, let's index them. Notice that the id field defined
inside the documents is just a normal field like any other. The actual document id is defined in
the request URL when you index the documents. Index both JSON documents into
the my_blogs index. Use _doc for the type and their respective ids. Use PUT method to add
document with id, in contrast to POST
Figure 11: Add index
The index operation can also be executed without specifying the id. In such a case, you should
use a POST instead of a PUT and an id will be generated automatically.
Index the following document without an id and check the _id in the response. Make sure you
use POST.
Figure 13: Using POST to create document without id
Reverify index
If found
Figure 20: If mappings found (before deletion)
Choose “Try sample data” tab and then choose “Sample Ecommerce orders”. This allow user
to add ecommerce data for trying
Figure 23: Sample Ecommerce order Elasticsearch
Result of term
2. Add the "size" parameter to your previous request and set it to 100. You should now
see 100 “kibana_sample_data_ecommerce” in the results. This is similar to “LIMIT” in
SQL.
_search means get inner data added. By default, Elasticsearch returns 10 available result
3. Range in ElasticSearch
Can be specified by using “range” key with “gte (>=)” and “lte (<=)”. To specify “< and >”, use
“lt” and “gt”.
Figure 30: Using range in Elasticsearch
4. Write and execute a match query for ecommerce that have the term "Eddie" in
the "customer_full_name" field.
Full text search includes simple “text” level query, represented by “query – match” and
advanced “bool” logic search (represented by “query – bool”)
If user wants to find exact match with both words exists in single field, using operator “and”
with “query”. Using “and” force fields must have 2 searched values exist in field. If “and” is not
exist, default is “or” operator (not specified in “operator” field – see below)
Figure 33: Search "Eddie" and "Underwood" in customer_full_name, both must exists in customer_full_name field
Boolean query: must / must_not: Result must/must not contain terms. For example
Explanation: Must and must not: Result must match products.product_name contain at least
2 terms; however, result returned must not contain oil or green.
should: If these clauses match, they increase the _score; otherwise, they have no effect
If using should, product_name which contain dark has _score higher. Score higher means it
stays on the top of result.
Figure 43: Score increased by using should
Lab 4: Read index mappings and define own mapping
At this point, we do not use existing data from ecommerce dataset, we will create a new index
for the blog type, the following data is used
Now we will create a new index call tmp_blogs with type = _doc and id = 1
The mapping keyword fields type is text and keyword. It means it can be broken down into
words. With keyword. If keyword fields type is “keyword”, it’s kept as it. For example,
“Roosters crow everyday” => “Rooster”, “crow”, “everyday” Text
“Roosters crow everyday” => “Roosters crow everyday” Keyword
DELETE tmp_blogs
PUT tmp_blogs
{
"mappings": {
"properties": {
"publish_date": {
"type": "date"
},
"author": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"category": {
"type": "keyword"
},
Note that using “fields”: a field in ElasticSearch can be mapped into several types, not just one
type.
Result is
Result is empty, because type = “keyword” is used, it means it must match exactly as typed. If
you change category keyword value as below
Result is
Figure 55: Result
By default, ElasticSearch will use standard analyzer for analyze text for searching. Here in this
sentence: The text is broken down into smaller tokens [Introducing, beta, releases,
Elasticsearch, ‘Kibana’, Docker, images] Tokenizer step completes
Next, standard analyzer apples lowercase token filter to searched string. Therefore, all token
will be transferred to lowercase.
In whitespace analyzer, punctuation is not removed and text is not in lowercase. Here comes the
colon(:)
Analyzer Meaning
Standard (Default) Standard analyzer cut words into several smaller terms, remove
punctuation, lowercase and support removing stop word – disabled by
default but can be turned on (a, off, many, …)
Eg: The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.
[2, quick, brown, foxes, jumped, over, lazy, dog's, bone]
Simple Divide text into terms whenever it encounters a character which is not
a letter, transform text to lower case.
Eg: The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.
[ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]
Whitespace Divide text into terms whenever it encounters a character which is
space
Stop Same as “simple”, but remove stop word (a/an/the…)
Eg: The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.
[quick, brown, foxes, jumped, over, lazy, dog, s, bone]
Keyword Same as text input
Eg: The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.
[The 2 QUICK Brown-Foxes jumped over the lazy dog's bone.]
Pattern Search by Regex pattern, default to \W+
Language Search by language-specific criteria. Criteria is configured by yourself
…
Boosting field by giving a field “boost” - count more towards the relevance score. Default
boost of field is 1.0
Exercise:
1. Modify your previous query so that the results are sorted first by email ascending, then
created_on from newest to oldest.
2. Our web application only shows three blog hits at a time. Modify the previous query so
that it only returns the top 3 hits.
3. Suppose a user clicks on page 4 of the search results from your previous query. Write a
query that returns the 3 hits of page 4.
Hint: Using from, along with size. From is similar to OFFSET in SQL.
First, we will create a new index for testing, which doesn’t harm original index that we have
already created.
Figure 66: Reindex data
Here, we are creating a new index based on existing data that we have already created named
kibana_sample_data_ecommerce_fixed. After running this command, we create a
new field in the _source, namely number_of_views and reindexBatch
Figure 67: Number of views and reindexBatch fields are created
Behind the scene, query in Figure 9 will look at the _source field, then add 2 fields
reindexBatch and number_of_views, with initial values set to 1 and 0 respectively.