ElasticSearch Learning Notes

#1 ElasticSearch
2018.4.17
#2 Slide/Conf
#3 Elasticsearch for beginners
link
2010年由Shay Banon创建的，⽤Java写的
Easy to Scale
RESTful API 可以与各种语⾔交互
Pre-operation Persistence 数据丢失⼏率⼩
Excellent Query DSL - ⽤JSON进⾏接⼝交互，接⼝DSL设计的好
Multi-tenancy 不懂
Support for advanced search features (Full Text)
..
Basic concpets
Cluster
Node 最好⼀个server⼀个node 不过好⾏已经不是这样了
Index 类似 database
Type 类似table
Document is a JSON document 类似 row in table. 每个Document都是⼀个JSON对象。
每⼀个Document是被存储在index⾥然后有type和id和key-value field
Field 类似column
Mapping 类似schema definition
Shard 不懂
Primary Shard 不懂
ReplaceSahrd
ElasticSearch Routing
Elasticsearch has no idea where to look for your document. All the docs were randomly
distributed aroudn your cluster. So Elasticsearch has no choice to broadcasts the
requerst to all shards.
Searching
#4 Searching
Searching and querying taks the format of:

http://localhost:9200/[index]/[type]/[operation]
Search across all indexes and all types
http://localhost:9200/_search
Sarch all types in the ttest-data index
http://localhost:9200/test-data/_search
Search explicitly for documents of type cities within the test-data index.
http://localhost:9200/test-data/cities/_search
Search explicitly for documents of type cities within the test-data index using paging
http://localhost:9200/test-data/cities/_search?size=5&from=10
排名
weight 权重 port 数据⾼⼀点 . ⽐较难定
#3 Using Elasticsearch with rails
link
Shard - is a single Lucene instance. It is a low-level "worker" unit which is managed

automatically by elasticsearch. An index is a logical namespace which points to
primary and replica shards.
Primary Shard - Each document is stored in a single primary shard. When you index a
document, it is indexed first on the primary shard, then on all replicas of the primary
shard. by default, an index has 5 primary shards. You can specify fewrer or more
primary shards to scale the number of documents that you index can handle.
Replica Shard - Each primary can have zero or more replicas. A replica is a copy of th
primary shard, and has two purposes:
1. increase failover
2. increase performance
#3 Ruby Conf 2013 Elasticsearch With Ruby
link
#3 Using Elasticsearch with Rails Apps by Brian

Gugliemetti
link
#2 Blog
#3 Asynchronous Elasticsearch bulk reindexing with

Rails, Searchkick and Sidekiq
link
异步reindex⽅案
#3 Configuring Elasticsearch On Rails
link
#2 Video
#3 65 Searchkick and Elasticsearch
link
#2 Wiki
#3 Install
link
当启动的时候可以设置cluster与node的名字
./elasticsearch -Ecluster.name=my_cluster_name -
Enode.name=my_node_name
Get health status: GET /_cat/health?v
curl -X GET "http://localhost:9200/_cat/health?v"

List all nodes: GET /_cat/nodes?v
curl -X GET "http://localhost:9200/_cat/nodes?v"
List all indices: GET /_cat/indices?v
curl -X GET "http://localhost:9200/_cat/indices?v"
Create an index PUT /customer?pretty
curl -X PUT "http://localhost:9200/customer?pretty"
Put something into our customer index: PUT /customer/_doc/1?pretty
curl -X PUT "http://localhost:9200/customer/_doc/1?pretty" -H

'Content-Type: application/json' -d'{ "name": "John Doe"}'
Index前不是⼀定要⼿动先创建index，它会⾃动帮我们创建
Retrieve that docuement we just index: GET /customer/_doc/1?pretty
curl -X GET "http://localhost:9200/customer/_doc/1?pretty"
Delete an Index DELETE /customer?pretty
curl -X DELETE "http://localhost:9200/customer/_doc/1?pretty"
The Pattern we use with Elasticsearch
<REST Verb> /<Index>/<Type>/<ID>
指定document id进⾏创建
curl -X PUT "http://localhost:9200/customer/_doc/1?pretty" -H

'Content-Type: application/json' -d '{ "name": "Jone Doe" }'
如果还是在这个id下再次创建，id不变，version加1，对应的内容改变（要注意这个不是
update⽽是replace）
换新的id就会在新的id下创建document,
不指定document id进⾏创建，就会为我们随机⽣成⼀⻓串id, ⽽且要⽤POST
curl -X POST "http://localhost:9200/customer/_doc?pretty" -H

'Content-Type: application/json' -d '{ "name": "John Doe" }'
Updating Documents
Document更新的本质上是删除原来的，创建⼀个新的。
curl -X POST "http://localhost:9200/customer/_doc/1/_update?pretty" -
H 'Content-Type: application/json' -d'
{
"doc": { "name": "Jane Doe" }
}
'
curl -X POST "http://localhost:9200/customer/_doc/1/_update?pretty" -

H 'Content-Type: application/json' -d'
{
"doc": { "name": "Jane Doe", "age": 20 }
}
'
跟新还可以使⽤Script
curl -X POST "http://localhost:9200/customer/_doc/1/_update?pretty"

-H 'Content-Type: application/json' -d'
{
"script" : "ctx._source.age += 5"
}
'
Deleting Documents
curl -X DELETE "http://localhost:9200/customer/_doc/2?pretty"
Batch Processing
这个在⼀个Batch⾥⼀次性执⾏许多语句，这样不⽤多次调⽤接⼝
⼀次indexes two documents:
curl -X POST "http://localhost:9200/customer/_doc/_bulk?pretty" -H

'Content-Type: application/json' -d'
{"index":{"_id":"1"}}
{"name": "John Doe" }
{"index":{"_id":"2"}}
{"name": "Jane Doe" }
'
跟新第⼀个document，删除第⼆个document
curl -X POST "http://localhost:9200/customer/_doc/_bulk?pretty" -H
'Content-Type: application/json' -d'
{"update":{"_id":"1"}}
{"doc": { "name": "John Doe becomes Jane Doe" } }
{"delete":{"_id":"2"}}
'
多个action中有个别action执⾏失败也不会影响其他action的执⾏，会在返回信息中告诉你每
⼀个action的执⾏情况
下⾯尝试导⼊⼤批量的数据：download-link
curl -H "Content-Type: application/json" -XPOST

"localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary
"@accounts.json"
然后⽤ curl -H "Content-Type: application/json" -XPOST

"localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary
"@accounts.json" 查看
Searching
endpoint is _search
返回bank这个index下的所有documents
curl -X GET "http://localhost:9200/bank/_search?

q=*&sort=account_number:asc&pretty"
q=* instructs Elasticsearch to match all documents in the index.

sort=account_number:asc indicates to sort the results using the
account_number field.
pretty just tell Elasticsearch to return pertty-printed JOSN results
The Content we got:
{
"took" : 63,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1000,
"max_score" : null,
"hits" : [ {
"_index" : "bank",
"_type" : "_doc",
"_id" : "0",
"sort": [0],
"_score" : null,
"_source" :
{"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname"
:"Mckenzie","age":29,"gender":"F","address":"244 Columbus
Place","employer":"Euron","email":"bradshawmckenzie@euron.com","city"
:"Hobucken","state":"CO"}
}, {
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"sort": [1],
"_score" : null,
"_source" :
{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"D
uke","age":32,"gender":"M","address":"880 Holmes
Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brog
an","state":"IL"}
}, ...
]
}
}
took time in milliseconds for Elasticsearch to execute

timed_out tells us if the search timed out or not
_shards - tells us how many shards were searched, as well as a count of the
sucessful/failed searched shards
hits search resutls
hits.total total number of documents matching our search criteria
hits.hits actual array of search results (defaults to first 10 documents)
hits.sort sort key for results (missing if sorting by score)
hits._score and max_score - ignore these files for now
我们还可以使⽤ request body method来搜索出同样的结果

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:
application/json' -d'
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
]
}
'
与⼀般SQL数据库不同，ElasticSearch返回内容以后就不会管你了，不像其他还有类似 page
throught这种功能。
返回特定区域的 from size

application/json' -d '
{
"sort" : [
{ "account_number": "asc" }
],
"from": 10,
"size": 10
}
'
返回Top 10

{
"sort": { "balance": { "order": "desc" } }
}
'
只返回某些fields (account_number与balance)
{
"_source": ["account_number", "balance"]
}
'
match搜素，找到account_number为 20的记录

{
"query": { "match": { "account_number": 20 } }
}
'
match搜索, 找到所有address contain mill的记录

{
"query": { "match": { "address": "mill" } }
}
'
match搜素,找到所有address⾥contain mill或者lane的记录

{
"query": { "match": { "address": "mill lane" } }
}
'
match_phrase搜索, 找到到所有address⾥有 "mill lane" 这个phrase的 (也就是完全符合包括

空格)

{
"query": { "match_phrase": { "address": "mill lane" } }
}
'
Bool + Must + Match 搜索, 找到所有address⾥同时包含"mill"和 "lane"的记录

{
"query": {
"bool": {
"must": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
'
这样可以组合很多搜索条件
Bool + should + Match 搜索, 找到所有address⾥同时包含"mill"或者 "lane"的记录

{
"query": {
"bool": {
"should": [
]
}
}
}
'
Bool + must_not + Match 搜索, 找到所有address⾥同时不包含"mill"或者 "lane"的记录

{
"query": {
"bool": {
"must_not": [
]
}
}
}
'
Bool⾥还可以嵌套Bool，Must与must_not, should都可以平⾏使⽤
找出40岁但没有ID的记录

{
"query": {
"bool": {
"must": [
{ "match": { "age": "40" } }
],
"must_not": [
{ "match": { "state": "ID" } }
]
}
}
}
'
搜索+普通的数字筛选，找出balance在20000与30000之间的记录

{
"query": {
"bool": {
"must": { "match_all": {} },
"filter": {
"range": {
"balance": {
"gte": 20000,
"lte": 30000
}
}
}
}
}
}
'
Search + Aggregations
group all the accounts by state, and then return to top 10 states sorted by count
descending
数⼀数每个state都有多少record，然后返回top 10

{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
}
}
}
}
'
算⼀算每个state下有多少record并计算其平均balance是多少,然后返回top10

{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
'
在上⾯的基础上,根据balance平均值进⾏排序

{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword",
"order": {
"average_balance": "desc"
}
},
"aggs": {
"avg": {
"field": "balance"
}
}
}
}
}
}
'
This example demonstrates how we can group by age brackets (ages 20-29, 30-39, and
40-49), then by gender, and then finally get the average account balance, per age bracket,
per gender:

{
"size": 0,
"aggs": {
"group_by_age": {
"range": {
"field": "age",
"ranges": [
{
"from": 20,
"to": 30
},
{
"from": 30,
"to": 40
},
{
"from": 40,
"to": 50
}
]
},
"aggs": {
"group_by_gender": {
"terms": {
"field": "gender.keyword"
},
"aggs": {
"avg": {
"field": "balance"
}
}
}
}
}
}
}
}
'
#3 Mapping with Zero Downtime
link
#3 Amazon Elasticsearch Service
link
#2 Video
#3 Upgrading Your Elastic Stack
link
6.2 - Upgrade Elasticsearch
可在这⾥找到完整具体的升级细节：https://www.elastic.co/products/upgrade_guide](http
s://www.elastic.co/products/upgrade_guide)
#2 Book
#3 Learning Elastic Stack 6.0
#2 Repo
#3 Searchkick
link
#1 其它
查看项⽬有多少⾏代码
link
cloc .
#2 主题学习
#3 Index
ElasticSearch stores its data in one or more indices.

index is something similar to database.
ElasticSearch uses Apache Lucene library to write and read the data from the index.
A single Elasticsearch index may be build of more than a single Apache Lucene index, by
using shards and replicas.

ElasticSearch Learning Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ElasticSearch Learning Notes

Uploaded by

Copyright:

Available Formats

#1 ElasticSearch

#3 Elasticsearch for beginners

2010年 由Shay Banon创建的，⽤Java写的

Searching and querying taks the format of:

weight 权重 port 数据⾼⼀点 . ⽐较难定

#3 Using Elasticsearch with rails

Shard - is a single Lucene instance. It is a low-level "worker" unit which is managed

#3 Ruby Conf 2013 Elasticsearch With Ruby

#3 Using Elasticsearch with Rails Apps by Brian

#3 Asynchronous Elasticsearch bulk reindexing with

#3 Conﬁguring Elasticsearch On Rails

#3 65 Searchkick and Elasticsearch

Get health status: GET /_cat/health?v

curl -X GET "http://localhost:9200/_cat/health?v"

curl -X GET "http://localhost:9200/_cat/nodes?v"

List all indices: GET /_cat/indices?v

curl -X GET "http://localhost:9200/_cat/indices?v"

Create an index PUT /customer?pretty

curl -X PUT "http://localhost:9200/customer?pretty"

Put something into our customer index: PUT /customer/_doc/1?pretty

curl -X PUT "http://localhost:9200/customer/_doc/1?pretty" -H

Retrieve that docuement we just index: GET /customer/_doc/1?pretty

curl -X GET "http://localhost:9200/customer/_doc/1?pretty"

Delete an Index DELETE /customer?pretty

curl -X DELETE "http://localhost:9200/customer/_doc/1?pretty"

The Pattern we use with Elasticsearch

<REST Verb> /<Index>/<Type>/<ID>

curl -X PUT "http://localhost:9200/customer/_doc/1?pretty" -H

不指定document id进⾏创建，就会为我们随机⽣成⼀⻓串id, ⽽且要⽤POST

curl -X POST "http://localhost:9200/customer/_doc?pretty" -H

curl -X POST "http://localhost:9200/customer/_doc/1/_update?pretty" -

curl -X POST "http://localhost:9200/customer/_doc/1/_update?pretty"

curl -X DELETE "http://localhost:9200/customer/_doc/2?pretty"

⼀次indexes two documents:

curl -X POST "http://localhost:9200/customer/_doc/_bulk?pretty" -H

curl -H "Content-Type: application/json" -XPOST

然后⽤ curl -H "Content-Type: application/json" -XPOST

curl -X GET "http://localhost:9200/bank/_search?

q=* instructs Elasticsearch to match all documents in the index.

The Content we got:

took time in milliseconds for Elasticsearch to execute

我们还可以使⽤ request body method来搜索出同样的结果

返回特定区域的 from size

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:

match搜索, 找到所有address contain mill的记录

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:

match_phrase搜索, 找到到所有address⾥有 "mill lane" 这个phrase的 (也就是完全符合包括

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:

Bool + should + Match 搜索, 找到所有address⾥同时包含"mill"或者 "lane"的记录

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:

Bool + must_not + Match 搜索, 找到所有address⾥同时不 包含"mill"或者 "lane"的记录

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:

#3 Mapping with Zero Downtime

#3 Amazon Elasticsearch Service

#3 Upgrading Your Elastic Stack

2010年由Shay Banon创建的，⽤Java写的

Bool + must_not + Match 搜索, 找到所有address⾥同时不包含"mill"或者 "lane"的记录