You are on page 1of 16

#1 ElasticSearch

2018.4.17

#2 Slide/Conf

#3 Elasticsearch for beginners

link

2010年 由Shay Banon创建的,⽤Java写的

Easy to Scale
RESTful API 可以与各种语⾔交互
Pre-operation Persistence 数据丢失⼏率⼩
Excellent Query DSL - ⽤JSON进⾏接⼝交互,接⼝DSL设计的好
Multi-tenancy 不懂
Support for advanced search features (Full Text)
..

Basic concpets

Cluster
Node 最好⼀个server⼀个node 不过好⾏已经不是这样了
Index 类似 database
Type 类似table
Document is a JSON document 类似 row in table. 每个Document都是⼀个JSON对象。
每⼀个Document是被存储在index⾥然后有type和id和key-value field
Field 类似column
Mapping 类似schema definition
Shard 不懂
Primary Shard 不懂
ReplaceSahrd

ElasticSearch Routing

Elasticsearch has no idea where to look for your document. All the docs were randomly
distributed aroudn your cluster. So Elasticsearch has no choice to broadcasts the
requerst to all shards.

Searching
#4 Searching

Searching and querying taks the format of:


http://localhost:9200/[index]/[type]/[operation]
Search across all indexes and all types
http://localhost:9200/_search
Sarch all types in the ttest-data index

http://localhost:9200/test-data/_search
Search explicitly for documents of type cities within the test-data index.
http://localhost:9200/test-data/cities/_search
Search explicitly for documents of type cities within the test-data index using paging

http://localhost:9200/test-data/cities/_search?size=5&from=10

排名

weight 权重 port 数据⾼⼀点 . ⽐较难定

#3 Using Elasticsearch with rails

link

Shard - is a single Lucene instance. It is a low-level "worker" unit which is managed


automatically by elasticsearch. An index is a logical namespace which points to
primary and replica shards.
Primary Shard - Each document is stored in a single primary shard. When you index a
document, it is indexed first on the primary shard, then on all replicas of the primary
shard. by default, an index has 5 primary shards. You can specify fewrer or more
primary shards to scale the number of documents that you index can handle.
Replica Shard - Each primary can have zero or more replicas. A replica is a copy of th
primary shard, and has two purposes:
1. increase failover
2. increase performance

#3 Ruby Conf 2013 Elasticsearch With Ruby

link

#3 Using Elasticsearch with Rails Apps by Brian


Gugliemetti
link

#2 Blog

#3 Asynchronous Elasticsearch bulk reindexing with


Rails, Searchkick and Sidekiq

link

异步reindex⽅案

#3 Configuring Elasticsearch On Rails

link

#2 Video

#3 65 Searchkick and Elasticsearch

link

#2 Wiki

#3 Install

link

当启动的时候可以设置cluster与node的名字

./elasticsearch -Ecluster.name=my_cluster_name -
Enode.name=my_node_name

Get health status: GET /_cat/health?v

curl -X GET "http://localhost:9200/_cat/health?v"


List all nodes: GET /_cat/nodes?v

curl -X GET "http://localhost:9200/_cat/nodes?v"

List all indices: GET /_cat/indices?v

curl -X GET "http://localhost:9200/_cat/indices?v"

Create an index PUT /customer?pretty

curl -X PUT "http://localhost:9200/customer?pretty"

Put something into our customer index: PUT /customer/_doc/1?pretty

curl -X PUT "http://localhost:9200/customer/_doc/1?pretty" -H


'Content-Type: application/json' -d'{ "name": "John Doe"}'

Index前不是⼀定要⼿动先创建index,它会⾃动帮我们创建

Retrieve that docuement we just index: GET /customer/_doc/1?pretty

curl -X GET "http://localhost:9200/customer/_doc/1?pretty"

Delete an Index DELETE /customer?pretty

curl -X DELETE "http://localhost:9200/customer/_doc/1?pretty"

The Pattern we use with Elasticsearch

<REST Verb> /<Index>/<Type>/<ID>

指定document id进⾏创建

curl -X PUT "http://localhost:9200/customer/_doc/1?pretty" -H


'Content-Type: application/json' -d '{ "name": "Jone Doe" }'

如果还是在这个id下再次创建,id不变,version加1,对应的内容改变 (要注意这个不是
update⽽是replace)

换新的id就会在新的id下创建document,

不指定document id进⾏创建,就会为我们随机⽣成⼀⻓串id, ⽽且要⽤POST

curl -X POST "http://localhost:9200/customer/_doc?pretty" -H


'Content-Type: application/json' -d '{ "name": "John Doe" }'

Updating Documents

Document更新的本质上是删除原来的,创建⼀个新的。
curl -X POST "http://localhost:9200/customer/_doc/1/_update?pretty" -
H 'Content-Type: application/json' -d'
{
"doc": { "name": "Jane Doe" }
}
'

curl -X POST "http://localhost:9200/customer/_doc/1/_update?pretty" -


H 'Content-Type: application/json' -d'
{
"doc": { "name": "Jane Doe", "age": 20 }
}
'

跟新还可以使⽤Script

curl -X POST "http://localhost:9200/customer/_doc/1/_update?pretty"


-H 'Content-Type: application/json' -d'
{
"script" : "ctx._source.age += 5"
}
'

Deleting Documents

curl -X DELETE "http://localhost:9200/customer/_doc/2?pretty"

Batch Processing

这个在⼀个Batch⾥⼀次性执⾏许多语句,这样不⽤多次调⽤接⼝

⼀次indexes two documents:

curl -X POST "http://localhost:9200/customer/_doc/_bulk?pretty" -H


'Content-Type: application/json' -d'
{"index":{"_id":"1"}}
{"name": "John Doe" }
{"index":{"_id":"2"}}
{"name": "Jane Doe" }
'

跟新第⼀个document,删除第⼆个document
curl -X POST "http://localhost:9200/customer/_doc/_bulk?pretty" -H
'Content-Type: application/json' -d'
{"update":{"_id":"1"}}
{"doc": { "name": "John Doe becomes Jane Doe" } }
{"delete":{"_id":"2"}}
'

多个action中有个别action执⾏失败也不会影响其他action的执⾏,会在返回信息中告诉你每
⼀个action的执⾏情况

下⾯尝试导⼊⼤批量的数据:download-link

curl -H "Content-Type: application/json" -XPOST


"localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary
"@accounts.json"

然后⽤ curl -H "Content-Type: application/json" -XPOST


"localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary
"@accounts.json" 查看

Searching

endpoint is _search

返回bank这个index下的所有documents

curl -X GET "http://localhost:9200/bank/_search?


q=*&sort=account_number:asc&pretty"

q=* instructs Elasticsearch to match all documents in the index.


sort=account_number:asc indicates to sort the results using the
account_number field.
pretty just tell Elasticsearch to return pertty-printed JOSN results

The Content we got:

{
"took" : 63,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1000,
"max_score" : null,
"hits" : [ {
"_index" : "bank",
"_type" : "_doc",
"_id" : "0",
"sort": [0],
"_score" : null,
"_source" :
{"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname"
:"Mckenzie","age":29,"gender":"F","address":"244 Columbus
Place","employer":"Euron","email":"bradshawmckenzie@euron.com","city"
:"Hobucken","state":"CO"}
}, {
"_index" : "bank",
"_type" : "_doc",
"_id" : "1",
"sort": [1],
"_score" : null,
"_source" :
{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"D
uke","age":32,"gender":"M","address":"880 Holmes
Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brog
an","state":"IL"}
}, ...
]
}
}

took time in milliseconds for Elasticsearch to execute


timed_out tells us if the search timed out or not
_shards - tells us how many shards were searched, as well as a count of the
sucessful/failed searched shards
hits search resutls
hits.total total number of documents matching our search criteria
hits.hits actual array of search results (defaults to first 10 documents)
hits.sort sort key for results (missing if sorting by score)
hits._score and max_score - ignore these files for now

我们还可以使⽤ request body method来搜索出同样的结果


curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:
application/json' -d'
{
"query": { "match_all": {} },
"sort": [
{ "account_number": "asc" }
]
}
'

与⼀般SQL数据库不同,ElasticSearch返回内容以后就不会管你了,不像其他还有类似 page
throught这种功能。

返回特定区域的 from size

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:


application/json' -d '
{
"query": { "match_all": {} },
"sort" : [
{ "account_number": "asc" }
],
"from": 10,
"size": 10
}
'

返回Top 10

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:


application/json' -d '
{
"query": { "match_all": {} },
"sort": { "balance": { "order": "desc" } }
}
'

只返回某些fields (account_number与balance)
curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:
application/json' -d '
{
"query": { "match_all": {} },
"_source": ["account_number", "balance"]
}
'

match搜素,找到account_number为 20的记录

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:


application/json' -d '
{
"query": { "match": { "account_number": 20 } }
}
'

match搜索, 找到所有address contain mill的记录

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:


application/json' -d '
{
"query": { "match": { "address": "mill" } }
}
'

match搜素,找到所有address⾥contain mill或者lane的记录

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:


application/json' -d '
{
"query": { "match": { "address": "mill lane" } }
}
'

match_phrase搜索, 找到到所有address⾥有 "mill lane" 这个phrase的 (也就是完全符合包括


空格)

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:


application/json' -d '
{
"query": { "match_phrase": { "address": "mill lane" } }
}
'
Bool + Must + Match 搜索, 找到所有address⾥同时包含"mill"和 "lane"的记录

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:


application/json' -d '
{
"query": {
"bool": {
"must": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
'

这样可以组合很多搜索条件

Bool + should + Match 搜索, 找到所有address⾥同时包含"mill"或者 "lane"的记录

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:


application/json' -d '
{
"query": {
"bool": {
"should": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
'

Bool + must_not + Match 搜索, 找到所有address⾥同时不 包含"mill"或者 "lane"的记录


curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:
application/json' -d '
{
"query": {
"bool": {
"must_not": [
{ "match": { "address": "mill" } },
{ "match": { "address": "lane" } }
]
}
}
}
'

Bool⾥还可以嵌套Bool,Must与must_not, should都可以平⾏使⽤

找出40岁但没有ID的记录

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:


application/json' -d '
{
"query": {
"bool": {
"must": [
{ "match": { "age": "40" } }
],
"must_not": [
{ "match": { "state": "ID" } }
]
}
}
}
'

搜索+普通的数字筛选, 找出balance在20000与30000之间的记录

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:


application/json' -d '
{
"query": {
"bool": {
"must": { "match_all": {} },
"filter": {
"range": {
"balance": {
"gte": 20000,
"lte": 30000
}
}
}
}
}
}
'

Search + Aggregations

group all the accounts by state, and then return to top 10 states sorted by count
descending

数⼀数每个state都有多少record,然后返回top 10

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:


application/json' -d '
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
}
}
}
}
'

算⼀算每个state下有多少record并计算其平均balance是多少,然后返回top10

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:


application/json' -d '
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword"
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
'

在上⾯的基础上,根据balance平均值进⾏排序

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:


application/json' -d '
{
"size": 0,
"aggs": {
"group_by_state": {
"terms": {
"field": "state.keyword",
"order": {
"average_balance": "desc"
}
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
'

This example demonstrates how we can group by age brackets (ages 20-29, 30-39, and
40-49), then by gender, and then finally get the average account balance, per age bracket,
per gender:

curl -X GET "http://localhost:9200/bank/_search" -H 'Content-Type:


application/json' -d '
{
"size": 0,
"aggs": {
"group_by_age": {
"range": {
"field": "age",
"ranges": [
{
"from": 20,
"to": 30
},
{
"from": 30,
"to": 40
},
{
"from": 40,
"to": 50
}
]
},
"aggs": {
"group_by_gender": {
"terms": {
"field": "gender.keyword"
},
"aggs": {
"average_balance": {
"avg": {
"field": "balance"
}
}
}
}
}
}
}
}
'

#3 Mapping with Zero Downtime

link

#3 Amazon Elasticsearch Service

link

#2 Video

#3 Upgrading Your Elastic Stack

link
6.2 - Upgrade Elasticsearch

可在这⾥找到完整具体的升级细节:https://www.elastic.co/products/upgrade_guide](http
s://www.elastic.co/products/upgrade_guide)

#2 Book

#3 Learning Elastic Stack 6.0

#2 Repo

#3 Searchkick

link

#1 其它
查看项⽬有多少⾏代码

link

cloc .

#2 主题学习

#3 Index

ElasticSearch stores its data in one or more indices.


index is something similar to database.

ElasticSearch uses Apache Lucene library to write and read the data from the index.

A single Elasticsearch index may be build of more than a single Apache Lucene index, by
using shards and replicas.

You might also like